918 lines
38 KiB
Python
Raw Normal View History

# Copyright (c) 2010-2012 OpenStack Foundation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import itertools
from collections import defaultdict
import unittest
from mock import patch
from swift.proxy.controllers.base import headers_to_container_info, \
headers_to_account_info, headers_to_object_info, get_container_info, \
get_container_memcache_key, get_account_info, get_account_memcache_key, \
get_object_env_key, get_info, get_object_info, \
Foundational support for PUT and GET of erasure-coded objects This commit makes it possible to PUT an object into Swift and have it stored using erasure coding instead of replication, and also to GET the object back from Swift at a later time. This works by splitting the incoming object into a number of segments, erasure-coding each segment in turn to get fragments, then concatenating the fragments into fragment archives. Segments are 1 MiB in size, except the last, which is between 1 B and 1 MiB. +====================================================================+ | object data | +====================================================================+ | +------------------------+----------------------+ | | | v v v +===================+ +===================+ +==============+ | segment 1 | | segment 2 | ... | segment N | +===================+ +===================+ +==============+ | | | | v v /=========\ /=========\ | pyeclib | | pyeclib | ... \=========/ \=========/ | | | | +--> fragment A-1 +--> fragment A-2 | | | | | | | | | | +--> fragment B-1 +--> fragment B-2 | | | | ... ... Then, object server A gets the concatenation of fragment A-1, A-2, ..., A-N, so its .data file looks like this (called a "fragment archive"): +=====================================================================+ | fragment A-1 | fragment A-2 | ... | fragment A-N | +=====================================================================+ Since this means that the object server never sees the object data as the client sent it, we have to do a few things to ensure data integrity. First, the proxy has to check the Etag if the client provided it; the object server can't do it since the object server doesn't see the raw data. Second, if the client does not provide an Etag, the proxy computes it and uses the MIME-PUT mechanism to provide it to the object servers after the object body. Otherwise, the object would not have an Etag at all. Third, the proxy computes the MD5 of each fragment archive and sends it to the object server using the MIME-PUT mechanism. With replicated objects, the proxy checks that the Etags from all the object servers match, and if they don't, returns a 500 to the client. This mitigates the risk of data corruption in one of the proxy --> object connections, and signals to the client when it happens. With EC objects, we can't use that same mechanism, so we must send the checksum with each fragment archive to get comparable protection. On the GET path, the inverse happens: the proxy connects to a bunch of object servers (M of them, for an M+K scheme), reads one fragment at a time from each fragment archive, decodes those fragments into a segment, and serves the segment to the client. When an object server dies partway through a GET response, any partially-fetched fragment is discarded, the resumption point is wound back to the nearest fragment boundary, and the GET is retried with the next object server. GET requests for a single byterange work; GET requests for multiple byteranges do not. There are a number of things _not_ included in this commit. Some of them are listed here: * multi-range GET * deferred cleanup of old .data files * durability (daemon to reconstruct missing archives) Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2014-10-22 13:18:34 -07:00
Controller, GetOrHeadHandler, _set_info_cache, _set_object_info_cache, \
bytes_to_skip
from swift.common.swob import Request, HTTPException, RESPONSE_REASONS
Foundational support for PUT and GET of erasure-coded objects This commit makes it possible to PUT an object into Swift and have it stored using erasure coding instead of replication, and also to GET the object back from Swift at a later time. This works by splitting the incoming object into a number of segments, erasure-coding each segment in turn to get fragments, then concatenating the fragments into fragment archives. Segments are 1 MiB in size, except the last, which is between 1 B and 1 MiB. +====================================================================+ | object data | +====================================================================+ | +------------------------+----------------------+ | | | v v v +===================+ +===================+ +==============+ | segment 1 | | segment 2 | ... | segment N | +===================+ +===================+ +==============+ | | | | v v /=========\ /=========\ | pyeclib | | pyeclib | ... \=========/ \=========/ | | | | +--> fragment A-1 +--> fragment A-2 | | | | | | | | | | +--> fragment B-1 +--> fragment B-2 | | | | ... ... Then, object server A gets the concatenation of fragment A-1, A-2, ..., A-N, so its .data file looks like this (called a "fragment archive"): +=====================================================================+ | fragment A-1 | fragment A-2 | ... | fragment A-N | +=====================================================================+ Since this means that the object server never sees the object data as the client sent it, we have to do a few things to ensure data integrity. First, the proxy has to check the Etag if the client provided it; the object server can't do it since the object server doesn't see the raw data. Second, if the client does not provide an Etag, the proxy computes it and uses the MIME-PUT mechanism to provide it to the object servers after the object body. Otherwise, the object would not have an Etag at all. Third, the proxy computes the MD5 of each fragment archive and sends it to the object server using the MIME-PUT mechanism. With replicated objects, the proxy checks that the Etags from all the object servers match, and if they don't, returns a 500 to the client. This mitigates the risk of data corruption in one of the proxy --> object connections, and signals to the client when it happens. With EC objects, we can't use that same mechanism, so we must send the checksum with each fragment archive to get comparable protection. On the GET path, the inverse happens: the proxy connects to a bunch of object servers (M of them, for an M+K scheme), reads one fragment at a time from each fragment archive, decodes those fragments into a segment, and serves the segment to the client. When an object server dies partway through a GET response, any partially-fetched fragment is discarded, the resumption point is wound back to the nearest fragment boundary, and the GET is retried with the next object server. GET requests for a single byterange work; GET requests for multiple byteranges do not. There are a number of things _not_ included in this commit. Some of them are listed here: * multi-range GET * deferred cleanup of old .data files * durability (daemon to reconstruct missing archives) Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2014-10-22 13:18:34 -07:00
from swift.common import exceptions
from swift.common.utils import split_path
from swift.common.header_key_dict import HeaderKeyDict
from swift.common.http import is_success
from swift.common.storage_policy import StoragePolicy, POLICIES
from test.unit import fake_http_connect, FakeRing, FakeMemcache
from swift.proxy import server as proxy_server
Generic means for persisting system metadata. Middleware or core features may need to store metadata against accounts or containers. This patch adds a generic mechanism for system metadata to be persisted in backend databases, without polluting the user metadata namespace, by using the reserved header namespace x-<server_type>-sysmeta-*. Modifications are firstly that backend servers persist system metadata headers alongside user metadata and other system state. For accounts and containers, system metadata in PUT and POST requests is treated in a similar way to user metadata. System metadata is not yet supported for object requests. Secondly, changes in the proxy controllers ensure that headers in the system metadata namespace will pass through in requests to backend servers. Thirdly, system metadata returned from backend servers in GET or HEAD responses is added to the cached info dict, which middleware can access. Finally, a gatekeeper middleware module is provided which filters all system metadata headers from requests and responses by removing headers with names starting x-account-sysmeta-, x-container-sysmeta-. The gatekeeper also removes headers starting x-object-sysmeta- in anticipation of future support for system metadata being set for objects. This prevents clients from writing or reading system metadata. The required_filters list in swift/proxy/server.py is modified to include the gatekeeper middleware so that if the gatekeeper has not been configured in the pipeline then it will be automatically inserted close to the start of the pipeline. blueprint cluster-federation Change-Id: I80b8b14243cc59505f8c584920f8f527646b5f45
2013-12-03 22:02:39 +00:00
from swift.common.request_helpers import get_sys_meta_prefix
from test.unit import patch_policies
class FakeResponse(object):
base_headers = {}
def __init__(self, status_int=200, headers=None, body=''):
self.status_int = status_int
self._headers = headers or {}
self.body = body
@property
def headers(self):
if is_success(self.status_int):
self._headers.update(self.base_headers)
return self._headers
class AccountResponse(FakeResponse):
base_headers = {
'x-account-container-count': 333,
'x-account-object-count': 1000,
'x-account-bytes-used': 6666,
}
class ContainerResponse(FakeResponse):
base_headers = {
'x-container-object-count': 1000,
'x-container-bytes-used': 6666,
}
class ObjectResponse(FakeResponse):
base_headers = {
'content-length': 5555,
'content-type': 'text/plain'
}
class DynamicResponseFactory(object):
def __init__(self, *statuses):
if statuses:
self.statuses = iter(statuses)
else:
self.statuses = itertools.repeat(200)
self.stats = defaultdict(int)
response_type = {
'obj': ObjectResponse,
'container': ContainerResponse,
'account': AccountResponse,
}
def _get_response(self, type_):
self.stats[type_] += 1
class_ = self.response_type[type_]
return class_(next(self.statuses))
def get_response(self, environ):
(version, account, container, obj) = split_path(
environ['PATH_INFO'], 2, 4, True)
if obj:
resp = self._get_response('obj')
elif container:
resp = self._get_response('container')
else:
resp = self._get_response('account')
resp.account = account
resp.container = container
resp.obj = obj
return resp
class FakeApp(object):
recheck_container_existence = 30
recheck_account_existence = 30
def __init__(self, response_factory=None, statuses=None):
self.responses = response_factory or \
DynamicResponseFactory(*statuses or [])
self.sources = []
def __call__(self, environ, start_response):
self.sources.append(environ.get('swift.source'))
response = self.responses.get_response(environ)
reason = RESPONSE_REASONS[response.status_int][0]
start_response('%d %s' % (response.status_int, reason),
[(k, v) for k, v in response.headers.items()])
# It's a bit strange, but the get_info cache stuff relies on the
# app setting some keys in the environment as it makes requests
# (in particular GETorHEAD_base) - so our fake does the same
_set_info_cache(self, environ, response.account,
response.container, response)
if response.obj:
_set_object_info_cache(self, environ, response.account,
response.container, response.obj,
response)
return iter(response.body)
class FakeCache(FakeMemcache):
def __init__(self, stub=None, **pre_cached):
super(FakeCache, self).__init__()
if pre_cached:
self.store.update(pre_cached)
self.stub = stub
def get(self, key):
return self.stub or self.store.get(key)
@patch_policies([StoragePolicy(0, 'zero', True, object_ring=FakeRing())])
class TestFuncs(unittest.TestCase):
def setUp(self):
self.app = proxy_server.Application(None, FakeMemcache(),
account_ring=FakeRing(),
container_ring=FakeRing())
def test_GETorHEAD_base(self):
base = Controller(self.app)
req = Request.blank('/v1/a/c/o/with/slashes')
Foundational support for PUT and GET of erasure-coded objects This commit makes it possible to PUT an object into Swift and have it stored using erasure coding instead of replication, and also to GET the object back from Swift at a later time. This works by splitting the incoming object into a number of segments, erasure-coding each segment in turn to get fragments, then concatenating the fragments into fragment archives. Segments are 1 MiB in size, except the last, which is between 1 B and 1 MiB. +====================================================================+ | object data | +====================================================================+ | +------------------------+----------------------+ | | | v v v +===================+ +===================+ +==============+ | segment 1 | | segment 2 | ... | segment N | +===================+ +===================+ +==============+ | | | | v v /=========\ /=========\ | pyeclib | | pyeclib | ... \=========/ \=========/ | | | | +--> fragment A-1 +--> fragment A-2 | | | | | | | | | | +--> fragment B-1 +--> fragment B-2 | | | | ... ... Then, object server A gets the concatenation of fragment A-1, A-2, ..., A-N, so its .data file looks like this (called a "fragment archive"): +=====================================================================+ | fragment A-1 | fragment A-2 | ... | fragment A-N | +=====================================================================+ Since this means that the object server never sees the object data as the client sent it, we have to do a few things to ensure data integrity. First, the proxy has to check the Etag if the client provided it; the object server can't do it since the object server doesn't see the raw data. Second, if the client does not provide an Etag, the proxy computes it and uses the MIME-PUT mechanism to provide it to the object servers after the object body. Otherwise, the object would not have an Etag at all. Third, the proxy computes the MD5 of each fragment archive and sends it to the object server using the MIME-PUT mechanism. With replicated objects, the proxy checks that the Etags from all the object servers match, and if they don't, returns a 500 to the client. This mitigates the risk of data corruption in one of the proxy --> object connections, and signals to the client when it happens. With EC objects, we can't use that same mechanism, so we must send the checksum with each fragment archive to get comparable protection. On the GET path, the inverse happens: the proxy connects to a bunch of object servers (M of them, for an M+K scheme), reads one fragment at a time from each fragment archive, decodes those fragments into a segment, and serves the segment to the client. When an object server dies partway through a GET response, any partially-fetched fragment is discarded, the resumption point is wound back to the nearest fragment boundary, and the GET is retried with the next object server. GET requests for a single byterange work; GET requests for multiple byteranges do not. There are a number of things _not_ included in this commit. Some of them are listed here: * multi-range GET * deferred cleanup of old .data files * durability (daemon to reconstruct missing archives) Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2014-10-22 13:18:34 -07:00
ring = FakeRing()
nodes = list(ring.get_part_nodes(0)) + list(ring.get_more_nodes(0))
with patch('swift.proxy.controllers.base.'
'http_connect', fake_http_connect(200)):
Foundational support for PUT and GET of erasure-coded objects This commit makes it possible to PUT an object into Swift and have it stored using erasure coding instead of replication, and also to GET the object back from Swift at a later time. This works by splitting the incoming object into a number of segments, erasure-coding each segment in turn to get fragments, then concatenating the fragments into fragment archives. Segments are 1 MiB in size, except the last, which is between 1 B and 1 MiB. +====================================================================+ | object data | +====================================================================+ | +------------------------+----------------------+ | | | v v v +===================+ +===================+ +==============+ | segment 1 | | segment 2 | ... | segment N | +===================+ +===================+ +==============+ | | | | v v /=========\ /=========\ | pyeclib | | pyeclib | ... \=========/ \=========/ | | | | +--> fragment A-1 +--> fragment A-2 | | | | | | | | | | +--> fragment B-1 +--> fragment B-2 | | | | ... ... Then, object server A gets the concatenation of fragment A-1, A-2, ..., A-N, so its .data file looks like this (called a "fragment archive"): +=====================================================================+ | fragment A-1 | fragment A-2 | ... | fragment A-N | +=====================================================================+ Since this means that the object server never sees the object data as the client sent it, we have to do a few things to ensure data integrity. First, the proxy has to check the Etag if the client provided it; the object server can't do it since the object server doesn't see the raw data. Second, if the client does not provide an Etag, the proxy computes it and uses the MIME-PUT mechanism to provide it to the object servers after the object body. Otherwise, the object would not have an Etag at all. Third, the proxy computes the MD5 of each fragment archive and sends it to the object server using the MIME-PUT mechanism. With replicated objects, the proxy checks that the Etags from all the object servers match, and if they don't, returns a 500 to the client. This mitigates the risk of data corruption in one of the proxy --> object connections, and signals to the client when it happens. With EC objects, we can't use that same mechanism, so we must send the checksum with each fragment archive to get comparable protection. On the GET path, the inverse happens: the proxy connects to a bunch of object servers (M of them, for an M+K scheme), reads one fragment at a time from each fragment archive, decodes those fragments into a segment, and serves the segment to the client. When an object server dies partway through a GET response, any partially-fetched fragment is discarded, the resumption point is wound back to the nearest fragment boundary, and the GET is retried with the next object server. GET requests for a single byterange work; GET requests for multiple byteranges do not. There are a number of things _not_ included in this commit. Some of them are listed here: * multi-range GET * deferred cleanup of old .data files * durability (daemon to reconstruct missing archives) Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2014-10-22 13:18:34 -07:00
resp = base.GETorHEAD_base(req, 'object', iter(nodes), 'part',
'/a/c/o/with/slashes')
self.assertTrue('swift.object/a/c/o/with/slashes' in resp.environ)
self.assertEqual(
resp.environ['swift.object/a/c/o/with/slashes']['status'], 200)
req = Request.blank('/v1/a/c/o')
with patch('swift.proxy.controllers.base.'
'http_connect', fake_http_connect(200)):
Foundational support for PUT and GET of erasure-coded objects This commit makes it possible to PUT an object into Swift and have it stored using erasure coding instead of replication, and also to GET the object back from Swift at a later time. This works by splitting the incoming object into a number of segments, erasure-coding each segment in turn to get fragments, then concatenating the fragments into fragment archives. Segments are 1 MiB in size, except the last, which is between 1 B and 1 MiB. +====================================================================+ | object data | +====================================================================+ | +------------------------+----------------------+ | | | v v v +===================+ +===================+ +==============+ | segment 1 | | segment 2 | ... | segment N | +===================+ +===================+ +==============+ | | | | v v /=========\ /=========\ | pyeclib | | pyeclib | ... \=========/ \=========/ | | | | +--> fragment A-1 +--> fragment A-2 | | | | | | | | | | +--> fragment B-1 +--> fragment B-2 | | | | ... ... Then, object server A gets the concatenation of fragment A-1, A-2, ..., A-N, so its .data file looks like this (called a "fragment archive"): +=====================================================================+ | fragment A-1 | fragment A-2 | ... | fragment A-N | +=====================================================================+ Since this means that the object server never sees the object data as the client sent it, we have to do a few things to ensure data integrity. First, the proxy has to check the Etag if the client provided it; the object server can't do it since the object server doesn't see the raw data. Second, if the client does not provide an Etag, the proxy computes it and uses the MIME-PUT mechanism to provide it to the object servers after the object body. Otherwise, the object would not have an Etag at all. Third, the proxy computes the MD5 of each fragment archive and sends it to the object server using the MIME-PUT mechanism. With replicated objects, the proxy checks that the Etags from all the object servers match, and if they don't, returns a 500 to the client. This mitigates the risk of data corruption in one of the proxy --> object connections, and signals to the client when it happens. With EC objects, we can't use that same mechanism, so we must send the checksum with each fragment archive to get comparable protection. On the GET path, the inverse happens: the proxy connects to a bunch of object servers (M of them, for an M+K scheme), reads one fragment at a time from each fragment archive, decodes those fragments into a segment, and serves the segment to the client. When an object server dies partway through a GET response, any partially-fetched fragment is discarded, the resumption point is wound back to the nearest fragment boundary, and the GET is retried with the next object server. GET requests for a single byterange work; GET requests for multiple byteranges do not. There are a number of things _not_ included in this commit. Some of them are listed here: * multi-range GET * deferred cleanup of old .data files * durability (daemon to reconstruct missing archives) Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2014-10-22 13:18:34 -07:00
resp = base.GETorHEAD_base(req, 'object', iter(nodes), 'part',
'/a/c/o')
self.assertTrue('swift.object/a/c/o' in resp.environ)
self.assertEqual(resp.environ['swift.object/a/c/o']['status'], 200)
req = Request.blank('/v1/a/c')
with patch('swift.proxy.controllers.base.'
'http_connect', fake_http_connect(200)):
Foundational support for PUT and GET of erasure-coded objects This commit makes it possible to PUT an object into Swift and have it stored using erasure coding instead of replication, and also to GET the object back from Swift at a later time. This works by splitting the incoming object into a number of segments, erasure-coding each segment in turn to get fragments, then concatenating the fragments into fragment archives. Segments are 1 MiB in size, except the last, which is between 1 B and 1 MiB. +====================================================================+ | object data | +====================================================================+ | +------------------------+----------------------+ | | | v v v +===================+ +===================+ +==============+ | segment 1 | | segment 2 | ... | segment N | +===================+ +===================+ +==============+ | | | | v v /=========\ /=========\ | pyeclib | | pyeclib | ... \=========/ \=========/ | | | | +--> fragment A-1 +--> fragment A-2 | | | | | | | | | | +--> fragment B-1 +--> fragment B-2 | | | | ... ... Then, object server A gets the concatenation of fragment A-1, A-2, ..., A-N, so its .data file looks like this (called a "fragment archive"): +=====================================================================+ | fragment A-1 | fragment A-2 | ... | fragment A-N | +=====================================================================+ Since this means that the object server never sees the object data as the client sent it, we have to do a few things to ensure data integrity. First, the proxy has to check the Etag if the client provided it; the object server can't do it since the object server doesn't see the raw data. Second, if the client does not provide an Etag, the proxy computes it and uses the MIME-PUT mechanism to provide it to the object servers after the object body. Otherwise, the object would not have an Etag at all. Third, the proxy computes the MD5 of each fragment archive and sends it to the object server using the MIME-PUT mechanism. With replicated objects, the proxy checks that the Etags from all the object servers match, and if they don't, returns a 500 to the client. This mitigates the risk of data corruption in one of the proxy --> object connections, and signals to the client when it happens. With EC objects, we can't use that same mechanism, so we must send the checksum with each fragment archive to get comparable protection. On the GET path, the inverse happens: the proxy connects to a bunch of object servers (M of them, for an M+K scheme), reads one fragment at a time from each fragment archive, decodes those fragments into a segment, and serves the segment to the client. When an object server dies partway through a GET response, any partially-fetched fragment is discarded, the resumption point is wound back to the nearest fragment boundary, and the GET is retried with the next object server. GET requests for a single byterange work; GET requests for multiple byteranges do not. There are a number of things _not_ included in this commit. Some of them are listed here: * multi-range GET * deferred cleanup of old .data files * durability (daemon to reconstruct missing archives) Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2014-10-22 13:18:34 -07:00
resp = base.GETorHEAD_base(req, 'container', iter(nodes), 'part',
'/a/c')
self.assertTrue('swift.container/a/c' in resp.environ)
self.assertEqual(resp.environ['swift.container/a/c']['status'], 200)
req = Request.blank('/v1/a')
with patch('swift.proxy.controllers.base.'
'http_connect', fake_http_connect(200)):
Foundational support for PUT and GET of erasure-coded objects This commit makes it possible to PUT an object into Swift and have it stored using erasure coding instead of replication, and also to GET the object back from Swift at a later time. This works by splitting the incoming object into a number of segments, erasure-coding each segment in turn to get fragments, then concatenating the fragments into fragment archives. Segments are 1 MiB in size, except the last, which is between 1 B and 1 MiB. +====================================================================+ | object data | +====================================================================+ | +------------------------+----------------------+ | | | v v v +===================+ +===================+ +==============+ | segment 1 | | segment 2 | ... | segment N | +===================+ +===================+ +==============+ | | | | v v /=========\ /=========\ | pyeclib | | pyeclib | ... \=========/ \=========/ | | | | +--> fragment A-1 +--> fragment A-2 | | | | | | | | | | +--> fragment B-1 +--> fragment B-2 | | | | ... ... Then, object server A gets the concatenation of fragment A-1, A-2, ..., A-N, so its .data file looks like this (called a "fragment archive"): +=====================================================================+ | fragment A-1 | fragment A-2 | ... | fragment A-N | +=====================================================================+ Since this means that the object server never sees the object data as the client sent it, we have to do a few things to ensure data integrity. First, the proxy has to check the Etag if the client provided it; the object server can't do it since the object server doesn't see the raw data. Second, if the client does not provide an Etag, the proxy computes it and uses the MIME-PUT mechanism to provide it to the object servers after the object body. Otherwise, the object would not have an Etag at all. Third, the proxy computes the MD5 of each fragment archive and sends it to the object server using the MIME-PUT mechanism. With replicated objects, the proxy checks that the Etags from all the object servers match, and if they don't, returns a 500 to the client. This mitigates the risk of data corruption in one of the proxy --> object connections, and signals to the client when it happens. With EC objects, we can't use that same mechanism, so we must send the checksum with each fragment archive to get comparable protection. On the GET path, the inverse happens: the proxy connects to a bunch of object servers (M of them, for an M+K scheme), reads one fragment at a time from each fragment archive, decodes those fragments into a segment, and serves the segment to the client. When an object server dies partway through a GET response, any partially-fetched fragment is discarded, the resumption point is wound back to the nearest fragment boundary, and the GET is retried with the next object server. GET requests for a single byterange work; GET requests for multiple byteranges do not. There are a number of things _not_ included in this commit. Some of them are listed here: * multi-range GET * deferred cleanup of old .data files * durability (daemon to reconstruct missing archives) Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2014-10-22 13:18:34 -07:00
resp = base.GETorHEAD_base(req, 'account', iter(nodes), 'part',
'/a')
self.assertTrue('swift.account/a' in resp.environ)
self.assertEqual(resp.environ['swift.account/a']['status'], 200)
# Run the above tests again, but this time with concurrent_reads
# turned on
policy = next(iter(POLICIES))
concurrent_get_threads = policy.object_ring.replica_count
for concurrency_timeout in (0, 2):
self.app.concurrency_timeout = concurrency_timeout
req = Request.blank('/v1/a/c/o/with/slashes')
# NOTE: We are using slow_connect of fake_http_connect as using
# a concurrency of 0 when mocking the connection is a little too
# fast for eventlet. Network i/o will make this fine, but mocking
# it seems is too instantaneous.
with patch('swift.proxy.controllers.base.http_connect',
fake_http_connect(200, slow_connect=True)):
resp = base.GETorHEAD_base(
req, 'object', iter(nodes), 'part', '/a/c/o/with/slashes',
concurrency=concurrent_get_threads)
self.assertTrue('swift.object/a/c/o/with/slashes' in resp.environ)
self.assertEqual(
resp.environ['swift.object/a/c/o/with/slashes']['status'], 200)
req = Request.blank('/v1/a/c/o')
with patch('swift.proxy.controllers.base.http_connect',
fake_http_connect(200, slow_connect=True)):
resp = base.GETorHEAD_base(
req, 'object', iter(nodes), 'part', '/a/c/o',
concurrency=concurrent_get_threads)
self.assertTrue('swift.object/a/c/o' in resp.environ)
self.assertEqual(resp.environ['swift.object/a/c/o']['status'], 200)
req = Request.blank('/v1/a/c')
with patch('swift.proxy.controllers.base.http_connect',
fake_http_connect(200, slow_connect=True)):
resp = base.GETorHEAD_base(
req, 'container', iter(nodes), 'part', '/a/c',
concurrency=concurrent_get_threads)
self.assertTrue('swift.container/a/c' in resp.environ)
self.assertEqual(resp.environ['swift.container/a/c']['status'],
200)
req = Request.blank('/v1/a')
with patch('swift.proxy.controllers.base.http_connect',
fake_http_connect(200, slow_connect=True)):
resp = base.GETorHEAD_base(
req, 'account', iter(nodes), 'part', '/a',
concurrency=concurrent_get_threads)
self.assertTrue('swift.account/a' in resp.environ)
self.assertEqual(resp.environ['swift.account/a']['status'], 200)
def test_get_info(self):
app = FakeApp()
# Do a non cached call to account
env = {}
info_a = get_info(app, env, 'a')
# Check that you got proper info
self.assertEqual(info_a['status'], 200)
self.assertEqual(info_a['bytes'], 6666)
self.assertEqual(info_a['total_object_count'], 1000)
# Make sure the env cache is set
self.assertEqual(env.get('swift.account/a'), info_a)
# Make sure the app was called
self.assertEqual(app.responses.stats['account'], 1)
# Do an env cached call to account
info_a = get_info(app, env, 'a')
# Check that you got proper info
self.assertEqual(info_a['status'], 200)
self.assertEqual(info_a['bytes'], 6666)
self.assertEqual(info_a['total_object_count'], 1000)
# Make sure the env cache is set
self.assertEqual(env.get('swift.account/a'), info_a)
# Make sure the app was NOT called AGAIN
self.assertEqual(app.responses.stats['account'], 1)
# This time do env cached call to account and non cached to container
info_c = get_info(app, env, 'a', 'c')
# Check that you got proper info
self.assertEqual(info_c['status'], 200)
self.assertEqual(info_c['bytes'], 6666)
self.assertEqual(info_c['object_count'], 1000)
# Make sure the env cache is set
self.assertEqual(env.get('swift.account/a'), info_a)
self.assertEqual(env.get('swift.container/a/c'), info_c)
# Make sure the app was called for container
self.assertEqual(app.responses.stats['container'], 1)
# This time do a non cached call to account than non cached to
# container
app = FakeApp()
env = {} # abandon previous call to env
info_c = get_info(app, env, 'a', 'c')
# Check that you got proper info
self.assertEqual(info_c['status'], 200)
self.assertEqual(info_c['bytes'], 6666)
self.assertEqual(info_c['object_count'], 1000)
# Make sure the env cache is set
self.assertEqual(env.get('swift.account/a'), info_a)
self.assertEqual(env.get('swift.container/a/c'), info_c)
# check app calls both account and container
self.assertEqual(app.responses.stats['account'], 1)
self.assertEqual(app.responses.stats['container'], 1)
# This time do an env cached call to container while account is not
# cached
del(env['swift.account/a'])
info_c = get_info(app, env, 'a', 'c')
# Check that you got proper info
self.assertEqual(info_a['status'], 200)
self.assertEqual(info_c['bytes'], 6666)
self.assertEqual(info_c['object_count'], 1000)
# Make sure the env cache is set and account still not cached
self.assertEqual(env.get('swift.container/a/c'), info_c)
# no additional calls were made
self.assertEqual(app.responses.stats['account'], 1)
self.assertEqual(app.responses.stats['container'], 1)
# Do a non cached call to account not found with ret_not_found
app = FakeApp(statuses=(404,))
env = {}
info_a = get_info(app, env, 'a', ret_not_found=True)
# Check that you got proper info
self.assertEqual(info_a['status'], 404)
self.assertEqual(info_a['bytes'], None)
self.assertEqual(info_a['total_object_count'], None)
# Make sure the env cache is set
self.assertEqual(env.get('swift.account/a'), info_a)
# and account was called
self.assertEqual(app.responses.stats['account'], 1)
# Do a cached call to account not found with ret_not_found
info_a = get_info(app, env, 'a', ret_not_found=True)
# Check that you got proper info
self.assertEqual(info_a['status'], 404)
self.assertEqual(info_a['bytes'], None)
self.assertEqual(info_a['total_object_count'], None)
# Make sure the env cache is set
self.assertEqual(env.get('swift.account/a'), info_a)
# add account was NOT called AGAIN
self.assertEqual(app.responses.stats['account'], 1)
# Do a non cached call to account not found without ret_not_found
app = FakeApp(statuses=(404,))
env = {}
info_a = get_info(app, env, 'a')
# Check that you got proper info
self.assertEqual(info_a, None)
self.assertEqual(env['swift.account/a']['status'], 404)
# and account was called
self.assertEqual(app.responses.stats['account'], 1)
# Do a cached call to account not found without ret_not_found
info_a = get_info(None, env, 'a')
# Check that you got proper info
self.assertEqual(info_a, None)
self.assertEqual(env['swift.account/a']['status'], 404)
# add account was NOT called AGAIN
self.assertEqual(app.responses.stats['account'], 1)
def test_get_container_info_swift_source(self):
app = FakeApp()
req = Request.blank("/v1/a/c", environ={'swift.cache': FakeCache()})
get_container_info(req.environ, app, swift_source='MC')
self.assertEqual(app.sources, ['GET_INFO', 'MC'])
def test_get_object_info_swift_source(self):
app = FakeApp()
req = Request.blank("/v1/a/c/o",
environ={'swift.cache': FakeCache()})
get_object_info(req.environ, app, swift_source='LU')
self.assertEqual(app.sources, ['LU'])
def test_get_container_info_no_cache(self):
req = Request.blank("/v1/AUTH_account/cont",
environ={'swift.cache': FakeCache({})})
resp = get_container_info(req.environ, FakeApp())
self.assertEqual(resp['storage_policy'], '0')
self.assertEqual(resp['bytes'], 6666)
self.assertEqual(resp['object_count'], 1000)
def test_get_container_info_no_account(self):
responses = DynamicResponseFactory(404, 200)
app = FakeApp(responses)
req = Request.blank("/v1/AUTH_does_not_exist/cont")
info = get_container_info(req.environ, app)
self.assertEqual(info['status'], 0)
def test_get_container_info_no_auto_account(self):
responses = DynamicResponseFactory(404, 200)
app = FakeApp(responses)
req = Request.blank("/v1/.system_account/cont")
info = get_container_info(req.environ, app)
self.assertEqual(info['status'], 200)
self.assertEqual(info['bytes'], 6666)
self.assertEqual(info['object_count'], 1000)
def test_get_container_info_cache(self):
cache_stub = {
'status': 404, 'bytes': 3333, 'object_count': 10,
'versions': u"\u1F4A9"}
req = Request.blank("/v1/account/cont",
environ={'swift.cache': FakeCache(cache_stub)})
resp = get_container_info(req.environ, FakeApp())
self.assertEqual(resp['storage_policy'], '0')
self.assertEqual(resp['bytes'], 3333)
self.assertEqual(resp['object_count'], 10)
self.assertEqual(resp['status'], 404)
self.assertEqual(resp['versions'], "\xe1\xbd\x8a\x39")
def test_get_container_info_env(self):
cache_key = get_container_memcache_key("account", "cont")
env_key = 'swift.%s' % cache_key
req = Request.blank("/v1/account/cont",
environ={env_key: {'bytes': 3867},
'swift.cache': FakeCache({})})
resp = get_container_info(req.environ, 'xxx')
self.assertEqual(resp['bytes'], 3867)
def test_get_account_info_swift_source(self):
app = FakeApp()
req = Request.blank("/v1/a", environ={'swift.cache': FakeCache()})
get_account_info(req.environ, app, swift_source='MC')
self.assertEqual(app.sources, ['MC'])
def test_get_account_info_no_cache(self):
app = FakeApp()
req = Request.blank("/v1/AUTH_account",
environ={'swift.cache': FakeCache({})})
resp = get_account_info(req.environ, app)
self.assertEqual(resp['bytes'], 6666)
self.assertEqual(resp['total_object_count'], 1000)
def test_get_account_info_cache(self):
# The original test that we prefer to preserve
cached = {'status': 404,
'bytes': 3333,
'total_object_count': 10}
req = Request.blank("/v1/account/cont",
environ={'swift.cache': FakeCache(cached)})
resp = get_account_info(req.environ, FakeApp())
self.assertEqual(resp['bytes'], 3333)
self.assertEqual(resp['total_object_count'], 10)
self.assertEqual(resp['status'], 404)
# Here is a more realistic test
cached = {'status': 404,
'bytes': '3333',
'container_count': '234',
'total_object_count': '10',
'meta': {}}
req = Request.blank("/v1/account/cont",
environ={'swift.cache': FakeCache(cached)})
resp = get_account_info(req.environ, FakeApp())
self.assertEqual(resp['status'], 404)
self.assertEqual(resp['bytes'], '3333')
self.assertEqual(resp['container_count'], 234)
self.assertEqual(resp['meta'], {})
self.assertEqual(resp['total_object_count'], '10')
def test_get_account_info_env(self):
cache_key = get_account_memcache_key("account")
env_key = 'swift.%s' % cache_key
req = Request.blank("/v1/account",
environ={env_key: {'bytes': 3867},
'swift.cache': FakeCache({})})
resp = get_account_info(req.environ, 'xxx')
self.assertEqual(resp['bytes'], 3867)
def test_get_object_info_env(self):
cached = {'status': 200,
'length': 3333,
'type': 'application/json',
'meta': {}}
env_key = get_object_env_key("account", "cont", "obj")
req = Request.blank("/v1/account/cont/obj",
environ={env_key: cached,
'swift.cache': FakeCache({})})
resp = get_object_info(req.environ, 'xxx')
self.assertEqual(resp['length'], 3333)
self.assertEqual(resp['type'], 'application/json')
def test_get_object_info_no_env(self):
app = FakeApp()
req = Request.blank("/v1/account/cont/obj",
environ={'swift.cache': FakeCache({})})
resp = get_object_info(req.environ, app)
self.assertEqual(app.responses.stats['account'], 0)
self.assertEqual(app.responses.stats['container'], 0)
self.assertEqual(app.responses.stats['obj'], 1)
self.assertEqual(resp['length'], 5555)
self.assertEqual(resp['type'], 'text/plain')
def test_options(self):
base = Controller(self.app)
base.account_name = 'a'
base.container_name = 'c'
origin = 'http://m.com'
self.app.cors_allow_origin = [origin]
req = Request.blank('/v1/a/c/o',
environ={'swift.cache': FakeCache()},
headers={'Origin': origin,
'Access-Control-Request-Method': 'GET'})
with patch('swift.proxy.controllers.base.'
'http_connect', fake_http_connect(200)):
resp = base.OPTIONS(req)
self.assertEqual(resp.status_int, 200)
def test_options_with_null_allow_origin(self):
base = Controller(self.app)
base.account_name = 'a'
base.container_name = 'c'
def my_container_info(*args):
return {
'cors': {
'allow_origin': '*',
}
}
base.container_info = my_container_info
req = Request.blank('/v1/a/c/o',
environ={'swift.cache': FakeCache()},
headers={'Origin': '*',
'Access-Control-Request-Method': 'GET'})
with patch('swift.proxy.controllers.base.'
'http_connect', fake_http_connect(200)):
resp = base.OPTIONS(req)
self.assertEqual(resp.status_int, 200)
def test_options_unauthorized(self):
base = Controller(self.app)
base.account_name = 'a'
base.container_name = 'c'
self.app.cors_allow_origin = ['http://NOT_IT']
req = Request.blank('/v1/a/c/o',
environ={'swift.cache': FakeCache()},
headers={'Origin': 'http://m.com',
'Access-Control-Request-Method': 'GET'})
with patch('swift.proxy.controllers.base.'
'http_connect', fake_http_connect(200)):
resp = base.OPTIONS(req)
self.assertEqual(resp.status_int, 401)
def test_headers_to_container_info_missing(self):
resp = headers_to_container_info({}, 404)
self.assertEqual(resp['status'], 404)
self.assertEqual(resp['read_acl'], None)
self.assertEqual(resp['write_acl'], None)
def test_headers_to_container_info_meta(self):
headers = {'X-Container-Meta-Whatevs': 14,
'x-container-meta-somethingelse': 0}
resp = headers_to_container_info(headers.items(), 200)
self.assertEqual(len(resp['meta']), 2)
self.assertEqual(resp['meta']['whatevs'], 14)
self.assertEqual(resp['meta']['somethingelse'], 0)
Generic means for persisting system metadata. Middleware or core features may need to store metadata against accounts or containers. This patch adds a generic mechanism for system metadata to be persisted in backend databases, without polluting the user metadata namespace, by using the reserved header namespace x-<server_type>-sysmeta-*. Modifications are firstly that backend servers persist system metadata headers alongside user metadata and other system state. For accounts and containers, system metadata in PUT and POST requests is treated in a similar way to user metadata. System metadata is not yet supported for object requests. Secondly, changes in the proxy controllers ensure that headers in the system metadata namespace will pass through in requests to backend servers. Thirdly, system metadata returned from backend servers in GET or HEAD responses is added to the cached info dict, which middleware can access. Finally, a gatekeeper middleware module is provided which filters all system metadata headers from requests and responses by removing headers with names starting x-account-sysmeta-, x-container-sysmeta-. The gatekeeper also removes headers starting x-object-sysmeta- in anticipation of future support for system metadata being set for objects. This prevents clients from writing or reading system metadata. The required_filters list in swift/proxy/server.py is modified to include the gatekeeper middleware so that if the gatekeeper has not been configured in the pipeline then it will be automatically inserted close to the start of the pipeline. blueprint cluster-federation Change-Id: I80b8b14243cc59505f8c584920f8f527646b5f45
2013-12-03 22:02:39 +00:00
def test_headers_to_container_info_sys_meta(self):
prefix = get_sys_meta_prefix('container')
headers = {'%sWhatevs' % prefix: 14,
'%ssomethingelse' % prefix: 0}
resp = headers_to_container_info(headers.items(), 200)
self.assertEqual(len(resp['sysmeta']), 2)
self.assertEqual(resp['sysmeta']['whatevs'], 14)
self.assertEqual(resp['sysmeta']['somethingelse'], 0)
Generic means for persisting system metadata. Middleware or core features may need to store metadata against accounts or containers. This patch adds a generic mechanism for system metadata to be persisted in backend databases, without polluting the user metadata namespace, by using the reserved header namespace x-<server_type>-sysmeta-*. Modifications are firstly that backend servers persist system metadata headers alongside user metadata and other system state. For accounts and containers, system metadata in PUT and POST requests is treated in a similar way to user metadata. System metadata is not yet supported for object requests. Secondly, changes in the proxy controllers ensure that headers in the system metadata namespace will pass through in requests to backend servers. Thirdly, system metadata returned from backend servers in GET or HEAD responses is added to the cached info dict, which middleware can access. Finally, a gatekeeper middleware module is provided which filters all system metadata headers from requests and responses by removing headers with names starting x-account-sysmeta-, x-container-sysmeta-. The gatekeeper also removes headers starting x-object-sysmeta- in anticipation of future support for system metadata being set for objects. This prevents clients from writing or reading system metadata. The required_filters list in swift/proxy/server.py is modified to include the gatekeeper middleware so that if the gatekeeper has not been configured in the pipeline then it will be automatically inserted close to the start of the pipeline. blueprint cluster-federation Change-Id: I80b8b14243cc59505f8c584920f8f527646b5f45
2013-12-03 22:02:39 +00:00
def test_headers_to_container_info_values(self):
headers = {
'x-container-read': 'readvalue',
'x-container-write': 'writevalue',
'x-container-sync-key': 'keyvalue',
'x-container-meta-access-control-allow-origin': 'here',
}
resp = headers_to_container_info(headers.items(), 200)
self.assertEqual(resp['read_acl'], 'readvalue')
self.assertEqual(resp['write_acl'], 'writevalue')
self.assertEqual(resp['cors']['allow_origin'], 'here')
headers['x-unused-header'] = 'blahblahblah'
self.assertEqual(
resp,
headers_to_container_info(headers.items(), 200))
def test_container_info_without_req(self):
base = Controller(self.app)
base.account_name = 'a'
base.container_name = 'c'
container_info = \
base.container_info(base.account_name,
base.container_name)
self.assertEqual(container_info['status'], 0)
def test_headers_to_account_info_missing(self):
resp = headers_to_account_info({}, 404)
self.assertEqual(resp['status'], 404)
self.assertEqual(resp['bytes'], None)
self.assertEqual(resp['container_count'], None)
def test_headers_to_account_info_meta(self):
headers = {'X-Account-Meta-Whatevs': 14,
'x-account-meta-somethingelse': 0}
resp = headers_to_account_info(headers.items(), 200)
self.assertEqual(len(resp['meta']), 2)
self.assertEqual(resp['meta']['whatevs'], 14)
self.assertEqual(resp['meta']['somethingelse'], 0)
Generic means for persisting system metadata. Middleware or core features may need to store metadata against accounts or containers. This patch adds a generic mechanism for system metadata to be persisted in backend databases, without polluting the user metadata namespace, by using the reserved header namespace x-<server_type>-sysmeta-*. Modifications are firstly that backend servers persist system metadata headers alongside user metadata and other system state. For accounts and containers, system metadata in PUT and POST requests is treated in a similar way to user metadata. System metadata is not yet supported for object requests. Secondly, changes in the proxy controllers ensure that headers in the system metadata namespace will pass through in requests to backend servers. Thirdly, system metadata returned from backend servers in GET or HEAD responses is added to the cached info dict, which middleware can access. Finally, a gatekeeper middleware module is provided which filters all system metadata headers from requests and responses by removing headers with names starting x-account-sysmeta-, x-container-sysmeta-. The gatekeeper also removes headers starting x-object-sysmeta- in anticipation of future support for system metadata being set for objects. This prevents clients from writing or reading system metadata. The required_filters list in swift/proxy/server.py is modified to include the gatekeeper middleware so that if the gatekeeper has not been configured in the pipeline then it will be automatically inserted close to the start of the pipeline. blueprint cluster-federation Change-Id: I80b8b14243cc59505f8c584920f8f527646b5f45
2013-12-03 22:02:39 +00:00
def test_headers_to_account_info_sys_meta(self):
prefix = get_sys_meta_prefix('account')
headers = {'%sWhatevs' % prefix: 14,
'%ssomethingelse' % prefix: 0}
resp = headers_to_account_info(headers.items(), 200)
self.assertEqual(len(resp['sysmeta']), 2)
self.assertEqual(resp['sysmeta']['whatevs'], 14)
self.assertEqual(resp['sysmeta']['somethingelse'], 0)
Generic means for persisting system metadata. Middleware or core features may need to store metadata against accounts or containers. This patch adds a generic mechanism for system metadata to be persisted in backend databases, without polluting the user metadata namespace, by using the reserved header namespace x-<server_type>-sysmeta-*. Modifications are firstly that backend servers persist system metadata headers alongside user metadata and other system state. For accounts and containers, system metadata in PUT and POST requests is treated in a similar way to user metadata. System metadata is not yet supported for object requests. Secondly, changes in the proxy controllers ensure that headers in the system metadata namespace will pass through in requests to backend servers. Thirdly, system metadata returned from backend servers in GET or HEAD responses is added to the cached info dict, which middleware can access. Finally, a gatekeeper middleware module is provided which filters all system metadata headers from requests and responses by removing headers with names starting x-account-sysmeta-, x-container-sysmeta-. The gatekeeper also removes headers starting x-object-sysmeta- in anticipation of future support for system metadata being set for objects. This prevents clients from writing or reading system metadata. The required_filters list in swift/proxy/server.py is modified to include the gatekeeper middleware so that if the gatekeeper has not been configured in the pipeline then it will be automatically inserted close to the start of the pipeline. blueprint cluster-federation Change-Id: I80b8b14243cc59505f8c584920f8f527646b5f45
2013-12-03 22:02:39 +00:00
def test_headers_to_account_info_values(self):
headers = {
'x-account-object-count': '10',
'x-account-container-count': '20',
}
resp = headers_to_account_info(headers.items(), 200)
self.assertEqual(resp['total_object_count'], '10')
self.assertEqual(resp['container_count'], '20')
headers['x-unused-header'] = 'blahblahblah'
self.assertEqual(
resp,
headers_to_account_info(headers.items(), 200))
def test_headers_to_object_info_missing(self):
resp = headers_to_object_info({}, 404)
self.assertEqual(resp['status'], 404)
self.assertEqual(resp['length'], None)
self.assertEqual(resp['etag'], None)
def test_headers_to_object_info_meta(self):
headers = {'X-Object-Meta-Whatevs': 14,
'x-object-meta-somethingelse': 0}
resp = headers_to_object_info(headers.items(), 200)
self.assertEqual(len(resp['meta']), 2)
self.assertEqual(resp['meta']['whatevs'], 14)
self.assertEqual(resp['meta']['somethingelse'], 0)
def test_headers_to_object_info_sys_meta(self):
prefix = get_sys_meta_prefix('object')
headers = {'%sWhatevs' % prefix: 14,
'%ssomethingelse' % prefix: 0}
resp = headers_to_object_info(headers.items(), 200)
self.assertEqual(len(resp['sysmeta']), 2)
self.assertEqual(resp['sysmeta']['whatevs'], 14)
self.assertEqual(resp['sysmeta']['somethingelse'], 0)
def test_headers_to_object_info_values(self):
headers = {
'content-length': '1024',
'content-type': 'application/json',
}
resp = headers_to_object_info(headers.items(), 200)
self.assertEqual(resp['length'], '1024')
self.assertEqual(resp['type'], 'application/json')
headers['x-unused-header'] = 'blahblahblah'
self.assertEqual(
resp,
headers_to_object_info(headers.items(), 200))
Foundational support for PUT and GET of erasure-coded objects This commit makes it possible to PUT an object into Swift and have it stored using erasure coding instead of replication, and also to GET the object back from Swift at a later time. This works by splitting the incoming object into a number of segments, erasure-coding each segment in turn to get fragments, then concatenating the fragments into fragment archives. Segments are 1 MiB in size, except the last, which is between 1 B and 1 MiB. +====================================================================+ | object data | +====================================================================+ | +------------------------+----------------------+ | | | v v v +===================+ +===================+ +==============+ | segment 1 | | segment 2 | ... | segment N | +===================+ +===================+ +==============+ | | | | v v /=========\ /=========\ | pyeclib | | pyeclib | ... \=========/ \=========/ | | | | +--> fragment A-1 +--> fragment A-2 | | | | | | | | | | +--> fragment B-1 +--> fragment B-2 | | | | ... ... Then, object server A gets the concatenation of fragment A-1, A-2, ..., A-N, so its .data file looks like this (called a "fragment archive"): +=====================================================================+ | fragment A-1 | fragment A-2 | ... | fragment A-N | +=====================================================================+ Since this means that the object server never sees the object data as the client sent it, we have to do a few things to ensure data integrity. First, the proxy has to check the Etag if the client provided it; the object server can't do it since the object server doesn't see the raw data. Second, if the client does not provide an Etag, the proxy computes it and uses the MIME-PUT mechanism to provide it to the object servers after the object body. Otherwise, the object would not have an Etag at all. Third, the proxy computes the MD5 of each fragment archive and sends it to the object server using the MIME-PUT mechanism. With replicated objects, the proxy checks that the Etags from all the object servers match, and if they don't, returns a 500 to the client. This mitigates the risk of data corruption in one of the proxy --> object connections, and signals to the client when it happens. With EC objects, we can't use that same mechanism, so we must send the checksum with each fragment archive to get comparable protection. On the GET path, the inverse happens: the proxy connects to a bunch of object servers (M of them, for an M+K scheme), reads one fragment at a time from each fragment archive, decodes those fragments into a segment, and serves the segment to the client. When an object server dies partway through a GET response, any partially-fetched fragment is discarded, the resumption point is wound back to the nearest fragment boundary, and the GET is retried with the next object server. GET requests for a single byterange work; GET requests for multiple byteranges do not. There are a number of things _not_ included in this commit. Some of them are listed here: * multi-range GET * deferred cleanup of old .data files * durability (daemon to reconstruct missing archives) Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2014-10-22 13:18:34 -07:00
def test_base_have_quorum(self):
base = Controller(self.app)
# just throw a bunch of test cases at it
self.assertEqual(base.have_quorum([201, 404], 3), False)
self.assertEqual(base.have_quorum([201, 201], 4), False)
self.assertEqual(base.have_quorum([201, 201, 404, 404], 4), False)
self.assertEqual(base.have_quorum([201, 503, 503, 201], 4), False)
self.assertEqual(base.have_quorum([201, 201], 3), True)
self.assertEqual(base.have_quorum([404, 404], 3), True)
self.assertEqual(base.have_quorum([201, 201], 2), True)
self.assertEqual(base.have_quorum([404, 404], 2), True)
self.assertEqual(base.have_quorum([201, 404, 201, 201], 4), True)
def test_best_response_overrides(self):
base = Controller(self.app)
responses = [
(302, 'Found', '', 'The resource has moved temporarily.'),
(100, 'Continue', '', ''),
(404, 'Not Found', '', 'Custom body'),
]
server_type = "Base DELETE"
req = Request.blank('/v1/a/c/o', method='DELETE')
statuses, reasons, headers, bodies = zip(*responses)
# First test that you can't make a quorum with only overridden
# responses
overrides = {302: 204, 100: 204}
resp = base.best_response(req, statuses, reasons, bodies, server_type,
headers=headers, overrides=overrides)
self.assertEqual(resp.status, '503 Service Unavailable')
# next make a 404 quorum and make sure the last delete (real) 404
# status is the one returned.
overrides = {100: 404}
resp = base.best_response(req, statuses, reasons, bodies, server_type,
headers=headers, overrides=overrides)
self.assertEqual(resp.status, '404 Not Found')
self.assertEqual(resp.body, 'Custom body')
def test_range_fast_forward(self):
req = Request.blank('/')
handler = GetOrHeadHandler(None, req, None, None, None, None, {})
handler.fast_forward(50)
self.assertEqual(handler.backend_headers['Range'], 'bytes=50-')
handler = GetOrHeadHandler(None, req, None, None, None, None,
{'Range': 'bytes=23-50'})
handler.fast_forward(20)
self.assertEqual(handler.backend_headers['Range'], 'bytes=43-50')
self.assertRaises(HTTPException,
handler.fast_forward, 80)
handler = GetOrHeadHandler(None, req, None, None, None, None,
{'Range': 'bytes=23-'})
handler.fast_forward(20)
self.assertEqual(handler.backend_headers['Range'], 'bytes=43-')
handler = GetOrHeadHandler(None, req, None, None, None, None,
{'Range': 'bytes=-100'})
handler.fast_forward(20)
self.assertEqual(handler.backend_headers['Range'], 'bytes=-80')
Generic means for persisting system metadata. Middleware or core features may need to store metadata against accounts or containers. This patch adds a generic mechanism for system metadata to be persisted in backend databases, without polluting the user metadata namespace, by using the reserved header namespace x-<server_type>-sysmeta-*. Modifications are firstly that backend servers persist system metadata headers alongside user metadata and other system state. For accounts and containers, system metadata in PUT and POST requests is treated in a similar way to user metadata. System metadata is not yet supported for object requests. Secondly, changes in the proxy controllers ensure that headers in the system metadata namespace will pass through in requests to backend servers. Thirdly, system metadata returned from backend servers in GET or HEAD responses is added to the cached info dict, which middleware can access. Finally, a gatekeeper middleware module is provided which filters all system metadata headers from requests and responses by removing headers with names starting x-account-sysmeta-, x-container-sysmeta-. The gatekeeper also removes headers starting x-object-sysmeta- in anticipation of future support for system metadata being set for objects. This prevents clients from writing or reading system metadata. The required_filters list in swift/proxy/server.py is modified to include the gatekeeper middleware so that if the gatekeeper has not been configured in the pipeline then it will be automatically inserted close to the start of the pipeline. blueprint cluster-federation Change-Id: I80b8b14243cc59505f8c584920f8f527646b5f45
2013-12-03 22:02:39 +00:00
def test_transfer_headers_with_sysmeta(self):
base = Controller(self.app)
good_hdrs = {'x-base-sysmeta-foo': 'ok',
'X-Base-sysmeta-Bar': 'also ok'}
bad_hdrs = {'x-base-sysmeta-': 'too short'}
hdrs = dict(good_hdrs)
hdrs.update(bad_hdrs)
dst_hdrs = HeaderKeyDict()
base.transfer_headers(hdrs, dst_hdrs)
self.assertEqual(HeaderKeyDict(good_hdrs), dst_hdrs)
def test_generate_request_headers(self):
base = Controller(self.app)
src_headers = {'x-remove-base-meta-owner': 'x',
'x-base-meta-size': '151M',
'new-owner': 'Kun'}
req = Request.blank('/v1/a/c/o', headers=src_headers)
dst_headers = base.generate_request_headers(req, transfer=True)
expected_headers = {'x-base-meta-owner': '',
'x-base-meta-size': '151M',
'connection': 'close'}
for k, v in expected_headers.items():
Generic means for persisting system metadata. Middleware or core features may need to store metadata against accounts or containers. This patch adds a generic mechanism for system metadata to be persisted in backend databases, without polluting the user metadata namespace, by using the reserved header namespace x-<server_type>-sysmeta-*. Modifications are firstly that backend servers persist system metadata headers alongside user metadata and other system state. For accounts and containers, system metadata in PUT and POST requests is treated in a similar way to user metadata. System metadata is not yet supported for object requests. Secondly, changes in the proxy controllers ensure that headers in the system metadata namespace will pass through in requests to backend servers. Thirdly, system metadata returned from backend servers in GET or HEAD responses is added to the cached info dict, which middleware can access. Finally, a gatekeeper middleware module is provided which filters all system metadata headers from requests and responses by removing headers with names starting x-account-sysmeta-, x-container-sysmeta-. The gatekeeper also removes headers starting x-object-sysmeta- in anticipation of future support for system metadata being set for objects. This prevents clients from writing or reading system metadata. The required_filters list in swift/proxy/server.py is modified to include the gatekeeper middleware so that if the gatekeeper has not been configured in the pipeline then it will be automatically inserted close to the start of the pipeline. blueprint cluster-federation Change-Id: I80b8b14243cc59505f8c584920f8f527646b5f45
2013-12-03 22:02:39 +00:00
self.assertTrue(k in dst_headers)
self.assertEqual(v, dst_headers[k])
self.assertFalse('new-owner' in dst_headers)
def test_generate_request_headers_with_sysmeta(self):
base = Controller(self.app)
good_hdrs = {'x-base-sysmeta-foo': 'ok',
'X-Base-sysmeta-Bar': 'also ok'}
bad_hdrs = {'x-base-sysmeta-': 'too short'}
hdrs = dict(good_hdrs)
hdrs.update(bad_hdrs)
req = Request.blank('/v1/a/c/o', headers=hdrs)
dst_headers = base.generate_request_headers(req, transfer=True)
for k, v in good_hdrs.items():
Generic means for persisting system metadata. Middleware or core features may need to store metadata against accounts or containers. This patch adds a generic mechanism for system metadata to be persisted in backend databases, without polluting the user metadata namespace, by using the reserved header namespace x-<server_type>-sysmeta-*. Modifications are firstly that backend servers persist system metadata headers alongside user metadata and other system state. For accounts and containers, system metadata in PUT and POST requests is treated in a similar way to user metadata. System metadata is not yet supported for object requests. Secondly, changes in the proxy controllers ensure that headers in the system metadata namespace will pass through in requests to backend servers. Thirdly, system metadata returned from backend servers in GET or HEAD responses is added to the cached info dict, which middleware can access. Finally, a gatekeeper middleware module is provided which filters all system metadata headers from requests and responses by removing headers with names starting x-account-sysmeta-, x-container-sysmeta-. The gatekeeper also removes headers starting x-object-sysmeta- in anticipation of future support for system metadata being set for objects. This prevents clients from writing or reading system metadata. The required_filters list in swift/proxy/server.py is modified to include the gatekeeper middleware so that if the gatekeeper has not been configured in the pipeline then it will be automatically inserted close to the start of the pipeline. blueprint cluster-federation Change-Id: I80b8b14243cc59505f8c584920f8f527646b5f45
2013-12-03 22:02:39 +00:00
self.assertTrue(k.lower() in dst_headers)
self.assertEqual(v, dst_headers[k.lower()])
for k, v in bad_hdrs.items():
Generic means for persisting system metadata. Middleware or core features may need to store metadata against accounts or containers. This patch adds a generic mechanism for system metadata to be persisted in backend databases, without polluting the user metadata namespace, by using the reserved header namespace x-<server_type>-sysmeta-*. Modifications are firstly that backend servers persist system metadata headers alongside user metadata and other system state. For accounts and containers, system metadata in PUT and POST requests is treated in a similar way to user metadata. System metadata is not yet supported for object requests. Secondly, changes in the proxy controllers ensure that headers in the system metadata namespace will pass through in requests to backend servers. Thirdly, system metadata returned from backend servers in GET or HEAD responses is added to the cached info dict, which middleware can access. Finally, a gatekeeper middleware module is provided which filters all system metadata headers from requests and responses by removing headers with names starting x-account-sysmeta-, x-container-sysmeta-. The gatekeeper also removes headers starting x-object-sysmeta- in anticipation of future support for system metadata being set for objects. This prevents clients from writing or reading system metadata. The required_filters list in swift/proxy/server.py is modified to include the gatekeeper middleware so that if the gatekeeper has not been configured in the pipeline then it will be automatically inserted close to the start of the pipeline. blueprint cluster-federation Change-Id: I80b8b14243cc59505f8c584920f8f527646b5f45
2013-12-03 22:02:39 +00:00
self.assertFalse(k.lower() in dst_headers)
Foundational support for PUT and GET of erasure-coded objects This commit makes it possible to PUT an object into Swift and have it stored using erasure coding instead of replication, and also to GET the object back from Swift at a later time. This works by splitting the incoming object into a number of segments, erasure-coding each segment in turn to get fragments, then concatenating the fragments into fragment archives. Segments are 1 MiB in size, except the last, which is between 1 B and 1 MiB. +====================================================================+ | object data | +====================================================================+ | +------------------------+----------------------+ | | | v v v +===================+ +===================+ +==============+ | segment 1 | | segment 2 | ... | segment N | +===================+ +===================+ +==============+ | | | | v v /=========\ /=========\ | pyeclib | | pyeclib | ... \=========/ \=========/ | | | | +--> fragment A-1 +--> fragment A-2 | | | | | | | | | | +--> fragment B-1 +--> fragment B-2 | | | | ... ... Then, object server A gets the concatenation of fragment A-1, A-2, ..., A-N, so its .data file looks like this (called a "fragment archive"): +=====================================================================+ | fragment A-1 | fragment A-2 | ... | fragment A-N | +=====================================================================+ Since this means that the object server never sees the object data as the client sent it, we have to do a few things to ensure data integrity. First, the proxy has to check the Etag if the client provided it; the object server can't do it since the object server doesn't see the raw data. Second, if the client does not provide an Etag, the proxy computes it and uses the MIME-PUT mechanism to provide it to the object servers after the object body. Otherwise, the object would not have an Etag at all. Third, the proxy computes the MD5 of each fragment archive and sends it to the object server using the MIME-PUT mechanism. With replicated objects, the proxy checks that the Etags from all the object servers match, and if they don't, returns a 500 to the client. This mitigates the risk of data corruption in one of the proxy --> object connections, and signals to the client when it happens. With EC objects, we can't use that same mechanism, so we must send the checksum with each fragment archive to get comparable protection. On the GET path, the inverse happens: the proxy connects to a bunch of object servers (M of them, for an M+K scheme), reads one fragment at a time from each fragment archive, decodes those fragments into a segment, and serves the segment to the client. When an object server dies partway through a GET response, any partially-fetched fragment is discarded, the resumption point is wound back to the nearest fragment boundary, and the GET is retried with the next object server. GET requests for a single byterange work; GET requests for multiple byteranges do not. There are a number of things _not_ included in this commit. Some of them are listed here: * multi-range GET * deferred cleanup of old .data files * durability (daemon to reconstruct missing archives) Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2014-10-22 13:18:34 -07:00
def test_generate_request_headers_with_no_orig_req(self):
base = Controller(self.app)
src_headers = {'x-remove-base-meta-owner': 'x',
'x-base-meta-size': '151M',
'new-owner': 'Kun'}
dst_headers = base.generate_request_headers(None,
additional=src_headers)
expected_headers = {'x-base-meta-size': '151M',
'connection': 'close'}
for k, v in expected_headers.items():
self.assertIn(k, dst_headers)
self.assertEqual(v, dst_headers[k])
self.assertEqual('', dst_headers['Referer'])
Foundational support for PUT and GET of erasure-coded objects This commit makes it possible to PUT an object into Swift and have it stored using erasure coding instead of replication, and also to GET the object back from Swift at a later time. This works by splitting the incoming object into a number of segments, erasure-coding each segment in turn to get fragments, then concatenating the fragments into fragment archives. Segments are 1 MiB in size, except the last, which is between 1 B and 1 MiB. +====================================================================+ | object data | +====================================================================+ | +------------------------+----------------------+ | | | v v v +===================+ +===================+ +==============+ | segment 1 | | segment 2 | ... | segment N | +===================+ +===================+ +==============+ | | | | v v /=========\ /=========\ | pyeclib | | pyeclib | ... \=========/ \=========/ | | | | +--> fragment A-1 +--> fragment A-2 | | | | | | | | | | +--> fragment B-1 +--> fragment B-2 | | | | ... ... Then, object server A gets the concatenation of fragment A-1, A-2, ..., A-N, so its .data file looks like this (called a "fragment archive"): +=====================================================================+ | fragment A-1 | fragment A-2 | ... | fragment A-N | +=====================================================================+ Since this means that the object server never sees the object data as the client sent it, we have to do a few things to ensure data integrity. First, the proxy has to check the Etag if the client provided it; the object server can't do it since the object server doesn't see the raw data. Second, if the client does not provide an Etag, the proxy computes it and uses the MIME-PUT mechanism to provide it to the object servers after the object body. Otherwise, the object would not have an Etag at all. Third, the proxy computes the MD5 of each fragment archive and sends it to the object server using the MIME-PUT mechanism. With replicated objects, the proxy checks that the Etags from all the object servers match, and if they don't, returns a 500 to the client. This mitigates the risk of data corruption in one of the proxy --> object connections, and signals to the client when it happens. With EC objects, we can't use that same mechanism, so we must send the checksum with each fragment archive to get comparable protection. On the GET path, the inverse happens: the proxy connects to a bunch of object servers (M of them, for an M+K scheme), reads one fragment at a time from each fragment archive, decodes those fragments into a segment, and serves the segment to the client. When an object server dies partway through a GET response, any partially-fetched fragment is discarded, the resumption point is wound back to the nearest fragment boundary, and the GET is retried with the next object server. GET requests for a single byterange work; GET requests for multiple byteranges do not. There are a number of things _not_ included in this commit. Some of them are listed here: * multi-range GET * deferred cleanup of old .data files * durability (daemon to reconstruct missing archives) Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2014-10-22 13:18:34 -07:00
def test_client_chunk_size(self):
class TestSource(object):
def __init__(self, chunks):
self.chunks = list(chunks)
self.status = 200
Foundational support for PUT and GET of erasure-coded objects This commit makes it possible to PUT an object into Swift and have it stored using erasure coding instead of replication, and also to GET the object back from Swift at a later time. This works by splitting the incoming object into a number of segments, erasure-coding each segment in turn to get fragments, then concatenating the fragments into fragment archives. Segments are 1 MiB in size, except the last, which is between 1 B and 1 MiB. +====================================================================+ | object data | +====================================================================+ | +------------------------+----------------------+ | | | v v v +===================+ +===================+ +==============+ | segment 1 | | segment 2 | ... | segment N | +===================+ +===================+ +==============+ | | | | v v /=========\ /=========\ | pyeclib | | pyeclib | ... \=========/ \=========/ | | | | +--> fragment A-1 +--> fragment A-2 | | | | | | | | | | +--> fragment B-1 +--> fragment B-2 | | | | ... ... Then, object server A gets the concatenation of fragment A-1, A-2, ..., A-N, so its .data file looks like this (called a "fragment archive"): +=====================================================================+ | fragment A-1 | fragment A-2 | ... | fragment A-N | +=====================================================================+ Since this means that the object server never sees the object data as the client sent it, we have to do a few things to ensure data integrity. First, the proxy has to check the Etag if the client provided it; the object server can't do it since the object server doesn't see the raw data. Second, if the client does not provide an Etag, the proxy computes it and uses the MIME-PUT mechanism to provide it to the object servers after the object body. Otherwise, the object would not have an Etag at all. Third, the proxy computes the MD5 of each fragment archive and sends it to the object server using the MIME-PUT mechanism. With replicated objects, the proxy checks that the Etags from all the object servers match, and if they don't, returns a 500 to the client. This mitigates the risk of data corruption in one of the proxy --> object connections, and signals to the client when it happens. With EC objects, we can't use that same mechanism, so we must send the checksum with each fragment archive to get comparable protection. On the GET path, the inverse happens: the proxy connects to a bunch of object servers (M of them, for an M+K scheme), reads one fragment at a time from each fragment archive, decodes those fragments into a segment, and serves the segment to the client. When an object server dies partway through a GET response, any partially-fetched fragment is discarded, the resumption point is wound back to the nearest fragment boundary, and the GET is retried with the next object server. GET requests for a single byterange work; GET requests for multiple byteranges do not. There are a number of things _not_ included in this commit. Some of them are listed here: * multi-range GET * deferred cleanup of old .data files * durability (daemon to reconstruct missing archives) Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2014-10-22 13:18:34 -07:00
def read(self, _read_size):
if self.chunks:
return self.chunks.pop(0)
else:
return ''
def getheader(self, header):
if header.lower() == "content-length":
return str(sum(len(c) for c in self.chunks))
def getheaders(self):
return [('content-length', self.getheader('content-length'))]
Foundational support for PUT and GET of erasure-coded objects This commit makes it possible to PUT an object into Swift and have it stored using erasure coding instead of replication, and also to GET the object back from Swift at a later time. This works by splitting the incoming object into a number of segments, erasure-coding each segment in turn to get fragments, then concatenating the fragments into fragment archives. Segments are 1 MiB in size, except the last, which is between 1 B and 1 MiB. +====================================================================+ | object data | +====================================================================+ | +------------------------+----------------------+ | | | v v v +===================+ +===================+ +==============+ | segment 1 | | segment 2 | ... | segment N | +===================+ +===================+ +==============+ | | | | v v /=========\ /=========\ | pyeclib | | pyeclib | ... \=========/ \=========/ | | | | +--> fragment A-1 +--> fragment A-2 | | | | | | | | | | +--> fragment B-1 +--> fragment B-2 | | | | ... ... Then, object server A gets the concatenation of fragment A-1, A-2, ..., A-N, so its .data file looks like this (called a "fragment archive"): +=====================================================================+ | fragment A-1 | fragment A-2 | ... | fragment A-N | +=====================================================================+ Since this means that the object server never sees the object data as the client sent it, we have to do a few things to ensure data integrity. First, the proxy has to check the Etag if the client provided it; the object server can't do it since the object server doesn't see the raw data. Second, if the client does not provide an Etag, the proxy computes it and uses the MIME-PUT mechanism to provide it to the object servers after the object body. Otherwise, the object would not have an Etag at all. Third, the proxy computes the MD5 of each fragment archive and sends it to the object server using the MIME-PUT mechanism. With replicated objects, the proxy checks that the Etags from all the object servers match, and if they don't, returns a 500 to the client. This mitigates the risk of data corruption in one of the proxy --> object connections, and signals to the client when it happens. With EC objects, we can't use that same mechanism, so we must send the checksum with each fragment archive to get comparable protection. On the GET path, the inverse happens: the proxy connects to a bunch of object servers (M of them, for an M+K scheme), reads one fragment at a time from each fragment archive, decodes those fragments into a segment, and serves the segment to the client. When an object server dies partway through a GET response, any partially-fetched fragment is discarded, the resumption point is wound back to the nearest fragment boundary, and the GET is retried with the next object server. GET requests for a single byterange work; GET requests for multiple byteranges do not. There are a number of things _not_ included in this commit. Some of them are listed here: * multi-range GET * deferred cleanup of old .data files * durability (daemon to reconstruct missing archives) Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2014-10-22 13:18:34 -07:00
source = TestSource((
'abcd', '1234', 'abc', 'd1', '234abcd1234abcd1', '2'))
req = Request.blank('/v1/a/c/o')
node = {}
handler = GetOrHeadHandler(self.app, req, None, None, None, None, {},
client_chunk_size=8)
app_iter = handler._make_app_iter(req, node, source)
client_chunks = list(app_iter)
self.assertEqual(client_chunks, [
'abcd1234', 'abcd1234', 'abcd1234', 'abcd12'])
def test_client_chunk_size_resuming(self):
class TestSource(object):
def __init__(self, chunks):
self.chunks = list(chunks)
self.status = 200
Foundational support for PUT and GET of erasure-coded objects This commit makes it possible to PUT an object into Swift and have it stored using erasure coding instead of replication, and also to GET the object back from Swift at a later time. This works by splitting the incoming object into a number of segments, erasure-coding each segment in turn to get fragments, then concatenating the fragments into fragment archives. Segments are 1 MiB in size, except the last, which is between 1 B and 1 MiB. +====================================================================+ | object data | +====================================================================+ | +------------------------+----------------------+ | | | v v v +===================+ +===================+ +==============+ | segment 1 | | segment 2 | ... | segment N | +===================+ +===================+ +==============+ | | | | v v /=========\ /=========\ | pyeclib | | pyeclib | ... \=========/ \=========/ | | | | +--> fragment A-1 +--> fragment A-2 | | | | | | | | | | +--> fragment B-1 +--> fragment B-2 | | | | ... ... Then, object server A gets the concatenation of fragment A-1, A-2, ..., A-N, so its .data file looks like this (called a "fragment archive"): +=====================================================================+ | fragment A-1 | fragment A-2 | ... | fragment A-N | +=====================================================================+ Since this means that the object server never sees the object data as the client sent it, we have to do a few things to ensure data integrity. First, the proxy has to check the Etag if the client provided it; the object server can't do it since the object server doesn't see the raw data. Second, if the client does not provide an Etag, the proxy computes it and uses the MIME-PUT mechanism to provide it to the object servers after the object body. Otherwise, the object would not have an Etag at all. Third, the proxy computes the MD5 of each fragment archive and sends it to the object server using the MIME-PUT mechanism. With replicated objects, the proxy checks that the Etags from all the object servers match, and if they don't, returns a 500 to the client. This mitigates the risk of data corruption in one of the proxy --> object connections, and signals to the client when it happens. With EC objects, we can't use that same mechanism, so we must send the checksum with each fragment archive to get comparable protection. On the GET path, the inverse happens: the proxy connects to a bunch of object servers (M of them, for an M+K scheme), reads one fragment at a time from each fragment archive, decodes those fragments into a segment, and serves the segment to the client. When an object server dies partway through a GET response, any partially-fetched fragment is discarded, the resumption point is wound back to the nearest fragment boundary, and the GET is retried with the next object server. GET requests for a single byterange work; GET requests for multiple byteranges do not. There are a number of things _not_ included in this commit. Some of them are listed here: * multi-range GET * deferred cleanup of old .data files * durability (daemon to reconstruct missing archives) Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2014-10-22 13:18:34 -07:00
def read(self, _read_size):
if self.chunks:
chunk = self.chunks.pop(0)
if chunk is None:
raise exceptions.ChunkReadTimeout()
else:
return chunk
else:
return ''
def getheader(self, header):
if header.lower() == "content-length":
return str(sum(len(c) for c in self.chunks
if c is not None))
def getheaders(self):
return [('content-length', self.getheader('content-length'))]
Foundational support for PUT and GET of erasure-coded objects This commit makes it possible to PUT an object into Swift and have it stored using erasure coding instead of replication, and also to GET the object back from Swift at a later time. This works by splitting the incoming object into a number of segments, erasure-coding each segment in turn to get fragments, then concatenating the fragments into fragment archives. Segments are 1 MiB in size, except the last, which is between 1 B and 1 MiB. +====================================================================+ | object data | +====================================================================+ | +------------------------+----------------------+ | | | v v v +===================+ +===================+ +==============+ | segment 1 | | segment 2 | ... | segment N | +===================+ +===================+ +==============+ | | | | v v /=========\ /=========\ | pyeclib | | pyeclib | ... \=========/ \=========/ | | | | +--> fragment A-1 +--> fragment A-2 | | | | | | | | | | +--> fragment B-1 +--> fragment B-2 | | | | ... ... Then, object server A gets the concatenation of fragment A-1, A-2, ..., A-N, so its .data file looks like this (called a "fragment archive"): +=====================================================================+ | fragment A-1 | fragment A-2 | ... | fragment A-N | +=====================================================================+ Since this means that the object server never sees the object data as the client sent it, we have to do a few things to ensure data integrity. First, the proxy has to check the Etag if the client provided it; the object server can't do it since the object server doesn't see the raw data. Second, if the client does not provide an Etag, the proxy computes it and uses the MIME-PUT mechanism to provide it to the object servers after the object body. Otherwise, the object would not have an Etag at all. Third, the proxy computes the MD5 of each fragment archive and sends it to the object server using the MIME-PUT mechanism. With replicated objects, the proxy checks that the Etags from all the object servers match, and if they don't, returns a 500 to the client. This mitigates the risk of data corruption in one of the proxy --> object connections, and signals to the client when it happens. With EC objects, we can't use that same mechanism, so we must send the checksum with each fragment archive to get comparable protection. On the GET path, the inverse happens: the proxy connects to a bunch of object servers (M of them, for an M+K scheme), reads one fragment at a time from each fragment archive, decodes those fragments into a segment, and serves the segment to the client. When an object server dies partway through a GET response, any partially-fetched fragment is discarded, the resumption point is wound back to the nearest fragment boundary, and the GET is retried with the next object server. GET requests for a single byterange work; GET requests for multiple byteranges do not. There are a number of things _not_ included in this commit. Some of them are listed here: * multi-range GET * deferred cleanup of old .data files * durability (daemon to reconstruct missing archives) Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2014-10-22 13:18:34 -07:00
node = {'ip': '1.2.3.4', 'port': 6000, 'device': 'sda'}
source1 = TestSource(['abcd', '1234', 'abc', None])
source2 = TestSource(['efgh5678'])
req = Request.blank('/v1/a/c/o')
handler = GetOrHeadHandler(
self.app, req, 'Object', None, None, None, {},
client_chunk_size=8)
app_iter = handler._make_app_iter(req, node, source1)
with patch.object(handler, '_get_source_and_node',
lambda: (source2, node)):
client_chunks = list(app_iter)
self.assertEqual(client_chunks, ['abcd1234', 'efgh5678'])
def test_client_chunk_size_resuming_chunked(self):
class TestChunkedSource(object):
def __init__(self, chunks):
self.chunks = list(chunks)
self.status = 200
self.headers = {'transfer-encoding': 'chunked',
'content-type': 'text/plain'}
def read(self, _read_size):
if self.chunks:
chunk = self.chunks.pop(0)
if chunk is None:
raise exceptions.ChunkReadTimeout()
else:
return chunk
else:
return ''
def getheader(self, header):
return self.headers.get(header.lower())
def getheaders(self):
return self.headers
node = {'ip': '1.2.3.4', 'port': 6000, 'device': 'sda'}
source1 = TestChunkedSource(['abcd', '1234', 'abc', None])
source2 = TestChunkedSource(['efgh5678'])
req = Request.blank('/v1/a/c/o')
handler = GetOrHeadHandler(
self.app, req, 'Object', None, None, None, {},
client_chunk_size=8)
app_iter = handler._make_app_iter(req, node, source1)
with patch.object(handler, '_get_source_and_node',
lambda: (source2, node)):
client_chunks = list(app_iter)
self.assertEqual(client_chunks, ['abcd1234', 'efgh5678'])
Foundational support for PUT and GET of erasure-coded objects This commit makes it possible to PUT an object into Swift and have it stored using erasure coding instead of replication, and also to GET the object back from Swift at a later time. This works by splitting the incoming object into a number of segments, erasure-coding each segment in turn to get fragments, then concatenating the fragments into fragment archives. Segments are 1 MiB in size, except the last, which is between 1 B and 1 MiB. +====================================================================+ | object data | +====================================================================+ | +------------------------+----------------------+ | | | v v v +===================+ +===================+ +==============+ | segment 1 | | segment 2 | ... | segment N | +===================+ +===================+ +==============+ | | | | v v /=========\ /=========\ | pyeclib | | pyeclib | ... \=========/ \=========/ | | | | +--> fragment A-1 +--> fragment A-2 | | | | | | | | | | +--> fragment B-1 +--> fragment B-2 | | | | ... ... Then, object server A gets the concatenation of fragment A-1, A-2, ..., A-N, so its .data file looks like this (called a "fragment archive"): +=====================================================================+ | fragment A-1 | fragment A-2 | ... | fragment A-N | +=====================================================================+ Since this means that the object server never sees the object data as the client sent it, we have to do a few things to ensure data integrity. First, the proxy has to check the Etag if the client provided it; the object server can't do it since the object server doesn't see the raw data. Second, if the client does not provide an Etag, the proxy computes it and uses the MIME-PUT mechanism to provide it to the object servers after the object body. Otherwise, the object would not have an Etag at all. Third, the proxy computes the MD5 of each fragment archive and sends it to the object server using the MIME-PUT mechanism. With replicated objects, the proxy checks that the Etags from all the object servers match, and if they don't, returns a 500 to the client. This mitigates the risk of data corruption in one of the proxy --> object connections, and signals to the client when it happens. With EC objects, we can't use that same mechanism, so we must send the checksum with each fragment archive to get comparable protection. On the GET path, the inverse happens: the proxy connects to a bunch of object servers (M of them, for an M+K scheme), reads one fragment at a time from each fragment archive, decodes those fragments into a segment, and serves the segment to the client. When an object server dies partway through a GET response, any partially-fetched fragment is discarded, the resumption point is wound back to the nearest fragment boundary, and the GET is retried with the next object server. GET requests for a single byterange work; GET requests for multiple byteranges do not. There are a number of things _not_ included in this commit. Some of them are listed here: * multi-range GET * deferred cleanup of old .data files * durability (daemon to reconstruct missing archives) Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2014-10-22 13:18:34 -07:00
def test_bytes_to_skip(self):
# if you start at the beginning, skip nothing
self.assertEqual(bytes_to_skip(1024, 0), 0)
# missed the first 10 bytes, so we've got 1014 bytes of partial
# record
self.assertEqual(bytes_to_skip(1024, 10), 1014)
# skipped some whole records first
self.assertEqual(bytes_to_skip(1024, 4106), 1014)
# landed on a record boundary
self.assertEqual(bytes_to_skip(1024, 1024), 0)
self.assertEqual(bytes_to_skip(1024, 2048), 0)
# big numbers
self.assertEqual(bytes_to_skip(2 ** 20, 2 ** 32), 0)
self.assertEqual(bytes_to_skip(2 ** 20, 2 ** 32 + 1), 2 ** 20 - 1)
self.assertEqual(bytes_to_skip(2 ** 20, 2 ** 32 + 2 ** 19), 2 ** 19)
# odd numbers
self.assertEqual(bytes_to_skip(123, 0), 0)
self.assertEqual(bytes_to_skip(123, 23), 100)
self.assertEqual(bytes_to_skip(123, 247), 122)
# prime numbers
self.assertEqual(bytes_to_skip(11, 7), 4)
self.assertEqual(bytes_to_skip(97, 7873823), 55)