This change adds 2 new parameters to enable and control concurrent GETs
in swift, these are 'concurrent_gets' and 'concurrency_timeout'.
'concurrent_gets' allows you to turn on or off concurrent GETs, when
on it will set the GET/HEAD concurrency to replica count. And in the
case of EC HEADs it will set it to ndata.
The proxy will then serve only the first valid source to respond.
This applies to all account, container and object GETs except
for EC. For EC only HEAD requests are effected.
It achieves this by changing the request sending mechanism to using
GreenAsyncPile and green threads with a time out between each
request.
'concurrency_timeout' is related to concurrent_gets. And is the
amount of time to wait before firing the next thread. A value of 0
will fire at the same time (fully concurrent), setting another value
will stagger the firing allowing you the ability to give a node a
shorter chance to respond before firing the next. This value is a float
and should be somewhere between 0 and node_timeout. The default is
conn_timeout. Meaning by default it will stagger the firing.
DocImpact
Implements: blueprint concurrent-reads
Change-Id: I789d39472ec48b22415ff9d9821b1eefab7da867
There was a function in swift.common.utils that was importing
swob.HeaderKeyDict at call time. It couldn't import it at compilation
time since utils can't import from swob or else it blows up with a
circular import error.
This commit just moves HeaderKeyDict into swift.common.header_key_dict
so that we can remove the inline import.
Change-Id: I656fde8cc2e125327c26c589cf1045cb81ffc7e5
Proxy-server now requires Content-Length in the response header
when getting object and does not support chunked transferring with
"Transfer-Encoding: chunked"
This doesn't matter in normal swift, but prohibits us from putting
any middelwares to execute something like streaming processing of
objects, which can't calculate the length of their response body
before they start to send their response.
Change-Id: I60fc6c86338d734e39b7e5f1e48a2647995045ef
assertDictContainsSubset is being called multiple times with
same arguments in a loop. Since assertDictContainsSubset is
deprecated form python 3.2, replace it with checks on
individual key, value pairs.
Change-Id: I7089487710147021f26bd77c36accf5751855d68
a1c32702, 736cf54a, and 38787d0f remove uses of `simplejson` from
various parts of Swift in favor of the standard libary `json`
module (introduced in Python 2.6). This commit performs the remaining
`simplejson` to `json` replacements, removes two comments highlighting
quirks of simplejson with respect to Unicode, and removes the references
to it in setup documentation and requirements.txt.
There were a lot of places where we were importing json from
swift.common.utils, which is less intuitive than a direct `import json`,
so that replacement is made as well.
(And in two more tiny drive-bys, we add some pretty-indenting to an XML
fragment and use `super` rather than naming a base class explicitly.)
Change-Id: I769e88dda7f76ce15cf7ce930dc1874d24f9498a
This patch adds more unit tests to diminish missing pieces
of the coverage in the proxy_controllers_base unit test.
Change-Id: I85ba1955c681cc9d5b2a70ac31155678d2e5b6fd
In earlier versions of swift when a request was made with an
existing origin, but without any CORS settings in the container,
it was possible to get an unhandled exception due to a method call
on the "None" return of cors.get('allow_origin', '').
Unit tests have been added to assert that this problem cannot go
undetected again.
Change-Id: Ia74896dabe1cf5a307c551b15a43ab1fd789c213
Fixes: bug 1468782
The iteritems() of Python 2 dictionaries has been renamed to items() on
Python 3. According to a discussion on the openstack-dev mailing list,
the overhead of creating a temporary list using dict.items() on Python 2
is very low because most dictionaries are small:
http://lists.openstack.org/pipermail/openstack-dev/2015-June/066391.html
Patch generated by the following command:
sed -i 's,iteritems,items,g' \
$(find swift -name "*.py") \
$(find test -name "*.py")
Change-Id: I6070bb6c684be76e8e77222a7d280ec6edd43496
The Python 2 next() method of iterators was renamed to __next__() on
Python 3. Use the builtin next() function instead which works on Python
2 and Python 3.
Change-Id: Ic948bc574b58f1d28c5c58e3985906dee17fa51d
This commit lets clients receive multipart/byteranges responses (see
RFC 7233, Appendix A) for erasure-coded objects. Clients can already
do this for replicated objects, so this brings EC closer to feature
parity (ha!).
GetOrHeadHandler got a base class extracted from it that treats an
HTTP response as a sequence of byte-range responses. This way, it can
continue to yield whole fragments, not just N-byte pieces of the raw
HTTP response, since an N-byte piece of a multipart/byteranges
response is pretty much useless.
There are a couple of bonus fixes in here, too. For starters, download
resuming now works on multipart/byteranges responses. Before, it only
worked on 200 responses or 206 responses for a single byte
range. Also, BufferedHTTPResponse grew a readline() method.
Also, the MIME response for replicated objects got tightened up a
little. Before, it had some leading and trailing CRLFs which, while
allowed by RFC 7233, provide no benefit. Now, both replicated and EC
multipart/byteranges avoid extraneous bytes. This let me re-use the
Content-Length calculation in swob instead of having to either hack
around it or add extraneous whitespace to match.
Change-Id: I16fc65e0ec4e356706d327bdb02a3741e36330a0
Current best response could return "503 Internal Server Error".
However, "503" means "Service Unavailable". (The status int of
Internal Server Error is 500)
This patch fixes the response status as "503 Service Unavailable"
Change-Id: I88b8c52c26b19e9e76ba3375f1e16ced555ed54c
This commit makes it possible to PUT an object into Swift and have it
stored using erasure coding instead of replication, and also to GET
the object back from Swift at a later time.
This works by splitting the incoming object into a number of segments,
erasure-coding each segment in turn to get fragments, then
concatenating the fragments into fragment archives. Segments are 1 MiB
in size, except the last, which is between 1 B and 1 MiB.
+====================================================================+
| object data |
+====================================================================+
|
+------------------------+----------------------+
| | |
v v v
+===================+ +===================+ +==============+
| segment 1 | | segment 2 | ... | segment N |
+===================+ +===================+ +==============+
| |
| |
v v
/=========\ /=========\
| pyeclib | | pyeclib | ...
\=========/ \=========/
| |
| |
+--> fragment A-1 +--> fragment A-2
| |
| |
| |
| |
| |
+--> fragment B-1 +--> fragment B-2
| |
| |
... ...
Then, object server A gets the concatenation of fragment A-1, A-2,
..., A-N, so its .data file looks like this (called a "fragment archive"):
+=====================================================================+
| fragment A-1 | fragment A-2 | ... | fragment A-N |
+=====================================================================+
Since this means that the object server never sees the object data as
the client sent it, we have to do a few things to ensure data
integrity.
First, the proxy has to check the Etag if the client provided it; the
object server can't do it since the object server doesn't see the raw
data.
Second, if the client does not provide an Etag, the proxy computes it
and uses the MIME-PUT mechanism to provide it to the object servers
after the object body. Otherwise, the object would not have an Etag at
all.
Third, the proxy computes the MD5 of each fragment archive and sends
it to the object server using the MIME-PUT mechanism. With replicated
objects, the proxy checks that the Etags from all the object servers
match, and if they don't, returns a 500 to the client. This mitigates
the risk of data corruption in one of the proxy --> object connections,
and signals to the client when it happens. With EC objects, we can't
use that same mechanism, so we must send the checksum with each
fragment archive to get comparable protection.
On the GET path, the inverse happens: the proxy connects to a bunch of
object servers (M of them, for an M+K scheme), reads one fragment at a
time from each fragment archive, decodes those fragments into a
segment, and serves the segment to the client.
When an object server dies partway through a GET response, any
partially-fetched fragment is discarded, the resumption point is wound
back to the nearest fragment boundary, and the GET is retried with the
next object server.
GET requests for a single byterange work; GET requests for multiple
byteranges do not.
There are a number of things _not_ included in this commit. Some of
them are listed here:
* multi-range GET
* deferred cleanup of old .data files
* durability (daemon to reconstruct missing archives)
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
The cached info object dict did not include
the sysmeta. This patch fixes that, and adds
a unit test.
Change-Id: I092200e76586af322ed4ff7d194a1034b1ca0433
Meaningless cleanup of the day. In my defence, I hope the next guy
doesn't have to grep throughout (the test is adjusted to signify).
Change-Id: I9e10dd977d4ca48db1393519068ce0e286705433
This change adds an optional overrides map to _make_request method
in the base Controller class.
def make_requests(self, req, ring, part, method, path, headers,
query_string='', overrides=None)
Which will be passed on the the best_response method. If set and
no quorum it reached, the override map is used to attempt to find
quorum.
The overrides map is in the form:
{ <response>: <override response>, .. }
The ObjectController, in the DELETE method now passes an override map
to make_requests method in the base Controller class in the form of:
{ 404: 204 }
Statuses/responses that have been overridden are used in calculation
of the quorum but never returned to the user. They are replaced by:
(STATUS, '', '', '')
And left out of the search for best response.
Change-Id: Ibf969eac3a09d67668d5275e808ed626152dd7eb
Closes-Bug: 1318375
If upgrading from a non-storage policy enabled version of
swift to a storage policy enabled version its possible that
memcached will have an info structure that does not contain
the 'storage_policy" key resulting in an unhandled exception
during the lookup. The fix is to simply make sure we never
return the dict without a storage_policy key defined; if it
doesn't exist its safe to make it '0' as this means you're
in the update scenario and there's xno other possibility.
Change-Id: If8e8f66d32819c5bfb2d1308e14643f3600ea6e9
* add get_object
* allow extra headers passthrough on HEAD/metadata reqeusts
* expose (account|container|get_object)_ring properties
Pipeline propety access to the auto_create_account_prefix also allows us to
bypass the early exit on a container HEAD for auto_create_accounts if the
container-updater hasn't cycled yet.
Allow overriding of storage policy index.
This is something the reconciler will need so that it can GET from one
policy, PUT in another, and then DELETE from the first one again.
DocImpact
Implements: blueprint storage-policies
Change-Id: I9b287d15f2426022d669d1186c9e22dd8ca13fb9
Objects now have a storage policy index associated with them as well;
this is determined by their filesystem path. Like before, objects in
policy 0 are in /srv/node/$disk/objects; this provides compatibility
on upgrade. (Recall that policy 0 is given to all existing data when a
cluster is upgraded.) Objects in policy 1 are in
/srv/node/$disk/objects-1, objects in policy 2 are in
/srv/node/$disk/objects-2, and so on.
* 'quarantined' dir already created 'objects' subdir so now there
will also be objects-N created at the same level
This commit does not address replicators, auditors, or updaters except
where method signatures changed. They'll still work if your cluster
has only one storage policy, though.
DocImpact
Implements: blueprint storage-policies
Change-Id: I459f3ed97df516cb0c9294477c28729c30f48e09
Middleware or core features may need to store metadata
against accounts or containers. This patch adds a
generic mechanism for system metadata to be persisted
in backend databases, without polluting the user
metadata namespace, by using the reserved header
namespace x-<server_type>-sysmeta-*.
Modifications are firstly that backend servers persist
system metadata headers alongside user metadata and
other system state.
For accounts and containers, system metadata in PUT
and POST requests is treated in a similar way to user
metadata. System metadata is not yet supported for
object requests.
Secondly, changes in the proxy controllers ensure that
headers in the system metadata namespace will pass through
in requests to backend servers.
Thirdly, system metadata returned from backend servers
in GET or HEAD responses is added to the cached info
dict, which middleware can access.
Finally, a gatekeeper middleware module is provided
which filters all system metadata headers from requests
and responses by removing headers with names starting
x-account-sysmeta-, x-container-sysmeta-. The gatekeeper
also removes headers starting x-object-sysmeta- in
anticipation of future support for system metadata being
set for objects. This prevents clients from writing or
reading system metadata.
The required_filters list in swift/proxy/server.py is
modified to include the gatekeeper middleware so that
if the gatekeeper has not been configured in the
pipeline then it will be automatically inserted close
to the start of the pipeline.
blueprint cluster-federation
Change-Id: I80b8b14243cc59505f8c584920f8f527646b5f45
If you create a container with a non-ASCII name, and then make another
container with X-Versions-Location: first-cøntåîner, *and* you're
serializing stuff in memcache as json (the default), when the proxy
tries to make a versioned object, it will crash.
The fix is to make sure that get_container_info() always returns strs,
not unicodes.
The long-term fix would be to get rid of simplejson entirely, as its
decoder can't make up its mind whether JSON strings should be Python
strs or unicodes, and that makes it really really easy to write bugs
like this.
Change-Id: Ib20ea5fb884484a4246d7a21a9f1e2ffd82eb04f
The proxy server was calling swob.Request.path_info_pop() prior to
instantiating a controller so that req.path_info was just /a/c/o (sans
/v1). The version got moved over into SCRIPT_NAME.
This lead to some unfortunate behavior when trying to re-use a request
from middleware. Something like this:
# Imagine we're a WSGIContext object here.
#
# To start, SCRIPT_NAME = '' and PATH_INFO='/v1/a/c/o'
resp_iter = self._app_call(env, start_response)
# Now SCRIPT_NAME='/v1' and PATH_INFO ='/a/c/o'
if something_special in self._response_headers:
env['REQUEST_METHOD'] = 'GET'
env.pop('HTTP_RANGE', None)
# 404 SURPRISE! The proxy calls path_info_pop() again,
# and now SCRIPT_NAME='/v1/a' and PATH_INFO='/c/o', so this
# gets treated as a container request. Yikes.
resp_iter = self._app_call(env, start_response)
Now we just leave SCRIPT_NAME and PATH_INFO alone. To make life easy
for everyone who does want just /a/c/o, I defined
swob.Request.swift_entity_path, which just strips off the /v1.
Note that there's still one call to path_info_pop() in tempauth, but
that's only for requests going to /auth, so it won't affect Swift API
requests. It might be a good idea to remove that one later, but let's
do one thing at a time.
Change-Id: I87557a11c01f3f3889b610578cda6ba7d3933e7a
If a source times out on read try another one of them with a
modified range. There had to be a lot of moved around code
to get this working but it should all make sense.
Change-Id: Ieaf045690a8823927a6f38098a95b37a4d4adb70
Allow the proxy to respond to many types of requests as soon as it has a
quorum. This can help speed up responses (without changing the results),
especially when one node is acting up.
I had to fix a few unit tests that no longer match the backend http requests
made by our proxy.
Change-Id: Ieb070dc3019e217e717b96154a7a809409bf40a5
In 1.8.0 (Grizzly), your proxy logs would indicate which middleware
was responsible for an internal request, e.g. TU for tempurl or BD for
bulk delete. At some point, those all turned into GET_INFO, which does
not give you any idea which specific middleware was responsible, only
that it came from a get_account_info/get_container_info call.
This commit puts it back to how it was in 1.8.0. Also, the
new-since-1.8.0 function get_object_info() got swift_source plumbing
added to it, so source tracking for the quota middlewares'
get_object_info() calls will happen now too.
Note that due to the new-since-1.8.0 in-environment caching of
account/container info, you may not see as many lines in the proxy log
as you would with 1.8.0. This is because there are actually fewer
internal requests being made.
Change-Id: I2b2ff7823c612dc7ed7f268da979c4500bbbe911
Content length of the copied object
is checked before allowing the copy
request according to the account
quota set by Reseller.
Fixes: bug #1200271
Change-Id: Ie4700f23466dd149ea5a497e6c72438cf52940fd
Consolidate the different ways in which info of account/container
is gathered, cached, used, updated, etc.
This refactoring increases code reuse and is a basis for later
addition of account ACLs.
Changing the get_info users is left for future.
This staged approach ensures the behaviour is unchanged.
Change-Id: I67b58030d3f9e3bc86bcd7ece0f1dc693c4e08c3
Fixes: Bug #1162199
- Fixes bug 1119282.
- Allow middleware accessing metadata of an account without having to
store it separately in a new memcache namespace.
- Add tests for get_container_info that was previously missed.
- Add get_account_info method based on get_container_info, a function
for other middleware to query accounts.
- Rename container_info['count'] as container_info['object_count'].
Change-Id: I43787916c7a812cb08d278edf45370521f12c912
Currently, a container's info can be cached without cors data intact after
a container GET.
I made headers_to_container_info a function instead of a method and I crammed
all container metadata into container_info.
This is so e.g. staticweb can eventually re-use the same container info cache.
Fix pep8 in swift/proxy/controllers/container.py
Change-Id: I4bbb042dde79afac48395efc38bd80f0ff240e1f