The proxy-server makes GET requests to the container server to fetch
full lists of shard ranges when handling object PUT/POST/DELETE and
container GETs, then it only stores the Namespace attributes (lower
and name) of the shard ranges into Memcache and reconstructs the list
of Namespaces based on those attributes. Thus, a namespaces GET
interface can be added into the backend container-server to only
return a list of those Namespace attributes.
On a container server setup which serves a container with ~12000
shard ranges, benchmarking results show that the request rate of the
HTTP GET all namespaces (states=updating) is ~12 op/s, while the
HTTP GET all shard ranges (states=updating) is ~3.2 op/s.
The new namespace GET interface supports most of headers and
parameters supported by shard range GET interface. For example,
the support of marker, end_marker, include, reverse and etc. Two
exceptions are: 'x-backend-include-deleted' cannot be supported
because there is no way for a Namespace to indicate the deleted state;
the 'auditing' state query parameter is not supported because it is
specific to the sharder which only requests full shard ranges.
Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: If152942c168d127de13e11e8da00a5760de5ae0d
The Namespace class grew account/container properties to make them
easier to use in the proxy and subjected to similar consistency
requirements as the ShardRange's properties in the related change.
There are no new assertions added in this change, it merely
consolidates the py2/py3 validating helper which was duplicated
between the Namespace and ShardRange TestCases.
Related-Change-Id: Iebb09d6eff2165c25f80abca360210242cf3e6b7
Change-Id: Ide7f1dd3d9c664fb57c47dcd50edb44ae90ff5f9
ShardRange.name is required to have the form <account/container>. We'd
like to be able to replace ShardRange instances with the Namespace
superclass but still have the convenience of the account and container
accessors.
The name is stored as a single attribute and split when accessing via
the account and container getters, rather than splitting into two
attributes in the name setter, to minimise the overhead of
constructing Namespace instances. Where performance can be critical
(e.g. fetching the entire set of namespaces from a container server)
the number of Namespace instances constructed can be much greater than
the number whose account and container properties are used. The author
found that splitting in the account and container getters became more
efficient than splitting in the name setter when the rate of
constructing instances was ~2x greater than the rate of calling the
account and container getters.
The account and container property setters are removed from the
ShardRange class. The name setter is removed from the Namespace class.
These setter were never used.
Change-Id: Iebb09d6eff2165c25f80abca360210242cf3e6b7
This patch reorganizes the SLO read response handling. The main goal
was to push the response header replacement for both GET/HEAD SLO and
multipart-manifest=get paths all into a common return path. A new
RespAttrs primitive is used to carry around some metadata details from
requests made in SLO. The authors hope these changes make the code more
easily readable and easier to modify.
Drive-By: add new "friendly_close" function in common.utils so we can
drain empty/error responses more confidently (and use it in swob and
request_helpers).
Drive-By: the tests added in the Related-Change discovered a 500 on
If-[Un]Modified-Since conditional GET requests - it probably wasn't
important, but this refactor fixed it on accident as a side effect.
Closes-Bug: #2040178
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Co-Authored-By: Ashwin Nair <nairashwin952013@gmail.com>
Related-Change-Id: I54094f3d2098f56b755ec19cc9315d06a6ca8b15
Change-Id: Idc84e70539fc7480b6ecb86e2f0da904baf2c727
Ensure name/account/container are always consistent and always encode
utf8 in py2.
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Change-Id: Ia5374f55adf80fef92a92d916b3f89297463c673
See https://www.freedesktop.org/software/systemd/man/sd_notify.html#Description
for more information.
Note that this requires that we keep the NOTIFY_SOCKET env var
around for more than just the first READY message, so we want to be
careful about when we're sending the default "READY=1".
UpgradeImpact
=============
Since prior versions of Swift would unset the NOTIFY_SOCKET env var,
services must be fully restarted (rather than seamlessly reloaded) to
emit the new messages.
Related-Change: Ice224fc2a6ba0150be180955037c13fc90365479
Change-Id: I201734ae0d6232ecb1923e67864dd928f90b6586
It seems unreasonable to expect timings to be accurate to sub-100ns
resolution.
Why 4 places? We already had some tests for proxy-logging that would
assertAlmostEqual to that many places.
Change-Id: Ic7a0c4a416a46eb5198d7cce103358d677ec94ab
Previously, clients could bypass fallocate_reserve checks by uploading
with `Transfer-Encoding: chunked` rather than sending a `Content-Length`
Now, a chunked transfer may still push a disk past the reserve threshold,
but once over the threshold, further PUTs and POSTs will 507. DELETEs
will still be allowed.
Closes-Bug: #2031049
Change-Id: I69ec7193509cd3ed0aa98aca15190468368069a5
It adds another layer of indirection and state for the sake of labeling;
longer term it'll be easier to be explicit at the point of emission.
Related-Change: I0522b1953722ca96021a0002cf93432b973ce626
Change-Id: Ieebafb19c3fa60334aff2914ab1ae70b8f140342
This reverts the fallocate- and punch_hole-related parts of commit
c78a5962b5f6c9e75f154cac924a226815236e98.
Closes-Bug: #2031035
Related-Change: I3e26f8d4e5de0835212ebc2314cac713950c85d7
Change-Id: I8050296d6982f70bb64a63765b25d287a144cb8d
The 'log_route' argument of utils.get_logger() determines which global
Logger instance is wrapped by the returned LogAdapter. Most middlewares
(s3api being the exception) explicity set 'log_route' to equal the
middleware 'brief' name e.g. 'bulk', 'tempauth' etc. However, the
s3api middleware sets 'log_route' to be the config 'log_name', if that
key is found in config.
When a proxy pipeline is instantiated via wsgi.run_wsgi(), all
middlewares and the proxy app are passed a default conf with
'"log_name": "proxy-server"'. As a result, the s3api middleware calls
get_logger() with log_route='proxy-server' and its LogAdapter
therefore shares the same Logger instance used by proxy-server app
(and any other middleware that similarly fails to explicitly
differentiate 'log_route)'.
Each Logger instance has a StatsdClient instance bound to it by
get_logger(). The Related-Change added statsd metrics to the s3api
middleware and sets 's3api' as the 'statsd_tail_prefix' when calling
get_logger(). This had the unintended effect of replacing the shared
Logger instance's StatsdClient with one that has prefix 's3api', such
that stats emitted by the proxy app (e.g. memcache shard range
hit/miss stats) would be erroneously prefixed with 's3api'.
This patch modifies the s3api middleware logger instantiation to
explictly set log_route='s3api', so that the s3api middleware
LogAdapter now wraps a unique global Logger instance, with a unique
StatsdClient instance bound to it.
The 'server' attribute of the middleware's LogAdapter, which may be
included in log lines by the "%(server)s" format element, is not
affected by this change. Its value is derived from the config
'log_name' or the 'name' argument passed to get_logger().
Change-Id: Ia89485bae8f92f4f3d9f5375cab8ff08f70a11a7
Related-Change: I4976b3ee24e4ec498c66359f391813261d42c495
Currently, SLO manifest files will be evicted from page cache
after reading it, which cause hard drives very busy when user
requests a lot of parallel byte range GETs for a particular
SLO object.
This patch will add a new config 'keep_cache_slo_manifest', and
try keeping the manifest files in page cache by not evicting them
after reading if config settings allow so.
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I557bd01643375d7ad68c3031430899b85908a54f
RemoteDisconnected from both BadStatusLine and ConnectionResetError
(which in turn eventually inherits from OSError). We want to make
sure it gets handled as a BadStatusLine, as it doesn't get its errno
set and would otherwise get the default traceback handling.
Change-Id: I0fb1f764722d73db6d3b79acc128f37f51499d35
Client when explicitly closed before finishing the download.
leads to a 499, but the shutdown logging for proxy in py3
needs to be fixed. We have done it by killing all running
coroutines in the ContextPool
Change-Id: Ic372ea9866bb7f2659e02f8796cdee01406e2079
Previously it was possible for an entire object PUT data transfer to
execute without the greenthread sleeping and allowing other
greenthreads to run. This was more likely with an EC PUT because the
computation of EC fragments might be slower than the rate at which
they are drained out of IO send buffers, so IO never blocks. In
extreme cases this could cause timeouts in other greenthreads to pop.
This patch adds a periodic zero-time sleep in the object PUT data
transfer loop. An existing pattern in the GET path is re-used, and
extracted to a new CooperativeIterator helper class.
Change-Id: Idd6b767f1a746c72c106199f5d1fada3615b1e97
Closes-Bug: #2019955
Related-Change: Iae27109f5a3d109ad21ec9a972e39f22150f6dbb
Previously swift.common.utils monkey patched logging.thread,
logging.threading, and logging._lock upon import with eventlet
threading modules, but that is no longer reasonable or necessary.
With py3, the existing logging._lock is not patched by eventlet,
unless the logging module is reloaded. The existing lock is not
tracked by the gc so would not be found by eventlet's
green_existing_locks().
Instead we group all monkey patching into utils function and apply
patching consistently across daemons and WSGI servers.
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Closes-Bug: #1380815
Change-Id: I6f35ad41414898fb7dc5da422f524eb52ff2940f
... and clean up WatchDog start a little.
If this pattern proves useful we could consider extending it.
Change-Id: Ia85f9321b69bc4114a60c32a7ad082cae7da72b3
Updating shard range cache has been restructured and upgraded to v2
which only persist the essential attributes in memcache (see
Related-Change). This is the following patch to restructure the
listing shard ranges cache for object listing in the same way.
UpgradeImpact
=============
The cache key for listing shard ranges in memcached is renamed
from 'shard-listing/<account>/<container>' to
'shard-listing-v2/<account>/<container>', and cache data is
changed to be a list of [lower bound, name]. As a result, this
will invalidate all existing listing shard ranges stored in the
memcache cluster.
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Related-Change: If98af569f99aa1ac79b9485ce9028fdd8d22576b
Change-Id: I54a32fd16e3d02b00c18b769c6f675bae3ba8e01
Also:
- move some tests to test_utils.TestNamespace.
- move ShardName class in file (no change to class)
- move end_marker method from ShardRange to Namespace
Related-Change: If98af569f99aa1ac79b9485ce9028fdd8d22576b
Change-Id: Ibd5614d378ec5e9ba47055ba8b67a42ab7f7453c
Restructure the shard ranges that are stored in memcache for
object updating to only persist the essential attributes of
shard ranges in memcache (lower bounds and names), so the
aggregate of memcache values is much smaller and retrieval
will be much faster too.
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
UpgradeImpact
=============
The cache key for updating shard ranges in memcached is renamed
from 'shard-updating/<account>/<container>' to
'shard-updating-v2/<account>/<container>', and cache data is
changed to be a list of [lower bound, name]. As a result, this
will invalid all existing updating shard ranges stored in the
memcache cluster.
Change-Id: If98af569f99aa1ac79b9485ce9028fdd8d22576b
Adding a "use_replication" field to the node dict, a helper function to
set use_replication dict value for a node copy by looking up the header
value for x-backend-use-replication-network
Change-Id: Ie05af464765dc10cf585be851f462033fc6bdec7
pytest still complains about some 20k warnings, but the vast majority
are actually because of eventlet, and a lot of those will get cleaned up
when upper-constraints picks up v0.33.2.
Change-Id: If48cda4ae206266bb41a4065cd90c17cbac84b7f
We've seen shards become stuck while sharding because they had
incomplete or stale deleted shard ranges. The root container had more
complete and useful shard ranges into which objects could have been
cleaved, but the shard never merged the root's shard ranges.
While the sharder is auditing shard container DBs it would previously
only merge shard ranges fetched from root into the shard DB if the
shard was shrinking or the shard ranges were known to be children of
the shard. With this patch the sharder will now merge other shard
ranges from root during sharding as well as shrinking.
Shard ranges from root are only merged if they would not result in
overlaps or gaps in the set of shard ranges in the shard DB. Shard
ranges that are known to be ancestors of the shard are never merged,
except the root shard range which may be merged into a shrinking
shard. These checks were not previously applied when merging
shard ranges into a shrinking shard.
The two substantive changes with this patch are therefore:
- shard ranges from root are now merged during sharding,
subject to checks.
- shard ranges from root are still merged during shrinking,
but are now subjected to checks.
Change-Id: I066cfbd9062c43cd9638710882ae9bd85a5b4c37
Lines like `Invalid response 500 from ::1` aren't terribly useful in an
all-in-one, while lines like
Error syncing with node: {'device': 'd5', 'id': 3, 'ip': '::1',
'meta': '', 'port': 6200, 'region': 1, 'replication_ip': '::1',
'replication_port': 6200, 'weight': 8000.0, 'zone': 1, 'index': 0}:
Timeout (60s)
are needlessly verbose.
While we're at it, introduce a node_to_string() helper, and use it in a
bunch of places.
Change-Id: I62b12f69e9ac44ce27ffaed320c0a3563673a018
Adds an is_child_of method that infers the parent-child relationship
of two shard ranges from their names. This new method is limited to
use only under the same account.
Co-Authored-By: Jianjian Huo <jhuo@nvidia.com>
Change-Id: Iac3a8ec5d8947989b64aa27f40caa3d8d1423a7c
The setDaemon method of the threading.Thread was deprecated
in Python 3.10 (*).
Replace the setDaemon method with the daemon property.
*: https://docs.python.org/3.10/library/threading.html#threading.Thread.setDaemon
Change-Id: Ic854dc3c393d382a8acd20d89f56bff198a2ec5e
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
We've known this would eventually be necessary for a while [1], and
way back in 2017 we started seeing SHA-1 collisions [2].
This patch follows the approach of soft deprecation of SHA1 in tempurl.
It's still a default digest, but we'll start with warning as the
middleware is loaded and exposing any deprecated digests
(if they're still allowed) in /info.
Further, because there is much shared code between formpost and tempurl, this
patch also goes and refactors shared code out into swift.common.digest.
Now that we have a digest, we also move digest related code:
- get_hmac
- extract_digest_and_algorithm
[1] https://www.schneier.com/blog/archives/2012/10/when_will_we_se.html
[2] https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
Change-Id: I581cadd6bc79e623f1dae071025e4d375254c1d9
Sha1 has known to be deprecated for a while so allow the formpost
middleware to use SHA256 and SHA512. Follow the tempurl model and
accept signatures of the form:
<hex-encoded signature>
or
sha1:<base64-encoded signature>
sha256:<base64-encoded signature>
sha512:<base64-encoded signature>
where the base64-encoding can be either standard or URL-safe, and the
trailing '=' chars may be stripped off.
As part of this, pull the signature-parsing out to a new function, and
add detection for hex-encoded sha512 signatures to tempurl.
Change-Id: Iaba3725551bd47d75067a634a7571485b9afa2de
Related-Change: Ia9dd1a91cc3c9c946f5f029cdefc9e66bcf01046
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Closes-Bug: #1794601
Previously, we always needed to retry test_statsd_set_prefix_deprecation.
This was because the warning would be triggered in
test_get_logger_statsd_client_non_defaults and recorded in the
module-level warnings registry. Now, explicitly clear the warning
registry. From the docs [0]:
> One thing to be aware of is that if a warning has already been raised
> because of a once/default rule, then no matter what filters are set
> the warning will not be seen again unless the warnings registry
> related to the warning has been cleared.
[0] https://docs.python.org/3/library/warnings.html#testing-warnings
Change-Id: Icf4b381dcc04d04b5401e5ed3f43df049c1dd2b4