The repo is Python using both Python 2 and 3 now, so update hacking to
version 2.0 which supports Python 2 and 3. Note that latest hacking
release 3.0 only supports version 3.
Fix problems found.
Remove hacking and friends from lower-constraints, they are not needed
for installation.
Change-Id: I9bd913ee1b32ba1566c420973723296766d1812f
The contextmanager eventlet.timeout.Timeout is scheduling a call to
throw an exception every time is is entered. The swift-proxy uses
Chunk(Read|Write)Timeout for every chunk read/written from the client or
object-server. For a single upload/download of a big object, it means
tens of thousands of scheduling in eventlet, which is very costly.
This patch replace the usage of these context managers by a watchdog
greenthread that will schedule itself by sleeping until the next timeout
expiration. Then, only if a timeout expired, it will schedule a call to
throw the appropriate exception.
The gain on bandwidth and CPU usage is significant. On a benchmark
environment, it gave this result for an upload of 6 Gbpson a replica
policy (average of 3 runs):
master: 5.66 Gbps / 849 jiffies consumed by the proxy-server
this patch: 7.56 Gbps / 618 jiffies consumed by the proxy-server
Change-Id: I19fd42908be5a6ac5905ba193967cd860cb27a0b
Previously, logs would often show 499s in places where some other status
would be more appropriate.
Change-Id: I68dbb8593101cd3b5b64a1a947c68e340e36ce02
This allows static symlinks to be synced before their target. Dynamic
symlinks could already be synced even if target object has not been
synced, but static links previously required that target object existed
before it can be PUT. Now, have container_sync middleware plumb in an
override like it does for SLO.
Change-Id: I3bfc62b77b247003adcee6bd4d374168bfd6707d
If we move it to constraints it's more globally accessible in our code,
but more importantly it's more obvious to ops that everything breaks if
you try to mis-configure different values per-service.
Change-Id: Ib8f7d08bc48da12be5671abe91a17ae2b49ecfee
Previously, if you were on Python 2.7.10+ [0], such a newline would cause the
sharder to fail, complaining about invalid header values when trying to create
the shard containers. On older versions of Python, it would most likely cause a
parsing error in the container-server that was trying to handle the PUT.
Now, quote all places that we pass around container paths. This includes:
* The X-Container-Sysmeta-Shard-(Quoted-)Root sent when creating the (empty)
remote shards
* The X-Container-Sysmeta-Shard-(Quoted-)Root included when initializing the
local handoff for cleaving
* The X-Backend-(Quoted-)Container-Path the proxy sends to the object-server
for container updates
* The Location header the container-server sends to the object-updater
Note that a new header was required in requests so that servers would
know whether the value should be unquoted or not. We can get away with
reusing Location in responses by having clients opt-in to quoting with
a new X-Backend-Accept-Quoted-Location header.
During a rolling upgrade,
* old object-servers servicing requests from new proxy-servers will
not know about the container path override and so will try to update
the root container,
* in general, object updates are more likely to land in the root
container; the sharder will deal with them as misplaced objects, and
* shard containers created by new code on servers running old code
will think they are root containers until the server is running new
code, too; during this time they'll fail the sharder audit and report
stats to their account, but both of these should get cleared up upon
upgrade.
Drive-by: fix a "conainer_name" typo that prevented us from testing that
we can shard a container with unicode in its name. Also, add more UTF8
probe tests.
[0] See https://bugs.python.org/issue22928
Change-Id: Ie08f36e31a448a547468dd85911c3a3bc30e89f1
Closes-Bug: 1856894
Since we don't use 404s from handoffs anymore, we need to not let errors
on handoffs overwhelm primary responses either
Change-Id: I2624e113c9d945542f787e5f18f487bd7be3d32e
Closes-Bug: #1857909
Otherwise, we waste a request on some 416/206 response that won't be
helpful.
To do this, add a new X-Backend-Ignore-Range-If-Metadata-Present header
whose value is a comma-separated list of header names. Middlewares may
include this header to tell object-servers to send the whole object
(rather than a 206 or 416) if *any* of the metadata are present.
Have dlo and symlink use it, too; it won't save us any round-trips, but
it should clean up some object-server logging.
Change-Id: I4ff2a178d0456e7e37d561109ef57dd0d92cbd4e
Reserve the namespace starting with the NULL byte for internal
use-cases. Backend services will allow path names to include the NULL
byte in urls and validate names in the reserved namespace. Database
services will filter all names starting with the NULL byte from
responses unless the request includes the header:
X-Backend-Allow-Reserved-Names: true
The proxy server will not allow path names to include the NULL byte in
urls unless a middlware has set the X-Backend-Allow-Reserved-Names
header. Middlewares can use the reserved namespace to create objects
and containers that can not be directly manipulated by clients. Any
objects and bytes created in the reserved namespace will be aggregated
to the user's account totals.
When deploying internal proxys developers and operators may configure
the gatekeeper middleware to translate the X-Allow-Reserved-Names header
to the Backend header so they can manipulate the reserved namespace
directly through the normal API.
UpgradeImpact: it's not safe to rollback from this change
Change-Id: If912f71d8b0d03369680374e8233da85d8d38f85
The test test_PUT_ec_fragment_quorum_archive_etag_mismatch
busts the md5 in server.py, so it ends damaging the md5 of
footers instead of the fragment archive. It appears that the
intention of the test was to check the integrity verification
for fragment archive, so change the test to bust diskfile.py
instead.
Change-Id: I54a203bb637d5f5814e8df2b4297758b0b72adac
A Python 3 bug causes us to abort header parsing in some cases. We
mostly worked around that in the related change, but that was *after*
eventlet used the parsed headers to determine things like message
framing. As a result, a client sending a malformed request (for example,
sending both Content-Length *and* Transfer-Encoding: chunked headers)
might have that request parsed properly and authorized by a proxy-server
running Python 2, but the proxy-to-backend request could get misparsed
if the backend is running Python 3. As a result, the single client
request could be interpretted as multiple requests by an object server,
only the first of which was properly authorized at the proxy.
Now, after we find and parse additional headers that weren't parsed by
Python, fix up eventlet's wsgi.input to reflect the message framing we
expect given the complete set of headers. As an added precaution, if the
client included Transfer-Encoding: chunked *and* a Content-Length,
ensure that the Content-Length is not forwarded to the backend.
Change-Id: I70c125df70b2a703de44662adc66f740cc79c7a9
Related-Change: I0f03c211f35a9a49e047a5718a9907b515ca88d7
Closes-Bug: 1840507
Previously we'd
- complain that a client disconnected even though they finished their
chunked transfer just fine, and
- on EC, send a X-Backend-Obj-Content-Length for pre-allocation even
though Content-Length doesn't determine request body size.
Change-Id: Ia80e595f713695cbb41dab575963f2cb9bebfa09
Related-Bug: 1840507
Previously, our unit tests with socket servers would let eventlet
capitalize headers on the way out, which
- isn't something we want to have eventlet do, because it
- breaks unicode-in-header-names on py3, so it
- is already disabled in swift.common.wsgi.run_server() for real servers.
Include a test to make sure we don't forget about it in the future.
Change-Id: I0156d0059092ed414b296c65fb70fc18533b074a
We were playing a little fast & loose with types before; as a result,
marker/end_marker weren't quite working right. In particular, we were
checking whether a WSGI string was contained in a shard range, while
ShardRange assumes all comparisons are against native strings.
Now, get everything to native strings before making comparisons, and
get them back to wsgi when we shove them in the params dict.
Change-Id: Iddf9e089ef95dc709ab76dc58952a776246991fd
It's probably weird that StreamingPile has this interfaces that swallows
exceptions, but this seems better than hanging.
Change-Id: I8fe45c0f0d291efc84f3edf5d6b7cd116b5c7835
We previously realized we needed to do that for accounts and containers
where the consequences of treating the 404 as authoritative were more
obvious: we'd cache the non-existence which prevented writes until it
fell out of cache.
The same basic logic applies for objects, though: if we see
(Timeout, Timeout, Timeout, 404, 404, 404)
on a triple-replica policy, we don't really have any reason to think
that a 404 is appropriate. In fact, it seems reasonably likely that
there's a thundering-herd problem where there are too many concurrent
requests for data that *definitely is there*. By responding with a 503,
we apply some back-pressure to clients, who hopefully have some
exponential backoff in their retries.
The situation gets a bit more complicated with erasure-coded data, but
the same basic principle applies. We're just more likely to have
confirmation that there *is* data out there, we just can't reconstruct
it (right now).
Note that we *still want to check* those handoffs, of course. Our
fail-in-place strategy has us replicate (and, more recently,
reconstruct) to handoffs to maintain durability; it'd be silly *not* to
look.
UpgradeImpact:
--------------
Be aware that this may cause an increase in 503 Service Unavailable
responses served by proxy-servers. However, this should more accurately
reflect the state of the system.
Co-Authored-By: Thiago da Silva <thiagodasilva@gmail.com>
Change-Id: Ia832e9bab13167948f01bc50aa8a61974ce189fb
Closes-Bug: #1837819
Related-Bug: #1833612
Related-Change: I53ed04b5de20c261ddd79c98c629580472e09961
Related-Change: Ief44ed39d97f65e4270bf73051da9a2dd0ddbaec
Previously, we issued a GET to the root container for every object PUT,
POST, and DELETE. This puts load on the container server, potentially
leading to timeouts, error limiting, and erroneous 404s (!).
Now, cache the complete set of 'updating' shards, and find the shard for
this particular update in the proxy. Add a new config option,
recheck_updating_shard_ranges, to control the cache time; it defaults to
one hour. Set to 0 to fall back to previous behavior.
Note that we should be able to tolerate stale shard data just fine; we
already have to worry about async pendings that got written down with
one shard but may not get processed until that shard has itself sharded
or shrunk into another shard.
Also note that memcache has a default value limit of 1MiB, which may be
exceeded if a container has thousands of shards. In that case, set()
will act like a delete(), causing increased memcache churn but otherwise
preserving existing behavior. In the future, we may want to add support
for gzipping the cached shard ranges as they should compress well.
Change-Id: Ic7a732146ea19a47669114ad5dbee0bacbe66919
Closes-Bug: 1781291
Sometimes we want 404, sometimes we want 503 - it's tricky
Change-Id: I30f5af07e2e1fc7cbb6bdb1c334a0a161caf0906
Related-Change-Id: I53ed04b5de20c261ddd79c98c629580472e09961
Previously, we stored the WSGI strings in memcached and returned them when
responding to get_account/container_info calls. This would lead to cache
corruption in a heterogenous py2/py3 cluster such as you would have during
a rolling upgrade.
Now, only store and return native strings.
Change-Id: I8d6f66dfe846493972e433f70bad76a33d204562
Instead of taking a X-Backend-Allow-Method that *must match* the
REQUEST_METHOD, take a truish X-Backend-Allow-Private-Methods and
expand the set of allowed methods. This allows us to also expose
the full list of available private methods when returning a 405.
Drive-By: make async-delete tests a little more robust:
* check that end_marker and prefix are preserved on subsequent
listings
* check that objects with a leading slash are correctly handled
Change-Id: I5542623f16e0b5a0d728a6706343809e50743f73
Adds a tool, swift-container-deleter, that takes an account/container
and optional prefix, marker, and/or end-marker; spins up an internal
client; makes listing requests against the container; and pushes the
found objects into the object-expirer queue with a special
application/async-deleted content-type.
In order to do this enqueuing efficiently, a new internal-to-the-cluster
container method is introduced: UPDATE. It takes a JSON list of object
entries and runs them through merge_items.
The object-expirer is updated to look for work items with this
content-type and skip the X-If-Deleted-At check that it would normally
do.
Note that the target-container's listing will continue to show the
objects until data is actually deleted, bypassing some of the concerns
raised in the related change about clearing out a container entirely and
then deleting it.
Change-Id: Ia13ee5da3d1b5c536eccaadc7a6fdcd997374443
Related-Change: I50e403dee75585fc1ff2bb385d6b2d2f13653cf8