702 Commits

Author SHA1 Message Date
Clay Gerrard
fc731198ac Quiet eventlet exceptions in test
This traceback would probably show up in the wild, but it's not relevant
to this test.

Change-Id: I9e6e7679f674bddddcc4440b38d5741aaece3393
2020-06-05 09:49:01 -05:00
Clay Gerrard
ede9dad9f6 Better functest quarantine cleanup
Change-Id: I9218aaeb5fcd21f1bc2a5d655e3216059a209aeb
2020-06-01 14:43:05 +00:00
Tim Burke
4c8512afb1 Use separate name for HeaderKeyDict var vs list of response headers
Closes-Bug: #1875538
Change-Id: I1bcef61157594329f6978f7380bf7293aa1ca65e
Related-Change: Ia832e9bab13167948f01bc50aa8a61974ce189fb
2020-04-28 07:48:27 -07:00
Zuul
3cceec2ee5 Merge "Update hacking for Python3" 2020-04-09 15:05:28 +00:00
Zuul
62fc62bb12 Merge "py3: stop barfing on message/rfc822 Content-Types" 2020-04-07 08:59:05 +00:00
Andreas Jaeger
96b56519bf Update hacking for Python3
The repo is Python using both Python 2 and 3 now, so update hacking to
version 2.0 which supports Python 2 and 3. Note that latest hacking
release 3.0 only supports version 3.

Fix problems found.

Remove hacking and friends from lower-constraints, they are not needed
for installation.

Change-Id: I9bd913ee1b32ba1566c420973723296766d1812f
2020-04-03 21:21:07 +02:00
Romain LE DISEZ
8378a11d11 Replace all "with Chunk*Timeout" by a watchdog
The contextmanager eventlet.timeout.Timeout is scheduling a call to
throw an exception every time is is entered. The swift-proxy uses
Chunk(Read|Write)Timeout for every chunk read/written from the client or
object-server. For a single upload/download of a big object, it means
tens of thousands of scheduling in eventlet, which is very costly.

This patch replace the usage of these context managers by a watchdog
greenthread that will schedule itself by sleeping until the next timeout
expiration. Then, only if a timeout expired, it will schedule a call to
throw the appropriate exception.

The gain on bandwidth and CPU usage is significant. On a benchmark
environment, it gave this result for an upload of 6 Gbpson a replica
policy (average of 3 runs):
    master: 5.66 Gbps / 849 jiffies consumed by the proxy-server
    this patch: 7.56 Gbps / 618 jiffies consumed by the proxy-server

Change-Id: I19fd42908be5a6ac5905ba193967cd860cb27a0b
2020-04-02 07:38:47 -04:00
Tim Burke
04cc11b938 py3: stop barfing on message/rfc822 Content-Types
Closes-Bug: #1863053
Change-Id: I7493d3e201e26df9f200e16bc081d8a0f30308b9
2020-03-26 12:55:54 -07:00
Zuul
71cc368179 Merge "sharding: filter shards based on prefix param when listing" 2020-03-05 06:53:05 +00:00
Zuul
efe084adab Merge "Use float consistently for proxy timeout settings" 2020-03-04 07:29:42 +00:00
Clay Gerrard
dc40779307 Use float consistently for proxy timeout settings
Change-Id: I433c97df99193ec31c863038b9b6fd20bb3705b8
2020-03-02 10:44:48 -06:00
Tim Burke
2a8d47f00e middlewares: Clean up app iters better
Previously, logs would often show 499s in places where some other status
would be more appropriate.

Change-Id: I68dbb8593101cd3b5b64a1a947c68e340e36ce02
2020-02-12 21:27:15 -08:00
Tim Burke
09b7ed600b sharding: filter shards based on prefix param when listing
Otherwise, we make a bunch of backend requests where we have no
real expectation of finding data.

Change-Id: I7eaa012ba938eaa7fc22837c32007d1b7ae99709
2020-02-05 08:36:27 -08:00
Thiago da Silva
26ff2eb1cb container-sync: Sync static links similar to how we sync SLOs
This allows static symlinks to be synced before their target. Dynamic
symlinks could already be synced even if target object has not been
synced, but static links previously required that target object existed
before it can be PUT. Now, have container_sync middleware plumb in an
override like it does for SLO.

Change-Id: I3bfc62b77b247003adcee6bd4d374168bfd6707d
2020-01-24 17:15:57 -08:00
Zuul
68924d920c Merge "Have slo tell the object-server that it wants whole manifests" 2020-01-18 13:31:32 +00:00
Zuul
7eccca9344 Merge "Early-return on non-Swift get_info requests" 2020-01-17 00:06:14 +00:00
Zuul
e32689a96d Merge "Deprecate per-service auto_create_account_prefix" 2020-01-07 01:30:20 +00:00
Zuul
fb538a9afe Merge "sharding: Better-handle newlines in container names" 2020-01-05 20:02:30 +00:00
Clay Gerrard
4601548dab Deprecate per-service auto_create_account_prefix
If we move it to constraints it's more globally accessible in our code,
but more importantly it's more obvious to ops that everything breaks if
you try to mis-configure different values per-service.

Change-Id: Ib8f7d08bc48da12be5671abe91a17ae2b49ecfee
2020-01-05 09:53:30 -06:00
Tim Burke
3f88907012 sharding: Better-handle newlines in container names
Previously, if you were on Python 2.7.10+ [0], such a newline would cause the
sharder to fail, complaining about invalid header values when trying to create
the shard containers. On older versions of Python, it would most likely cause a
parsing error in the container-server that was trying to handle the PUT.

Now, quote all places that we pass around container paths. This includes:

  * The X-Container-Sysmeta-Shard-(Quoted-)Root sent when creating the (empty)
    remote shards
  * The X-Container-Sysmeta-Shard-(Quoted-)Root included when initializing the
    local handoff for cleaving
  * The X-Backend-(Quoted-)Container-Path the proxy sends to the object-server
    for container updates
  * The Location header the container-server sends to the object-updater

Note that a new header was required in requests so that servers would
know whether the value should be unquoted or not. We can get away with
reusing Location in responses by having clients opt-in to quoting with
a new X-Backend-Accept-Quoted-Location header.

During a rolling upgrade,

  * old object-servers servicing requests from new proxy-servers will
    not know about the container path override and so will try to update
    the root container,
  * in general, object updates are more likely to land in the root
    container; the sharder will deal with them as misplaced objects, and
  * shard containers created by new code on servers running old code
    will think they are root containers until the server is running new
    code, too; during this time they'll fail the sharder audit and report
    stats to their account, but both of these should get cleared up upon
    upgrade.

Drive-by: fix a "conainer_name" typo that prevented us from testing that
we can shard a container with unicode in its name. Also, add more UTF8
probe tests.

[0] See https://bugs.python.org/issue22928

Change-Id: Ie08f36e31a448a547468dd85911c3a3bc30e89f1
Closes-Bug: 1856894
2020-01-03 16:04:57 -08:00
Zuul
f73a190837 Merge "Use less responses from handoffs" 2020-01-03 03:13:58 +00:00
Clay Gerrard
286082222d Use less responses from handoffs
Since we don't use 404s from handoffs anymore, we need to not let errors
on handoffs overwhelm primary responses either

Change-Id: I2624e113c9d945542f787e5f18f487bd7be3d32e
Closes-Bug: #1857909
2020-01-02 16:44:05 -08:00
Tim Burke
e8b654f318 Have slo tell the object-server that it wants whole manifests
Otherwise, we waste a request on some 416/206 response that won't be
helpful.

To do this, add a new X-Backend-Ignore-Range-If-Metadata-Present header
whose value is a comma-separated list of header names. Middlewares may
include this header to tell object-servers to send the whole object
(rather than a 206 or 416) if *any* of the metadata are present.

Have dlo and symlink use it, too; it won't save us any round-trips, but
it should clean up some object-server logging.

Change-Id: I4ff2a178d0456e7e37d561109ef57dd0d92cbd4e
2020-01-02 15:48:39 -08:00
Tim Burke
d246bf20ed sharding: Tolerate blank limits when listing
Otherwise, we can 500 with

   ValueError: invalid literal for int() with base 10: ''

Change-Id: I35614aa4b42e61d97929579dcb16f7dfc9fef96f
2019-12-19 22:27:27 -08:00
Tim Burke
b65d8b10c5 Early-return on non-Swift get_info requests
Change-Id: Iadc61a1c3bcbfbc47f65ec65df36d8da3694ee74
2019-12-14 01:38:31 +00:00
Clay Gerrard
698717d886 Allow internal clients to use reserved namespace
Reserve the namespace starting with the NULL byte for internal
use-cases.  Backend services will allow path names to include the NULL
byte in urls and validate names in the reserved namespace.  Database
services will filter all names starting with the NULL byte from
responses unless the request includes the header:

    X-Backend-Allow-Reserved-Names: true

The proxy server will not allow path names to include the NULL byte in
urls unless a middlware has set the X-Backend-Allow-Reserved-Names
header.  Middlewares can use the reserved namespace to create objects
and containers that can not be directly manipulated by clients.  Any
objects and bytes created in the reserved namespace will be aggregated
to the user's account totals.

When deploying internal proxys developers and operators may configure
the gatekeeper middleware to translate the X-Allow-Reserved-Names header
to the Backend header so they can manipulate the reserved namespace
directly through the normal API.

UpgradeImpact: it's not safe to rollback from this change

Change-Id: If912f71d8b0d03369680374e8233da85d8d38f85
2019-11-27 11:22:00 -06:00
Tim Burke
d270596b67 Consistently use io.BytesIO
Change-Id: Ic41b37ac75b5596a8307c4962be86f2a4b0d9731
2019-10-15 15:09:46 +02:00
Pete Zaitcev
bb5fa0ea2e tests: bust md5 of object not footers
The test test_PUT_ec_fragment_quorum_archive_etag_mismatch
busts the md5 in server.py, so it ends damaging the md5 of
footers instead of the fragment archive. It appears that the
intention of the test was to check the integrity verification
for fragment archive, so change the test to bust diskfile.py
instead.

Change-Id: I54a203bb637d5f5814e8df2b4297758b0b72adac
2019-10-04 21:24:30 -07:00
Zuul
6114965ab9 Merge "Fix some request-smuggling vectors on py3" 2019-10-02 23:09:48 +00:00
Tim Burke
bf9346d88d Fix some request-smuggling vectors on py3
A Python 3 bug causes us to abort header parsing in some cases. We
mostly worked around that in the related change, but that was *after*
eventlet used the parsed headers to determine things like message
framing. As a result, a client sending a malformed request (for example,
sending both Content-Length *and* Transfer-Encoding: chunked headers)
might have that request parsed properly and authorized by a proxy-server
running Python 2, but the proxy-to-backend request could get misparsed
if the backend is running Python 3. As a result, the single client
request could be interpretted as multiple requests by an object server,
only the first of which was properly authorized at the proxy.

Now, after we find and parse additional headers that weren't parsed by
Python, fix up eventlet's wsgi.input to reflect the message framing we
expect given the complete set of headers. As an added precaution, if the
client included Transfer-Encoding: chunked *and* a Content-Length,
ensure that the Content-Length is not forwarded to the backend.

Change-Id: I70c125df70b2a703de44662adc66f740cc79c7a9
Related-Change: I0f03c211f35a9a49e047a5718a9907b515ca88d7
Closes-Bug: 1840507
2019-10-02 08:20:20 -07:00
Tim Burke
291873e784 proxy: Don't trust Content-Length for chunked transfers
Previously we'd
- complain that a client disconnected even though they finished their
  chunked transfer just fine, and
- on EC, send a X-Backend-Obj-Content-Length for pre-allocation even
  though Content-Length doesn't determine request body size.

Change-Id: Ia80e595f713695cbb41dab575963f2cb9bebfa09
Related-Bug: 1840507
2019-09-23 10:49:26 -07:00
Zuul
0790b62e1f Merge "tests/py3: Improve header casing" 2019-09-14 00:09:56 +00:00
Zuul
3ec6ce2a0f Merge "py3: fix up listings on sharded containers" 2019-08-28 07:37:16 +00:00
Tim Burke
4d83b9b95e tests/py3: Improve header casing
Previously, our unit tests with socket servers would let eventlet
capitalize headers on the way out, which

- isn't something we want to have eventlet do, because it
- breaks unicode-in-header-names on py3, so it
- is already disabled in swift.common.wsgi.run_server() for real servers.

Include a test to make sure we don't forget about it in the future.

Change-Id: I0156d0059092ed414b296c65fb70fc18533b074a
2019-08-26 14:44:05 -07:00
Tim Burke
3750285bc8 py3: fix up listings on sharded containers
We were playing a little fast & loose with types before; as a result,
marker/end_marker weren't quite working right. In particular, we were
checking whether a WSGI string was contained in a shard range, while
ShardRange assumes all comparisons are against native strings.

Now, get everything to native strings before making comparisons, and
get them back to wsgi when we shove them in the params dict.

Change-Id: Iddf9e089ef95dc709ab76dc58952a776246991fd
2019-08-15 12:34:02 -07:00
Clay Gerrard
25aeb0ca49 Make GreenAsyncPile not hang
It's probably weird that StreamingPile has this interfaces that swallows
exceptions, but this seems better than hanging.

Change-Id: I8fe45c0f0d291efc84f3edf5d6b7cd116b5c7835
2019-08-02 08:19:41 -07:00
Tim Burke
3189410f9d Ignore 404s from handoffs for objects when calculating quorum
We previously realized we needed to do that for accounts and containers
where the consequences of treating the 404 as authoritative were more
obvious: we'd cache the non-existence which prevented writes until it
fell out of cache.

The same basic logic applies for objects, though: if we see

    (Timeout, Timeout, Timeout, 404, 404, 404)

on a triple-replica policy, we don't really have any reason to think
that a 404 is appropriate. In fact, it seems reasonably likely that
there's a thundering-herd problem where there are too many concurrent
requests for data that *definitely is there*. By responding with a 503,
we apply some back-pressure to clients, who hopefully have some
exponential backoff in their retries.

The situation gets a bit more complicated with erasure-coded data, but
the same basic principle applies. We're just more likely to have
confirmation that there *is* data out there, we just can't reconstruct
it (right now).

Note that we *still want to check* those handoffs, of course. Our
fail-in-place strategy has us replicate (and, more recently,
reconstruct) to handoffs to maintain durability; it'd be silly *not* to
look.

UpgradeImpact:
--------------
Be aware that this may cause an increase in 503 Service Unavailable
responses served by proxy-servers. However, this should more accurately
reflect the state of the system.

Co-Authored-By: Thiago da Silva <thiagodasilva@gmail.com>
Change-Id: Ia832e9bab13167948f01bc50aa8a61974ce189fb
Closes-Bug: #1837819
Related-Bug: #1833612
Related-Change: I53ed04b5de20c261ddd79c98c629580472e09961
Related-Change: Ief44ed39d97f65e4270bf73051da9a2dd0ddbaec
2019-08-01 14:07:39 -07:00
Tim Burke
a1af3811a7 sharding: Cache shard ranges for object writes
Previously, we issued a GET to the root container for every object PUT,
POST, and DELETE. This puts load on the container server, potentially
leading to timeouts, error limiting, and erroneous 404s (!).

Now, cache the complete set of 'updating' shards, and find the shard for
this particular update in the proxy. Add a new config option,
recheck_updating_shard_ranges, to control the cache time; it defaults to
one hour. Set to 0 to fall back to previous behavior.

Note that we should be able to tolerate stale shard data just fine; we
already have to worry about async pendings that got written down with
one shard but may not get processed until that shard has itself sharded
or shrunk into another shard.

Also note that memcache has a default value limit of 1MiB, which may be
exceeded if a container has thousands of shards. In that case, set()
will act like a delete(), causing increased memcache churn but otherwise
preserving existing behavior. In the future, we may want to add support
for gzipping the cached shard ranges as they should compress well.

Change-Id: Ic7a732146ea19a47669114ad5dbee0bacbe66919
Closes-Bug: 1781291
2019-07-11 10:40:38 -07:00
Clay Gerrard
044c919871 More tests for 404 handoff skipping
Sometimes we want 404, sometimes we want 503 - it's tricky

Change-Id: I30f5af07e2e1fc7cbb6bdb1c334a0a161caf0906
Related-Change-Id: I53ed04b5de20c261ddd79c98c629580472e09961
2019-06-25 13:42:26 -05:00
Clay Gerrard
563e1671cf Return 503 when primary containers can't respond
Closes-Bug: #1833612

Change-Id: I53ed04b5de20c261ddd79c98c629580472e09961
2019-06-25 12:23:12 -05:00
Clay Gerrard
82169ead1c Don't handle object without container
Closes-Bug: #1833616

Change-Id: I16ca1589abcea2bca942da7e97719286a8961ea6
2019-06-24 16:58:09 -05:00
Tim Burke
ef8818a639 Fix up how we memcache on py3
Previously, we stored the WSGI strings in memcached and returned them when
responding to get_account/container_info calls. This would lead to cache
corruption in a heterogenous py2/py3 cluster such as you would have during
a rolling upgrade.

Now, only store and return native strings.

Change-Id: I8d6f66dfe846493972e433f70bad76a33d204562
2019-06-14 08:20:36 -07:00
Tim Burke
aa2f1db1b7 Ensure get_*_info keys are native strings
Change-Id: I29bbea48ae38cfabf449a9f4cca1f5f27769405a
2019-06-11 14:50:49 -07:00
Tim Burke
ff04ef05cd Rework private-request-method interface
Instead of taking a X-Backend-Allow-Method that *must match* the
REQUEST_METHOD, take a truish X-Backend-Allow-Private-Methods and
expand the set of allowed methods. This allows us to also expose
the full list of available private methods when returning a 405.

Drive-By: make async-delete tests a little more robust:
  * check that end_marker and prefix are preserved on subsequent
    listings
  * check that objects with a leading slash are correctly handled

Change-Id: I5542623f16e0b5a0d728a6706343809e50743f73
2019-05-22 16:36:50 -07:00
Tim Burke
83d0161991 Add operator tool to async-delete some or all objects in a container
Adds a tool, swift-container-deleter, that takes an account/container
and optional prefix, marker, and/or end-marker; spins up an internal
client; makes listing requests against the container; and pushes the
found objects into the object-expirer queue with a special
application/async-deleted content-type.

In order to do this enqueuing efficiently, a new internal-to-the-cluster
container method is introduced: UPDATE. It takes a JSON list of object
entries and runs them through merge_items.

The object-expirer is updated to look for work items with this
content-type and skip the X-If-Deleted-At check that it would normally
do.

Note that the target-container's listing will continue to show the
objects until data is actually deleted, bypassing some of the concerns
raised in the related change about clearing out a container entirely and
then deleting it.

Change-Id: Ia13ee5da3d1b5c536eccaadc7a6fdcd997374443
Related-Change: I50e403dee75585fc1ff2bb385d6b2d2f13653cf8
2019-05-22 13:22:50 -07:00
Zuul
72a0115514 Merge "Get functional/tests.py running under py3" 2019-05-18 07:33:03 +00:00
Tim Burke
8b3d0a6c64 py3: finish porting proxy/test_server.py
Change-Id: I8287db75b4f19581203360c646e72f64fe45f170
2019-05-08 17:47:40 -07:00
Tim Burke
506279235d Get functional/tests.py running under py3
Note that you need a pretty recent eventlet to pick up
https://github.com/eventlet/eventlet/commit/f0bc79e

Change-Id: I6b006b972e7431c406039f4e0f6890a8f74a4432
2019-05-08 17:44:03 -07:00
Tim Burke
259224f009 py3: port unit/proxy/test_server.py
All except the versioned_writes tests, which are hairy.

Change-Id: Ieb54869f93a70c8887d33bd2ad46cf04a190d896
2019-05-08 15:59:06 -07:00
Zuul
4473fd9ba1 Merge "py3: port unit/proxy/test_sysmeta.py" 2019-05-07 19:17:31 +00:00