swift

Author	SHA1	Message	Date
Tim Burke	99947150dd	func tests: work with etag-quoter on by default Also, run the in-process encryption func tests like that. Change-Id: I984ab8d1304d23b89589973950b10dda4aea0db3	2020-06-01 18:38:23 -05:00
Zuul	948289151b	Merge "probe tests: Work when fronted by a TLS terminator"	2020-05-05 00:45:51 +00:00
Tim Burke	630c9ef809	probe tests: Work when fronted by a TLS terminator * Add a new config option, proxy_base_url * Support HTTPS as well as HTTP connections * Monkey-patch eventlet early so we never import an unpatched version from swiftclient Change-Id: I4945d512966d3666f2738058f15a916c65ad4a6b	2020-05-04 10:54:01 -07:00
Tim Burke	d0f0d1d4f3	sharding: Add probe test that exercises swift-manage-shard-ranges Change-Id: Ic7c40589679c290e5565f9581f70b9a1c070f6ab	2020-04-20 18:46:31 -07:00
Tim Burke	668242c422	pep8: Turn on E305 Change-Id: Ia968ec7375ab346a2155769a46e74ce694a57fc2	2020-04-03 21:22:38 +02:00
Andreas Jaeger	96b56519bf	Update hacking for Python3 The repo is Python using both Python 2 and 3 now, so update hacking to version 2.0 which supports Python 2 and 3. Note that latest hacking release 3.0 only supports version 3. Fix problems found. Remove hacking and friends from lower-constraints, they are not needed for installation. Change-Id: I9bd913ee1b32ba1566c420973723296766d1812f	2020-04-03 21:21:07 +02:00
Tim Burke	ff885d30e4	py3: Fix up probe tests Change-Id: Ic0f54f393002e2170e7f1459625ee5a2b37df900	2020-02-03 19:26:17 -08:00
Zuul	17e57e38cd	Merge "probe: Add test for syncing a delete when the remote 404s"	2020-01-31 22:45:24 +00:00
Clay Gerrard	2759d5d51c	New Object Versioning mode This patch adds a new object versioning mode. This new mode provides a new set of APIs for users to interact with older versions of an object. It also changes the naming scheme of older versions and adds a version-id to each object. This new mode is not backwards compatible or interchangeable with the other two modes (i.e., stack and history), especially due to the changes in the namimg scheme of older versions. This new mode will also serve as a foundation for adding S3 versioning compatibility in the s3api middleware. Note that this does not (yet) support using a versioned container as a source in container-sync. Container sync should be enhanced to sync previous versions of objects. Change-Id: Ic7d39ba425ca324eeb4543a2ce8d03428e2225a1 Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Thiago da Silva <thiagodasilva@gmail.com>	2020-01-24 17:39:56 -08:00
Thiago da Silva	26ff2eb1cb	container-sync: Sync static links similar to how we sync SLOs This allows static symlinks to be synced before their target. Dynamic symlinks could already be synced even if target object has not been synced, but static links previously required that target object existed before it can be PUT. Now, have container_sync middleware plumb in an override like it does for SLO. Change-Id: I3bfc62b77b247003adcee6bd4d374168bfd6707d	2020-01-24 17:15:57 -08:00
Tim Burke	73a0e8e9cf	probe: Add test for syncing a delete when the remote 404s This works fine; we continue processing the other rows in the DB. But it does take longer than it really ought to require. See the related bug; we ought to be able to shave some 17s off the test time by not retrying on the 404. Change-Id: I9ca2511651e9b2bc0045894baa4062d20bc15369 Related-Bug: #1849841	2020-01-20 22:06:43 -08:00
Zuul	a4f1078864	Merge "Allow reconciler to handle reserved names"	2020-01-07 03:01:36 +00:00
Zuul	e32689a96d	Merge "Deprecate per-service auto_create_account_prefix"	2020-01-07 01:30:20 +00:00
Clay Gerrard	b1178b4a96	Allow reconciler to handle reserved names Change-Id: Ib918f10e95970b9f562b88e923c25608b826b83f	2020-01-05 10:04:05 -06:00
Clay Gerrard	4601548dab	Deprecate per-service auto_create_account_prefix If we move it to constraints it's more globally accessible in our code, but more importantly it's more obvious to ops that everything breaks if you try to mis-configure different values per-service. Change-Id: Ib8f7d08bc48da12be5671abe91a17ae2b49ecfee	2020-01-05 09:53:30 -06:00
Tim Burke	3f88907012	sharding: Better-handle newlines in container names Previously, if you were on Python 2.7.10+ [0], such a newline would cause the sharder to fail, complaining about invalid header values when trying to create the shard containers. On older versions of Python, it would most likely cause a parsing error in the container-server that was trying to handle the PUT. Now, quote all places that we pass around container paths. This includes: * The X-Container-Sysmeta-Shard-(Quoted-)Root sent when creating the (empty) remote shards * The X-Container-Sysmeta-Shard-(Quoted-)Root included when initializing the local handoff for cleaving * The X-Backend-(Quoted-)Container-Path the proxy sends to the object-server for container updates * The Location header the container-server sends to the object-updater Note that a new header was required in requests so that servers would know whether the value should be unquoted or not. We can get away with reusing Location in responses by having clients opt-in to quoting with a new X-Backend-Accept-Quoted-Location header. During a rolling upgrade, * old object-servers servicing requests from new proxy-servers will not know about the container path override and so will try to update the root container, * in general, object updates are more likely to land in the root container; the sharder will deal with them as misplaced objects, and * shard containers created by new code on servers running old code will think they are root containers until the server is running new code, too; during this time they'll fail the sharder audit and report stats to their account, but both of these should get cleared up upon upgrade. Drive-by: fix a "conainer_name" typo that prevented us from testing that we can shard a container with unicode in its name. Also, add more UTF8 probe tests. [0] See https://bugs.python.org/issue22928 Change-Id: Ie08f36e31a448a547468dd85911c3a3bc30e89f1 Closes-Bug: 1856894	2020-01-03 16:04:57 -08:00
Tim Burke	8c0fd3f138	py3: Make seamless reloads work Starting with Python 3.4, newly-created file descriptors are non-inheritable [0], which causes trouble when we try to use a pipe for IPC. Fortunately, the same PEP that implemented this change also provided a new API to mark file descriptors as being inheritable -- so just do that. While we're at it, * Fix up the probe tests to work on py3 * Fix up the probe tests to work when policy-0 is erasure-coded * Decode the bytes read so py3 doesn't log a b'pid' * Log a warning if the read() is empty; something surely went wrong in the re-exec [0] https://www.python.org/dev/peps/pep-0446/ Change-Id: I2a8a9f3dc78abb99bf9cbcf6b44c32ca644bb07b Related-Change: I3e5229d2fb04be67e53533ff65b0870038accbb7	2019-12-11 01:07:19 +00:00
Clay Gerrard	698717d886	Allow internal clients to use reserved namespace Reserve the namespace starting with the NULL byte for internal use-cases. Backend services will allow path names to include the NULL byte in urls and validate names in the reserved namespace. Database services will filter all names starting with the NULL byte from responses unless the request includes the header: X-Backend-Allow-Reserved-Names: true The proxy server will not allow path names to include the NULL byte in urls unless a middlware has set the X-Backend-Allow-Reserved-Names header. Middlewares can use the reserved namespace to create objects and containers that can not be directly manipulated by clients. Any objects and bytes created in the reserved namespace will be aggregated to the user's account totals. When deploying internal proxys developers and operators may configure the gatekeeper middleware to translate the X-Allow-Reserved-Names header to the Backend header so they can manipulate the reserved namespace directly through the normal API. UpgradeImpact: it's not safe to rollback from this change Change-Id: If912f71d8b0d03369680374e8233da85d8d38f85	2019-11-27 11:22:00 -06:00
Darrell Bishop	1107f24179	Seamlessly reload servers with SIGUSR1 Swift servers can now be seamlessly reloaded by sending them a SIGUSR1 (instead of a SIGHUP). The server forks off a synchronized child to wait to close the old listen socket(s) until the new server has started up and bound its listen socket(s). The new server is exec'ed from the old one so its PID doesn't change. This makes Systemd happier, so a ReloadExec= stanza can now be used. The seamless part means that incoming connections will alwyas get accepted either by the old server or the new one. This eliminates client-perceived "downtime" during server reloads, while allowing the server to fully reload, re-reading configuration, becoming a fresh Python interpreter instance, etc. The SO_REUSEPORT socket option has already been getting used, so nothing had to change there. This patch also includes a non-invasive fix for a current eventlet bug; see https://github.com/eventlet/eventlet/pull/590 That bug prevents a SIGHUP "reload" from properly servicing existing requests before old worker processes close sockets and exit. The existing probtests missed this, but the new ones, in this patch, caught it. New probe tests cover both old SIGHUP "reload" behavior as well as the new SIGUSR1 seamless reload behavior. Change-Id: I3e5229d2fb04be67e53533ff65b0870038accbb7	2019-11-07 10:15:26 -08:00
John Dickinson	0c1b485ad6	exclude utf8 tests under py3 These are known to not work until https://bugs.python.org/issue37093 is addressed in CPython upstream. Change-Id: I4a6877907d14b632a9a477c887913488427b62b7	2019-10-29 20:12:05 +00:00
Zuul	d059505aba	Merge "sharding: Update probe test to verify CleavingContext cleanup"	2019-09-25 23:11:33 +00:00
Tim Burke	9495bc0003	sharding: Update probe test to verify CleavingContext cleanup Change-Id: I219bbbfd6a3c7adcaf73f3ee14d71aadd183633b Related-Change: I1e502c328be16fca5f1cca2186b27a0545fecc16	2019-09-23 16:03:58 -07:00
Tim Burke	1ded0d6c87	Allow arbitrary UTF-8 strings as delimiters in listings AWS seems to support this, so let's allow s3api to do it, too. Previously, S3 clients trying to use multi-character delimiters would get 500s back, because s3api didn't know how to handle the 412s that the container server would send. As long as we're adding support for container listings, may as well do it for accounts, too. Change-Id: I62032ddd50a3493b8b99a40fb48d840ac763d0e7 Co-Authored-By: Thiago da Silva <thiagodasilva@gmail.com> Closes-Bug: #1797305	2019-09-12 10:44:00 -07:00
Tim Burke	1d7e1558b3	py3: (mostly) port probe tests There's still one problem, though: since swiftclient on py3 doesn't support non-ASCII characters in metadata names, none of the tests in TestReconstructorRebuildUTF8 will pass. Change-Id: I4ec879ade534e09c3a625414d8aa1f16fd600fa4	2019-09-04 10:17:45 -07:00
Tim Burke	3189410f9d	Ignore 404s from handoffs for objects when calculating quorum We previously realized we needed to do that for accounts and containers where the consequences of treating the 404 as authoritative were more obvious: we'd cache the non-existence which prevented writes until it fell out of cache. The same basic logic applies for objects, though: if we see (Timeout, Timeout, Timeout, 404, 404, 404) on a triple-replica policy, we don't really have any reason to think that a 404 is appropriate. In fact, it seems reasonably likely that there's a thundering-herd problem where there are too many concurrent requests for data that definitely is there. By responding with a 503, we apply some back-pressure to clients, who hopefully have some exponential backoff in their retries. The situation gets a bit more complicated with erasure-coded data, but the same basic principle applies. We're just more likely to have confirmation that there is data out there, we just can't reconstruct it (right now). Note that we still want to check those handoffs, of course. Our fail-in-place strategy has us replicate (and, more recently, reconstruct) to handoffs to maintain durability; it'd be silly not to look. UpgradeImpact: -------------- Be aware that this may cause an increase in 503 Service Unavailable responses served by proxy-servers. However, this should more accurately reflect the state of the system. Co-Authored-By: Thiago da Silva <thiagodasilva@gmail.com> Change-Id: Ia832e9bab13167948f01bc50aa8a61974ce189fb Closes-Bug: #1837819 Related-Bug: #1833612 Related-Change: I53ed04b5de20c261ddd79c98c629580472e09961 Related-Change: Ief44ed39d97f65e4270bf73051da9a2dd0ddbaec	2019-08-01 14:07:39 -07:00
Tim Burke	a1af3811a7	sharding: Cache shard ranges for object writes Previously, we issued a GET to the root container for every object PUT, POST, and DELETE. This puts load on the container server, potentially leading to timeouts, error limiting, and erroneous 404s (!). Now, cache the complete set of 'updating' shards, and find the shard for this particular update in the proxy. Add a new config option, recheck_updating_shard_ranges, to control the cache time; it defaults to one hour. Set to 0 to fall back to previous behavior. Note that we should be able to tolerate stale shard data just fine; we already have to worry about async pendings that got written down with one shard but may not get processed until that shard has itself sharded or shrunk into another shard. Also note that memcache has a default value limit of 1MiB, which may be exceeded if a container has thousands of shards. In that case, set() will act like a delete(), causing increased memcache churn but otherwise preserving existing behavior. In the future, we may want to add support for gzipping the cached shard ranges as they should compress well. Change-Id: Ic7a732146ea19a47669114ad5dbee0bacbe66919 Closes-Bug: 1781291	2019-07-11 10:40:38 -07:00
Zuul	f55167a735	Merge "Increase node_timeout in gate"	2019-04-30 22:12:01 +00:00
Kota Tsuyuzaki	a30a477755	Stop overwriting reserved term `dir` is a reserved instruction term in python, so this patch avoiding to assing a value to it. Change-Id: If780c4ffb72808b834e25a396665f17bd8383870	2019-03-12 08:53:18 +00:00
Zuul	736e76d764	Merge "probe tests: wait to start replicators until after verifying initial state"	2019-02-27 06:35:54 +00:00
Tim Burke	f4689dd22f	probe tests: wait to start replicators until after verifying initial state Change-Id: Ida7c776201a068d44572d1e94472c975c4bc8e36	2019-02-19 12:11:47 -08:00
Zuul	89b3adc9fb	Merge "probetests: make negative assertion more meaningful"	2019-02-18 18:50:47 +00:00
Clay Gerrard	771963c926	Increase node_timeout in gate Give storage nodes more time to complete requests for multi-node upgrade and probetests. Also slightly decouple probetests from default configs. Change-Id: I334ef517d833916a3b7be3151a812d4f9c66a6e1	2019-02-12 10:39:17 -06:00
Clay Gerrard	ea8e545a27	Rebuild frags for unmounted disks Change the behavior of the EC reconstructor to perform a fragment rebuild to a handoff node when a primary peer responds with 507 to the REPLICATE request. Each primary node in a EC ring will sync with exactly three primary peers, in addition to the left & right nodes we now select a third node from the far side of the ring. If any of these partners respond unmounted the reconstructor will rebuild it's fragments to a handoff node with the appropriate index. To prevent ssync (which is uninterruptible) receiving a 409 (Conflict) we must give the remote handoff node the correct backend_index for the fragments it will recieve. In the common case we will use determistically different handoffs for each fragment index to prevent multiple unmounted primary disks from forcing a single handoff node to hold more than one rebuilt fragment. Handoff nodes will continue to attempt to revert rebuilt handoff fragments to the appropriate primary until it is remounted or rebalanced. After a rebalance of EC rings (potentially removing unmounted/failed devices), it's most IO efficient to run in handoffs_only mode to avoid unnecessary rebuilds. Closes-Bug: #1510342 Change-Id: Ief44ed39d97f65e4270bf73051da9a2dd0ddbaec	2019-02-08 18:04:55 +00:00
Tim Burke	61c9aa4bf3	probetests: make negative assertion more meaningful In test_replication_servers_working, we delete a bunch of directories without deleting hashes.pkl, then verify that nothing at that level is a directory. This would be trivially true except that throughout the test, we have the replicators running constantly. However, we never verified that the replicators actually have run and had a chance to re-create the missing directories. Now, stop the replicators before doing the deletes, run them synchronously between doing the deletes and verifying that there are no directories, and start them again before the final set of assertions. Change-Id: I841f8250eb7abfb0fcdfca5c106f65e6e94dce0c	2019-02-01 01:17:56 +00:00
Tim Burke	c0dbf5b885	sharding: Make replicator logging less scary When we abort the replication process because we've got shard ranges and the sharder is now responsible for ensuring object-row durability, we log a warning like "refusing to replicate objects" which sounds scary. That's because it is, of course -- if the sharder isn't running, whatever rows that DB has may only exist in that DB, meaning we're one drive failure away from losing track of them entirely. However, when the sharder is running and everything's happy, we reach a steady-state where the root containers are all sharded and none of them have any object rows to lose. At that point, the warning does more harm than good. Only print the scary "refusing to replicate" warning if we're still responsible for some object rows, whether deleted or not. Change-Id: I35de08d6c1617b2e446e969a54b79b42e8cfafef	2019-01-31 15:20:12 -08:00
Tim Burke	050f8799ca	Use latest eventlet in probe tests Note that eventlet 0.22.0+ closes connections between requests when it stops accepting connections. Partial-Bug: #1792615 Change-Id: Ia8d9ab95e2aad40e8d797acc3423a917e809ffdb	2018-09-19 14:59:32 -07:00
Tim Burke	5652dec43b	container-updater: Always report zero objects/bytes used for shards Otherwise, a sharded container AUTH_test/sharded will have its stats included in the totals for both AUTH_test and .shards_AUTH_test Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I7fa74e13347601c5f44fd7e6cf65656cc3ebc2c5	2018-06-22 10:35:36 +01:00
Zuul	c568b4b100	Merge "Resolve TODO's in test/probe/test_sharder.py"	2018-06-20 23:48:23 +00:00
Alistair Coles	a59c5e3bae	Resolve TODO's in test/probe/test_sharder.py Resolve outstanding TODO's. One TODO is removed because there isn't an easy way to arrange for an async pending to be targeted at a shard container. Change-Id: I0b003904f73461ddb995b2e6a01e92f14283278d	2018-06-20 19:42:32 +00:00
Zuul	ec066392b5	Merge "Make If-None-Match:* work properly with 0-byte PUTs"	2018-06-05 02:45:06 +00:00
Zuul	9d2a1a1d14	Merge "Make the decision between primary/handoff sets more obvious"	2018-05-23 14:16:24 +00:00
Tim Burke	8c386fff40	Make the decision between primary/handoff sets more obvious Change-Id: I419de59df3317d67c594fe768f5696de24148280	2018-05-22 12:12:42 -07:00
Alistair Coles	37ee89e47a	Avoid premature shrinking in sharder probe test Previously test_misplaced_object_movement() deleted objects from both shards and then relied on override-partitions option to selectively run the sharder on root or shard containers and thereby control when each shard range was identified for shrinking. This approach is flawed when the second shard container lands in the same partition as the root: running the sharder on the empty second shard's partition would also cause the sharder to process the root and identify the second shard for shrinking, resulting in premature shrinking of the second shard. Now, objects are only deleted from each shard range as that shard is wanted to shrink. Change-Id: I9f51621e8414e446e4d3f3b5027f6c40e01192c3 Drive-by: use the run_sharders() helper more often.	2018-05-22 13:35:19 +01:00
Alistair Coles	c35285f14b	Use correct policy when faking misplaced objects in probe test Before, merge_objects() always used storage policy index of 0 when inserting a fake misplaced object into a shard container. If the shard broker had a different policy index then the misplaced object would not show in listings causing test_misplaced_object_movement() to fail. This test bug might be exposed by having policy index 0 be an EC policy, since the probe test requires a replication policy and would therefore choose a non-zero policy index. The fix is simply to specify the shard's policy index when inserting the fake object. Change-Id: Iec3f8ec29950220bb1b2ead9abfdfb1a261517d6	2018-05-21 08:36:14 +01:00
Matthew Oliver	2641814010	Add sharder daemon, manage_shard_ranges tool and probe tests The sharder daemon visits container dbs and when necessary executes the sharding workflow on the db. The workflow is, in overview: - perform an audit of the container for sharding purposes. - move any misplaced objects that do not belong in the container to their correct shard. - move shard ranges from FOUND state to CREATED state by creating shard containers. - move shard ranges from CREATED to CLEAVED state by cleaving objects to shard dbs and replicating those dbs. By default this is done in batches of 2 shard ranges per visit. Additionally, when the auto_shard option is True (NOT yet recommeneded in production), the sharder will identify shard ranges for containers that have exceeded the threshold for sharding, and will also manage the sharding and shrinking of shard containers. The manage_shard_ranges tool provides a means to manually identify shard ranges and merge them to a container in order to trigger sharding. This is currently the recommended way to shard a container. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: I7f192209d4d5580f5a0aa6838f9f04e436cf6b1f	2018-05-18 18:48:13 +01:00
Alistair Coles	9d742b85ad	Refactoring, test infrastructure changes and cleanup ...in preparation for the container sharding feature. Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: I4455677abb114a645cff93cd41b394d227e805de	2018-05-15 18:18:25 +01:00
Tim Burke	b640631daf	Apply remote metadata in _handle_sync_response We've already got it in the response, may as well apply it now rather than wait for the other end to get around to running its replicators. Change-Id: Ie36a6dd075beda04b9726dfa2bba9ffed025c9ef	2018-03-06 19:52:59 +00:00
Tim Burke	748b29ef80	Make If-None-Match:* work properly with 0-byte PUTs When PUTting an object with `If-None-Match: `, we rely 100-continue support: the proxy checks the responses from all object-servers, and if any of them respond 412, it closes down the connections. When there's actual data for the object, this ensures that even nodes that don't* respond 412 will hit a ChunkReadTimeout and abort the PUT. However, if the client does a PUT with a Content-Length of 0, that would get sent all the way to the object server, which had all the information it needed to respond 201. After replication, the PUT propagates to the other nodes and the old object is lost, despite the client receiving a 412 indicating the operation failed. Now, when PUTting a zero-byte object, switch to a chunked transfer so the object-server still gets a ChunkReadTimeout. Change-Id: Ie88e41aca2d59246c3134d743c1531c8e996f9e4	2018-02-26 13:12:44 +00:00
Alistair Coles	1f4ebbc990	kill orphans during probe test setup orphans processes sometimes cause probe test failures so get rid of them before each test. Change-Id: I4ba6748d30fbb28371f13aa95387c49bc8223402	2018-02-08 16:43:18 -08:00
Samuel Merritt	745581ff2f	Don't make async_pendings during object expiration After deleting an object, the object expirer deletes the corresponding row from the expirer queue by making DELETE requests directly to the container servers. The same thing happens after attempting to delete an object, but failing because the object has already been deleted. If the DELETE requests fail, then the expirer will encounter that row again on its next pass and retry the DELETE at that time. Therefore, it is not necessary for the object server to write an async_pending for that queue row's deletion. Currently, however, two of the object servers do write such async_pendings. Given Rc container replicas, that's 2 * Rc updates from async_pendings and another Rc from the object expirer directly. Given a typical Rc of 3, that's 9 container updates per expiring object. This commit makes the object server write no async_pendings for DELETE requests coming from the object expirer. This reduces the number of container server requests to Rc (typically 3), all issued directly from the object expirer. Closes-Bug: 1076202 Change-Id: Icd63c80c73f864d2561e745c3154fbfda02bd0cc	2018-01-17 10:39:11 -08:00

1 2 3 4 5 ...

255 Commits