swift

Author	SHA1	Message	Date
Alistair Coles	41f85f3969	sharder: fix loop in find_compactible_shard_sequences The sharder find_compactible_shard_sequences function was vulnerable to looping with some combinations of shrink_threshold and merge_size parameters. The inner loop might not consume a shard range, resulting in the same shard range being submitted to the inner loop again. This patch simplifies the function in an attempt to make it more obvious that the loops are always making progress towards termination by consuming shard ranges from the list. Change-Id: Ia87ab6feaf5172d91f1c60c2e0f72e03182e3c9b	2021-02-28 15:45:19 +00:00
Alistair Coles	c9f6e760bc	Add unit test coverage for CleavingContext.done() Change-Id: Ica983aa13a7bcd07bc73c68f3938b190c4bbaa3d	2021-02-25 14:59:00 +00:00
Alistair Coles	a3d77cac07	Simplify interface to sharder shrinking Calls to process_compactible_shard_sequences were always followed by calls to finalize_shrinking, so simplify the sharder module interface by making process_compactible_shard_sequences call finalize_shrinking. Change-Id: I22b8d23f32a5e776c37f711a913e4b40425d5e54	2021-02-23 11:01:17 +00:00
Matthew Oliver	13b17af45e	Add shrink candidates to recon dump This patch adds shrinking candidates to the sharding recon dump shrinking candidates will always be SHARDED root containers. And get added to the candidate list if they have any ranges that are compactible, that is to say they have ranges that can be compacted into an upper neighbour. The shrinking_candidates data comes out something like: { 'found': 1, 'top': [ { 'object_count': <some number>, 'account': 'a', 'meta_timestamp': <ts1>, 'container': 'c', 'file_size': <something>, 'path': <something>, 'root': <something>, 'node_index': 0, 'compactible_ranges': 2 }] } In this case 'compactible_ranges' is the number of donors that can be shrunk in a single command. Change-Id: I63fc9ae39e164c2ce82865d055527b52c86b5b2a	2021-02-12 08:26:41 -08:00
Alistair Coles	e8df26a2b5	sharder: Ignore already shrinking sequence when compacting If a sequence of shard ranges is already shrinking then in some circumstances we do not want to report it as a candidate for shrinking. For backwards compatibility allow already shrinking sequences to be optionally included in return value of find_compactible_shard_sequences. Also refactor to add an is_shrinking_candidate() function. Change-Id: Ifa20b7c08aba7254185918dfcee69e8206f51cea	2021-02-12 08:15:47 -08:00
Alistair Coles	21a01e1c05	find_compactible_shard_sequences: fix skipped range Fix the find_compactible_shard_sequences function to prevent skipping a shard range after finding a sequence of compactible shard ranges that approaches the merge size. Previously a compactible sequence would correctly terminate on the nth shard range if the n+1th shard range would take the object count over the merge_size, but the n+1th shard range would then be skipped and not considered for the start of the next sequence. Change-Id: I670441e7426b28ab2247563c7fa854d1cd502316	2021-02-12 07:56:08 -08:00
Alistair Coles	5dc7c6a24d	sharder: add find_compactible_shard_sequences unit test This method was previously only indirectly covered by test_manage_shard_ranges. Add unit tests in test_sharder. Change-Id: I9d0403f6dfa7a988e79f79a38ff713d05476cb84	2021-02-12 07:54:42 -08:00
Alistair Coles	8e0060e2fc	Correct spelling of 'compactable' to 'compactible' Change-Id: I5af05362741254e16381e0a43dd595cf81c1f7d8	2021-02-12 09:44:36 +00:00
Zuul	eafeda8a51	Merge "swift-manage-shard-ranges: add 'compact' command"	2021-02-10 03:38:50 +00:00
Zuul	1d34f321ac	Merge "Enable shard ranges to be manually shrunk to root container"	2021-02-08 03:01:39 +00:00
Alistair Coles	12bb4839f0	swift-manage-shard-ranges: add 'compact' command This patch adds a 'compact' command to swift-manage-shard-ranges that enables sequences of contiguous shards with low object counts to be compacted into another existing shard, or into the root container. Change-Id: Ia8f3297d610b5a5cf5598d076fdaf30211832366	2021-02-05 17:18:29 +00:00
Alistair Coles	b0c8de699e	Enable shard ranges to be manually shrunk to root container Shard containers learn about their own shard range by fetching shard ranges from the root container during the sharder audit phase. Since [1], if the shard is shrinking, it may also learn about acceptor shards in the shard ranges fetched from the root. However, the fetched shard ranges do not currently include the root's own shard range, even when the root is to be the acceptor for a shrinking shard. This prevents the mechanism being used to perform shrinking to root. This patch modifies the root container behaviour to include its own shard range in responses to shard containers when the container GET request param 'states' has value 'auditing'. This parameter is used to indicate that a particular GET request is from the sharder during shard audit; the root does not otherwise include its own shard range in GET responses. When the 'states=auditing' parameter is used with a container GET request the response includes all shard ranges except those in the FOUND state. The shard ranges of relevance to a shard are its own shard range and any overlapping shard ranges that may be acceptors if the shard is shrinking. None of these relevant shard ranges should be in state FOUND: the shard itself cannot be in FOUND state since it has been created; acceptor ranges should not be in FOUND state. The FOUND state is therefore excluded from the 'auditing' states to prevent an unintended overlapping FOUND shard range that has not yet been resolved at the root container being fetched by a shrinking shard, which might then proceed to create and cleave to it. The shard only merges the root's shard range (and any other shard ranges) when the shard is shrinking. If the root shard range is ACTIVE then it is the acceptor and will be used when the shard cleaves. If the root shard range is in any other state then it will be ignored when the shard cleaves to other acceptors. The sharder cleave loop is modified to break as soon as cleaving is done i.e. cleaving has been completed up to the shard's upper bound. This prevents misleading logging that cleaving has stopped when in fact cleaving to a non-root acceptor has completed but the shard range list still contains an irrelevant root shard range in SHARDED state. This also prevents cleaving to more than one acceptor in the unexpected case that multiple active acceptors overlap the shrinking shard - cleaving will now complete once the first acceptor has cleaved. [1] Related-Change: I9034a5715406b310c7282f1bec9625fe7acd57b6 Change-Id: I5d48b67217f705ac30bb427ef8d969a90eaad2e5	2021-02-05 11:44:50 +00:00
Zuul	923323f477	Merge "Do not reclaim sharded roots until they shrink"	2021-01-28 02:24:40 +00:00
Clay Gerrard	f53ba5b502	Do not reclaim sharded roots until they shrink We don't normally issue any DELETEs to shards when an empty root accepts a DELETE from the client. If we allow root dbs to reclaim while they still have shards we risk letting undeleted shards get orphaned. Partial-Bug: 1911232 Change-Id: I4f591e393a526bb74675874ba81bf743936633c1	2021-01-25 20:27:59 +00:00
Alistair Coles	beb1c3969b	Fix intermittent failures in sharder audit unit tests Use predictable timestamp iterators to avoid some sharder audit tests intermittently failing due to timestamps not advancing as assumed. Change-Id: I1caea0925a6e9a853c7d6d7ad23c27bd37c5056f	2021-01-25 09:25:03 +00:00
Clay Gerrard	d277960161	Populate shrinking shards with shard ranges learnt from root Shard shrinking can be instigated by a third party modifying shard ranges, moving one shard to shrinking state and expanding the namespace of one or more other shard(s) to act as acceptors. These state and namespace changes must propagate to the shrinking and acceptor shards. The shrinking shard must also discover the acceptor shard(s) into which it will shard itself. The sharder audit function already updates shards with their own state and namespace changes from the root. However, there is currently no mechanism for the shrinking shard to learn about the acceptor(s) other than by a PUT request being made to the shrinking shard container. This patch modifies the shard container audit function so that other overlapping shards discovered from the root are merged into the audited shard's db. In this way, the audited shard will have acceptor shards to cleave to if shrinking. This new behavior is restricted to when the shard is shrinking. In general, a shard is responsible for processing its own sub-shard ranges (if any) and reporting them to root. Replicas of a shard container synchronise their sub-shard ranges via replication, and do not rely on the root to propagate sub-shard ranges between shard replicas. The exception to this is when a third party (or auto-sharding) wishes to instigate shrinking by modifying the shard and other acceptor shards in the root container. In other circumstances, merging overlapping shard ranges discovered from the root is undesirable because it risks shards inheriting other unrelated shard ranges. For example, if the root has become polluted by split-brain shard range management, a sharding shard may have its sub-shards polluted by an undesired shard from the root. During the shrinking process a shard range's own shard range state may be either shrinking or, prior to this patch, sharded. The sharded state could occur when one replica of a shrinking shard completed shrinking and moved the own shard range state to sharded before other replica(s) had completed shrinking. This makes it impossible to distinguish a shrinking shard (with sharded state), which we do want to inherit shard ranges, from a sharding shard (with sharded state), which we do not want to inherit shard ranges. This patch therefore introduces a new shard range state, 'SHRUNK', and applies this state to shard ranges that have completed shrinking. Shards are now restricted to inherit shard ranges from the root only when their own shard range state is either SHRINKING or SHRUNK. This patch also: - Stops overlapping shrinking shards from generating audit warnings: overlaps are cured by shrinking and we therefore expect shrinking shards to sometimes overlap. - Extends an existing probe test to verify that overlapping shard ranges may be resolved by shrinking a subset of the shard ranges. - Adds a --no-auto-shard option to swift-container-sharder to enable the probe tests to disable auto-sharding. - Improves sharder logging, in particular by decrementing ranges_todo when a shrinking shard is skipped during cleaving. - Adds a ShardRange.sort_key class method to provide a single definition of ShardRange sort ordering. - Improves unit test coverage for sharder shard auditing. Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I9034a5715406b310c7282f1bec9625fe7acd57b6	2020-12-18 11:33:48 +00:00
Ade Lee	5320ecbaf2	replace md5 with swift utils version md5 is not an approved algorithm in FIPS mode, and trying to instantiate a hashlib.md5() will fail when the system is running in FIPS mode. md5 is allowed when in a non-security context. There is a plan to add a keyword parameter (usedforsecurity) to hashlib.md5() to annotate whether or not the instance is being used in a security context. In the case where it is not, the instantiation of md5 will be allowed. See https://bugs.python.org/issue9216 for more details. Some downstream python versions already support this parameter. To support these versions, a new encapsulation of md5() is added to swift/common/utils.py. This encapsulation is identical to the one being added to oslo.utils, but is recreated here to avoid adding a dependency. This patch is to replace the instances of hashlib.md5() with this new encapsulation, adding an annotation indicating whether the usage is a security context or not. While this patch seems large, it is really just the same change over and again. Reviewers need to pay particular attention as to whether the keyword parameter (usedforsecurity) is set correctly. Right now, all of them appear to be not used in a security context. Now that all the instances have been converted, we can update the bandit run to look for these instances and ensure that new invocations do not creep in. With this latest patch, the functional and unit tests all pass on a FIPS enabled system. Co-Authored-By: Pete Zaitcev Change-Id: Ibb4917da4c083e1e094156d748708b87387f2d87	2020-12-15 09:52:55 -05:00
Zuul	f953bc0d49	Merge "Refactor audit shard tests"	2020-11-06 02:37:15 +00:00
Zuul	d50b50ba8d	Merge "Address a sharder/replicator race"	2020-08-07 07:46:19 +00:00
Tim Burke	2a6dfae2f3	Allow direct and internal clients to use the replication network A new header `X-Backend-Use-Replication-Network` is added; if true, use the replication network instead of the client-data-path network. Several background daemons are updated to use the replication network: * account-reaper * container-reconciler * container-sharder * container-sync * object-expirer Note that if container-sync is being used to sync data within the same cluster, the replication network will only be used when communicating with the "source" container; the "destination" traffic will continue to use the configured realm endpoint. The direct and internal client APIs still default to using the client-data-path network; this maintains backwards compatibility for external tools written against them. UpgradeImpact ============= Until recently, servers configured with replication_server = true would only handle REPLICATE (and, in the case of object servers, SSYNC) requests, and would respond 405 Method Not Allowed to other requests. When upgrading from Swift 2.25.0 or earlier, remove the config option and restart services prior to upgrade to avoid a flood of background daemon errors in logs. Note that some background daemons find work by querying Swift rather than walking local drives that should be available on the replication network: * container-reconciler * object-expirer Previosuly these may have been configured without access to the replication network; ensure they have access before upgrading. Closes-Bug: #1883302 Related-Bug: #1446873 Related-Change: Ica2b41a52d11cb10c94fa8ad780a201318c4fc87 Change-Id: Ieef534bf5d5fb53602e875b51c15ef565882fbff	2020-08-04 21:22:04 +00:00
Clay Gerrard	34dd24e804	Refactor audit shard tests The existing tests cover a lot of behaviors and carry around a lot of state that makes them hard to extend in a descriptive mannor to cover new or changed behaviors. Change-Id: Ie52932d8d4a66b11c295d5568aa3a60895b84f3b	2020-08-03 11:40:13 -05:00
Tim Burke	36bd21488e	Address a sharder/replicator race In the previous patch, we could clean up all container DBs, but only if the daemons went in a specific order (which cannot be guaranteed in a production system). Once a reclaim age passes, there's a race: If the container-replicator processes the root container before the container-sharder processes the shards, the deleted shards would get reaped from the root so they won't be available for the sharder. The shard containers then hang around indefinitely. Now, be willing to mark shard DBs as deleted even when we can't find our own shard range in the root. Fortunately, the shard already knows that its range has been deleted; we don't need to get that info from the root. Change-Id: If08bccf753490157f27c95b4038f3dd33d3d7f8c Related-Change: Icba98f1c9e17e8ade3f0e1b9a23360cf5ab8c86b	2020-07-13 14:57:25 -07:00
Tim Burke	cedec8c5ef	Latch shard-stat reporting The idea is, if none of - timestamp, - object_count, - bytes_used, - state, or - epoch has changed, we shouldn't need to send an update back to the root container. This is more-or-less comparable to what the container-updater does to avoid unnecessary writes to the account. Closes-Bug: #1834097 Change-Id: I1ee7ba5eae3c508064714c4deb4f7c6bbbfa32af	2020-05-29 22:33:10 -07:00
Andreas Jaeger	96b56519bf	Update hacking for Python3 The repo is Python using both Python 2 and 3 now, so update hacking to version 2.0 which supports Python 2 and 3. Note that latest hacking release 3.0 only supports version 3. Fix problems found. Remove hacking and friends from lower-constraints, they are not needed for installation. Change-Id: I9bd913ee1b32ba1566c420973723296766d1812f	2020-04-03 21:21:07 +02:00
Zuul	e32689a96d	Merge "Deprecate per-service auto_create_account_prefix"	2020-01-07 01:30:20 +00:00
Clay Gerrard	4601548dab	Deprecate per-service auto_create_account_prefix If we move it to constraints it's more globally accessible in our code, but more importantly it's more obvious to ops that everything breaks if you try to mis-configure different values per-service. Change-Id: Ib8f7d08bc48da12be5671abe91a17ae2b49ecfee	2020-01-05 09:53:30 -06:00
Tim Burke	3f88907012	sharding: Better-handle newlines in container names Previously, if you were on Python 2.7.10+ [0], such a newline would cause the sharder to fail, complaining about invalid header values when trying to create the shard containers. On older versions of Python, it would most likely cause a parsing error in the container-server that was trying to handle the PUT. Now, quote all places that we pass around container paths. This includes: * The X-Container-Sysmeta-Shard-(Quoted-)Root sent when creating the (empty) remote shards * The X-Container-Sysmeta-Shard-(Quoted-)Root included when initializing the local handoff for cleaving * The X-Backend-(Quoted-)Container-Path the proxy sends to the object-server for container updates * The Location header the container-server sends to the object-updater Note that a new header was required in requests so that servers would know whether the value should be unquoted or not. We can get away with reusing Location in responses by having clients opt-in to quoting with a new X-Backend-Accept-Quoted-Location header. During a rolling upgrade, * old object-servers servicing requests from new proxy-servers will not know about the container path override and so will try to update the root container, * in general, object updates are more likely to land in the root container; the sharder will deal with them as misplaced objects, and * shard containers created by new code on servers running old code will think they are root containers until the server is running new code, too; during this time they'll fail the sharder audit and report stats to their account, but both of these should get cleared up upon upgrade. Drive-by: fix a "conainer_name" typo that prevented us from testing that we can shard a container with unicode in its name. Also, add more UTF8 probe tests. [0] See https://bugs.python.org/issue22928 Change-Id: Ie08f36e31a448a547468dd85911c3a3bc30e89f1 Closes-Bug: 1856894	2020-01-03 16:04:57 -08:00
Zuul	2d87ad6333	Merge "sharder: Keep cleaving on empty shard ranges"	2019-09-27 01:42:09 +00:00
Matthew Oliver	e9cd9f74a5	sharder: Keep cleaving on empty shard ranges When a container is being cleaved there is a possiblity that we're dealing with an empty or near empty container created on a handoff node. These containers may have a valid list of shard ranges, so would need to cleave to the new shards. Currently, when using a `cleave_batch_size` that is smaller then the number of shard ranges on the cleaving container, these containers will have to take a few shard passes to shard, even though there maybe nothing in them. This is worse if a really large container is sharding, and due to being slow, error limitted a node causing a new container on a handoff location. This empty container would have a large number of shard ranges and could take a _very_ long time to shard away, slowing the process down. This patch eliminates the issue by detecting when no objects are returned for a shard range. The `_cleave_shard_range` method now returns 3 possible results: - CLEAVE_SUCCESS - CLEAVE_FAILED - CLEAVE_EMPTY They all are pretty self explanitory. When `CLEAVE_EMPTY` is returned the code will: - Log - Not replicate the empty temp shard container sitting in a handoff location - Not count the shard range in the `cleave_batch_size` count - Update the cleaving context so sharding can move forward If there already is a shard range DB existing on a handoff node to use then the sharder wont skip it, even if there are no objects, it'll replicate it and treat it as normal, including using a `cleave_batch_size` slot. Change-Id: Id338f6c3187f93454bcdf025a32a073284a4a159 Closes-Bug: #1839355	2019-09-26 14:46:43 -07:00
Matthew Oliver	370ac4cd70	Sharding: Use the metadata timestamp as last_modified This is a follow up patch from the cleaning up cleave context's patch (patch 681970). Instead of tracking a last_modified timestamp, and storing it in the context metadata, use the timestamp we use when storing any metadata. Reducing duplication is nice, but there's a more significant reason to do this: affected container DBs can start getting cleaned up as soon as they're running the new code rather than needing to wait for an additional reclaim_age. Change-Id: I2cdbe11f06ffb5574e573c4a60ba4e5d41a00c50	2019-09-23 13:43:09 -07:00
Matthew Oliver	81a41da542	Sharding: Clean up old CleaveConext's during audit There is a sharding edge case where more CleaveContext are generated and stored in the sharding container DB. If this number get's high enough, like in the linked bug. If enough CleaveContects build up in the DB then this can lead to the 503's when attempting to list the container due to all the `X-Container-Sysmeta-Shard-Context-*` headers. This patch resolves this by tracking the a CleaveContext's last modified. And during the sharding audit, any context's that hasn't been touched after reclaim_age are deleted. This plus the skip empty ranges patches should improve these handoff shards. Change-Id: I1e502c328be16fca5f1cca2186b27a0545fecc16 Closes-Bug: #1843313	2019-09-18 17:10:36 +10:00
Pete Zaitcev	575538b55b	py3: port the container This started with ShardRanges and its CLI. The sharder is at the bottom of the dependency chain. Even container backend needs it. Once we started tinkering with the sharder, it all snowballed to include the rest of the container services. Beware, this does affect some of Python 2 code. Mostly it's trivial and obviously correct, but needs checking by reviewers. About killing the stray "from __future__ import unicode_literals": we do not do it in general. The specific problem it caused was a failure of functional tests because unicode leaked into a field that was supposed to be encoded. It is just too hard to track the types when rules change from file to file, so off with its head. Change-Id: Iba4e65d0e46d8c1f5a91feb96c2c07f99ca7c666	2019-02-20 21:30:46 -06:00
John Dickinson	c26d67efcf	fixed _check_node() in the container sharder Previously, _check_node() wouldn't catch the raise ValueError when a drive was unmounted. Therefore the error would bubble up, uncaught, and stop the shard cycle. The practical effect is that an unmounted drive on a node would prevent sharding for happening. This patch updates _check_node() to properly use the check_drive() method. Furthermore, the _check_node() return value has been modified to be more similar to what check_drive() actually returns. This should help prevent similar errors from being introduced in the future. Closes-Bug: #1806500 Change-Id: I3da9b5b120a5980e77ef5c4dc8fa1697e462ce0d	2018-12-04 16:16:04 -08:00
Clay Gerrard	06cf5d298f	Add databases_per_second to db daemons Most daemons have a "go as fast as you can then sleep for 30 seconds" strategy towards resource utilization; the object-updater and object-auditor however have some "X_per_second" options that allow operators much better control over how they spend their I/O budget. This change extends that pattern into the account-replicator, container-replicator, and container-sharder which have been known to peg CPUs when they're not IO limited. Partial-Bug: #1784753 Change-Id: Ib7f2497794fa2f384a1a6ab500b657c624426384	2018-10-30 22:28:05 +00:00
Tim Burke	773b633118	Change default sharding threshold to 1,000,000 objects ...instead of 10,000,000. The sample configs were already using one million, all of our testing with non-SAIO containers was done with one million, and the resulting container DBs were around 100MB which seems like a comfortable size. Pretty sure this was just a typo during some code cleanup. Change-Id: Icd31f9d8efaac2d5dc0f021cad550687859558b9	2018-05-29 10:48:51 -07:00
Matthew Oliver	2641814010	Add sharder daemon, manage_shard_ranges tool and probe tests The sharder daemon visits container dbs and when necessary executes the sharding workflow on the db. The workflow is, in overview: - perform an audit of the container for sharding purposes. - move any misplaced objects that do not belong in the container to their correct shard. - move shard ranges from FOUND state to CREATED state by creating shard containers. - move shard ranges from CREATED to CLEAVED state by cleaving objects to shard dbs and replicating those dbs. By default this is done in batches of 2 shard ranges per visit. Additionally, when the auto_shard option is True (NOT yet recommeneded in production), the sharder will identify shard ranges for containers that have exceeded the threshold for sharding, and will also manage the sharding and shrinking of shard containers. The manage_shard_ranges tool provides a means to manually identify shard ranges and merge them to a container in order to trigger sharding. This is currently the recommended way to shard a container. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: I7f192209d4d5580f5a0aa6838f9f04e436cf6b1f	2018-05-18 18:48:13 +01:00

36 Commits