swift

Author	SHA1	Message	Date
Clay Gerrard	df22032d79	object-expirer: add round_robin_cache_size option Drive-Bys: * DRY out redundent configuration examples in expiring objects overview documentation. * Add missing delay_reaping man page docs. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I8879dbd13527233c878dff764ec411ce9619ee39	2024-11-01 09:54:54 +00:00
indianwhocodes	11eb17d3b2	support x-open-expired header for expired objects If the global configuration option 'enable_open_expired' is set to true in the config, then the client will be able to make a request with the header 'x-open-expired' set to true in order to access an object that has expired, provided it is in its grace period. If this config flag is set to false, the client will not be able to access any expired objects, even with the header, which is the default behavior unless the flag is set. When a client sets a 'x-open-expired' header to a true value for a GET/HEAD/POST request the proxy will forward x-backend-open-expired to storage server. The storage server will allow clients that set x-backend-open-expired to open and read an object that has not yet been reaped by the object-expirer, even after the x-delete-at time has passed. The header is always ignored when used with temporary URLs. Co-Authored-By: Anish Kachinthaya <akachinthaya@nvidia.com> Related-Change: I106103438c4162a561486ac73a09436e998ae1f0 Change-Id: Ibe7dde0e3bf587d77e14808b169c02f8fb3dddb3	2024-04-26 10:13:40 +01:00
Mandell Degerness	5961ba0ca7	expirer: account and container level delay_reaping The object expirer can be configured to delay the reaping of objects from disk after their expiration time using account and container level delay_reaping values. The delay_reaping value of accounts and containers in seconds is configured in the object server config. The object expirer references these configured values to only reap objects from specified accounts and containers after their corresponding delays. The goal of the delay_reaping feature is to prevent accidental or premature data loss if an object marked for deletion with the 'x-delete-at' feature should not be reaped immediately, for whatever reason. Configuring the delay_reaping value at a granular account and container level is beneficial for being able to keep storage capacity consumption in control while maintaining a desired data recovery window. This patch also adds a sample configuration, documentation, and tests for bad configurations and grace period functionality. Co-Authored-By: Anish Kachinthaya <akachinthaya@nvidia.com> Change-Id: I106103438c4162a561486ac73a09436e998ae1f0	2024-04-25 13:59:36 -07:00
Takashi Kajinami	49b19613d2	Remove per-service auto_create_account_prefix The per-service option was deprecated almost 4 years ago[1]. [1] 4601548dabdec0a4dc89cefba11e963217255be3 Change-Id: I45f7678c9932afa038438ee841d1b262d53c9da8	2023-11-22 01:58:03 +09:00
Jianjian Huo	cb1e584e64	Object-server: keep SLO manifest files in page cache. Currently, SLO manifest files will be evicted from page cache after reading it, which cause hard drives very busy when user requests a lot of parallel byte range GETs for a particular SLO object. This patch will add a new config 'keep_cache_slo_manifest', and try keeping the manifest files in page cache by not evicting them after reading if config settings allow so. Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I557bd01643375d7ad68c3031430899b85908a54f	2023-07-07 12:48:24 -07:00
Matthew Oliver	2edd3e65da	docs: Add memcache.conf config doc Change-Id: I29d00e939a3842bd064382575955fa3e255242eb	2023-02-22 16:18:37 +11:00
Tim Burke	5c6407bf59	proxy: Add a chance to skip memcache for get__info calls If you've got thousands of requests per second for objects in a single container, you basically NEVER want that container's info to ever fall out of memcache. If it does*, all those clients are almost certainly going to overload the container. Avoid this by allowing some small fraction of requests to bypass and refresh the cache, pushing out the TTL as long as there continue to be requests to the container. The likelihood of skipping the cache is configurable, similar to what we did for shard range sets. Change-Id: If9249a42b30e2a2e7c4b0b91f947f24bf891b86f Closes-Bug: #1883324	2022-08-30 18:49:48 +10:00
Matthew Oliver	bf4edefce4	DB Replicator: Add handoff_delete option Currently the object-replicator has an option called `handoff_delete` which allows us to define the the number of replicas which are ensured in swift. Once a handoff node ensures that many successful responses it can go ahead and delete the handoff partition. By default it's 'auto' or rather the number of primary nodes. But this can be reduced. It's useful in draining full disks, but has to be used carefully. This patch adds the same option to the DB replicator and works the same way. But instead of deleting a partition it's done at the per DB level. Because it's done in the DB Replicator level it means the option is now available to both the Account and Container replicators. Change-Id: Ide739a6d805bda20071c7977f5083574a5345a33	2022-07-21 13:35:24 +10:00
Alistair Coles	8ee631ccee	reconstructor: restrict max objects per revert job Previously the ssync Sender would attempt to revert all objects in a partition within a single SSYNC request. With this change the reconstructor daemon option max_objects_per_revert can be used to limit the number of objects reverted inside a single SSYNC request for revert type jobs i.e. when reverting handoff partitions. If more than max_objects_per_revert are available, the remaining objects will remain in the sender partition and will not be reverted until the next call to ssync.Sender, which would currrently be the next time the reconstructor visits that handoff partition. Note that the option only applies to handoff revert jobs, not to sync jobs. Change-Id: If81760c80a4692212e3774e73af5ce37c02e8aff	2021-12-03 12:43:23 +00:00
Alistair Coles	bbaed18e9b	diskfile: don't remove recently written non-durables DiskFileManager will remove any stale files during cleanup_ondisk_files(): these include tombstones and nondurable EC data fragments whose timestamps are older than reclaim_age. It can usually be safely assumed that a non-durable data fragment older than reclaim_age is not going to become durable. However, if an agent PUTs objects with specified older X-Timestamps (for example the reconciler or container-sync) then there is a window of time during which the object server has written an old non-durable data file but has not yet committed it to make it durable. Previously, if another process (for example the reconstructor) called cleanup_ondisk_files during this window then the non-durable data file would be removed. The subsequent attempt to commit the data file would then result in a traceback due to there no longer being a data file to rename, and of course the data file is lost. This patch modifies cleanup_ondisk_files to not remove old, otherwise stale, non-durable data files that were only written to disk in the preceding 'commit_window' seconds. 'commit_window' is configurable for the object server and defaults to 60.0 seconds. Closes-Bug: #1936508 Related-Change: I0d519ebaaade35249fb7b17bd5f419ffdaa616c0 Change-Id: I5f3318a44af64b77a63713e6ff8d0fd3b6144f13	2021-07-19 21:18:02 +01:00
Zuul	17489ce7bf	Merge "sharder: avoid small tail shards"	2021-07-08 17:00:52 +00:00
Zuul	8066efb43a	Merge "sharder: support rows_per_shard in config file"	2021-07-07 23:06:08 +00:00
Alistair Coles	2a593174a5	sharder: avoid small tail shards A container is typically sharded when it has grown to have an object count of shard_container_threshold + N, where N << shard_container_threshold. If sharded using the default rows_per_shard of shard_container_threshold / 2 then this would previously result in 3 shards: the tail shard would typically be small, having only N rows. This behaviour caused more shards to be generated than desirable. This patch adds a minimum-shard-size option to swift-manage-shard-ranges, and a corresponding option in the sharder config, which can be used to avoid small tail shards. If set to greater than one then the final shard range may be extended to more than rows_per_shard in order to avoid a further shard range with less than minimum-shard-size rows. In the example given, if minimum-shard-size is set to M > N then the container would shard into two shards having rows_per_shard rows and rows_per_shard + N respectively. The default value for minimum-shard-size is rows_per_shard // 5. If all options have their default values this results in minimum-shard-size being 100000. Closes-Bug: #1928370 Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Change-Id: I3baa278c6eaf488e3f390a936eebbec13f2c3e55	2021-07-07 13:59:36 +01:00
Alistair Coles	a87317db6e	sharder: support rows_per_shard in config file Make rows_per_shard an option that can be configured in the [container-sharder] section of a config file. For auto-sharding, this option was previously hard-coded to shard_container_threshold // 2. The swift-manage-shard-ranges command line tool already supported rows_per_shard on the command line and will now also load it from a config file if specified. Any value given on the command line takes precedence over any value found in a config file. Change-Id: I820e133a4e24400ed1e6a87ebf357f7dac463e38	2021-07-07 13:59:36 +01:00
Alistair Coles	2fd5b87dc5	reconstructor: make quarantine delay configurable Previously the reconstructor would quarantine isolated durable fragments that were more than reclaim_age old. This patch adds a quarantine_age option for the reconstructor which defaults to reclaim_age but can be used to configure the age that a fragment must reach before quarantining. Change-Id: I867f3ea0cf60620c576da0c1f2c65cec2cf19aa0	2021-07-06 16:41:08 +01:00
Alistair Coles	18f20daf38	Add absolute values for shard shrinking config options Add two new sharder config options for configuring shrinking behaviour: - shrink_threshold: the size below which a shard may shrink - expansion_limit: the maximum size to which an acceptor shard may grow The new options match the 'swift-manage-shard-ranges' command line options and take absolute values. The new options provide alternatives to the current equivalent options 'shard_shrink_point' and 'shard_shrink_merge_point', which are expressed as percentages of 'shard_container_threshold'. 'shard_shrink_point' and 'shard_shrink_merge_point' are deprecated and will be overridden by the new options if the new options are explicitly set in a config file. The default values of the new options are the same as the values that would result from the default 'shard_container_threshold', 'shard_shrink_point' and 'shard_shrink_merge_point' i.e.: - shrink_threshold: 100000 - expansion_limit: 750000 Change-Id: I087eac961c1eab53540fe56be4881e01ded1f60e	2021-05-20 21:00:02 +01:00
Alistair Coles	f7fd99a880	Use ContainerSharderConf class in sharder and manage-shard-ranges Change the swift-manage-shard-ranges default expansion-limit to equal the sharder daemon default merge_size i.e 750000. The previous default of 500000 had erroneously differed from the sharder default value. Introduce a ContainerSharderConf class to encapsulate loading of sharder conf and the definition of defaults. ContainerSharder inherits this and swift-manage-shard-ranges instantiates it. Rename ContainerSharder member vars to match the equivalent vars and cli options in manage_shard_ranges: shrink_size -> shrink_threshold merge_size -> expansion_limit split_size -> rows_per_shard (This direction of renaming is chosen so that the manage_shard_ranges cli options are not changed.) Rename ContainerSharder member vars to match the conf file option name: scanner_batch_size -> shard_scanner_batch_size Remove some ContainerSharder member vars that were not used outside of the __init__ method: shrink_merge_point shard_shrink_point Change-Id: I8a58a82c08ac3abaddb43c11d26fda9fb45fe6c1	2021-05-20 20:59:56 +01:00
Matthew Oliver	fb186f6710	Add a config file option to swift-manage-shard-ranges While working on the shrinking recon drops, we want to display numbers that directly relate to how tool should behave. But currently all options of the s-m-s-r tool is driven by cli options. This creates a disconnect, defining what should be used in the sharder and in the tool via options are bound for failure. It would be much better to be able to define the required default options for your environment in one place that both the sharder and tool could use. This patch does some refactoring and adding max_shrinking and max_expanding options to the sharding config. As well as adds a --config option to the tool. The --config option expects a config with at '[container-sharder]' section. It only supports the shard options: - max_shrinking - max_expanding - shard_container_threshold - shard_shrink_point - shard_merge_point The latter 2 are used to generate the s-m-s-r's: - shrink_threshold - expansion_limit - rows_per_shard Use of cli arguments take precedence over that of the config. Change-Id: I4d0147ce284a1a318b3cd88975e060956d186aec	2021-03-12 10:49:46 +11:00
Alistair Coles	4f94ac263a	Add sharder section to container config doc Change-Id: I33c0168c1bb89f780be6fc317a4df89322cbc28d	2021-02-22 09:29:49 +00:00
Alistair Coles	72786533ea	Move config option documentation to separate docs This patch moves the tables describing configuration options for each server type from the deployment_guide.rst doc to separate per-server documents. The new per-server documents are grouped under a config directory with a config index doc. The config index doc is listed in the top level index and provides a single starting point to navigate to the individual server docs. Change-Id: I6cedd98586febb5dc949c088ee44e160385ed324	2020-11-05 14:40:05 +00:00

20 Commits