20 Commits

Author SHA1 Message Date
Clay Gerrard
df22032d79 object-expirer: add round_robin_cache_size option
Drive-Bys:
 * DRY out redundent configuration examples in expiring objects overview
   documentation.
 * Add missing delay_reaping man page docs.

Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>

Change-Id: I8879dbd13527233c878dff764ec411ce9619ee39
2024-11-01 09:54:54 +00:00
indianwhocodes
11eb17d3b2 support x-open-expired header for expired objects
If the global configuration option 'enable_open_expired' is set
to true in the config, then the client will be able to make a
request with the header 'x-open-expired' set to true in order
to access an object that has expired, provided it is in its
grace period. If this config flag is set to false, the client
will not be able to access any expired objects, even with the
header, which is the default behavior unless the flag is set.

When a client sets a 'x-open-expired' header to a true value for a
GET/HEAD/POST request the proxy will forward x-backend-open-expired to
storage server. The storage server will allow clients that set
x-backend-open-expired to open and read an object that has not yet
been reaped by the object-expirer, even after the x-delete-at time
has passed.

The header is always ignored when used with temporary URLs.

Co-Authored-By: Anish Kachinthaya <akachinthaya@nvidia.com>
Related-Change: I106103438c4162a561486ac73a09436e998ae1f0
Change-Id: Ibe7dde0e3bf587d77e14808b169c02f8fb3dddb3
2024-04-26 10:13:40 +01:00
Mandell Degerness
5961ba0ca7 expirer: account and container level delay_reaping
The object expirer can be configured to delay the reaping of
objects from disk after their expiration time using account
and container level delay_reaping values. The delay_reaping
value of accounts and containers in seconds is configured in
the object server config. The object expirer references these
configured values to only reap objects from specified accounts
and containers after their corresponding delays.

The goal of the delay_reaping feature is to prevent accidental or
premature data loss if an object marked for deletion with the
'x-delete-at' feature should not be reaped immediately, for
whatever reason.

Configuring the delay_reaping value at a granular account and
container level is beneficial for being able to keep storage
capacity consumption in control while maintaining a desired
data recovery window.

This patch also adds a sample configuration, documentation, and
tests for bad configurations and grace period functionality.

Co-Authored-By: Anish Kachinthaya <akachinthaya@nvidia.com>
Change-Id: I106103438c4162a561486ac73a09436e998ae1f0
2024-04-25 13:59:36 -07:00
Takashi Kajinami
49b19613d2 Remove per-service auto_create_account_prefix
The per-service option was deprecated almost 4 years ago[1].

[1] 4601548dabdec0a4dc89cefba11e963217255be3

Change-Id: I45f7678c9932afa038438ee841d1b262d53c9da8
2023-11-22 01:58:03 +09:00
Jianjian Huo
cb1e584e64 Object-server: keep SLO manifest files in page cache.
Currently, SLO manifest files will be evicted from page cache
after reading it, which cause hard drives very busy when user
requests a lot of parallel byte range GETs for a particular
SLO object.

This patch will add a new config 'keep_cache_slo_manifest', and
try keeping the manifest files in page cache by not evicting them
after reading if config settings allow so.

Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I557bd01643375d7ad68c3031430899b85908a54f
2023-07-07 12:48:24 -07:00
Matthew Oliver
2edd3e65da docs: Add memcache.conf config doc
Change-Id: I29d00e939a3842bd064382575955fa3e255242eb
2023-02-22 16:18:37 +11:00
Tim Burke
5c6407bf59 proxy: Add a chance to skip memcache for get_*_info calls
If you've got thousands of requests per second for objects in a single
container, you basically NEVER want that container's info to ever fall
out of memcache. If it *does*, all those clients are almost certainly
going to overload the container.

Avoid this by allowing some small fraction of requests to bypass and
refresh the cache, pushing out the TTL as long as there continue to be
requests to the container. The likelihood of skipping the cache is
configurable, similar to what we did for shard range sets.

Change-Id: If9249a42b30e2a2e7c4b0b91f947f24bf891b86f
Closes-Bug: #1883324
2022-08-30 18:49:48 +10:00
Matthew Oliver
bf4edefce4 DB Replicator: Add handoff_delete option
Currently the object-replicator has an option called `handoff_delete`
which allows us to define the the number of replicas which are ensured
in swift. Once a handoff node ensures that many successful responses it
can go ahead and delete the handoff partition.

By default it's 'auto' or rather the number of primary nodes. But this
can be reduced. It's useful in draining full disks, but has to be used
carefully.

This patch adds the same option to the DB replicator and works the same
way. But instead of deleting a partition it's done at the per DB level.

Because it's done in the DB Replicator level it means the option is now
available to both the Account and Container replicators.

Change-Id: Ide739a6d805bda20071c7977f5083574a5345a33
2022-07-21 13:35:24 +10:00
Alistair Coles
8ee631ccee reconstructor: restrict max objects per revert job
Previously the ssync Sender would attempt to revert all objects in a
partition within a single SSYNC request. With this change the
reconstructor daemon option max_objects_per_revert can be used to limit
the number of objects reverted inside a single SSYNC request for revert
type jobs i.e. when reverting handoff partitions.

If more than max_objects_per_revert are available, the remaining objects
will remain in the sender partition and will not be reverted until the
next call to ssync.Sender, which would currrently be the next time the
reconstructor visits that handoff partition.

Note that the option only applies to handoff revert jobs, not to sync
jobs.

Change-Id: If81760c80a4692212e3774e73af5ce37c02e8aff
2021-12-03 12:43:23 +00:00
Alistair Coles
bbaed18e9b diskfile: don't remove recently written non-durables
DiskFileManager will remove any stale files during
cleanup_ondisk_files(): these include tombstones and nondurable EC
data fragments whose timestamps are older than reclaim_age. It can
usually be safely assumed that a non-durable data fragment older than
reclaim_age is not going to become durable. However, if an agent PUTs
objects with specified older X-Timestamps (for example the reconciler
or container-sync) then there is a window of time during which the
object server has written an old non-durable data file but has not yet
committed it to make it durable.

Previously, if another process (for example the reconstructor) called
cleanup_ondisk_files during this window then the non-durable data file
would be removed. The subsequent attempt to commit the data file would
then result in a traceback due to there no longer being a data file to
rename, and of course the data file is lost.

This patch modifies cleanup_ondisk_files to not remove old, otherwise
stale, non-durable data files that were only written to disk in the
preceding 'commit_window' seconds. 'commit_window' is configurable for
the object server and defaults to 60.0 seconds.

Closes-Bug: #1936508
Related-Change: I0d519ebaaade35249fb7b17bd5f419ffdaa616c0
Change-Id: I5f3318a44af64b77a63713e6ff8d0fd3b6144f13
2021-07-19 21:18:02 +01:00
Zuul
17489ce7bf Merge "sharder: avoid small tail shards" 2021-07-08 17:00:52 +00:00
Zuul
8066efb43a Merge "sharder: support rows_per_shard in config file" 2021-07-07 23:06:08 +00:00
Alistair Coles
2a593174a5 sharder: avoid small tail shards
A container is typically sharded when it has grown to have an object
count of shard_container_threshold + N, where N <<
shard_container_threshold.  If sharded using the default
rows_per_shard of shard_container_threshold / 2 then this would
previously result in 3 shards: the tail shard would typically be
small, having only N rows. This behaviour caused more shards to be
generated than desirable.

This patch adds a minimum-shard-size option to
swift-manage-shard-ranges, and a corresponding option in the sharder
config, which can be used to avoid small tail shards. If set to
greater than one then the final shard range may be extended to more
than rows_per_shard in order to avoid a further shard range with less
than minimum-shard-size rows. In the example given, if
minimum-shard-size is set to M > N then the container would shard into
two shards having rows_per_shard rows and rows_per_shard + N
respectively.

The default value for minimum-shard-size is rows_per_shard // 5. If
all options have their default values this results in
minimum-shard-size being 100000.

Closes-Bug: #1928370
Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Change-Id: I3baa278c6eaf488e3f390a936eebbec13f2c3e55
2021-07-07 13:59:36 +01:00
Alistair Coles
a87317db6e sharder: support rows_per_shard in config file
Make rows_per_shard an option that can be configured
in the [container-sharder] section of a config file.

For auto-sharding, this option was previously hard-coded to
shard_container_threshold // 2.

The swift-manage-shard-ranges command line tool already supported
rows_per_shard on the command line and will now also load it from a
config file if specified. Any value given on the command line takes
precedence over any value found in a config file.

Change-Id: I820e133a4e24400ed1e6a87ebf357f7dac463e38
2021-07-07 13:59:36 +01:00
Alistair Coles
2fd5b87dc5 reconstructor: make quarantine delay configurable
Previously the reconstructor would quarantine isolated durable
fragments that were more than reclaim_age old. This patch adds a
quarantine_age option for the reconstructor which defaults to
reclaim_age but can be used to configure the age that a fragment must
reach before quarantining.

Change-Id: I867f3ea0cf60620c576da0c1f2c65cec2cf19aa0
2021-07-06 16:41:08 +01:00
Alistair Coles
18f20daf38 Add absolute values for shard shrinking config options
Add two new sharder config options for configuring shrinking
behaviour:

  - shrink_threshold: the size below which a shard may shrink
  - expansion_limit: the maximum size to which an acceptor shard
    may grow

The new options match the 'swift-manage-shard-ranges' command line
options and take absolute values.

The new options provide alternatives to the current equivalent options
'shard_shrink_point' and 'shard_shrink_merge_point', which are
expressed as percentages of 'shard_container_threshold'.
'shard_shrink_point' and 'shard_shrink_merge_point' are deprecated and
will be overridden by the new options if the new options are
explicitly set in a config file.

The default values of the new options are the same as the values that
would result from the default 'shard_container_threshold',
'shard_shrink_point' and 'shard_shrink_merge_point' i.e.:

  - shrink_threshold: 100000
  - expansion_limit: 750000

Change-Id: I087eac961c1eab53540fe56be4881e01ded1f60e
2021-05-20 21:00:02 +01:00
Alistair Coles
f7fd99a880 Use ContainerSharderConf class in sharder and manage-shard-ranges
Change the swift-manage-shard-ranges default expansion-limit to equal
the sharder daemon default merge_size i.e 750000. The previous default
of 500000 had erroneously differed from the sharder default value.

Introduce a ContainerSharderConf class to encapsulate loading of
sharder conf and the definition of defaults. ContainerSharder inherits
this and swift-manage-shard-ranges instantiates it.

Rename ContainerSharder member vars to match the equivalent vars and
cli options in manage_shard_ranges:

  shrink_size -> shrink_threshold
  merge_size -> expansion_limit
  split_size -> rows_per_shard

(This direction of renaming is chosen so that the manage_shard_ranges
cli options are not changed.)

Rename ContainerSharder member vars to match the conf file option name:
  scanner_batch_size -> shard_scanner_batch_size

Remove some ContainerSharder member vars that were not used outside of
the __init__ method:

  shrink_merge_point
  shard_shrink_point

Change-Id: I8a58a82c08ac3abaddb43c11d26fda9fb45fe6c1
2021-05-20 20:59:56 +01:00
Matthew Oliver
fb186f6710 Add a config file option to swift-manage-shard-ranges
While working on the shrinking recon drops, we want to display numbers
that directly relate to how tool should behave. But currently all
options of the s-m-s-r tool is driven by cli options.

This creates a disconnect, defining what should be used in the sharder
and in the tool via options are bound for failure. It would be much
better to be able to define the required default options for your
environment in one place that both the sharder and tool could use.

This patch does some refactoring and adding max_shrinking and
max_expanding options to the sharding config. As well as adds a
--config option to the tool.

The --config option expects a config with at '[container-sharder]'
section. It only supports the shard options:
 - max_shrinking
 - max_expanding
 - shard_container_threshold
 - shard_shrink_point
 - shard_merge_point

The latter 2 are used to generate the s-m-s-r's:
 - shrink_threshold
 - expansion_limit
 - rows_per_shard

Use of cli arguments take precedence over that of the config.

Change-Id: I4d0147ce284a1a318b3cd88975e060956d186aec
2021-03-12 10:49:46 +11:00
Alistair Coles
4f94ac263a Add sharder section to container config doc
Change-Id: I33c0168c1bb89f780be6fc317a4df89322cbc28d
2021-02-22 09:29:49 +00:00
Alistair Coles
72786533ea Move config option documentation to separate docs
This patch moves the tables describing configuration options for each
server type from the deployment_guide.rst doc to separate per-server
documents.  The new per-server documents are grouped under a config
directory with a config index doc. The config index doc is listed in
the top level index and provides a single starting point to navigate
to the individual server docs.

Change-Id: I6cedd98586febb5dc949c088ee44e160385ed324
2020-11-05 14:40:05 +00:00