This is a fairly blunt tool: ratelimiting is per device and
applied independently in each worker, but this at least provides
some limit to disk IO on backend servers.
GET, HEAD, PUT, POST, DELETE, UPDATE and REPLICATE methods may be
rate-limited.
Only requests with a path starting '<device>/<partition>', where
<partition> can be cast to an integer, will be rate-limited. Other
requests, including, for example, recon requests with paths such as
'recon/version', are unconditionally forwarded to the next app in the
pipeline.
OPTIONS and SSYNC methods are not rate-limited. Note that
SSYNC sub-requests are passed directly to the object server app
and will not pass though this middleware.
Change-Id: I78b59a081698a6bff0d74cbac7525e28f7b5d7c1
Replace github by opendev because currently opendev is the source and
github is its mirror.
Also, update links for repositories managed by SwiftStack organization.
Unfortunately some repositories are no longer available so are removed
from the list.
Change-Id: Ic223650eaf7a1934f489c8b713c6d8da1239f3c5
The swauth project is already retired[1]. The documentation is updated
to reflect status of the project.
Also, this change removes reference to this middleware in unit tests.
[1] https://opendev.org/x/swauth/
Change-Id: I3d8e46d85ccd965f9b51006c330e391dcdc24a34
Several headers and query params were previously revealed in logs but
are now redacted:
* X-Auth-Token header (previously redacted in the {auth_token} field,
but not the {headers} field)
* temp_url_sig query param (used by tempurl middleware)
* Authorization header and X-Amz-Signature and Signature query
parameters (used by s3api middleware)
This patch adds some new middleware helper methods to track headers and
query parameters that should be redacted by proxy-logging. While
instantiating the middleware, authors can call either:
register_sensitive_header('case-insensitive-header-name')
register_sensitive_param('case-sensitive-query-param-name')
to add items that should be redacted. The redaction uses proxy-logging's
existing reveal_sensitive_prefix config option to determine how much to
reveal.
Note that query params will still be logged in their entirety if
eventlet_debug is enabled.
UpgradeImpact
=============
The reveal_sensitive_prefix config option now applies to more items;
operators should review their currently-configured value to ensure it
is appropriate for these new contexts. In particular, operators should
consider reducing the value if it is more than 20 or so, even if that
previously offered sufficient protection for auth tokens.
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Closes-Bug: #1685798
Change-Id: I88b8cfd30292325e0870029058da6fb38026ae1a
Previously, the set_statsd_prefix method was used to mutate a logger's
StatsdClient tail prefix after a logger was instantiated. This pattern
had led to unexpected mutations (see Related-Change). The tail_prefix
can now be passed as an argument to get_logger(), and is then
forwarded to the StatsdClient constructor, for a more explicit
assignment pattern.
The set_statsd_prefix method is left in place for backwards
compatibility. A DeprecationWarning will be raised if it is used
to mutate the StatsdClient tail prefix.
Change-Id: I7692860e3b741e1bc10626e26bb7b27399c325ab
Related-Change: I0522b1953722ca96021a0002cf93432b973ce626
Previously the ssync Sender would attempt to revert all objects in a
partition within a single SSYNC request. With this change the
reconstructor daemon option max_objects_per_revert can be used to limit
the number of objects reverted inside a single SSYNC request for revert
type jobs i.e. when reverting handoff partitions.
If more than max_objects_per_revert are available, the remaining objects
will remain in the sender partition and will not be reverted until the
next call to ssync.Sender, which would currrently be the next time the
reconstructor visits that handoff partition.
Note that the option only applies to handoff revert jobs, not to sync
jobs.
Change-Id: If81760c80a4692212e3774e73af5ce37c02e8aff
There are been members of the community running sharding in production
and it's awesome. It's just the auto-sharding swift of that remains
experimental.
This patch removes the big sharding warning from the top of the
sharding overview page and better emphasises that it's the audo_shard
option that isn't ready for production use.
Change-Id: Id2c842cffad58fb6fd5e1d12619c46ffcb38f8a5
This patch plumbs the object-reconstructor stats that are dropped
into recon cache out through the middleware and swift-recon tool.
This adds a '/recon/reconstruction/object' to the middleware. As such
the swift-recon tool has grown a '-R' or '--reconstruction' option
access this data from each node.
Plus some tests and documentation updates.
Change-Id: I98582732ca5ccb2e7d2369b53abf9aa8c0ede00c
The documentation currently uses the sysctl parameter:
'net.ipv4.netfilter.ip_conntrack_max', but it's been deprecated
for a long time. This patch switches it to:
'net.netfilter.nf_conntrack_max', which is the modern equivalent.
Change-Id: I3fd5d4060840092bca53af7da7dbaaa600e936a3
DiskFileManager will remove any stale files during
cleanup_ondisk_files(): these include tombstones and nondurable EC
data fragments whose timestamps are older than reclaim_age. It can
usually be safely assumed that a non-durable data fragment older than
reclaim_age is not going to become durable. However, if an agent PUTs
objects with specified older X-Timestamps (for example the reconciler
or container-sync) then there is a window of time during which the
object server has written an old non-durable data file but has not yet
committed it to make it durable.
Previously, if another process (for example the reconstructor) called
cleanup_ondisk_files during this window then the non-durable data file
would be removed. The subsequent attempt to commit the data file would
then result in a traceback due to there no longer being a data file to
rename, and of course the data file is lost.
This patch modifies cleanup_ondisk_files to not remove old, otherwise
stale, non-durable data files that were only written to disk in the
preceding 'commit_window' seconds. 'commit_window' is configurable for
the object server and defaults to 60.0 seconds.
Closes-Bug: #1936508
Related-Change: I0d519ebaaade35249fb7b17bd5f419ffdaa616c0
Change-Id: I5f3318a44af64b77a63713e6ff8d0fd3b6144f13
A container is typically sharded when it has grown to have an object
count of shard_container_threshold + N, where N <<
shard_container_threshold. If sharded using the default
rows_per_shard of shard_container_threshold / 2 then this would
previously result in 3 shards: the tail shard would typically be
small, having only N rows. This behaviour caused more shards to be
generated than desirable.
This patch adds a minimum-shard-size option to
swift-manage-shard-ranges, and a corresponding option in the sharder
config, which can be used to avoid small tail shards. If set to
greater than one then the final shard range may be extended to more
than rows_per_shard in order to avoid a further shard range with less
than minimum-shard-size rows. In the example given, if
minimum-shard-size is set to M > N then the container would shard into
two shards having rows_per_shard rows and rows_per_shard + N
respectively.
The default value for minimum-shard-size is rows_per_shard // 5. If
all options have their default values this results in
minimum-shard-size being 100000.
Closes-Bug: #1928370
Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Change-Id: I3baa278c6eaf488e3f390a936eebbec13f2c3e55
Make rows_per_shard an option that can be configured
in the [container-sharder] section of a config file.
For auto-sharding, this option was previously hard-coded to
shard_container_threshold // 2.
The swift-manage-shard-ranges command line tool already supported
rows_per_shard on the command line and will now also load it from a
config file if specified. Any value given on the command line takes
precedence over any value found in a config file.
Change-Id: I820e133a4e24400ed1e6a87ebf357f7dac463e38
Previously the reconstructor would quarantine isolated durable
fragments that were more than reclaim_age old. This patch adds a
quarantine_age option for the reconstructor which defaults to
reclaim_age but can be used to configure the age that a fragment must
reach before quarantining.
Change-Id: I867f3ea0cf60620c576da0c1f2c65cec2cf19aa0
Add two new sharder config options for configuring shrinking
behaviour:
- shrink_threshold: the size below which a shard may shrink
- expansion_limit: the maximum size to which an acceptor shard
may grow
The new options match the 'swift-manage-shard-ranges' command line
options and take absolute values.
The new options provide alternatives to the current equivalent options
'shard_shrink_point' and 'shard_shrink_merge_point', which are
expressed as percentages of 'shard_container_threshold'.
'shard_shrink_point' and 'shard_shrink_merge_point' are deprecated and
will be overridden by the new options if the new options are
explicitly set in a config file.
The default values of the new options are the same as the values that
would result from the default 'shard_container_threshold',
'shard_shrink_point' and 'shard_shrink_merge_point' i.e.:
- shrink_threshold: 100000
- expansion_limit: 750000
Change-Id: I087eac961c1eab53540fe56be4881e01ded1f60e
Change the swift-manage-shard-ranges default expansion-limit to equal
the sharder daemon default merge_size i.e 750000. The previous default
of 500000 had erroneously differed from the sharder default value.
Introduce a ContainerSharderConf class to encapsulate loading of
sharder conf and the definition of defaults. ContainerSharder inherits
this and swift-manage-shard-ranges instantiates it.
Rename ContainerSharder member vars to match the equivalent vars and
cli options in manage_shard_ranges:
shrink_size -> shrink_threshold
merge_size -> expansion_limit
split_size -> rows_per_shard
(This direction of renaming is chosen so that the manage_shard_ranges
cli options are not changed.)
Rename ContainerSharder member vars to match the conf file option name:
scanner_batch_size -> shard_scanner_batch_size
Remove some ContainerSharder member vars that were not used outside of
the __init__ method:
shrink_merge_point
shard_shrink_point
Change-Id: I8a58a82c08ac3abaddb43c11d26fda9fb45fe6c1
There are times when it is convenient to specify a policy by name or
by index (see Related-Change), but policy names can unfortunately
collide with indexes. Using a number as a policy name should at least
be discouraged.
Change-Id: I0cdd3b86b527d6656b7fb50c699e3c0cc566e732
Related-Change: Icf1517bd930c74e9552b88250a7b4019e0ab413e
When I tried to follow documentation, the expirer said:
object-expirer[391675]: This node is not configured to dequeue
tasks from the legacy queue.
Our documentation was incorrect, the actual name of the option is
"dequeue_from_legacy".
Change-Id: I5ca7ac589a405d0b6250922aa9bcaabecb3c4fb0
While working on the shrinking recon drops, we want to display numbers
that directly relate to how tool should behave. But currently all
options of the s-m-s-r tool is driven by cli options.
This creates a disconnect, defining what should be used in the sharder
and in the tool via options are bound for failure. It would be much
better to be able to define the required default options for your
environment in one place that both the sharder and tool could use.
This patch does some refactoring and adding max_shrinking and
max_expanding options to the sharding config. As well as adds a
--config option to the tool.
The --config option expects a config with at '[container-sharder]'
section. It only supports the shard options:
- max_shrinking
- max_expanding
- shard_container_threshold
- shard_shrink_point
- shard_merge_point
The latter 2 are used to generate the s-m-s-r's:
- shrink_threshold
- expansion_limit
- rows_per_shard
Use of cli arguments take precedence over that of the config.
Change-Id: I4d0147ce284a1a318b3cd88975e060956d186aec
Previously, we would rely on replication/reconstruction to rehash
once the part power increase was complete. This would lead to large
I/O spikes if operators didn't proactively tune down replication.
Now, do the rehashing in the relinker as it completes work. Operators
should already be mindful of the relinker's I/O usage and are likely
limiting it through some combination of cgroups, ionice, and/or
--files-per-second.
Also clean up empty partitions as we clean up. Operators with metrics on
cluster-wide primary/handoff partition counts can use them to monitor
relinking/cleanup progress:
P 3N .----------.
a / '.
r / '.
t 2N / .-----------------
i / |
t / |
i N -----'---------'
o
n
s 0 ----------------------------------
t0 t1 t2 t3 t4 t5
At t0, prior to relinking, there are
N := <replica count> * 2 ** <old part power>
primary partitions throughout the cluster and a negligible number of
handoffs.
At t1, relinking begins. In any non-trivial, low-replica-count cluster,
the probability of a new-part-power partition being assigned to the same
device as its old-part-power equivalent is low, so handoffs grow while
primaries remain constant.
At t2, relinking is complete. The total number of partitions is now 3N
(N primaries + 2N handoffs).
At t3, the ring with increased part power is distributed. The notion of
what's a handoff and what's a primary inverts.
At t4, cleanup begins. The "handoffs" are cleaned up as hard links and
now-empty partitions are removed.
At t5, cleanup is complete and there are now 2N total partitions.
Change-Id: Ib5bf426cf38559091917f2d25f4f60183cd16354
UpgradeImpact
=============
Operators should verify that encryption is not enabled in their
reconciler pipelines; having it enabled there may harm data durability.
For more information, see https://launchpad.net/bugs/1910804
Change-Id: I1a1d78ed91d940ef0b4eba186dcafd714b4fb808
Closes-Bug: #1910804
When you start getting more then 3k shards in a root container the
cached shard range listing can get bigger then the default max size for
memcache (1MB).
So add a mention about it in the configuration guide.
Fixes bug 1890643
Change-Id: If380410c17ed9ebc014b8198af0ea8d502deacc8
We have a separate doc page for middlewares that pulls the docstring
from each middleware's docstring[0]. This makes it easy to look up the docs
in our documentation and easy to find the middleware doc by looking in
the code of the middleware itself.
This patch does the same with the audit watchers. There is now a page
that generates a list of audit watchers, even though currently it's only
one, and pulls the docs from their docstrings. Giving us an easy way to
maintain each audit watcher doc along with it's code.
[0] - https://docs.openstack.org/swift/latest/middleware.html
Change-Id: I1456aba0158d29fa0a879dcc2dfb13245c45ad16
Swift operators may find it useful to operate on each object in their
cluster in some way. This commit provides them a way to hook into the
object auditor with a simple, clearly-defined boundary so that they
can iterate over their objects without additional disk IO.
For example, a cluster operator may want to ensure a semantic
consistency with all SLO segments accounted in their manifests,
or locate objects that aren't in container listings. Now that Swift
has encryption support, this could be used to locate unencrypted
objects. The list goes on.
This commit makes the auditor locate, via entry points, the watchers
named in its config file.
A watcher is a class with at least these four methods:
__init__(self, conf, logger, **kwargs)
start(self, audit_type, **kwargs)
see_object(self, object_metadata, data_file_path, **kwargs)
end(self, **kwargs)
The auditor will call watcher.start(audit_type) at the start of an
audit pass, watcher.see_object(...) for each object audited, and
watcher.end() at the end of an audit pass. All method arguments are
passed as keyword args.
This version of the API is implemented on the context of the
auditor itself, without spawning any additional processes.
If the plugins are not working well -- hang, crash, or leak --
it's easier to debug them when there's no additional complication
of processes that run by themselves.
In addition, we include a reference implementation of plugin for
the watcher API, as a help to plugin writers.
Change-Id: I1be1faec53b2cdfaabf927598f1460e23c206b0a
This patch moves the tables describing configuration options for each
server type from the deployment_guide.rst doc to separate per-server
documents. The new per-server documents are grouped under a config
directory with a config index doc. The config index doc is listed in
the top level index and provides a single starting point to navigate
to the individual server docs.
Change-Id: I6cedd98586febb5dc949c088ee44e160385ed324
Capture the on the wire status code for logging because we change the
logged status code sometimes.
Closes-Bug: #1896518
Change-Id: I27feabe923a6520e983637a9c68a19ec7174a0df
docs: Removing the use of NameVirtualHost from the apache examples
It's not used anymore. It's deprecated in fact: https://httpd.apache.org/docs/2.4/mod/core.html#namevirtualhost
Change-Id: I76999cfacc10a244024ee0cca66dda95a0169a67
docs: Added more spacing to the apache2 examples
They're easier to read and a bit less bloated.
Change-Id: I5e21a66018b7ef309918fbbde93f2494286d291e
docs: Switching to /srv/www to be more FHS 3.0 conformat
It's more modern and well supported to use /srv/www now in place of
/var/www.
Change-Id: Icd09ed4d5fb4e2b9b84ddead21313ea1c0a87c91
ref: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s17.html
docs: Added user, group and display name in WSGI examples
This properly sets the user and group for the wsgi processes in the
examples; as well as adding a display name for easier identification.
Change-Id: Ie5783081e4054e5b2fbf3a856716101a1aaf61b8
docs: Replace apachectl for systemctl commands
It's safe to asume that all modern distros; supported by OpenStack, will
have systemd implemented. It's better to favor systemctl in those cases.
Change-Id: Ic0d2e47c1ac53502ce638d6fc2424ab9df037262
docs: Emphasis to file paths and command options
I've enclosed configuration options or parameters in interpreted text
quotes.
Also, I've enclosed fiel paths with inline literal quotes.
Change-Id: Iec54b7758bce01fc8e8daff48498383cb70c62ce
docs: Fixed wording used to indicate the restart of apache
Just a little commit to make it clearer of what we're gonna do.
Change-Id: Id5ab3e94519bcfe1832b92e456a1d1fa81dd54e3
Note that existing SAIOs with 60xx ports should still work fine.
Change-Id: If5dd79f926fa51a58b3a732b212b484a7e9f00db
Related-Change: Ie1c778b159792c8e259e2a54cb86051686ac9d18
Lower the part-power -- 18 is way higher than is needed for a dev
environment.
Add commands for reduced-redundancy and erasure-coded storage policies.
Related-Change: Ibe46011d8e6a6482d39b3a20ac9c091d9fbc6ef7
Related-Change: I6f11f7a1bdaa6f3defb3baa56a820050e5f727f1
Related-Change: I0403016a4bb7dad9535891632753b0e5e9d402eb
Change-Id: I13de27674c81977c2470d43bbb2126ecc4bdd85a