DiskFile already fills in the _ondisk_info attribute when it tries to open
a diskfile - even if the DiskFile's fileset is not valid or deleted.
During this process the rsync tempfiles would be discovered and logged,
but no-one would attempt to clean them up - even if they were really old.
Instead of logging and ignoring unexpected files when validate a DiskFile
fileset we'll add unexpected files to the unexpected key in the
_ondisk_info attribute.
With a little bit of re-organization in the auditor's object_audit method
to get things into a single return path we can add an unconditional check
for unexpected files and remove those that are "old enough".
Since the replicator will kill any rsync processes that are running longer
than the configured rsync_timeout we know that any rsync tempfiles older
than this can be deleted.
Split unlink_older_than in common.utils into two functions to allow an
explicit list of previously discovered paths to be passed in to avoid an
extra listdir. Since the getmtime handling already ignores OSError
there's less concern of race condition where a previous discovered
unexpected file is reaped by rsync while we're attempting to clean it up.
Update some doc on the new config option.
Closes-Bug: #1554005
Change-Id: Id67681cb77f605e3491b8afcb9c69d769e154283
Updates docs to remove warnings that container sync only
works with object_post_as_copy=True. Since commit e91de49
container sync will also sync POST updates when using
object_post_as_copy=False.
Change-Id: I5cc3cc6e8f9ba2fef6f896f2b11d2a4e06825f7f
Bring overview_auth.rst and proxy server man page
up to date with changes made in [1]
[1] Change-Id: I373734933189c87c4094203b0752dd3762689034
Change-Id: Ia16f0c391e7c357ccb9c13945839dc5647e49a13
Swift now uses SSYNC verb instead of old REPLICATION verb for ssync
protocol. This patch replaces all docs written as REPLICATION into
SSYNC and fix a few words for explanation.
Change-Id: I1253210d4f49749e7d425d6252dd262b650d9548
The variable max_large_object_get_time is no longer used and was
removed to reflect the change.
Change-Id: I43051181dcb38245de6d13fab63876e83f46fc39
Closes-Bug: #1538834
The log_statsd_host value can now be an IPv6 address or a hostname
which only resolves to an IPv6 address. In both cases, the new
behavior is to use an AF_INET6 socket on which .sendto() is called
with the originally-configured hostname (or IP). This means the
Swift process is not caching a DNS resolution for the lifetime of
the process (a good thing).
If a hostname resolves to both an IPv6 or IPv4 address, an AF_INET
socket is used (i.e. only the IPv4 address will receive the UDP
packet).
The old behavior is preserved: any invalid IP address literals and
failures in DNS resolution or actual StatsD packet sending do not
halt the process or bubble up; they are caught, logged, and
otherwise ignored.
Change-Id: Ibddddcf140e2e69b08edf3feed3e9a5fa17307cf
These errors are producing lintian warnings, so fixing them
helps having less errors when checking for Debian packages.
Change-Id: Iff99a8d5f2276515f42d758d110a43cae757db28
This option send SIGKILL to daemon after kill_wait period.
When daemon hangs and doesn't respond to SIGTERM/SIGHUP
there is no way to stop it using swift-init now. Classic
init scripts in Linux kills hanged process after grace
period and this patch add same behaviour. This is most
usefull when using "restart" on hanged daemon.
Change-Id: I8c932b673a0f51e52132df87ea2f4396f4bba9d8
Currently, swift-init returns zero if can't locate config on start.
Because of this problem, it is not possible to distinguish if managed
to start server.
Due to legacy two new complementary options are added. Default is context
dependent.
--strict returns non-zero if some config is missing (default mode
if explicitly named server)
--non-strict returns zero even if some config is missing (default mode
if alias is used)
As a side effect:
If some of demanded servers already running it does not try to start
unstarted and also returns non-zero (in strict mode). That is still sufficient
for the goal of patch.
For future improvements LSB status codes should be considered.
DocImpact
Change-Id: I7750abd4a94875b46f83f4aeee8509388d543c2b
The doc for these sections was missing because of an rst error - the
source is there in rst file but didn't make it into the html output.
Add doc for per_diff and max_diffs in account and container doc sections.
Also, fix a bunch of other sphinx build errors and most of the warnings.
Change-Id: If9ed2619b2f92c6c65a94f41d8819db8726d3893
Currently, the rsync module where the replicators send data is static. It
forbids administrators to set rsync configuration based on their current
deployment or needs.
As an example, the rsyncd configuration example encourages to set a connections
limit for the modules account, container and object. It permits to protect
devices from excessives parallels connections, because it would impact
performances.
On a server with many devices, it is tempting to increase this number
proportionally, but nothing guarantees that the distribution of the connections
will be balanced. In the worst scenario, a single device can receive all the
connections, which is a severe impact on performances.
This commit adds a new option named 'rsync_module' to the *-replicator sections
of the *-server configuration file. This configuration variable can be
extrapolated with device attributes like ip, port, device, zone, ... by using
the format {NAME}. eg:
rsync_module = {replication_ip}::object_{device}
With this configuration, an administrators can solve the problem of connections
distribution by creating one module per device in rsyncd configuration.
The default values are backward compatible:
{replication_ip}::account
{replication_ip}::container
{replication_ip}::object
Option vm_test_mode is deprecated by this commit, but backward compatibility is
maintained. The option is only effective when rsync_module is not set. In that
case, {replication_port} is appended to the default value of rsync_module.
Change-Id: Iad91df50dadbe96c921181797799b4444323ce2e
This provides the capability to specify a project_name,
project_domain_name and user_domain_name in /etc/swift/dispersion.conf.
If this values are set in dispersion.conf they get populated to the
swift-client. With this it is possible to have a specific dispersion
project specified, which is not the keystone default domain. Changes
were applied to swift-dispersion-populate and swift-dispersion-report.
Relevant man pages, the example dispersion.conf and the admin guide were
updated accordingly.
DocImpact
Closes-Bug: #1468374
Change-Id: I0e716f8d281b4d0f510bc568bcee4a13fc480ff7
This change add call time to recon middleware and param --time to
recon CLI. This is usefull for checking if time in cluster is
synchronized.
Change-Id: I62373e681f64d0bd71f4aeb287953dd3b2ea5662
The actual server-side changes are simple. The tests are a different
matter. Many changes were needed to the object server tests to
handle the now-async calls to the container server. In an effort to
test this properly, some drive-by changes were made to improve tests.
I tested this patch by doing zero-byte object writes to one container
as fast as possible. Then I did it again while also saturating 2 of the
container replica's disks. The results are linked below.
https://gist.github.com/notmyname/2bb85acfd8fbc7fc312a
DocImpact
Change-Id: I737bd0af3f124a4ce3e0862a155e97c1f0ac3e52
Previously, the reseller prefix needed to be provided in the host name
even when the domain was unique to that reseller. With the
default_reseller_prefix, any domain which matches in this middleware,
will will be passed on with a reseller prefix, whether or not it was
provided.
Change-Id: I5aa5ce78ad1ee2e3660cce4c3e07306f8999f02a
Implements: blueprint domainremap-reseller-domains
The deprecated directive `run_pause` should be replaced with the more
standard one `interval`. The `run_pause` should be still supported for
backward compatibility. This patch updates object replicator to use
`interval` and support `run_pause`. It also updates its sample config
and documentation.
Co-Authored-By: Joanna H. Huang <joanna.huitzu.huang@gmail.com>
Co-Authored-By: Kamil Rykowski <kamil.rykowski@intel.com>
Change-Id: Ie2a3414a96a94efb9273ff53a80b9d90c74fff09
Closes-Bug: #1364735
The swit-recon tool has had several functional additions
added recently but not all of these have been added to the docs.
This change adds the following options to the manpages:
--human-readable
--validate-servers
--sockstat
--driveaudit
--region
--timeout
Also fixes a typo on line 78 (cop -> copy)
Change-Id: Id083b32a60473ad5a2b9ac1d092528d230521c86
Currently there is a "--top" option when running swift-recon for
disk usage stats. This option lists the x disks with the highest
disk usage in descending order.
This feature adds a "--lowest" option which does the opposite and
lists the y disks with lowest disk usage in ascending order.
Have also updated the docs section with --top and --lowest options
Change-Id: Ic15d407fe010a31995c2bdd9fb88548a1057f569
This patch changes container sync to use Internal Client instead
of Direct Client.
In the current design, container sync uses direct_get_object to
get the newest source object(which talks to storage node directly).
This works fine for replication storage policies however in
erasure coding policies, direct_get_object would only return part
of the object(it's encoded as several pieces). Using Internal
Client can get the original object in EC case.
Note that for the container sync put/delete part, it's working in
EC since it's using Simple Client.
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
DocImpact
Change-Id: I91952bc9337f354ce6024bf8392046a1ecf6ecc9
The way we do this now involves a conf change and a proxy
reload which is a pain. You can now just set these:
X-Account-Sysmeta-Global-Write-Ratelimit: WHITELIST
or
X-Account-Sysmeta-Global-Write-Ratelimit: BLACKLIST
NOTE:
The existing proxy config settings: account_whitelist
and account_blacklist will continue to work.
Change-Id: I532663f1d2c75d03170c5fdb9b330416822fbc88
This change allows the user to use a "--no-overlap" parameter when
running the tool multiple times. It will increase the coverage by
whatever is specified in the dispersion_coverage field of the conf
file in a manner where existing container/objects are left in place
and no partition is populated more than once.
Related-Bug: #1233045
Change-Id: I139fed2f4c967ba18d073b7ecd1e946ed4da1271
There is a simple typo in the man page of proxy-server.conf,
"client_timeout" is written as "client_timeoutt".
This commit fixes it.
Closes-Bug: #1326237
Change-Id: I98777f523906e4ed625de8f20a96979ea627aa1f
This is a very simple swift tool to retrieve information
of an account that is located on the storage node.
One can call the tool with a given account db file
as it is stored on the storage node system.
It will then return several information about that account.
Change-Id: Ibfeee790adc000fc177b4b3c03d22ff785fda325
This is a very simple swift tool to retrieve information
of a container that is located on the storage node.
One can call the tool with a given container db file
as it is stored on the storage node system.
It will then return several information about that container.
Change-Id: Ifebaed6c51a9ed5fbc0e7572bb43ef05d7dd254b
If auth is setup in the env then it needs to be copied over with the
make_request wsgi helper. Also renamed make_request to
make_subrequest- when I grepped for make_request I got > 250 results,
this'll make it easier to find references to this function in the
future.
Updated docs and sample confs to show tempurl needs to be before dlo and
slo as well as auth.
Change-Id: I9750555727f520a7c9fedd5f4fd31ff0f63d8088
Used groff to recreate the errors. I believe all the issues
except `binary-without-manpage` are solved. Would like
confirmation from someone using Lintian.
Closes-Bug: #1210114
Change-Id: I533205c53efdb7cdf3645cc3e3dc487f9ee5640a
- Makes swift-dispersion-populate a bit faster when using a larger
dispersion_coverage with a larger part_power.
- Adds option to only run population for container OR objects
- Adds option to let you resume population at given point (useful if you
need to resume population after a previous run error'd out or the
like) by specifying which suffix to start at.
The original populate just randomly used uuid4().hex as a suffix on the
container/object names until all the partition's required where covered.
This isn't a big deal if you're only doing 1% coverage on a ring with a
small part power but takes ages if you're doing 100% on a larger ring.
Change-Id: I52f890a774412c1d6179f12db9081aedc58b6bc2