544 Commits

Author SHA1 Message Date
Matthew Oliver
52c80d652d cli: add --sync to db info to show syncs
When looking at containers and accounts it's sometimes nice to know who
they've been replicating with. This patch adds a `--sync|-s` option to
swift-{container|account}-info which will also dump the incoming and
outgoing sync tables:

  $ swift-container-info /srv/node3/sdb3/containers/294/624/49b9ff074c502ec5e429e7af99a30624/49b9ff074c502ec5e429e7af99a30624.db -s
  Path: /AUTH_test/new
    Account: AUTH_test
    Container: new
    Deleted: False
    Container Hash: 49b9ff074c502ec5e429e7af99a30624
  Metadata:
    Created at: 2022-02-16T05:34:05.988480 (1644989645.98848)
    Put Timestamp: 2022-02-16T05:34:05.981320 (1644989645.98132)
    Delete Timestamp: 1970-01-01T00:00:00.000000 (0)
    Status Timestamp: 2022-02-16T05:34:05.981320 (1644989645.98132)
    Object Count: 1
    Bytes Used: 7
    Storage Policy: default (0)
    Reported Put Timestamp: 1970-01-01T00:00:00.000000 (0)
    Reported Delete Timestamp: 1970-01-01T00:00:00.000000 (0)
    Reported Object Count: 0
    Reported Bytes Used: 0
    Chexor: 962368324c2ca023c56669d03ed92807
    UUID: f33184e7-56d5-4c74-9d2e-5417c187d722-sdb3
    X-Container-Sync-Point2: -1
    X-Container-Sync-Point1: -1
  No system metadata found in db file
  No user metadata found in db file
  Sharding Metadata:
    Type: root
    State: unsharded
  Incoming Syncs:
    Sync Point	Remote ID                                	Updated At
    1         	ce7268a1-f5d0-4b83-b993-af17b602a0ff-sdb1	2022-02-16T05:38:22.000000 (1644989902)
    1         	2af5abc0-7f70-4e2f-8f94-737aeaada7f4-sdb4	2022-02-16T05:38:22.000000 (1644989902)
  Outgoing Syncs:
    Sync Point	Remote ID	Updated At
  Partition	294
  Hash     	49b9ff074c502ec5e429e7af99a30624

As a follow up to the device in DB ID patch we can see that the replicas
at sdb1 and sdb4 have replicated with this node.

Change-Id: I23d786e82c6710bea7660a9acf8bbbd113b5b727
2024-01-16 08:19:08 -08:00
kim woo seok
62a9dbca76 Add unittest of swift-recon-cron
Moving bin script of swift-recon-cron to cli module
for unittest

Delete unused `logger` parameter in get_async_count function

Partial-Bug: #1743656
Change-Id: I4ca91e3b519a99f3096b95b286779a183e936eb7
2023-09-22 12:59:09 +00:00
Philippe SERAPHIN
1c210d2e49 Change getting major:minor of blkdev
Replace method for determine major:minor of block device
because stat can't detect major:minor in some cases.

Change-Id: Idcc7cd7a41e225d1052c03ba846dff02851758f8
2023-06-28 10:35:13 +02:00
Tim Burke
0a5f0253b1 Add --test-config option to WSGI servers
Previously, seamless reloads were a little risky: when they worked, they
worked great, but if they failed (say, because you wrote out an invalid
config), you were left with no usable server processes and possible
client downtime.

Now, add the ability to do a preflight check before reloading processes
to reduce the likelihood of the reloaded process immediately dying. For
example, you might use a systemd unit that includes something like

    ExecReload=swift-proxy-server --test-config /etc/swift/proxy-server.conf
    ExecReload=kill -USR1 $MAINPID"

Change-Id: I9e5e158ce8be92535430b9cabf040063f5188bf4
2023-04-05 20:51:46 -07:00
Matthew Vernon
e838d8a947 swift-drive-audit: reload systemd after editing fstab
Systemd does not monitor /etc/fstab for changes; so a filesystem
unmounted and commented-out in fstab will be re-mounted by systemd
after some time.

This change means that swift-drive-audit will call systemcl
daemon-reload (which causes systemd to reload its configuration
including /etc/fstab) after editing /etc/fstab on systems where
systemd is the running init. Check for that case by looking for the
existence of the directory /run/systemd/system, as documented in
sd_booted(3).

Signed-off-by: Matthew Vernon <mvernon@wikimedia.org>
Change-Id: I8830e3da9b6b085224511ac351f2d2860119c432
2022-09-30 09:47:24 +01:00
Matthew Vernon
89ee843080 swift-ring-builder: exit ERROR (2) on uncaught exceptions
swift-ring-builder has three exit statuses: 0 (OK), 1 (WARNING),
2 (ERROR). Uncaught exceptions in python result in an exit code of 1,
so for example problems writing a builder file to disk will result in
an exit of 1 (warning) rather than 2 (error).

This addresses that by overriding sys.excepthook to produce the usual
backtrace and then exit 2 (error); excepthook is called when an
exception is unhandled, unless that is SystemExit.

Closes-Bug: 1960657
Change-Id: I7cfeff4f436ade319cf21d0d29853931aef6d20f
2022-02-15 14:29:14 +00:00
Matthew Oliver
85e36f7122 recon: refactor common recon names into a common location
Change-Id: I0a0766cfb6672377de0f152ce179c874c327ec54
2021-06-29 15:22:57 -07:00
Tim Burke
39ad468dfe Add async_pending_last time to object.recon
The async_pending count isn't near as useful when we don't know how out
of date it is.

Change-Id: I3e5e904ffc0eba7a7e141e1c2d9f9840e4952041
2021-06-15 08:12:05 -07:00
Tim Burke
c8de76c7fd swift-account-audit: Log the bad status
Change-Id: Ib28d1948a571acf31926df82dd8c24910c227053
2021-04-08 17:34:49 -07:00
Tim Burke
e72aaf0c57 relinker: Pull arg parsing into module
This allows us to do testing that's more end-to-end.

Change-Id: Ifc47b00c597217efb4d705bd84dc8f7df117ae9d
2021-02-08 13:46:28 -08:00
Tim Burke
1f9b879547 Run flake8 on bin/ files
Change-Id: I58d4b5a00e97785584c6d3bd8b06243f481c1934
2021-02-01 13:26:53 -08:00
its-not-a-bug-its-a-feature
ea0cab6e3e Adjust initial month value from int to str
swift-drive-audit checks to see if a new year has recently ticked over
by checking to see if the current month is January and the logs we are
checking are in December. The logs use abbreviated month names, so we
need to extract that from "now" to make valid comparisons.

Closes-Bug: 1912508
Change-Id: Iabb53f5e4081d580d016bbf75d86e1d75e1f20bb
2021-01-23 05:31:33 +00:00
Zuul
817528e8ae Merge "Add option to swift-oldies to only print pids" 2021-01-09 09:18:03 +00:00
Clay Gerrard
d277960161 Populate shrinking shards with shard ranges learnt from root
Shard shrinking can be instigated by a third party modifying shard
ranges, moving one shard to shrinking state and expanding the
namespace of one or more other shard(s) to act as acceptors. These
state and namespace changes must propagate to the shrinking and
acceptor shards. The shrinking shard must also discover the acceptor
shard(s) into which it will shard itself.

The sharder audit function already updates shards with their own state
and namespace changes from the root. However, there is currently no
mechanism for the shrinking shard to learn about the acceptor(s) other
than by a PUT request being made to the shrinking shard container.

This patch modifies the shard container audit function so that other
overlapping shards discovered from the root are merged into the
audited shard's db. In this way, the audited shard will have acceptor
shards to cleave to if shrinking.

This new behavior is restricted to when the shard is shrinking. In
general, a shard is responsible for processing its own sub-shard
ranges (if any) and reporting them to root. Replicas of a shard
container synchronise their sub-shard ranges via replication, and do
not rely on the root to propagate sub-shard ranges between shard
replicas. The exception to this is when a third party (or
auto-sharding) wishes to instigate shrinking by modifying the shard
and other acceptor shards in the root container.  In other
circumstances, merging overlapping shard ranges discovered from the
root is undesirable because it risks shards inheriting other unrelated
shard ranges. For example, if the root has become polluted by
split-brain shard range management, a sharding shard may have its
sub-shards polluted by an undesired shard from the root.

During the shrinking process a shard range's own shard range state may
be either shrinking or, prior to this patch, sharded. The sharded
state could occur when one replica of a shrinking shard completed
shrinking and moved the own shard range state to sharded before other
replica(s) had completed shrinking. This makes it impossible to
distinguish a shrinking shard (with sharded state), which we do want
to inherit shard ranges, from a sharding shard (with sharded state),
which we do not want to inherit shard ranges.

This patch therefore introduces a new shard range state, 'SHRUNK', and
applies this state to shard ranges that have completed shrinking.
Shards are now restricted to inherit shard ranges from the root only
when their own shard range state is either SHRINKING or SHRUNK.

This patch also:

 - Stops overlapping shrinking shards from generating audit warnings:
   overlaps are cured by shrinking and we therefore expect shrinking
   shards to sometimes overlap.

 - Extends an existing probe test to verify that overlapping shard
   ranges may be resolved by shrinking a subset of the shard ranges.

 - Adds a --no-auto-shard option to swift-container-sharder to enable the
   probe tests to disable auto-sharding.

 - Improves sharder logging, in particular by decrementing ranges_todo
   when a shrinking shard is skipped during cleaving.

 - Adds a ShardRange.sort_key class method to provide a single definition
   of ShardRange sort ordering.

 - Improves unit test coverage for sharder shard auditing.

Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I9034a5715406b310c7282f1bec9625fe7acd57b6
2020-12-18 11:33:48 +00:00
Tim Burke
cec9cb881b Add option to swift-oldies to only print pids
Change-Id: I8877cf482616404eb7023b2975a24ad827efe2b6
2020-12-14 13:59:17 -08:00
Tim Burke
5b8073c770 py3: Fix swift-dispersion-populate
We fixed swift-dispersion-report already; -populate needed the same fix
or else it'd hit a "maximum recursion depth exceeded" error.

Change-Id: I2d22e087a88c9e8003621feb26460ab6e5ce2a57
Related-Change: I24f4bcc3d62dc37fd9559032bfd25f5b15f98745
Closes-Bug: #1895346
Related-Bug: #1863680
2020-09-11 19:35:32 +00:00
Tim Burke
67e3830ab9 swift-container-info: Show shard ranges summary
The current behavior is really painful when you've got hundreds of shard
ranges in a DB.  The new summary with the states is default.  Users can
add a -v/--verbose flag to see the old full detail view.

Change-Id: I0a7d65f64540f99514c52a70f9157ef060a8a892
2020-07-22 12:29:53 -05:00
Zuul
0b86f681f5 Merge "swift-get-nodes: Allow users to specify either quoted or unquoted paths" 2020-06-09 17:54:26 +00:00
Tim Burke
1dfa41dada swift-get-nodes: Allow users to specify either quoted or unquoted paths
Now that we can have null bytes in Swift paths, we need a way for
operators to be able to locate such containers and objects. Our usual
trick of making sure the name is properly quoted for the shell won't
suffice; running something like

   swift-get-nodes /etc/swift/container.ring.gz $'AUTH_test/\0versions\0container'

has the path get cut off after "AUTH_test/" because of how argv works.

So, add a new option, --quoted, to let operators indicate that they
already quoted the path.

Drive-bys:

  * If account, container, or object are explicitly blank, treat them
    as though they were not provided. This provides better errors when
    account is explicitly blank, for example.
  * If account, container, or object are not provided or explicitly
    blank, skip printing them. This resolves abiguities about things
    like objects whose name is actually "None".
  * When displaying account, container, and object, quote them (since
    they may contain newlines or other control characters).

Change-Id: I3d10e121b403de7533cc3671604bcbdecb02c795
Related-Change: If912f71d8b0d03369680374e8233da85d8d38f85
Closes-Bug: #1875734
Closes-Bug: #1875735
Closes-Bug: #1875736
Related-Bug: #1791302
2020-06-08 12:03:56 -07:00
Romain LE DISEZ
3061ec803f relinker: Improve performance by limiting I/O
This commit reduce the number of I/O done by the swift-object-relinker.

First, it saves a progress state of relinking and cleanup in case the
process is interrupted during the operation. This allow to resume
operation without rescanning all partitions.

Secondly, it prevents from being scanned by relink and cleanup all
partitions that are bigger than 2^part_power (or (2^next_part_power)/2).
These partitions were not existing before the beginning of the part_power
increase, so there is nothing to relink or cleanup.

Thirdly, it reverse-orders the partitions to scan so that some useless
work is avoided. If a device contains partitions 1 and 3, relinking
partition 1 will create "new" objects in partition 3, that will need to
be scanned when the relinker will work on partition 3. It is useless. If
partition 3 is done first, it will only contain the objects that need to
be relinked.

Fourthly, it allows to specify a unique device to work on.

To do that, some hooks were added in audit_location_generator to allow
to execute some custom code before/after iterating a
device/partition/suffix/hash.

Change-Id: If1bf8ed9036fb0ec619b0d4f16061a81a1af2082
2020-03-31 17:33:06 -04:00
Zuul
02aea34c46 Merge "swift-account-audit: work with encryption" 2020-01-22 16:59:19 +00:00
Zuul
f4fc811862 Merge "swift-account-audit: clean up some error formatting" 2019-10-16 04:03:03 +00:00
Zuul
294472464a Merge "py3: fix swift-account-audit" 2019-10-15 20:01:24 +00:00
Zuul
1e77d6834f Merge "py3: fix swift-dispersion-populate" 2019-10-15 14:54:03 +00:00
Tim Burke
4c66596e63 py3: fix swift-dispersion-populate
Change-Id: I1f140ae00cbd25b23c9a40ee91dccee8c7c15d81
2019-10-14 11:34:31 -07:00
Tim Burke
3e90ddb37d swift-account-audit: clean up some error formatting
"127.0.0.1s:sdas" is confusing at best.

Change-Id: I37f78d5993082ac29b001e9563aa4b24fd009a27
2019-10-14 11:21:22 -07:00
Tim Burke
2f4fe56ca4 py3: fix swift-account-audit
Previously, we'd get a KeyError trying to read headers.

Change-Id: I5d9f86784a3e39577ab010d29d8d03b26ffda357
2019-10-14 11:21:06 -07:00
Tim Burke
e58840c571 swift-account-audit: work with encryption
Change-Id: I26c5fe9d45a9765da0d30138ea2df16fd4e73d57
2019-10-14 11:01:59 -07:00
Tim Burke
405a2b2a55 py3: Fix swift-drive-audit
Walking through the kernel logs backwards requires that we open them
in binary mode. Add a new option to allow users to specify which
encoding should be used to interpret those logs; default to the same
encoding that open() uses for its default.

Change-Id: Iae332bb58388b5521445e75beba6ee2e9f06bfa6
Closes-Bug: #1847955
2019-10-13 21:55:58 -07:00
Tim Burke
e6e31410e0 Find .d pid files with swift-orphans
Change-Id: I7a2f19862817abf15e51463bd124293730451602
2019-08-30 11:54:47 -07:00
Tim Burke
27e7e80e92 py3: fix up swift-orphans
Change-Id: Id1280abd92e8bb02fcaa4701a0e9d211d9d6e33e
2019-08-15 10:34:54 -07:00
Thiago da Silva
a7c5ca0806 Fix locking in swift-recon-cron
The previous locking method would leave the lock dir lying around
if the process died unexpectedly, preventing others swift-recon-cron
process from running sucessfuly and requiring a manual clean.

Change-Id: Icb328b2766057a2a4d126f63e2d6dfa5163dd223
2018-08-15 21:51:14 +00:00
Zuul
a3cc7ccc69 Merge "Experimental swift-ring-composer CLI to build composite rings" 2018-06-15 04:27:43 +00:00
Alistair Coles
6b626f2f98 Experimental swift-ring-composer CLI to build composite rings
Provides a simple, experimental, CLI tool to generate a
composite ring from a list of component builder files.

For example:

  swift-ring-composer <composite-file> compose \
      <builder-file> <builder-file> --output <ring-file>

Commands available:

- compose: compose a list of builder file to a composite ring
- show: show the metadata for a composite ring

Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Change-Id: I25a79e71c13af352e19e4358f60545265b51584f
2018-06-14 09:50:55 +01:00
Matthew Oliver
2641814010 Add sharder daemon, manage_shard_ranges tool and probe tests
The sharder daemon visits container dbs and when necessary executes
the sharding workflow on the db.

The workflow is, in overview:

- perform an audit of the container for sharding purposes.

- move any misplaced objects that do not belong in the container
  to their correct shard.

- move shard ranges from FOUND state to CREATED state by creating
  shard containers.

- move shard ranges from CREATED to CLEAVED state by cleaving objects
  to shard dbs and replicating those dbs. By default this is done in
  batches of 2 shard ranges per visit.

Additionally, when the auto_shard option is True (NOT yet recommeneded
in production), the sharder will identify shard ranges for containers
that have exceeded the threshold for sharding, and will also manage
the sharding and shrinking of shard containers.

The manage_shard_ranges tool provides a means to manually identify
shard ranges and merge them to a container in order to trigger
sharding. This is currently the recommended way to shard a container.

Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>

Change-Id: I7f192209d4d5580f5a0aa6838f9f04e436cf6b1f
2018-05-18 18:48:13 +01:00
Zuul
0bc52a6036 Merge "swift-recon-cron: do not get confused by files in /srv/node" 2018-04-13 04:14:31 +00:00
Zuul
c9ddee7aac Merge "swift-(account|container)-info: tolerate LockTimeouts" 2018-04-11 07:24:53 +00:00
Zuul
e4660a3e31 Merge "Add manpage for swift-object-relinker" 2018-04-11 01:35:07 +00:00
Tim Burke
5b68eb5396 swift-(account|container)-info: tolerate LockTimeouts
I'm not really clear on why a sqlite3.OperationalError should cause us to
retry with stale_reads_ok=True, but swift.common.exceptions.LockTimeout
*definitely* should.

Change-Id: I707dec1d11b8db80bc8fbee30662b319bf10d6a5
2018-04-10 17:09:07 -07:00
Zuul
b20893f540 Merge "Support -d <devs> and -p <partitions> in DB replicators." 2018-03-20 00:04:01 +00:00
Zuul
2c7e12289f Merge "Optionally drop common prefixes in swift-*-info output" 2018-03-13 03:39:02 +00:00
Samuel Merritt
b08c70d38e Support -d <devs> and -p <partitions> in DB replicators.
Similar to the object replicator and reconstructor, these arguments
are comma-separated lists of device names and partitions,
respectively, on which the account or container replicator will
operate. Other devices and partitions are ignored.

Change-Id: Ic108f5c38f700ac4c7bcf8315bf4c55306951361
2018-03-05 16:26:19 -08:00
Tim Burke
36c42974d6 py3: Port more CLI tools
Bring under test

 - test/unit/cli/test_dispersion_report.py
 - test/unit/cli/test_info.py and
 - test/unit/cli/test_relinker.py

I've verified that swift-*-info (at least) behave reasonably under
py3, even swift-object-info when there's non-utf8 metadata on the
data/meta file.

Change-Id: Ifed4b8059337c395e56f5e9f8d939c34fe4ff8dd
2018-02-28 21:10:01 +00:00
Clay Gerrard
55a1b63db5 Let recon-cron work with conf.d
Change-Id: I862b74e0d9b20ba149581c1add6473dc1e5b2859
2018-01-11 12:36:45 -08:00
Ondřej Nový
611b28f73a Add manpage for swift-object-relinker
Change-Id: I56dd9c646faba91e9f124f343ea0e08f8c3c4249
2017-12-09 19:10:35 +01:00
Tim Burke
250da37a7b Remove swift-temp-url script
This has been deprecated since Swift 2.10.0 (Newton) including a
message that it would go away. Let's actually remove it.

Change-Id: I7d3659761c71119363ff2c0c750e37b4c6374a39
Related-Change: Ifa8bf636f20f82db4845b02d1b58699edaa39356
2017-10-13 23:28:09 +00:00
Tim Burke
79905ae794 Replace SOSO auth prefix in examples with more-standard AUTH
Change-Id: I98643d6acf248840a8360f31e446bc8ecb834898
2017-09-27 23:49:59 +00:00
Tim Burke
4716d3da11 swift-account-audit: compare each etag to the hash from container
...rather than only comparing the ETag from the last response over and
over again.

NB: This tool *does not* like EC data :-(

Change-Id: Idd37f94b07f607ab8a404dd986760361c39af029
Closes-Bug: 1266636
2017-09-27 23:49:59 +00:00
Tim Burke
f95befb37f Optionally drop common prefixes in swift-*-info output
Add a --drop-prefixes flag to swift-account-info, swift-container-info,
and swift-object-info. This makes the output between the three more
consistent.

Change-Id: I98252ff74c4983eaad0a93d9a9fc527c74ffce68
2017-09-13 22:47:04 +00:00
Christian Schwede
cbddec340e Add bin/swift-dispersion-report
Change-Id: I81736080fc478c2b69d5b71edd0cada39aad9400
2017-09-13 05:57:30 +00:00