swift/test/unit/account
Alistair Coles a41873ce3d Fix and simplify container replication
This patch simplifies the rsync replication paths so they revert to
being closer to master, and fixes some of the rsync scenarios for
sharding or sharded containers.

Details of changes:

1. Before, when the destination was sharded, the first REPLICATE call
would return 404 which would cause rsync to always be used. Now, the
existence of any valid db on the destination will result in a success
response to the first REPLICATE request.

2. Before, in some scenarios, rows from a hash.db might be rsync'd to
a destination *hash.db* even when the destination was sharding or
sharded. That could be problematic given that the shard db would have
previously initialised its row id from a supposedly read-only
hash.db. Now, sync'd rows are always merged into the newest db on the
destination, regardless of the sharding state of the source or
destination. This does mean that the destination shard db could become
bloated if an unsharded source is significantly out of sync, but in
practice this was likely to have happened anyway because large db's
with large rowid's are likely to have been using usync.

3. Before only usync was used when both source or destination was in
sharding state, or when they were in different states. This
restriction is no longer necessary so has been removed to minimise the
differences w.r.t. master branch. The restriction to usync may be a
useful as a means of restricting the rate of sync'ing when a container
is sharding, given that the correct destination for rows is actually a
shard. If so, this feature can be re-introduced. However, in practice
it is likely that usync will be chosen anyway since larger containers
will tend to use usync.

4. If source is in sharding state, each source db is replicated
separately. This means that the decision to use rsync or usync is made
as appropriate for the difference between each db and the
destination. However, if an older source db uses usync then the newer
source db will also use usync and will only usync rows if the older db
completed usyncing all of its rows. That avoids any discontinuity in
the order of usynced rows.

Note that if the destination is also in sharding state,
it's sync status is represented by the broker that encapsulates both old
hash.db and shard db, regardless of the source db being sync'd.

As a consequence, the _rsync_db and _rsync_file methods only need to
handle one file at a time, so these methods revert to being very
similar to master. _rsync_db does still append an extra arg
w.r.t. master, which is the source db filename. This is used when
complete_rsync is renaming a new db file on the destination.

5. When doing an rsync_then_merge, merge incoming syncs from the old
destination db to the new destination db. This is significant when
rsync_then_merge is used to sync both db's from a sharding
source. Before, the first source db (hash.db) would rsync_then_merge
and add its sync point to the destination db. When the second source
db was rsync_then_merge'd, the sync point from the previous
rsync_then_merge was lost. This caused the sync of the first db to be
repeated on the next replication cycle. Now the sync point for the
hash.db is retained when the shard.db is rsync_then_merge'd.

6. Adds unit tests for many replication scenarios.

Change-Id: I8cea31896262ee2ac26a440847f6b6cfc836ad37
2018-02-22 16:53:38 +00:00
..
__init__.py Initial commit of Swift code 2010-07-12 17:03:45 -05:00
test_auditor.py Use more specific asserts in test/unit/account tests 2017-06-22 13:57:11 +02:00
test_backend.py Use more specific asserts in test/unit/account tests 2017-08-23 17:22:54 +02:00
test_reaper.py Use check_drive consistently 2017-11-01 16:33:40 +00:00
test_replicator.py Fix and simplify container replication 2018-02-22 16:53:38 +00:00
test_server.py Merge branch 'master' into feature/deep 2017-11-03 13:14:13 -07:00
test_utils.py Move HeaderKeyDict to avoid an inline import 2016-03-07 12:26:48 -08:00