Otherwise, swift-in-the-small can fill up logs with
object-replicator: Error syncing partition:
Traceback (most recent call last):
File ".../swift/obj/replicator.py", line 419, in update
node = next(nodes)
StopIteration
...which simultaneously sounds worse than it is and isn't helpful in
diagnosing/debugging the issue.
Change-Id: I2f5bb12f3704880df1750229425f64f419ff9aef
Currently, our integrity checking for objects is pretty weak when it
comes to object metadata. If the extended attributes on a .data or
.meta file get corrupted in such a way that we can still unpickle it,
we don't have anything that detects that.
This could be especially bad with encrypted etags; if the encrypted
etag (X-Object-Sysmeta-Crypto-Etag or whatever it is) gets some bits
flipped, then we'll cheerfully decrypt the cipherjunk into plainjunk,
then send it to the client. Net effect is that the client sees a GET
response with an ETag that doesn't match the MD5 of the object *and*
Swift has no way of detecting and quarantining this object.
Note that, with an unencrypted object, if the ETag metadatum gets
mangled, then the object will be quarantined by the object server or
auditor, whichever notices first.
As part of this commit, I also ripped out some mocking of
getxattr/setxattr in tests. It appears to be there to allow unit tests
to run on systems where /tmp doesn't support xattrs. However, since
the mock is keyed off of inode number and inode numbers get re-used,
there's lots of leakage between different test runs. On a real FS,
unlinking a file and then creating a new one of the same name will
also reset the xattrs; this isn't the case with the mock.
The mock was pretty old; Ubuntu 12.04 and up all support xattrs in
/tmp, and recent Red Hat / CentOS releases do too. The xattr mock was
added in 2011; maybe it was to support Ubuntu Lucid Lynx?
Bonus: now you can pause a test with the debugger, inspect its files
in /tmp, and actually see the xattrs along with the data.
Since this patch now uses a real filesystem for testing filesystem
operations, tests are skipped if the underlying filesystem does not
support setting xattrs (eg tmpfs or more than 4k of xattrs on ext4).
References to "/tmp" have been replaced with calls to
tempfile.gettempdir(). This will allow setting the TMPDIR envvar in
test setup and getting an XFS filesystem instead of ext4 or tmpfs.
THIS PATCH SIGNIFICANTLY CHANGES TESTING ENVIRONMENTS
With this patch, every test environment will require TMPDIR to be
using a filesystem that supports at least 4k of extended attributes.
Neither ext4 nor tempfs support this. XFS is recommended.
So why all the SkipTests? Why not simply raise an error? We still need
the tests to run on the base image for OpenStack's CI system. Since
we were previously mocking out xattr, there wasn't a problem, but we
also weren't actually testing anything. This patch adds functionality
to validate xattr data, so we need to drop the mock.
`test.unit.skip_if_no_xattrs()` is also imported into `test.functional`
so that functional tests can import it from the functional test
namespace.
The related OpenStack CI infrastructure changes are made in
https://review.openstack.org/#/c/394600/.
Co-Authored-By: John Dickinson <me@not.mn>
Change-Id: I98a37c0d451f4960b7a12f648e4405c6c6716808
We added check_drive to the account/container servers to unify how all
the storage wsgi servers treat device dirs/mounts. Thus pushes that
unification down into the consistency engine.
Drive-by:
* use FakeLogger less
* clean up some repeititon in probe utility for device re-"mounting"
Related-Change-Id: I3362a6ebff423016bb367b4b6b322bb41ae08764
Change-Id: I941ffbc568ebfa5964d49964dc20c382a5e2ec2a
Insufficient arguments are passed to create MockProcess instances
resulting in StopIteration errors being raised during the repeated
replicator run_once cycles added in [1]. The test passes because
the replicator just logs these exceptions, but the logger noise is
distracting when running the test [2].
[1] Related-Change: Ib5c9dd17e40150450ec57a728ae8652fbc730af6
[2] nosetests ./test/unit/obj/test_replicator.py:\
TestObjectReplicator.test_run_once -s
Change-Id: I36208e93c81744068a3454577a30d0c5a8d9cb9b
This patch adds methods to increase the partition power of an existing
object ring without downtime for the users using a 3-step process. Data
won't be moved to other nodes; objects using the new increased partition
power will be located on the same device and are hardlinked to avoid
data movement.
1. A new setting "next_part_power" will be added to the rings, and once
the proxy server reloaded the rings it will send this value to the
object servers on any write operation. Object servers will now create a
hard-link in the new location to the original DiskFile object. Already
existing data will be relinked using a new tool in the new locations
using hardlinks.
2. The actual partition power itself will be increased. Servers will now
use the new partition power to read from and write to. No longer
required hard links in the old object location have to be removed now by
the relinker tool; the relinker tool reads the next_part_power setting
to find object locations that need to be cleaned up.
3. The "next_part_power" flag will be removed.
This mostly implements the spec in [1]; however it's not using an
"epoch" as described there. The idea of the epoch was to store data
using different partition powers in their own namespace to avoid
conflicts with auditors and replicators as well as being able to abort
such an operation and just remove the new tree. This would require some
heavy change of the on-disk data layout, and other object-server
implementations would be required to adopt this scheme too.
Instead the object-replicator is now aware that there is a partition
power increase in progress and will skip replication of data in that
storage policy; the relinker tool should be simply run and afterwards
the partition power will be increased. This shouldn't take that much
time (it's only walking the filesystem and hardlinking); impact should
be low therefore. The relinker should be run on all storage nodes at the
same time in parallel to decrease the required time (though this is not
mandatory). Failures during relinking should not affect cluster
operations - relinking can be even aborted manually and restarted later.
Auditors are not quarantining objects written to a path with a different
partition power and therefore working as before (though they are reading
each object twice in the worst case before the no longer needed hard
links are removed).
Co-Authored-By: Alistair Coles <alistair.coles@hpe.com>
Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
[1] https://specs.openstack.org/openstack/swift-specs/specs/in_progress/
increasing_partition_power.html
Change-Id: I7d6371a04f5c1c4adbb8733a71f3c177ee5448bb
Some public functions in the diskfile manager expect or return full
file paths. It implies a filesystem diskfile implementation.
To make it easier to plug alternate diskfile implementations, patch
functions to take more generic arguments.
This commit changes DiskFileManager _get_hashes() arguments from:
- partition_path, recalculate=None, do_listdir=False
to :
- device, partition, policy, recalculate=None, do_listdir=False
Callers are modified accordingly, in diskfile.py, reconstructor.py,
and replicator.py
Change-Id: I8e2d7075572e466ae2fa5ebef5e31d87eed90fec
Because random.randint includeds both endpoints so that
random.randint(0, 9) which is assigned in replicatore
should be [0-9]. Hence, the assertion for replication_cycle should be
*less or equal to* 9. And the replication_cycle should be mod of 10.
Change-Id: I81da375a4864256e8f3b473d4399402f83fc6aeb
The reclaim_age is a DiskFile option, it doesn't make sense for two
different object services or nodes to use different values.
I also driveby cleanup the reclaim_age plumbing from get_hashes to
cleanup_ondisk_files since it's a method on the Manager and has access
to the configured reclaim_age. This fixes a bug where finalize_put
wouldn't use the [DEFAULT]/object-server configured reclaim_age - which
is normally benign but leads to weird behavior on DELETE requests with
really small reclaim_age.
There's a couple of places in the replicator and reconstructor that
reach into their manager to borrow the reclaim_age when emptying out
the aborted PUTs that failed to cleanup their files in tmp - but that
timeout doesn't really need to be coupled with reclaim_age and that
method could have just as reasonably been implemented on the Manager.
UpgradeImpact: Previously the reclaim_age was documented to be
configurable in various object-* services config sections, but that did
not work correctly unless you also configured the option for the
object-server because of REPLICATE request rehash cleanup. All object
services must use the same reclaim_age. If you require a non-default
reclaim age it should be set in the [DEFAULT] section. If there are
different non-default values, the greater should be used for all object
services and configured only in the [DEFAULT] section.
If you specify a reclaim_age value in any object related config you
should move it to *only* the [DEFAULT] section before you upgrade. If
you configure a reclaim_age less that your consistency window you are
likely to be eaten by a Grue.
Closes-Bug: #1626296
Change-Id: I2b9189941ac29f6e3be69f76ff1c416315270916
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
This patch fixes the object-reconstructor to calculate device_count
as the total number of local devices in all policies. Previously
Swift counts it for each policy but reconstruction_device_count
which means the number of devices actually swift needs to reconstruct
is counted as sum of ones for all polices.
With this patch, Swift will gather all local devices for all policies
at first, and then, collect parts for each devices as well as current.
To do so, we can see the statuses for remaining job/disks percentage via
stats_line output.
To enable this change, this patch also touchs the object replicator
to get a DiskFileManager via the DiskFileRouter class so that
DiskFileManager instances are policy specific. Currently the same
replication policy DiskFileManager class is always used, but this
change future proofs the replicator for possible other DiskFileManager
implementations.
The change also gives the ObjectReplicator a _df_router variable,
making it consistent with the ObjectReconstructor, and allowing a
common way for ssync.Sender to access DiskFileManager instances via
it's daemon's _df_router instance.
Also, remove the use of FakeReplicator from the ssync test suite. It
was not necessary and risked masking divergence between ssync and the
replicator and reconstructor daemon implementations.
Co-Author: Alistair Coles <alistair.coles@hpe.com>
Closes-Bug: #1488608
Change-Id: Ic7a4c932b59158d21a5fb4de9ed3ed57f249d068
Right now the do_listdir option was set on every 10th replication run.
Due to the randomness of the job listing this might update a given
partition much less often than expected, for example with 1000
partitions per replicator only every ~70th run.
Co-Authored-By: Alistair Coles <alistair.coles@hpe.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Christian Schwede <cschwede@redhat.com>
Related-Bug: #1634967
Closes-Bug: 1644807
Change-Id: Ib5c9dd17e40150450ec57a728ae8652fbc730af6
Ignore `auditor_status_*.json` files during the collecting jobs
and replicator won't use these wrong paths to find objects that
causes an exception to increase failure count in replicator report.
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Mark Kirkwood <mark.kirkwood@catalyst.net.nz>
Change-Id: Ib15a0987288d9ee32432c1998aefe638ca3b223b
Closes-Bug: #1583305
The object replicator can log some junk about the cluster ip instead
of the replication ip in some specific error log lines that can make
you think either you're crazy or your rings are crazy.
... in this case it was just the logging was crazy - so fix that.
Change-Id: Ie5cbb2d1b30feb2529c17fc3d72af7df1aa3ffdd
Before this commit, when a local device has not found been found
in a object-replication run, the policy was not mentioned in the
error log. But it is of interest to know the policy, for example for
error searching, when no local device has been found.
Change-Id: Icb9f9f1d4aec5c4a70dd8abdf5483d4816720418
Changing the recommended ports for Swift services
from ports 6000-6002 to unused ports 6200-6202;
so they do not conflict with X-Windows or other services.
Updated SAIO docs.
DocImpact
Closes-Bug: #1521339
Change-Id: Ie1c778b159792c8e259e2a54cb86051686ac9d18
In situations where rsync may inadvertently be unable to cleanup it's
temporary files we shouldn't spread them around the cluster.
By asking our rsync subexec to --exclude patterns that match it's own
convention for temporary naming we'll only ever transfer real replicated
artifacts and never temporary artifacts which should always be ignored
until they are fully transfered.
Cleanup of stale rsync droppings should be performed by the auditor and
will be addressed in a separate change related to lp bug #1554005.
Closes-Bug: #1553995
Change-Id: Ibe598b339af024d05e4d89c34d696e972d8189ff
Based on experience using handoffs_first and feedback from other
operators it has become clear that handoffs_first is only used during
periods of problematic cluster behavior (e.g. full disks) when
replication attempts are failing to quickly drain off the partitions
from the nodes which they have been rebalanced from.
In order to focus on the most important work (getting handoff partitions
off the node) handoffs_first mode will abort the current replication
sweep before attempting any primary suffix syncing if any of the handoff
partitions were not removed for any reason - and start over with
replication of handoffs jobs as the highest priority.
Note that handoffs_first being enabled will emit a warning on start up,
even if no handoff jobs fail, because of the negative impact it can have
during normal operations by dog piling on a node that was temporarily
unavailable.
Change-Id: Ia324728d42c606e2f9e7d29b4ab5fcbff6e47aea
Example:
* Different port in config and in ring file.
* Running daemon on server not in ring file.
In both cases replication daemon is running but nothing is replicated.
Error log helps to distinguish a local device can't be identified.
Closes-Bug: 1508228
Change-Id: I99351b7d9946f250b7750df91c13d09352a145ce
assertEquals is deprecated in py3, replacing it.
Change-Id: Ida206abbb13c320095bb9e3b25a2b66cc31bfba8
Co-Authored-By: Ondřej Nový <ondrej.novy@firma.seznam.cz>
This patch fixed the exception (AttributeError: 'list' object has no
attribute 'intersection') when replicator try to sync data from
handoff to primary partition in more than one remote region.
Change-Id: I565c45dda8c99d36e24dbf1145f2d2527d593ac0
Closes-Bug: 1503152
This change cleans up test/unit/obj/test_replicator.py's imports
to use only 1 version of multiline import syntaxes (' \' vs '()').
I don't really mind which, but we should be consistant, at least
in the same file.
This is a follow up for patch 215857.
Change-Id: Ie2d328c25865b19092c493981a803ee246a9d7a5
Under some concurrency the object-replicator could potentially send the
wrong X-Backed-Storage-Policy-Index header to it's partner nodes during
replication if there were multiple storage policies on the same node
because of a race where multiple jobs being processed concurrently would
mutate some shared state on the ObjectReplicator instance.
Instead of using shared stated on the ObjectReplicator instance when
mutating the default headers send with REPLICATION requests each job
will copy them into a local where they can safely be updated.
Change-Id: I5522db57af7e308b1f9d4181f14ea14e386a71fd
When the object-replicator encounters handoffs_first and
handoff_delete options as enabled it should emit a log
warning indicating that it should be changed back to the
default before the next "normal" rebalance.
Closes-Bug: #1457262
Change-Id: If9dc2796c18ed3cf13da920831e2d5c2ae9f12a0
This patch makes the count of object replication failure in recon.
And "failure_nodes" is added to Account Replicator and
Container Replicator.
Recon shows the count of object repliction failure as follows:
$ curl http://<ip>:<port>/recon/replication/object
{
"replication_last": 1416334368.60865,
"replication_stats": {
"attempted": 13346,
"failure": 870,
"failure_nodes": {
"192.168.0.1": {"sdb1": 3},
"192.168.0.2": {"sdb1": 851,
"sdc1": 1,
"sdd1": 8},
"192.168.0.3": {"sdb1": 3,
"sdc1": 4}
},
"hashmatch": 0,
"remove": 0,
"rsync": 0,
"start": 1416354240.9761429,
"success": 1908
},
"replication_time": 2316.5563162644703,
"object_replication_last": 1416334368.60865,
"object_replication_time": 2316.5563162644703
}
Note that 'object_replication_last' and 'object_replication_time' are
considered to be transitional and will be removed in the subsequent
releases. Use 'replication_last' and 'replication_time' instead.
Additionaly this patch adds the count in swift-recon and it will be
showed as follows:
$ swift-recon object -r
========================================================================
=======
--> Starting reconnaissance on 4 hosts
========================================================================
=======
[2014-11-27 16:14:09] Checking on replication
[replication_failure] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%,
no_result: 0, reported: 4
[replication_success] low: 3, high: 3, avg: 3.0, total: 12,
Failed: 0.0%, no_result: 0, reported: 4
[replication_time] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%,
no_result: 0, reported: 4
[replication_attempted] low: 1, high: 1, avg: 1.0, total: 4,
Failed: 0.0%, no_result: 0, reported: 4
Oldest completion was 2014-11-27 16:09:45 (4 minutes ago) by
192.168.0.4:6002.
Most recent completion was 2014-11-27 16:14:19 (-10 seconds ago) by
192.168.0.1:6002.
========================================================================
=======
In case there is a cluster which has servers, a server runs with this
patch and the other servers run without this patch. If swift-recon
executes on the server which runs with this patch, there are unnecessary
information on the output such as [failure], [success] and [attempted].
Because other servers which run without this patch are not able to
send a response with information that this patch needs.
Therefore once you apply this patch, you also apply this patch to other
servers before you execute swift-recon.
DocImpact
Change-Id: Iecd33655ae2568482833131f422679996c374d78
Co-Authored-By: Kenichiro Matsuda <matsuda_kenichi@jp.fujitsu.com>
Co-Authored-By: Brian Cline <bcline@softlayer.com>
Implements: blueprint enable-object-replication-failure-in-recon
This patch mostly eliminates the duplicate code that was
deliberately left in place during EC review to avoid major
churn of the diskfile module prior to the kilo release.
This focuses on obvious de-duplication and shuffling code
between classes. It deliberately does not attempt to
hammer out every last piece of de-duplication where that
would introduce more complex changes - that can come later.
Code is moved from the module level and from ECDiskFile*
classes into new BaseDiskFile* classes.
Concrete classes for replication and EC policy retain their
existing names i.e. DiskFile[Manager|Writer|Reader|] and
ECDiskFile[Manager|Writer|Reader|] respectively.
Knock-on changes:
- fix bug whereby get_hashes was ignoring self.reclaim_age
and always using the default arg value.
- replication diskfile manager now deletes a tombstone that is older
than reclaim_age even when there is a newer .meta file.
- replication diskfile manager will no longer raise an
AssertionError if only a .meta file is found during
hash_cleanup_listdir.
- fix stale test in test_auditor.py: test_with_tombstone test
setup was convoluted (probably dates back to when object puts
did not clean up the object dir). Now that they do you have to
try harder to create a dir with a tombstone and a data file.
Change-Id: I963e0d0ae0d6569ad1de605034c529529cbb4f9a
Enabled by a new > 0 integer config value, "servers_per_port" in the
[DEFAULT] config section for object-server and/or replication server
configs. The setting's integer value determines how many different
object-server workers handle requests for any single unique local port
in the ring. In this mode, the parent swift-object-server process
continues to run as the original user (i.e. root if low-port binding
is required), binds to all ports as defined in the ring, and forks off
the specified number of workers per listen socket. The child, per-port
servers drop privileges and behave pretty much how object-server workers
always have, except that because the ring has unique ports per disk, the
object-servers will only be handling requests for a single disk. The
parent process detects dead servers and restarts them (with the correct
listen socket), starts missing servers when an updated ring file is
found with a device on the server with a new port, and kills extraneous
servers when their port is found to no longer be in the ring. The ring
files are stat'ed at most every "ring_check_interval" seconds, as
configured in the object-server config (same default of 15s).
Immediately stopping all swift-object-worker processes still works by
sending the parent a SIGTERM. Likewise, a SIGHUP to the parent process
still causes the parent process to close all listen sockets and exit,
allowing existing children to finish serving their existing requests.
The drop_privileges helper function now has an optional param to
suppress the setsid() call, which otherwise screws up the child workers'
process management.
The class method RingData.load() can be told to only load the ring
metadata (i.e. everything except replica2part2dev_id) with the optional
kwarg, header_only=True. This is used to keep the parent and all
forked off workers from unnecessarily having full copies of all storage
policy rings in memory.
A new helper class, swift.common.storage_policy.BindPortsCache,
provides a method to return a set of all device ports in all rings for
the server on which it is instantiated (identified by its set of IP
addresses). The BindPortsCache instance will track mtimes of ring
files, so they are not opened more frequently than necessary.
This patch includes enhancements to the probe tests and
object-replicator/object-reconstructor config plumbing to allow the
probe tests to work correctly both in the "normal" config (same IP but
unique ports for each SAIO "server") and a server-per-port setup where
each SAIO "server" must have a unique IP address and unique port per
disk within each "server". The main probe tests only work with 4
servers and 4 disks, but you can see the difference in the rings for the
EC probe tests where there are 2 disks per server for a total of 8
disks. Specifically, swift.common.ring.utils.is_local_device() will
ignore the ports when the "my_port" argument is None. Then,
object-replicator and object-reconstructor both set self.bind_port to
None if server_per_port is enabled. Bonus improvement for IPv6
addresses in is_local_device().
This PR for vagrant-swift-all-in-one will aid in testing this patch:
https://github.com/swiftstack/vagrant-swift-all-in-one/pull/16/
Also allow SAIO to answer is_local_device() better; common SAIO setups
have multiple "servers" all on the same host with different ports for
the different "servers" (which happen to match the IPs specified in the
rings for the devices on each of those "servers").
However, you can configure the SAIO to have different localhost IP
addresses (e.g. 127.0.0.1, 127.0.0.2, etc.) in the ring and in the
servers' config files' bind_ip setting.
This new whataremyips() implementation combined with a little plumbing
allows is_local_device() to accurately answer, even on an SAIO.
In the default case (an unspecified bind_ip defaults to '0.0.0.0') as
well as an explict "bind to everything" like '0.0.0.0' or '::',
whataremyips() behaves as it always has, returning all IP addresses for
the server.
Also updated probe tests to handle each "server" in the SAIO having a
unique IP address.
For some (noisy) benchmarks that show servers_per_port=X is at least as
good as the same number of "normal" workers:
https://gist.github.com/dbishop/c214f89ca708a6b1624a#file-summary-md
Benchmarks showing the benefits of I/O isolation with a small number of
slow disks:
https://gist.github.com/dbishop/fd0ab067babdecfb07ca#file-results-md
If you were wondering what the overhead of threads_per_disk looks like:
https://gist.github.com/dbishop/1d14755fedc86a161718#file-tabular_results-md
DocImpact
Change-Id: I2239a4000b41a7e7cc53465ce794af49d44796c6
The Python 2 next() method of iterators was renamed to __next__() on
Python 3. Use the builtin next() function instead which works on Python
2 and Python 3.
Change-Id: Ic948bc574b58f1d28c5c58e3985906dee17fa51d
This patch adds the erasure code reconstructor. It follows the
design of the replicator but:
- There is no notion of update() or update_deleted().
- There is a single job processor
- Jobs are processed partition by partition.
- At the end of processing a rebalanced or handoff partition, the
reconstructor will remove successfully reverted objects if any.
And various ssync changes such as the addition of reconstruct_fa()
function called from ssync_sender which performs the actual
reconstruction while sending the object to the receiver
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
blueprint ec-reconstructor
Change-Id: I7d15620dc66ee646b223bb9fff700796cd6bef51
Adds specific disk file classes for EC policy types.
The new ECDiskFile and ECDiskFileWriter classes are used by the
ECDiskFileManager.
ECDiskFileManager is registered with the DiskFileRouter for use with
EC_POLICY type policies.
Refactors diskfile tests into BaseDiskFileMixin and BaseDiskFileManagerMixin
classes which are then extended in subclasses for the legacy
replication-type DiskFile* and ECDiskFile* classes.
Refactor to prefer use of a policy instance reference over a policy_index
int to refer to a policy.
Add additional verification to DiskFileManager.get_dev_path to validate the
device root with common.constraints.check_dir, even when mount_check is
disabled for use in on a virtual swift-all-in-one.
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: I22f915160dc67a9e18f4738c1ddf068344e8ad5d
* Get FakeConn ready for expect 100 continue
* Use debug_logger more and with better interfaces
* Fix patch_policies to be less annoying
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: I28c0a3539d994cbb8e6b94d63a23ed4ea6cb956d
From rsync's man page:
-z, --compress
With this option, rsync compresses the file data as it is sent to the
destination machine, which reduces the amount of data being transmitted --
something that is useful over a slow connection.
A configurable option has been added to allow rsync to compress, but only
if the remote node is in a different region than the local one.
NOTE: Objects that are already compressed (for example: .tar.gz, .mp3)
might slow down the syncing process.
On wire compression can also be extended to ssync later in a different
change if required. In case of ssync, we could explore faster
compression libraries like lz4. rsync uses zlib which is slow but offers
higher compression ratio.
Change-Id: Ic9b9cbff9b5e68bef8257b522cc352fc3544db3c
Signed-off-by: Prashanth Pai <ppai@redhat.com>
Probetests discovered two issues with the current state of the
object-replicator as a result of the attempts to clean up changes
related to efficient cross-region replication.
Known failures are:
* rsync replication when configured with no sync_method in the config
fails to clean up a handoff partition
* ssync replication when there is only one region fails to cleanup a
handoff partition
In both cases the path resulting in the failure moved through the
implicit else clause (dangling elif) of the partition cleanup code path.
In the ssync case the failure came form a miss on the first if branch
when delete_objs would be None if there is no remote regions. In the
rsync case the failure came from a miss on the second elif condition
when looking for an entry in the conf dict and not setting a default.
This change adds unittests for both failures that should fail in a
reasonable way against master without requiring a probetest run against
other configs, as well as rephrasing the logic in the partition cleanup
handling to try and make the logic flow more explicit.
Change-Id: Ic59d998a3e36a3eb3e509d9fdf7096e812281357
Current code might delete local handoff objects incorrectly
when remote node requires whole of the objects at poking
because empty cand_objs won't be applied to the delete candidate
objects list.
This patch ensures the delete candidate objects list always
will be updated (i.e. it will be empty list when the poke job
find whole local objects are required by remote), and then,
handle deleting objects correctly according to the delete
candidate.
This patch includes a test written by Clay Gerrard at [1].
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
1: https://review.openstack.org/#/c/155542/
Change-Id: Ie8f75ed65c7bfefbb18ddccd9fe0e41b72dca0a4
One log line had a typo, and I refactored the per object cleanup code out of
update_deleted into the per object hashdir cleanup method.
Change-Id: I19d03d0706a75bd8ec2fe327a1eb1b5ec36de6d2
This change provides a efficient way of replication
between regions of a global distributed cluster.
This approach makes object-replicator to push replicas
to a primary node in a remote region, then, to skip
pushing them to next primary node in the region with
expecting asynchronous replication.
This implementation includes a couple of changes on
ssync_sender to allow object-replicator to delete local
handoff objects correctly. One is to return a list of existing
objects in remote region. The list includes local paths of the
objects which exist both on the local device and the remote device.
The other is supporting existence check for specified objects.
It requires the object list build by the first change. When
the object list is given, ssync_sender does only missing_check
based on the list. These changes are needed because current
swift can not handle the existence check in object-level.
Note that this feature will work partially (i.e. only when
primary-to-primary) with rsync.
Implements: blueprint efficient-replication
Change-Id: I5d990444d7977f4127bb37f9256212c893438df1