9 Commits

Author SHA1 Message Date
Alistair Coles
2d55960a22 Fix inconsistent suffix hashes after ssync of tombstone
Consider two replicas of the same object whose ondisk files
have diverged due to failures:

  A has t2.ts
  B has t1.data, t4.meta

(The DELETE at t2 did not make it to B. The POST at t4 was
rejected by A.)

After ssync replication the two ondisk file sets will not be
consistent:

  A has t2.ts  (ssync cannot POST t4.meta to this node)
  B has t2.ts, t4.meta (ssync should not delete t4.meta,
                        there may be a t3.data somewhere)

Consequenty the two nodes will report different hashes for the
object's suffix, and replication will repeat, always with the
inconsistent outcome. This scenario is reproduced by the probe
test added in this patch.

(Note that rsync replication does result in (t2.ts, t4.meta)
on both nodes.)

The solution is to change the way that suffix hashes are
calculated. Currently the names of *all* files found in each
object dir are added to the hash.  With this patch the
timestamps of only those files that could be used to
construct a valid diskfile are added to the hash. File
extensions are appended to the timestamp so that in most
'normal' situations the result of the hashing is the same
as before this patch. That avoids a storm of hash mismatches
when this patch is deployed in an existing cluster.

In the problem case described above, t4.meta is no longer
added to the hash, since it is not useful for constructing
a diskfile. (Note that t4.meta is not deleted because it
may become useful should a t3.data be replicated in future).

Closes-Bug: 1534276
Change-Id: I99e88b8d5f5d9bc22b42112a99634ba942415e05
2016-02-18 15:45:10 +00:00
Alistair Coles
29c10db0cb Add POST capability to ssync for .meta files
ssync currently does the wrong thing when replicating object dirs
containing both a .data and a .meta file. The ssync sender uses a
single PUT to send both object content and metadata to the receiver,
using the metadata (.meta file) timestamp. This results in the object
content timestamp being advanced to the metadata timestamp,
potentially overwriting newer object data on the receiver and causing
an inconsistency with the container server record for the object.

For example, replicating an object dir with {t0.data(etag=x), t2.meta}
to a receiver with t1.data(etag=y) will result in the creation of
t2.data(etag=x) on the receiver. However, the container server will
continue to list the object as t1(etag=y).

This patch modifies ssync to replicate the content of .data and .meta
separately using a PUT request for the data (no change) and a POST
request for the metadata. In effect, ssync replication replicates the
client operations that generated the .data and .meta files so that
the result of replication is the same as if the original client requests
had persisted on all object servers.

Apart from maintaining correct timestamps across sync'd nodes, this has
the added benefit of not needing to PUT objects when only the metadata
has changed and a POST will suffice.

Taking the same example, ssync sender will no longer PUT t0.data but will
POST t2.meta resulting in the receiver having t1.data and t2.meta.

The changes are backwards compatible: an upgraded sender will only sync
data files to a legacy receiver and will not sync meta files (fixing the
erroneous behavior described above); a legacy sender will operate as
before when sync'ing to an upgraded receiver.

Changes:
- diskfile API provides methods to get the data file timestamp
  as distinct from the diskfile timestamp.

- diskfile yield_hashes return tuple now passes a dict mapping data and
  meta (if any) timestamps to their respective values in the timestamp
  field.

- ssync_sender will encode data and meta timestamps in the
  (hash_path, timestamp) tuple sent to the receiver during
  missing_checks.

- ssync_receiver compares sender's data and meta timestamps to any
  local diskfile and may specify that only data or meta parts are sent
  during updates phase by appending a qualifier to the hash returned
  in its 'wanted' list.

- ssync_sender now sends POST subrequests when a meta file
  exists and its content needs to be replicated.

- ssync_sender may send *only* a POST if the receiver indicates that
  is the only part required to be sync'd.

- object server will allow PUT and DELETE with earlier timestamp than
  a POST

- Fixed TODO related to replicated objects with fast-POST and ssync

Related spec change-id: I60688efc3df692d3a39557114dca8c5490f7837e

Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Closes-Bug: 1501528
Change-Id: I97552d194e5cc342b0a3f4b9800de8aa6b9cb85b
2015-10-02 11:24:19 +00:00
paul luse
647b66a2ce Erasure Code Reconstructor
This patch adds the erasure code reconstructor. It follows the
design of the replicator but:
  - There is no notion of update() or update_deleted().
  - There is a single job processor
  - Jobs are processed partition by partition.
  - At the end of processing a rebalanced or handoff partition, the
    reconstructor will remove successfully reverted objects if any.

And various ssync changes such as the addition of reconstruct_fa()
function called from ssync_sender which performs the actual
reconstruction while sending the object to the receiver

Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
blueprint ec-reconstructor
Change-Id: I7d15620dc66ee646b223bb9fff700796cd6bef51
2015-04-14 00:52:17 -07:00
Leah Klearman
ca0fce8542 more probe test refactoring
* move get_to_final_state into ProbeTest
* get rid of kill_servers
* add replicators manager and updaters manager to ProbeTest

(this is all going someplace, i promise)

Change-Id: I8393a2ebc0d04051cae48cc3c49580f70818dbf2
2015-02-13 16:55:45 -08:00
Jenkins
28c99763e9 Merge "Fix ssync send_delete" 2015-02-13 00:18:38 +00:00
Alistair Coles
82e5090848 Fix ssync send_delete
The ssync_sender send_delete method treats its
timestamp argument as a string when in fact it is
passed a Timestamp object. As a result the method
always raises an exception and deletes are never
replicated.

This patch fixes bug and adds unit and probe tests
to verify expected behavior.

Closes-Bug: 1421425

Change-Id: I664fb8d5dfea7362313037a67927ea90021c3f62
2015-02-12 21:44:36 +00:00
Leah Klearman
2c1b5af062 refactor probe tests
* refactor probe tests to use probe.common.ProbeTest
* move reset_environment functionality to ProbeTest.setUp()
* choose rings and policies that meet the criteria - raise SkipTest if
nothing matches
* replace all AssertionErrors in setup with SkipTest

Change-Id: Id56c497d58083f5fd55f5283cdd346840df039d3
2015-02-12 11:30:21 -08:00
Clay Gerrard
01f6e86006 Add Expected Failure for ssync with sys-meta
Sysmeta included with an object PUT persists with the PUT data - if an
internal operation such as POST-as-copy during partial failure, or ssync
with fast-POST (not supported), causes that data to be lost then the
associated sysmeta will also be lost.

Since object sys-meta persistence in the face of a POST when the
original .data is unavailable requires fast-POST with .meta files the
probetest that validates object sys-meta persistence of a POST when the
most up-to-date copy of the object with sys-meta is unavailable
configures an InternalClient with object_post_as_copy = false.

This non-default configuration option is not supported by ssync and
results in a loss of sys-meta very similar to the object sys-meta
failure you would see with object_post_as_copy = true when the COPY part
of the POST is unable to retrieve the most recently written object with
sys-meta.

Until we can fix the default POST behavior to make metadata updates
without stomping on newer data file timestamps we should expect object
sys-meta to be "very very best possible but not really guaranteed
effort".

Until we can fix ssync to replicate metadata updates without stomping on
newer data file timestamps we should expect this test to fail.

When ssync replication of fast-POST metadata update is fixed this test
will fail signaling that the expected failure cruft should be removed,
but other parts of ssync replication will still work and some other bugs
can be fixed while we wait.

Change-Id: Ifc5d49514de79b78f7715408e0fe0908357771d3
2014-11-25 14:28:00 -08:00
anc
4286f36a60 Enable object system metadata on PUTs
This patch takes a first step towards support
for object system metadata by enabling headers
in the x-object-sysmeta- namespace to be
persisted when objects are PUT. This should be
useful for other pending patches such as on
demand migration and server side encryption
(https://review.openstack.org/#/c/64430/ and
https://review.openstack.org/#/c/76578/1).

The x-object-sysmeta- namespace is already
reserved/protected by the gatekeeper and
passed through the proxy. This patch modifies
the object server to persist these headers
alongside user metadata when an object is
PUT.

This patch will preserve existing object
system metadata and ignore any new system
metadata when handling object POSTs,
including POST-as-copy operations. Support
for modification of object system metadata
with a POST request requires further work
as discussed in the blueprint.

This patch will preserve existing object
system metadata and update it with new
system metadata when copying an object.

A new probe test is added which makes use of
the BrainSplitter class that has been moved
from test_container_merge_policy_index.py to
a new module brain.py.

blueprint object-system-metadata

Change-Id: If716bc15730b7322266ebff4ab8dd31e78e4b962
2014-08-01 16:41:33 -07:00