44 Commits

Author SHA1 Message Date
Andreas Jaeger
96b56519bf Update hacking for Python3
The repo is Python using both Python 2 and 3 now, so update hacking to
version 2.0 which supports Python 2 and 3. Note that latest hacking
release 3.0 only supports version 3.

Fix problems found.

Remove hacking and friends from lower-constraints, they are not needed
for installation.

Change-Id: I9bd913ee1b32ba1566c420973723296766d1812f
2020-04-03 21:21:07 +02:00
Tim Burke
d270596b67 Consistently use io.BytesIO
Change-Id: Ic41b37ac75b5596a8307c4962be86f2a4b0d9731
2019-10-15 15:09:46 +02:00
Tim Burke
d9cafca246 py3: port ssync
Change-Id: I63a502be13f5dcda2a457d38f2fc5f1ca469d562
2019-06-05 13:26:18 -07:00
Clay Gerrard
fb0e7837af Cleanup EC and SSYNC frag index parameters
An object node should reject a PUT with 409 when the timestamp is less
than or equal to the timestamp of an existing version of the object.

However, if the PUT is part of an SSYNC, and the fragment archive has a
different index than the one on disk we may store it.

We should store it we're the primary holder for that fragment index.

Back before the related change we used to revert fragments to handoffs
and it caused a lot of problems.  Mainly multiple frag indexes piling up
on one handoff node.  Eventually we settled on handoffs only reverting
to primaries but there was some crufty flailing left over.

When EC frag duplication (multi-region EC) came in we also added a new
complexity because a node's primary index (the index in part_nodes list)
was no longer universially equal to the EC frag index (the storage
policy backend end index).  There was a few places we assumed
node_index == frag_index, some of which caused bugs which we've fixed.

This change tries to clean all that up.

Related-Change-Id: Ie351d8342fc8e589b143f981e95ce74e70e52784

Change-Id: I3c5935e2d5f1cd140cf52df779596ebd6442686c
2019-02-04 17:02:17 -06:00
Zuul
b9d2c08e8d Merge "Fix SSYNC concurrency on partition" 2018-12-11 23:34:54 +00:00
Romain LE DISEZ
014d46f9a7 Fix SSYNC concurrency on partition
Commit e199192caefef068b5bf57da8b878e0bc82e3453 introduced the ability
to have multiple SSYNC running on a single device. It misses a security
to ensure that only one SSYNC request can be running on a partition.

This commit update replication_lock to lock N times the device, then
lock once the partition related to a SSYNC request.

Change-Id: Id053ed7dd355d414d7920dda79a968a1c6677c14
2018-12-04 14:47:26 +01:00
Tim Burke
3420921a33 Clean up HASH_PATH_* patching
Previously, we'd sometimes shove strings into HASH_PATH_PREFIX or
HASH_PATH_SUFFIX, which would blow up on py3. Now, always use bytes.

Change-Id: Icab9981e8920da505c2395eb040f8261f2da6d2e
2018-11-01 20:52:33 +00:00
Samuel Merritt
728b4ba140 Add checksum to object extended attributes
Currently, our integrity checking for objects is pretty weak when it
comes to object metadata. If the extended attributes on a .data or
.meta file get corrupted in such a way that we can still unpickle it,
we don't have anything that detects that.

This could be especially bad with encrypted etags; if the encrypted
etag (X-Object-Sysmeta-Crypto-Etag or whatever it is) gets some bits
flipped, then we'll cheerfully decrypt the cipherjunk into plainjunk,
then send it to the client. Net effect is that the client sees a GET
response with an ETag that doesn't match the MD5 of the object *and*
Swift has no way of detecting and quarantining this object.

Note that, with an unencrypted object, if the ETag metadatum gets
mangled, then the object will be quarantined by the object server or
auditor, whichever notices first.

As part of this commit, I also ripped out some mocking of
getxattr/setxattr in tests. It appears to be there to allow unit tests
to run on systems where /tmp doesn't support xattrs. However, since
the mock is keyed off of inode number and inode numbers get re-used,
there's lots of leakage between different test runs. On a real FS,
unlinking a file and then creating a new one of the same name will
also reset the xattrs; this isn't the case with the mock.

The mock was pretty old; Ubuntu 12.04 and up all support xattrs in
/tmp, and recent Red Hat / CentOS releases do too. The xattr mock was
added in 2011; maybe it was to support Ubuntu Lucid Lynx?

Bonus: now you can pause a test with the debugger, inspect its files
in /tmp, and actually see the xattrs along with the data.

Since this patch now uses a real filesystem for testing filesystem
operations, tests are skipped if the underlying filesystem does not
support setting xattrs (eg tmpfs or more than 4k of xattrs on ext4).

References to "/tmp" have been replaced with calls to
tempfile.gettempdir(). This will allow setting the TMPDIR envvar in
test setup and getting an XFS filesystem instead of ext4 or tmpfs.

THIS PATCH SIGNIFICANTLY CHANGES TESTING ENVIRONMENTS

With this patch, every test environment will require TMPDIR to be
using a filesystem that supports at least 4k of extended attributes.
Neither ext4 nor tempfs support this. XFS is recommended.

So why all the SkipTests? Why not simply raise an error? We still need
the tests to run on the base image for OpenStack's CI system. Since
we were previously mocking out xattr, there wasn't a problem, but we
also weren't actually testing anything. This patch adds functionality
to validate xattr data, so we need to drop the mock.

`test.unit.skip_if_no_xattrs()` is also imported into `test.functional`
so that functional tests can import it from the functional test
namespace.

The related OpenStack CI infrastructure changes are made in
https://review.openstack.org/#/c/394600/.

Co-Authored-By: John Dickinson <me@not.mn>

Change-Id: I98a37c0d451f4960b7a12f648e4405c6c6716808
2017-11-03 13:30:05 -04:00
Romain LE DISEZ
e199192cae Replace replication_one_per_device by custom count
This commit replaces boolean replication_one_per_device by an integer
replication_concurrency_per_device. The new configuration parameter is
passed to utils.lock_path() which now accept as an argument a limit for
the number of locks that can be acquired for a specific path.

Instead of trying to lock path/.lock, utils.lock_path() now tries to lock
files path/.lock-X, where X is in the range (0, N), N being the limit for
the number of locks allowed for the path. The default value of limit is
set to 1.

Change-Id: I3c3193344c7a57a8a4fc7932d1b10e702efd3572
2017-10-24 16:17:41 +01:00
Pavel Kvasnička
163fb4d52a Always require device dir for containers
For test purposes (e.g. saio probetests) even if mount_check is False,
still require check_dir for account/container server storage when real
mount points are not used.

This behavior is consistent with the object-server's checks in diskfile.

Co-Author: Clay Gerrard <clay.gerrard@gmail.com>
Related lp bug #1693005
Related-Change-Id: I344f9daaa038c6946be11e1cf8c4ef104a09e68b
Depends-On: I52c4ecb70b1ae47e613ba243da5a4d94e5adedf2
Change-Id: I3362a6ebff423016bb367b4b6b322bb41ae08764
2017-09-01 10:32:12 -07:00
lingyongxu
ee9458a250 Using assertIsNone() instead of assertEqual(None)
Following OpenStack Style Guidelines:
[1] http://docs.openstack.org/developer/hacking/#unit-tests-and-assertraises
[H203] Unit test assertions tend to give better messages for more specific
assertions. As a result, assertIsNone(...) is preferred over
assertEqual(None, ...) and assertIs(..., None)

Change-Id: If4db8872c4f5705c1fff017c4891626e9ce4d1e4
2017-06-07 14:05:53 +08:00
Jenkins
45b17d89c7 Merge "Fix SSYNC failing to replicate unexpired object" 2017-05-31 22:49:55 +00:00
Pete Zaitcev
5dfc3a75fb Open-code eventlet.listen()
Recently out gate started blowing up intermittently with a strange
case of ports mixed up. Sometimes a functional tests tries to
authorize on a port that's clearly an object server port, and
the like. As it turns out, eventlet developers added an unavoidable
SO_REUSEPORT into listen(), which makes listen(("localhost",0)
to reuse ports.

There's an issue about it:
 https://github.com/eventlet/eventlet/issues/411

This patch is working around the problem while eventlet people
consider the issue.

Change-Id: I67522909f96495a6a30e1acdb79835dce2189549
2017-05-11 01:39:14 -06:00
Romain LE DISEZ
38d35797df Fix SSYNC failing to replicate unexpired object
Fix a situation where SSYNC would fail to replicate a valid object because
the datafile contains an expired X-Delete-At information while a metafile
contains no X-Delete-At information. Example:
 - 1454619054.02968.data => contains X-Delete-At: 1454619654
 - 1454619056.04876.meta => does not contain X-Delete-At info

In this situation, the replicator tries to PUT the datafile, and then
to POST the metadata. Previously, if the receiver has the datafile but
current time is greater than the X-Delete-At, then it considers it to
be expired and requests no updates from the sender, so the metafile is
never synced. If the receiver does not have the datafile then it does
request updates from the sender, but the ssync PUT subrequest is
refused if the current time is greater than the X-Delete-At (the
object is expired). If the datafile is transfered, the ssync POST
subrequest fails because the object does not exist (expired).

This commit allows PUT and POST to works so that the object can be
replicated, by enabling the receiver object server to open expired
diskfiles when handling replication requests.

Closes-Bug: #1683689
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I919994ead2b20dbb6c5671c208823e8b7f513715
2017-04-26 11:29:40 +01:00
Jenkins
264e728364 Merge "Prevent ssync writing bad fragment data to diskfile" 2016-10-14 23:29:29 +00:00
Alistair Coles
3218f8b064 Prevent ssync writing bad fragment data to diskfile
Previously, if a reconstructor sync type job failed to provide
sufficient bytes from a reconstructed fragment body iterator to match
the content-length that the ssync sender had already sent to the ssync
receiver, the sender would still proceed to send the next
subrequest. The ssync receiver might then write the start of the next
subrequest to the partially complete diskfile for the previous
subrequest (including writing subrequest headers to that diskfile)
until it has received content-length bytes.

Since a reconstructor ssync job does not send an ETag header (it
cannot because it does not know the ETag of a reconstructed fragment
until it has been sent) then the receiving object server does not
detect the "bad" data written to the fragment diskfile, and worse,
will label it with an ETag that matches the md5 sum of the bad
data. The bad fragment file will therefore appear good to the auditor.

There is no easy way for the ssync sender to communicate a lack of
source data to the receiver other than by disconnecting the
session. So this patch adds a check in the ssync sender that the sent
byte count is equal to the sent Content-Length header value for each
subrequest, and disconnect if a mismatch is detected.

The disconnect prevents the receiver finalizing the bad diskfile, but
also prevents subsequent fragments in the ssync job being sync'd until
the next cycle.

Closes-Bug: #1631144
Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>

Change-Id: I54068906efdb9cd58fcdc6eae7c2163ea92afb9d
2016-10-13 17:15:10 +01:00
Alistair Coles
b13b49a27c EC - eliminate .durable files
Instead of using a separate .durable file to indicate
the durable status of a .data file, rename the .data
to include a durable marker in the filename. This saves
one inode for every EC fragment archive.

An EC policy PUT will, as before, first rename a temp
file to:

   <timestamp>#<frag_index>.data

but now, when the object is committed, that file will be
renamed:

   <timestamp>#<frag_index>#d.data

with the '#d' suffix marking the data file as durable.

Diskfile suffix hashing returns the same result when the
new durable-data filename or the legacy durable file is
found in an object directory. A fragment archive that has
been created on an upgraded object server will therefore
appear to be in the same state, as far as the consistency
engine is concerned, as the same fragment archive created
on an older object server.

Since legacy .durable files will still exist in deployed
clusters, many of the unit tests scenarios have been
duplicated for both new durable-data filenames and legacy
durable files.

Change-Id: I6f1f62d47be0b0ac7919888c77480a636f11f607
2016-10-10 18:11:02 +01:00
Alistair Coles
d456d9e934 Don't ssync data when only a durable is missing
If an EC diskfile is missing its .durable file (for example
due to a partial PUT failure) then the ssync missing check
will fail to open the file and will consider it
missing. This can result in possible reconstruction of the
fragment archive (for a sync job) and definite transmission
of the fragment archive (for sync and revert jobs), which is
wasteful.

This patch makes the ssync receiver inspect the diskfile
state after attempting to open it, and if fragments exist at
the timestamp of the sender's diskfile, but a .durable file
is missing, then the receiver will commit the diskfile at
the sender's timestamp. As a result, there is no longer any
need to send a fragment archive.

Change-Id: I4766864fcc0a3553976e8fd85bbb2fc782f04abd
2016-03-04 15:39:52 +00:00
Alistair Coles
e91de49d68 Update container on fast-POST
This patch makes a number of changes to enable content-type
metadata to be updated when using the fast-POST mode of
operation, as proposed in the associated spec [1].

* the object server and diskfile are modified to allow
  content-type to be updated by a POST and the updated value
  to be stored in .meta files.

* the object server accepts PUTs and DELETEs with older
  timestamps than existing .meta files. This is to be
  consistent with replication that will leave a later .meta
  file in place when replicating a .data file.

* the diskfile interface is modified to provide accessor
  methods for the content-type and its timestamp.

* the naming of .meta files is modified to encode two
  timestamps when the .meta file contains a content-type value
  that was set prior to the latest metadata update; this
  enables consistency to be achieved when rsync is used for
  replication.

* ssync is modified to sync meta files when content-type
  differs between local and remote copies of objects.

* the object server issues container updates when handling
  POST requests, notifying the container server of the current
  immutable metadata (etag, size, hash, swift_bytes),
  content-type with their respective timestamps, and the
  mutable metadata timestamp.

* the container server maintains the most recently reported
  values for immutable metadata, content-type and mutable
  metadata, each with their respective timestamps, in a single
  db row.

* new probe tests verify that replication achieves eventual
  consistency of containers and objects after discrete updates
  to content-type and mutable metadata, and that container-sync
  sync's objects after fast-post updates.

[1] spec change-id: I60688efc3df692d3a39557114dca8c5490f7837e

Change-Id: Ia597cd460bb5fd40aa92e886e3e18a7542603d01
2016-03-03 14:25:10 +00:00
Alistair Coles
6858510b59 Re-organise ssync tests
We have some tests that exercise both the sender and receiver,
but are spread across test_ssync_sender.py and test_ssync_receiver.py.
This creates a new module test_ssync.py and moves the end-to-end tests
into there.

Change-Id: Iea3e9932734924453f7241432afda90abbc75c06
2015-11-05 14:50:28 +00:00
Samuel Merritt
e31ecb24b6 Get rid of contextlib.nested() for py3
contextlib.nested() is missing completely in Python 3.

Since 2.7, we can use multiple context managers in a 'with' statement,
like so:

    with thing1() as t1, thing2() as t2:
        do_stuff()

Now, if we had some code that needed to nest an arbitrary number of
context managers, there's stuff we could do with contextlib.ExitStack
and such... but we don't. We only use contextlib.nested() in tests to
set up bunches of mocks without crazy-deep indentation, and all that
stuff fits perfectly into multiple-context-manager 'with' statements.

Change-Id: Id472958b007948f05dbd4c7fb8cf3ffab58e2681
2015-10-23 11:44:54 -07:00
Victor Stinner
8f85427939 py3: Replace gen.next() with next(gen)
The next() method of Python 2 generators was renamed to __next__().
Call the builtin next() function instead which works on Python 2 and
Python 3.

The patch was generated by the next operation of the sixer tool.

Change-Id: Id12bc16cba7d9b8a283af0d392188a185abe439d
2015-10-08 15:40:06 +02:00
Alistair Coles
29c10db0cb Add POST capability to ssync for .meta files
ssync currently does the wrong thing when replicating object dirs
containing both a .data and a .meta file. The ssync sender uses a
single PUT to send both object content and metadata to the receiver,
using the metadata (.meta file) timestamp. This results in the object
content timestamp being advanced to the metadata timestamp,
potentially overwriting newer object data on the receiver and causing
an inconsistency with the container server record for the object.

For example, replicating an object dir with {t0.data(etag=x), t2.meta}
to a receiver with t1.data(etag=y) will result in the creation of
t2.data(etag=x) on the receiver. However, the container server will
continue to list the object as t1(etag=y).

This patch modifies ssync to replicate the content of .data and .meta
separately using a PUT request for the data (no change) and a POST
request for the metadata. In effect, ssync replication replicates the
client operations that generated the .data and .meta files so that
the result of replication is the same as if the original client requests
had persisted on all object servers.

Apart from maintaining correct timestamps across sync'd nodes, this has
the added benefit of not needing to PUT objects when only the metadata
has changed and a POST will suffice.

Taking the same example, ssync sender will no longer PUT t0.data but will
POST t2.meta resulting in the receiver having t1.data and t2.meta.

The changes are backwards compatible: an upgraded sender will only sync
data files to a legacy receiver and will not sync meta files (fixing the
erroneous behavior described above); a legacy sender will operate as
before when sync'ing to an upgraded receiver.

Changes:
- diskfile API provides methods to get the data file timestamp
  as distinct from the diskfile timestamp.

- diskfile yield_hashes return tuple now passes a dict mapping data and
  meta (if any) timestamps to their respective values in the timestamp
  field.

- ssync_sender will encode data and meta timestamps in the
  (hash_path, timestamp) tuple sent to the receiver during
  missing_checks.

- ssync_receiver compares sender's data and meta timestamps to any
  local diskfile and may specify that only data or meta parts are sent
  during updates phase by appending a qualifier to the hash returned
  in its 'wanted' list.

- ssync_sender now sends POST subrequests when a meta file
  exists and its content needs to be replicated.

- ssync_sender may send *only* a POST if the receiver indicates that
  is the only part required to be sync'd.

- object server will allow PUT and DELETE with earlier timestamp than
  a POST

- Fixed TODO related to replicated objects with fast-POST and ssync

Related spec change-id: I60688efc3df692d3a39557114dca8c5490f7837e

Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Closes-Bug: 1501528
Change-Id: I97552d194e5cc342b0a3f4b9800de8aa6b9cb85b
2015-10-02 11:24:19 +00:00
paul luse
a3facce53c Fix invalid frag_index header in ssync_sender when reverting EC tombstones
Back in d124ce [1] we failed to recognize the situation where a revert
job would have an explicit frag_index key wth the literal value None
which would take precedence over the dict.get's default value of ''.
Later in ssync_receiver we'd bump into the ValueError converting 'None'
to an int (again).

In ssync_sender we now handle literal None's correctly and should
hopefully no longer put this invalid headers on the wire - but for belts
and braces we'll also update ssync_receiver to raise a 400 series error
and ssync_sender to better log the error messages.

1. https://review.openstack.org/#/c/195457/

Co-Author: Clay Gerrard <clay.gerrard@gmail.com>
Co-Author: Alistair Coles <alistair.coles@hp.com>
Change-Id: Ic71ba7cc82487773214030207bb193f425319449
Closes-Bug: 1489546
2015-09-08 14:58:11 -07:00
Clay Gerrard
05de1305a9 Make ssync_sender send valid chunked requests
The connect method of ssync_sender tells the remote connection that it's
going to send a valid HTTP chunked request, but if the remote end needs
to respond with an error of any kind sender throws HTTP right out the
window, picks up his ball, and closes the socket down hard - much to the
surprise of the eventlet.wsgi server who up to this point had been
playing along quite nicely with this 'SSYNC' nonsense assuming that
everyone here is consenting mature adults.

If you're going to make a "Transfer-Encoding: chunked" request have the
good decency to finish the job with a proper '0\r\n\r\n'. [1]

N.B. It might be possible to handle an error status during the
initialize_request phase with some sort of 100-continue support, but
honestly it's not entirely clear to me when the server isn't going to
close the connection if the client is still expected to send the body
[2] - further if the error comes later during missing_check or updates
we'll for sure want to send the chunk transfer termination line before
we close down the socket and this way we cover both.

1. Really, eventlet.wsgi shouldn't be so blasted brittle about this [3]
2. https://lists.w3.org/Archives/Public/ietf-http-wg/2005AprJun/0007.html
3. c3ce3eef0b

Closes-Bug #1489587
Change-Id: Ic17c6c3075553f8cf6ef6213e62a00282f0d01cf
2015-08-28 11:38:05 -07:00
janonymous
9456af35a2 pep8 fix: assertEquals -> assertEqual
assertEquals is deprecated in py3,changes in
dir:
*test/unit/obj/*
*test/unit/test_locale/*

Change-Id: I3dd0c1107165ac529f1cd967363e5cf408a1d02b
2015-08-07 19:28:35 +05:30
Alistair Coles
6f89f71f9b Filter Etag key from ssync replication-headers
ssync rx sends a header X-Backend-Replication-Headers whose value is a
list of headers that the source object has. This list extends the list
of allowed headers on the target object server, so that the target
object metadata is faithfully reconstructed to match the source.

Unfortunately the combination of lower() and title() operations on
header keys results in the source 'ETag' value being added to the target
metadata under the key 'Etag' in addition to the 'ETag' key that the
receiving server adds (note different capitilization), both having
the same value.

The spurious 'Etag' metadata is potentially confusing for humans
inspecting the object metadata and complicates tests that wish to
assert the equality of two object metadata dicts. See for example the
test in test_ssync_sender.py that this patch cleans up.

Furthermore, the possibility of having both Etag and ETag keys has
required a workaround in the EC reconstructor [1].

[1] reconstructor fix change id: Ie59ad93a67a7f439c9a84cd9cff31540f97f334a

Change-Id: I0c89cf7924a4471bb6d268b5ef3884e2d2cb4286
2015-07-24 21:51:10 -07:00
Victor Stinner
6e70f3fa32 Get StringIO and cStringIO from six.moves
* replace "from cStringIO import StringIO"
  with "from six.moves import cStringIO as StringIO"
* replace "from StringIO import StringIO"
  with "from six import StringIO"
* replace "import cStringIO" and "cStringIO.StringIO()"
  with "from six import moves" and "moves.cStringIO()"
* replace "import StringIO" and "StringIO.StringIO()"
  with "import six" and "six.StringIO()"

This patch was generated by the stringio operation of the sixer tool:
https://pypi.python.org/pypi/sixer

Change-Id: Iacba77fec3045f96773d1090c0bd48613729a561
2015-07-15 16:56:33 +02:00
Kota Tsuyuzaki
81ceee056d Add one more test for ssync_receiver
To prevent 409 conflict on a primary node during ssyncing,
ssync-receiver should add x-ssync-backend-frag-index generated from
x-ssync-backend-node-index of the SSYNC replication request header.
The change is done by previous work[1], but we need more test for that.

This patch addes one more assertion if the x-ssync-backend-frag-index
is in the ssync subrequest correctly.

*BONUS*
Fix some weird mock and add some sanities.

1: https://review.openstack.org/#/c/191521/

Change-Id: I27aff713c69017c0bc4c60b4833184e1285595d7
2015-06-24 22:30:47 -07:00
paul luse
ac8a769585 EC Ssync: Update parms to include node and frag indices
Previously we sent the ssync backend frag index based on the node
index.  We need to be more specific for ssync to handle both sync
and revert cases so now we send the frag index based on the job
contents (as determined by the ec recon)) and the node index
as a new header based on, well, the node index.

The rcvr can now validate the incoming pair to reject (400) when
a primary node is being asked to accept fragments that don't
belong to it.  Additionally, by having the frag index the
rcvr can reject (409) an attempt to accept a fragment when its
a handoff and already has one that needs to be reverted.

Fixes-bug: #1452619
Change-Id: I8287b274bbbd00903c1975fe49375590af697be4
2015-06-19 16:30:11 -07:00
Alistair Coles
3aa06f185a Make SSYNC receiver return a reponse when initial checks fail
The ssync Receiver performs some checks on request parameters
in initialize_request() before starting the exchange of missing
hashes and updates e.g. the destination device must be available;
the policy must be valid. Currently if any of these checks fails
then the receiver just closes the connection, so the Sender gets
no useful response code and noise is generated in logs by httplib
and wsgi Exceptions.

This change moves the request parameter checks to the Receiver
constructor so that the HTTPExceptions raised are actually sent
as responses. (The 'connection close' exception handling still
applies once the 'missing_check' and 'updates' handshakes are in
progress.)

Moving initialize_request() revealed the following lurking bug:
 * initialize_request() sets
       req.environ['eventlet.minimum_write_chunk_size'] = 0
 * this was previously ineffective because the Response environ
   had already been copied from Request environ before this value
   was set, so the Response never used the value :/
 * Now that it is effective (a good thing) it causes the empty string
   yielded by the receiver when there are no missing hashes in
   missing_checks() to be sent to the sender immediately. This makes
   the Sender.readline() think there has been an early disconnect
   and raise an Exception (a bad thing), as revealed by
   test/unit/obj/test_ssync_sender.py:TestSsync.test_nothing_to_sync

The fix for this is to simply make the receiver skip sending the empty
string if there are no missing object_hashes.

Change-Id: I036a6919fead6e970505dccbb0da7bfbdf8cecc3
2015-05-27 15:01:11 +01:00
Clay Gerrard
a3559edc23 Exclude local_dev from sync partners on failure
If the primary left or right hand partners are down, the next best thing
is to validate the rest of the primary nodes.  Where the rest should
exclude not just the left and right hand partners - but ourself as well.

This fixes a accidental noop when partner node is unavailable and
another node is missing data.

Validation:

Add probetests to cover ssync failures for the primary sync_to nodes for
sync jobs.

Drive-by:

Make additional plumbing for the check_mount and check_dir constraints into
the remaining daemons.

Change-Id: I4d1c047106c242bca85c94b569d98fd59bb255f4
2015-05-26 12:50:31 -07:00
paul luse
647b66a2ce Erasure Code Reconstructor
This patch adds the erasure code reconstructor. It follows the
design of the replicator but:
  - There is no notion of update() or update_deleted().
  - There is a single job processor
  - Jobs are processed partition by partition.
  - At the end of processing a rebalanced or handoff partition, the
    reconstructor will remove successfully reverted objects if any.

And various ssync changes such as the addition of reconstruct_fa()
function called from ssync_sender which performs the actual
reconstruction while sending the object to the receiver

Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
blueprint ec-reconstructor
Change-Id: I7d15620dc66ee646b223bb9fff700796cd6bef51
2015-04-14 00:52:17 -07:00
Samuel Merritt
b1eda4aef8 Allow sending object metadata after data
This lets the proxy server send object metadata to the object server
after the object data. This is necessary for EC, as it allows us to
compute the etag of the object in the proxy server and still store it
with the object.

The wire format is a multipart MIME document. For sanity during a
rolling upgrade, the multipart MIME document is only sent to the
object server if it indicates, via 100 Continue header, that it knows
how to consume it.

Example 1 (new proxy, new obj server):

   proxy: PUT /p/a/c/o
          X-Backend-Obj-Metadata-Footer: yes

     obj: 100 Continue
        X-Obj-Metadata-Footer: yes

   proxy: --MIMEmimeMIMEmime...

Example2: (new proxy, old obj server)

   proxy: PUT /p/a/c/o
          X-Backend-Obj-Metadata-Footer: yes

     obj: 100 Continue

   proxy: <obj body>

Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: Id38f7e93e3473f19ff88123ae0501000ed9b2e89
2015-04-14 00:52:17 -07:00
Paul Luse
873c52e608 Replace POLICY and POLICY_INDEX with string literals
Replaced throughout code base &  tox'd. Functional as well
as probe tests pass with and without policies defined.

POLICY --> 'X-Storage-Policy'
POLICY_INDEX --> 'X-Backend-Storage-Policy-Index'

Change-Id: Iea3d06de80210e9e504e296d4572583d7ffabeac
2014-06-23 12:52:50 -07:00
Paul Luse
b9707d497c Add Storage Policy Support to ssync
This patch makes ssync policy aware so that clusters using storage
policies and ssync replication will replicate objects in all policies.

DocImpact
Implements: blueprint storage-policies
Change-Id: I64879077676d764c6330e03734fc6665bb26f552
2014-06-18 17:31:38 -07:00
Clay Gerrard
3824ff3df7 Add Storage Policy support to Object Server
Objects now have a storage policy index associated with them as well;
this is determined by their filesystem path. Like before, objects in
policy 0 are in /srv/node/$disk/objects; this provides compatibility
on upgrade. (Recall that policy 0 is given to all existing data when a
cluster is upgraded.) Objects in policy 1 are in
/srv/node/$disk/objects-1, objects in policy 2 are in
/srv/node/$disk/objects-2, and so on.

 * 'quarantined' dir already created 'objects' subdir so now there
   will also be objects-N created at the same level

This commit does not address replicators, auditors, or updaters except
where method signatures changed. They'll still work if your cluster
has only one storage policy, though.

DocImpact
Implements: blueprint storage-policies
Change-Id: I459f3ed97df516cb0c9294477c28729c30f48e09
2014-06-18 17:31:38 -07:00
Alex Gaynor
032f0bfc7c Fix several typos in the codebase.
These were found using https://github.com/intgr/topy

Change-Id: I0dc7b76c44b8b17b1dcd79184dad1516fb11173c
2014-04-25 20:14:09 -07:00
Jenkins
9a2275cfd2 Merge "Make ssync_receiver drop conn on exc" 2013-12-19 20:06:32 +00:00
John Dickinson
883eb48502 update ssync tests to use req.get_response()
Change-Id: I2fa752b42669fab27d1a8e7562d52d8f8b848351
2013-12-16 12:33:08 -08:00
gholt
34c2a45d8d Make ssync_receiver drop conn on exc
Before, the ssync receiver wouldn't hang up the connection with all
errors. Now it will. Hanging up early will ensure the remote end gets
a "broken pipe" type error right away instead of it possible sending
more data for quite some time before finally reading the error.

Change-Id: I2d3d55baaf10f99e7edce7843a7106875752020a
2013-12-06 04:23:31 +00:00
gholt
c859ebf5ce Per device replication_lock
New replication_one_per_device (True by default)
that restricts incoming REPLICATION requests to
one per device, replication_currency allowing.

Also has replication_lock_timeout (15 by default)
to control how long a request will wait to obtain
a replication device lock before giving up.

This should be very useful in that you can be
assured any concurrent REPLICATION requests are
each writing to distinct devices. If you have 100
devices on a server, you can set
replication_concurrency to 100 and be confident
that, even if 100 replication requests were
executing concurrently, they'd each be writing to
separate devices. Before, all 100 could end up
writing to the same device, bringing it to a
horrible crawl.

NOTE: This is only for ssync replication. The
current default rsync replication still has the
potentially horrible behavior.

Change-Id: I36e99a3d7e100699c76db6d3a4846514537ff685
2013-11-22 21:40:29 +00:00
Samuel Merritt
b468f79daa Fix test to work with mock 0.8.0
This particular test case used a construct that worked in mock 1.0,
but not in 0.8. Our test-requirements.txt says mock>=0.8.0.

Change-Id: I93ed4b7b5d169572bbed6490cb9b0dd421a3b9e2
2013-11-13 12:52:21 -08:00
gholt
a80c720af5 Object replication ssync (an rsync alternative)
For this commit, ssync is just a direct replacement for how
we use rsync. Assuming we switch over to ssync completely
someday and drop rsync, we will then be able to improve the
algorithms even further (removing local objects as we
successfully transfer each one rather than waiting for whole
partitions, using an index.db with hash-trees, etc., etc.)

For easier review, this commit can be thought of in distinct
parts:

1)  New global_conf_callback functionality for allowing
    services to perform setup code before workers, etc. are
    launched. (This is then used by ssync in the object
    server to create a cross-worker semaphore to restrict
    concurrent incoming replication.)

2)  A bit of shifting of items up from object server and
    replicator to diskfile or DEFAULT conf sections for
    better sharing of the same settings. conn_timeout,
    node_timeout, client_timeout, network_chunk_size,
    disk_chunk_size.

3)  Modifications to the object server and replicator to
    optionally use ssync in place of rsync. This is done in
    a generic enough way that switching to FutureSync should
    be easy someday.

4)  The biggest part, and (at least for now) completely
    optional part, are the new ssync_sender and
    ssync_receiver files. Nice and isolated for easier
    testing and visibility into test coverage, etc.

All the usual logging, statsd, recon, etc. instrumentation
is still there when using ssync, just as it is when using
rsync.

Beyond the essential error and exceptional condition
logging, I have not added any additional instrumentation at
this time. Unless there is something someone finds super
pressing to have added to the logging, I think such
additions would be better as separate change reviews.

FOR NOW, IT IS NOT RECOMMENDED TO USE SSYNC ON PRODUCTION
CLUSTERS. Some of us will be in a limited fashion to look
for any subtle issues, tuning, etc. but generally ssync is
an experimental feature. In its current implementation it is
probably going to be a bit slower than rsync, but if all
goes according to plan it will end up much faster.

There are no comparisions yet between ssync and rsync other
than some raw virtual machine testing I've done to show it
should compete well enough once we can put it in use in the
real world.

If you Tweet, Google+, or whatever, be sure to indicate it's
experimental. It'd be best to keep it out of deployment
guides, howtos, etc. until we all figure out if we like it,
find it to be stable, etc.

Change-Id: If003dcc6f4109e2d2a42f4873a0779110fff16d6
2013-11-07 16:52:01 +00:00