To minimize external library dependencies for Swift unit
tests and SAIO, PyECLib 1.1.1 introduces a native backend
'liberasurecode_rs_vand.' This patch is to migrate over
the unit tests to the new ec_type when available.
This change will work with current pyeclib requirements
(==1.0.7) and also future requirements (>=1.0.7).
When we're able to raise *our* requirements to >=1.1.1 we
should remove jerasure from the list of preferred backends.
Related SAIO doc and example config changes should be
included with that patch.
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: Idf657f0acf0479bc8158972e568a29dbc08eaf3b
On a full disk, a call to delete an object will fail when it tries to
write tombstones. Handling DiskFileNoSpace exception raised by
swift.common.utils.
Change-Id: I8f0cfcc4159ee154fcd3e7ca90c422aa5aadf0b3
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
Closes-Bug: 1491675
The next() method of Python 2 generators was renamed to __next__().
Call the builtin next() function instead which works on Python 2 and
Python 3.
The patch was generated by the next operation of the sixer tool.
Change-Id: Id12bc16cba7d9b8a283af0d392188a185abe439d
ssync currently does the wrong thing when replicating object dirs
containing both a .data and a .meta file. The ssync sender uses a
single PUT to send both object content and metadata to the receiver,
using the metadata (.meta file) timestamp. This results in the object
content timestamp being advanced to the metadata timestamp,
potentially overwriting newer object data on the receiver and causing
an inconsistency with the container server record for the object.
For example, replicating an object dir with {t0.data(etag=x), t2.meta}
to a receiver with t1.data(etag=y) will result in the creation of
t2.data(etag=x) on the receiver. However, the container server will
continue to list the object as t1(etag=y).
This patch modifies ssync to replicate the content of .data and .meta
separately using a PUT request for the data (no change) and a POST
request for the metadata. In effect, ssync replication replicates the
client operations that generated the .data and .meta files so that
the result of replication is the same as if the original client requests
had persisted on all object servers.
Apart from maintaining correct timestamps across sync'd nodes, this has
the added benefit of not needing to PUT objects when only the metadata
has changed and a POST will suffice.
Taking the same example, ssync sender will no longer PUT t0.data but will
POST t2.meta resulting in the receiver having t1.data and t2.meta.
The changes are backwards compatible: an upgraded sender will only sync
data files to a legacy receiver and will not sync meta files (fixing the
erroneous behavior described above); a legacy sender will operate as
before when sync'ing to an upgraded receiver.
Changes:
- diskfile API provides methods to get the data file timestamp
as distinct from the diskfile timestamp.
- diskfile yield_hashes return tuple now passes a dict mapping data and
meta (if any) timestamps to their respective values in the timestamp
field.
- ssync_sender will encode data and meta timestamps in the
(hash_path, timestamp) tuple sent to the receiver during
missing_checks.
- ssync_receiver compares sender's data and meta timestamps to any
local diskfile and may specify that only data or meta parts are sent
during updates phase by appending a qualifier to the hash returned
in its 'wanted' list.
- ssync_sender now sends POST subrequests when a meta file
exists and its content needs to be replicated.
- ssync_sender may send *only* a POST if the receiver indicates that
is the only part required to be sync'd.
- object server will allow PUT and DELETE with earlier timestamp than
a POST
- Fixed TODO related to replicated objects with fast-POST and ssync
Related spec change-id: I60688efc3df692d3a39557114dca8c5490f7837e
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Closes-Bug: 1501528
Change-Id: I97552d194e5cc342b0a3f4b9800de8aa6b9cb85b
This patch fixes small nits for inline comments for
https://review.openstack.org/#/c/211338
as a follow-up patch, plus some other typos in comments.
Change-Id: Ibf7dc5683b39d6662573dbb036da146174a965fd
There are a few places in the PUT path where the object server is
reading WSGI input and can find that there's nothing there. e.g. in the
middle of a 2 phase commit and the proxy goes away for whatever reason,
like maybe it timed out because things are really busy. Anyway, this
results in the ugly ValueError coming out of eventlet.wsgi about a
zillion levels away from the PUT path.
Expanding on the test cases from lp bug #1496205 and lp bug #1469094
this change carefully narrows into our read/readline calls to
wsgi_input and makes sure to tranlsate the ValueError to a
ChunkReadError - which the object.server can handle along with
ChunkReadTimeout. When it made sense, this change attempts to stay
consistent throughout the code path in logging/raising client disconnect
instead of timeout.
It's unfortunate the error coming out of eventlet is so generic, but
that will be improved in future versions [1].
1. c3ce3eef0b
Related-Bug: #1469094
Related-Bug: #1496205
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: I9e4dbf26623c0c6fc5c87afd14349466aa157385
This patch add a test to figure out the failure case behavior of
object-server when the connection from proxy-server disconnected
during commit phase. Especially, this patch was made to focus on
making sure whether or not contaienr updates occurs in the situation.
In the process of working on that test we made the behavior of the
object-server when the connection from the proxy-server disconnected
during the commit phase - reasonable.
We capture the IOError/ValueError's that eventlet.wsgi might barf out
really close to the wsgi_input read and translate them to a
swift.common.exceptions.ChunkReadError so we can handle them at a higher
level in the ObjectController's generic PUT disconnect handling.
Since that test went so well, we refactored the other ones to use some
common context management and wrote a few more.
Co-Author: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: I60c98172e524869b06bdf23fd1c4e1bce7a98f80
There is duplicate 'X-Backend-Storage-Policy-Index' dictionary key in unit.obj.test_server.py.
One key has fixed policy index value, and another has random value.
Unittest should done with random policy index, so remove key which is set fixed value.
Change-Id: Ic91fcf44d48297d0feee33c928ca682def9790a3
When using fast-post and POST (i.e. metadata update) is requested to
a SLO manifest files, current Swift drops the 'X-Static-Large-Object'
header from the existing metadata. It results in breaking the SLO
state because the manifest missing the 'X-Static-Large-Object' metadata
will be maintained as a normal files.
This patch fixes object-server to keep the existing
'X-Static-Large-Object' flag and then keep the SLO state.
Change-Id: Ib1eb569071372c322dd105c52baeeb094003291e
Closes-bug: #1453807
Fix pep8 warnings of the E category of hacking 0.10:
* E113: unexpected indentation
* E121: continuation line under-indented for hanging indent
* E122: continuation line missing indentation or outdented
* E123: closing bracket does not match indentation of opening bracket's
line
* E126: continuation line over-indented for hanging indent
* E251: unexpected spaces around keyword / parameter equals
Change-Id: I0b24eebdf1a37dc1b572b6c9a3d3d4832d050237
If we're going to have a subclass of BytesIO, having "StringIO" in its
name is just asking for confusion.
Change-Id: I695ab3105b1a02eb158dcf0399ae91888bc1c0ac
ssync rx sends a header X-Backend-Replication-Headers whose value is a
list of headers that the source object has. This list extends the list
of allowed headers on the target object server, so that the target
object metadata is faithfully reconstructed to match the source.
Unfortunately the combination of lower() and title() operations on
header keys results in the source 'ETag' value being added to the target
metadata under the key 'Etag' in addition to the 'ETag' key that the
receiving server adds (note different capitilization), both having
the same value.
The spurious 'Etag' metadata is potentially confusing for humans
inspecting the object metadata and complicates tests that wish to
assert the equality of two object metadata dicts. See for example the
test in test_ssync_sender.py that this patch cleans up.
Furthermore, the possibility of having both Etag and ETag keys has
required a workaround in the EC reconstructor [1].
[1] reconstructor fix change id: Ie59ad93a67a7f439c9a84cd9cff31540f97f334a
Change-Id: I0c89cf7924a4471bb6d268b5ef3884e2d2cb4286
The actual server-side changes are simple. The tests are a different
matter. Many changes were needed to the object server tests to
handle the now-async calls to the container server. In an effort to
test this properly, some drive-by changes were made to improve tests.
I tested this patch by doing zero-byte object writes to one container
as fast as possible. Then I did it again while also saturating 2 of the
container replica's disks. The results are linked below.
https://gist.github.com/notmyname/2bb85acfd8fbc7fc312a
DocImpact
Change-Id: I737bd0af3f124a4ce3e0862a155e97c1f0ac3e52
The assert_() method is deprecated and can be safely replaced by assertTrue().
This patch makes sure that running the tests does not create undesired
warnings.
Change-Id: I0602ba39ef93263386644ee68088d5f65fcb4a71
wsgi.input is a binary stream (bytes), not a text stream (unicode).
* Replace StringIO with BytesIO for WSGI input
* Replace StringIO('') with StringIO() and replace WsgiStringIO('') with
WsgiStringIO(): an empty string is already the default value
Change-Id: I09c9527be2265a6847189aeeb74a17261ddc781a
* replace "from cStringIO import StringIO"
with "from six.moves import cStringIO as StringIO"
* replace "from StringIO import StringIO"
with "from six import StringIO"
* replace "import cStringIO" and "cStringIO.StringIO()"
with "from six import moves" and "moves.cStringIO()"
* replace "import StringIO" and "StringIO.StringIO()"
with "import six" and "six.StringIO()"
This patch was generated by the stringio operation of the sixer tool:
https://pypi.python.org/pypi/sixer
Change-Id: Iacba77fec3045f96773d1090c0bd48613729a561
If the test ran across a one second boundary it would fail because while
the timestamp normalization was doing some rounding it was making no
attempt to reuse the same timestamp on subsequent requests.
Change-Id: Ic560032bcfacd6f0d10cfc0f4f10e5d6c2bc8dd5
The iteritems() of Python 2 dictionaries has been renamed to items() on
Python 3. According to a discussion on the openstack-dev mailing list,
the overhead of creating a temporary list using dict.items() on Python 2
is very low because most dictionaries are small:
http://lists.openstack.org/pipermail/openstack-dev/2015-June/066391.html
Patch generated by the following command:
sed -i 's,iteritems,items,g' \
$(find swift -name "*.py") \
$(find test -name "*.py")
Change-Id: I6070bb6c684be76e8e77222a7d280ec6edd43496
Previously we sent the ssync backend frag index based on the node
index. We need to be more specific for ssync to handle both sync
and revert cases so now we send the frag index based on the job
contents (as determined by the ec recon)) and the node index
as a new header based on, well, the node index.
The rcvr can now validate the incoming pair to reject (400) when
a primary node is being asked to accept fragments that don't
belong to it. Additionally, by having the frag index the
rcvr can reject (409) an attempt to accept a fragment when its
a handoff and already has one that needs to be reverted.
Fixes-bug: #1452619
Change-Id: I8287b274bbbd00903c1975fe49375590af697be4
The Python 2 next() method of iterators was renamed to __next__() on
Python 3. Use the builtin next() function instead which works on Python
2 and Python 3.
Change-Id: Ic948bc574b58f1d28c5c58e3985906dee17fa51d
An operation that removes an existing .ts or .meta out from under another
concurrent operation at the right point can cause the whole object to be
needlessly quarantined.
Closes-Bug: #1451520
Change-Id: I37d660199e54411d0610889f9ee230b13747244b
The ssync Receiver performs some checks on request parameters
in initialize_request() before starting the exchange of missing
hashes and updates e.g. the destination device must be available;
the policy must be valid. Currently if any of these checks fails
then the receiver just closes the connection, so the Sender gets
no useful response code and noise is generated in logs by httplib
and wsgi Exceptions.
This change moves the request parameter checks to the Receiver
constructor so that the HTTPExceptions raised are actually sent
as responses. (The 'connection close' exception handling still
applies once the 'missing_check' and 'updates' handshakes are in
progress.)
Moving initialize_request() revealed the following lurking bug:
* initialize_request() sets
req.environ['eventlet.minimum_write_chunk_size'] = 0
* this was previously ineffective because the Response environ
had already been copied from Request environ before this value
was set, so the Response never used the value :/
* Now that it is effective (a good thing) it causes the empty string
yielded by the receiver when there are no missing hashes in
missing_checks() to be sent to the sender immediately. This makes
the Sender.readline() think there has been an early disconnect
and raise an Exception (a bad thing), as revealed by
test/unit/obj/test_ssync_sender.py:TestSsync.test_nothing_to_sync
The fix for this is to simply make the receiver skip sending the empty
string if there are no missing object_hashes.
Change-Id: I036a6919fead6e970505dccbb0da7bfbdf8cecc3
This patch adds the erasure code reconstructor. It follows the
design of the replicator but:
- There is no notion of update() or update_deleted().
- There is a single job processor
- Jobs are processed partition by partition.
- At the end of processing a rebalanced or handoff partition, the
reconstructor will remove successfully reverted objects if any.
And various ssync changes such as the addition of reconstruct_fa()
function called from ssync_sender which performs the actual
reconstruction while sending the object to the receiver
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
blueprint ec-reconstructor
Change-Id: I7d15620dc66ee646b223bb9fff700796cd6bef51
This lets the proxy server send object metadata to the object server
after the object data. This is necessary for EC, as it allows us to
compute the etag of the object in the proxy server and still store it
with the object.
The wire format is a multipart MIME document. For sanity during a
rolling upgrade, the multipart MIME document is only sent to the
object server if it indicates, via 100 Continue header, that it knows
how to consume it.
Example 1 (new proxy, new obj server):
proxy: PUT /p/a/c/o
X-Backend-Obj-Metadata-Footer: yes
obj: 100 Continue
X-Obj-Metadata-Footer: yes
proxy: --MIMEmimeMIMEmime...
Example2: (new proxy, old obj server)
proxy: PUT /p/a/c/o
X-Backend-Obj-Metadata-Footer: yes
obj: 100 Continue
proxy: <obj body>
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: Id38f7e93e3473f19ff88123ae0501000ed9b2e89
Adds specific disk file classes for EC policy types.
The new ECDiskFile and ECDiskFileWriter classes are used by the
ECDiskFileManager.
ECDiskFileManager is registered with the DiskFileRouter for use with
EC_POLICY type policies.
Refactors diskfile tests into BaseDiskFileMixin and BaseDiskFileManagerMixin
classes which are then extended in subclasses for the legacy
replication-type DiskFile* and ECDiskFile* classes.
Refactor to prefer use of a policy instance reference over a policy_index
int to refer to a policy.
Add additional verification to DiskFileManager.get_dev_path to validate the
device root with common.constraints.check_dir, even when mount_check is
disabled for use in on a virtual swift-all-in-one.
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: I22f915160dc67a9e18f4738c1ddf068344e8ad5d
* Get FakeConn ready for expect 100 continue
* Use debug_logger more and with better interfaces
* Fix patch_policies to be less annoying
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: I28c0a3539d994cbb8e6b94d63a23ed4ea6cb956d
This patch extends the StoragePolicy class for non-replication storage
policies, the first one being "erasure coding".
Changes:
- Add 'policy_type' support to BaseStoragePolicy class
- Disallow direct instantiation of BaseStoragePolicy class
- Subclass BaseStoragePolicy
- "StoragePolicy":
. Replication policy, default
. policy_type = 'replication'
- "ECStoragePolicy":
. Erasure Coding policy
. policy_type = 'erasure_coding'
. Private member variables
ec_type (EC backend),
ec_num_data_fragments (number of fragments original
data split into after erasure coding operation),
ec_num_parity_fragments (number of parity fragments
generated during erasure coding)
. Private methods
EC specific attributes and ring validator methods.
- Swift will use PyECLib, a Python Erasure Coding library, for
erasure coding operations. PyECLib is already an approved
OpenStack core requirement.
(https://bitbucket.org/kmgreen2/pyeclib/)
- Add test cases for
- 'policy_type' StoragePolicy member
- policy_type == 'erasure_coding'
DocImpact
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: Ie0e09796e3ec45d3e656fb7540d0e5a5709b8386
Implements: blueprint ec-proxy-work
mkstemp() can fail with ENOSPC when filesystem runs out of inodes.
And fallocate() used to raise DiskFileNoSpace for all OSErrors.
Change-Id: I8c95cb710107d8e481d068b00eda53dd805c00a5
Signed-off-by: Prashanth Pai <ppai@redhat.com>
Crypto middleware needs to arrange for alternative
values of etag and content-type to be sent to container
servers with updates, since these will be encrypted with
a different key than the etag and content-type stored on
the object server.
Erasure coding apparently needs a similar capability.
This patch modifies the object server to overwrite the etag
and content-type values in the container update headers with
values that may optionally be specified by middleware in
X-Backend-Container-Update-Override-* headers.
Using the X-Backend- prefix ensures that these headers
cannot be sent or seen by clients.
A new probe test verifies the propagation of override
values from an internal client through the proxy, to
object server, to container server and then returned
in a container listing.
Change-Id: I7d846ed54ff173d08c66c6d5b0ecf7dff27f5a87
To make it easier for Swift operators to specify problematic devices,
a policy index will be recorded in log files of proxy and storage servers
for each user request which is related to storage policy.
This patch simply adds 'storage_policy_index' field in a log format.
If there is no specified policy index, '-' is output in this field.
Extra fix: Doc about the log line of storage nodes now properly reflects
'server_pid' field.
DocImpact
Change-Id: I7286ae85bcbcec73b5377dc115cbdb0f57d1b025
Implements: blueprint logging-policy-number
Many times new deployers get mysterious errors after first setting up their
Swift clusters. Most of the time, the errors are because the values in the ring
are incorrect (e.g. a bad port number). OPTIONS will be used in a ring checker
(which is WIP) that validates values in the ring.
This patch includes OPTIONS for storage nodes and respective tests.
Change-Id: Ia0033756d070bef11d921180e8d32a1ab2b88acf