Crypto middleware needs to arrange for alternative
values of etag and content-type to be sent to container
servers with updates, since these will be encrypted with
a different key than the etag and content-type stored on
the object server.
Erasure coding apparently needs a similar capability.
This patch modifies the object server to overwrite the etag
and content-type values in the container update headers with
values that may optionally be specified by middleware in
X-Backend-Container-Update-Override-* headers.
Using the X-Backend- prefix ensures that these headers
cannot be sent or seen by clients.
A new probe test verifies the propagation of override
values from an internal client through the proxy, to
object server, to container server and then returned
in a container listing.
Change-Id: I7d846ed54ff173d08c66c6d5b0ecf7dff27f5a87
* move get_to_final_state into ProbeTest
* get rid of kill_servers
* add replicators manager and updaters manager to ProbeTest
(this is all going someplace, i promise)
Change-Id: I8393a2ebc0d04051cae48cc3c49580f70818dbf2
The ssync_sender send_delete method treats its
timestamp argument as a string when in fact it is
passed a Timestamp object. As a result the method
always raises an exception and deletes are never
replicated.
This patch fixes bug and adds unit and probe tests
to verify expected behavior.
Closes-Bug: 1421425
Change-Id: I664fb8d5dfea7362313037a67927ea90021c3f62
* refactor probe tests to use probe.common.ProbeTest
* move reset_environment functionality to ProbeTest.setUp()
* choose rings and policies that meet the criteria - raise SkipTest if
nothing matches
* replace all AssertionErrors in setup with SkipTest
Change-Id: Id56c497d58083f5fd55f5283cdd346840df039d3
Sysmeta included with an object PUT persists with the PUT data - if an
internal operation such as POST-as-copy during partial failure, or ssync
with fast-POST (not supported), causes that data to be lost then the
associated sysmeta will also be lost.
Since object sys-meta persistence in the face of a POST when the
original .data is unavailable requires fast-POST with .meta files the
probetest that validates object sys-meta persistence of a POST when the
most up-to-date copy of the object with sys-meta is unavailable
configures an InternalClient with object_post_as_copy = false.
This non-default configuration option is not supported by ssync and
results in a loss of sys-meta very similar to the object sys-meta
failure you would see with object_post_as_copy = true when the COPY part
of the POST is unable to retrieve the most recently written object with
sys-meta.
Until we can fix the default POST behavior to make metadata updates
without stomping on newer data file timestamps we should expect object
sys-meta to be "very very best possible but not really guaranteed
effort".
Until we can fix ssync to replicate metadata updates without stomping on
newer data file timestamps we should expect this test to fail.
When ssync replication of fast-POST metadata update is fixed this test
will fail signaling that the expected failure cruft should be removed,
but other parts of ssync replication will still work and some other bugs
can be fixed while we wait.
Change-Id: Ifc5d49514de79b78f7715408e0fe0908357771d3
A deprecated policy in swift.conf causes errors in
probe tests that may attempt to use that policy.
This patch introduces a list ENABLED_POLICIES in
test/probe/common.py and changes probe tests to only
use policies contained in that list.
Change-Id: Ie65477c15d631fcfc3a4a5772fbe6d7d171b22b0
This patch takes a first step towards support
for object system metadata by enabling headers
in the x-object-sysmeta- namespace to be
persisted when objects are PUT. This should be
useful for other pending patches such as on
demand migration and server side encryption
(https://review.openstack.org/#/c/64430/ and
https://review.openstack.org/#/c/76578/1).
The x-object-sysmeta- namespace is already
reserved/protected by the gatekeeper and
passed through the proxy. This patch modifies
the object server to persist these headers
alongside user metadata when an object is
PUT.
This patch will preserve existing object
system metadata and ignore any new system
metadata when handling object POSTs,
including POST-as-copy operations. Support
for modification of object system metadata
with a POST request requires further work
as discussed in the blueprint.
This patch will preserve existing object
system metadata and update it with new
system metadata when copying an object.
A new probe test is added which makes use of
the BrainSplitter class that has been moved
from test_container_merge_policy_index.py to
a new module brain.py.
blueprint object-system-metadata
Change-Id: If716bc15730b7322266ebff4ab8dd31e78e4b962
With the two vector timestamp change some resolution was lost in the queue
entries that could lead to the reconciler being unable to successfully remove
a processed item from the queue in pop_queue. To ensure the queue entries
with a significant offset can be successfully removed while still handling
the re-enqueued object case issue the DELETE with the timestamp slightly later
than the maximum of the queue entries last modified time (q_record) and
misplaced objects timestamp (q_ts).
Change-Id: I4726243b3f7c4c1e98f0c578e7ffdecf4ec22199
Replaced throughout code base & tox'd. Functional as well
as probe tests pass with and without policies defined.
POLICY --> 'X-Storage-Policy'
POLICY_INDEX --> 'X-Backend-Storage-Policy-Index'
Change-Id: Iea3d06de80210e9e504e296d4572583d7ffabeac
The normalized form of the X-Timestamp header looks like a float with a fixed
width to ensure stable string sorting - normalized timestamps look like
"1402464677.04188"
To support overwrites of existing data without modifying the original
timestamp but still maintain consistency a second internal offset
vector is append to the normalized timestamp form which compares and
sorts greater than the fixed width float format but less than a newer
timestamp. The internalized format of timestamps looks like
"1402464677.04188_0000000000000000" - the portion after the underscore
is the offset and is a formatted hexadecimal integer.
The internalized form is not exposed to clients in responses from Swift.
Normal client operations will not create a timestamp with an offset.
The Timestamp class in common.utils supports internalized and normalized
formatting of timestamps and also comparison of timestamp values. When the
offset value of a Timestamp is 0 - it's considered insignificant and need not
be represented in the string format; to support backwards compatibility during
a Swift upgrade the internalized and normalized form of a Timestamp with an
insignificant offset are identical. When a timestamp includes an offset it
will always be represented in the internalized form, but is still excluded
from the normalized form. Timestamps with an equivalent timestamp portion
(the float part) will compare and order by their offset. Timestamps with a
greater timestamp portion will always compare and order greater than a
Timestamp with a lesser timestamp regardless of it's offset. String
comparison and ordering is guaranteed for the internalized string format, and
is backwards compatible for normalized timestamps which do not include an
offset.
The reconciler currently uses a offset bump to ensure that objects can move to
the wrong storage policy and be moved back. This use-case is valid because
the content represented by the user-facing timestamp is not modified in way.
Future consumers of the offset vector of timestamps should be mindful of HTTP
semantics of If-Modified and take care to avoid deviation in the response from
the object server without an accompanying change to the user facing timestamp.
DocImpact
Implements: blueprint storage-policies
Change-Id: Id85c960b126ec919a481dc62469bf172b7fb8549
Currently if the object-expirer goes to delete an object and the primary nodes
are unavailable, or the object is on handoffs - the object servers are unable
to verify the x-if-delete-at timestamp and return 412, without writing a
tombstone or updating the containers. The expirer treats 412 as success and
the dark data is not removed form the object servers nor the object removed in
the listing.
As a side effect of this bug, if the expirer encounters split brain the delete
would never get processed in the correct storage policy.
It seems it's just not correct to treat the lack of data as success. Now the
object server will treat x-if-delete at against a non-existent object as a
404, and to distinguish from a successfull process of an x-if-delete-at
request, will return 204.
The expirer will treat a 404 response from swift as a failure, and will
continue to attempt to expire the object until it is older that it's
configurable reclaim age. However swift will only return 404 if the majority
of nodes are able to return success, or if only even a single node is able to
accept the x-if-delete-at request the containers will get updated and
replicaiton will settle the tombstone - the subsequent x-if-delete-at request
will 412 and be removed from the queue.
It's worth noting that if an object with x-delete-at meta is DELETED (by a
client request) an async update for the expiring update containers will be
processed to remove the queue entry - but if no primary nodes handle the
DELETE request replication will never remove the expiring entry and assuming
it's scheduled for beyond the tombstones reclaim age - the queue entry will
not be processable. In this case the expirer will attempt to DELETE the
object (and get 404s) in vain until the queue entry passes the configurable
reclaim age.
DocImpact
Implements: blueprint storage-policies
Change-Id: I66260e99fda37e97d6d2470971b6f811ee9e01be
Extract X-Storage-Policy-Index header from container listing request
and use it when making direct object DELETE requests.
DocImpact
Implements: blueprint storage-policies
Change-Id: Icd4b2611b4169e46f216ff9a9839af732971a2bf
Have container sync get its object ring from POLICIES now,
update tests to use policy index from container_info and pass
that along for use in ring selection.
This change also introduced the option of specifiying in the cluster info
which of the relam/cluster's is the current realm/cluster.
DocImpact
Implements: blueprint storage-policies
Change-Id: If57d3b0ff8c395f21c81fda76458bc34fcb23257
Add headers param to direct_client.direct_get_object, which is used in
probetests to passthrough the X-Storage-Policy-Index header.
DocImpact
Implements: blueprint storage-policies
Change-Id: I19adbbcefbc086c8467bd904a275d55cde596412
After a container database is replicated, a _post_replicate_hook will enqueue
misplaced objects for the container-reconciler into the .misplaced_objects
containers. Items to be reconciled are "batch loaded" into the reconciler
queue and the end of a container replication cycle by levering container
replication itself.
DocImpact
Implements: blueprint storage-policies
Change-Id: I3627efcdea75403586dffee46537a60add08bfda
Keep status_changed_at in container databases current with status changes that
occur as a result of container creation, deletion, or re-creation.
Merge container put/delete/created timestamps when handling replicate
responses from remote servers in addition to during the handling of the
REPLICATE request.
When storage policies are configured on a cluster send status_changed_at,
object_count and storage_policy_index as part of container replication sync
args.
Use status_changed_at during replication to determine the oldest active
container and merge storage_policy_index.
DocImpact
Implements: blueprint storage-policies
Change-Id: Ib9a0dd42c271145e641437dc04d0ebea1e11fc47
You can manually setup a split brain scenario for reconciler testing with the
enqueue script using the machinery from the included probetest. Evoke the
test as a script with with 'split-brain' command for more help.
DocImpact
Implements: blueprint storage-policies
Change-Id: I3a7b3167d674eba5f6e4072b176f6c4d29cdcd72
See comments from: https://review.openstack.org/55991
Change-Id: Ibb4153702b3dc4c60f66abb11cd3fa1953449827
Signed-off-by: Peter Portante <peter.portante@redhat.com>
Fix for a probe test that failed every once in a
while due to the early-majority change previously
committed. Sometimes a write would return success
before the third node had succeeded and the probe
test would look for on-disk evidence and fail,
when it would've been fine had it waited just a
bit longer for the third node to complete.
Since there's no real way for the probe test to
know when all three nodes are done, I just made
it retry once a second for several seconds before
reporting an error.
There may be more tests like this we'll have to
fix as we run across them.
Change-Id: I749e43d4580a7c726a9a8648f71bafefa70a05f5
As it happens, diskfile.read_metadata() and diskfile.write_metadata()
can take either an open file or a filename as their first arguments
(since xattr.[get|set]xattr() can), so we can clean up a couple places
where we were opening a file just to call read_metadata() or
write_metadata() on it. This results in 2 fewer system calls.
Example strace output:
/* read_metadata(filename) */
getxattr("/mnt/sdb1/1/node/sdb1/afile", "user.some.key", 0x0, 0) = 10
getxattr("/mnt/sdb1/1/node/sdb1/afile", "user.some.key", "some-value", 10) = 10
/* fp = open(filename); read_metadata(fp) */
open("/mnt/sdb1/1/node/sdb1/afile", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
fgetxattr(4, "user.some.key", 0x0, 0) = 10
fgetxattr(4, "user.some.key", "some-value", 10) = 10
Change-Id: I321d8663b9e9e47b8f3ee6c21a1b65b408bb80e6
This reverts commit 7760f41c3ce436cb23b4b8425db3749a3da33d32
Change-Id: I95e57a2563784a8cd5e995cc826afeac0eadbe62
Signed-off-by: Peter Portante <peter.portante@redhat.com>
except x,y: was deprected and is removed in Python 3.x.
Use "except x as y:" instead which works in any Python
version >= 2.6.
Change-Id: I7008c74b807340f3457d3a0c8bd0b83f23169d14
Currently probetests take advantage of a number of assumptions about the SUT.
Unfortunately after some time with a working SAIO, configuration drift may
result in a system that is no longer compatible with these assumptions. To
help weary developers more quickly identify the changes they've made since
they last ran probetests successfully, some handy validators have been added
to test.probe.common
Additionally a new option 'validate_rsync' in test.conf, when enabled, will
run a series of up front validations during the setup of each probetest by
inspecting the ring, the mounted devices, and the rsync exports ("modules") in
order to ensure that when probetests fail the do so early and with specific
complaints.
To preserve existing failures, the option is disabled by default.
Change-Id: I2be11c7e67ccd0bc0589c360c170049b6288c152
* new module swift.obj.diskfile
I parameterized two constants from obj.server into the DiskFile's __init__
* DATADIR -> obj_dir
* DISALLOWED_HEADERS -> disallowed_metadata_keys
I'm not sure if this is the right long term abstraction but for now it avoids
circular imports.
Change-Id: I3962202c07c4b2fbfc26f9776c8a5c96292ae199
If account autocreation is on and the proxy receives a GET request for
a nonexistent account, it'll fake up a response that makes it look as
if the account exists, but without reifying that account into sqlite
DB files.
That faked-out response was just fine as long as you wanted a
text/plain response, but it didn't handle the case of format=json or
format=xml; in those cases, the response would still be
text/plain. This can break clients, and certainly causes crashes in
swift3. Now, those responses match as closely as possible.
The code for generating an account-listing response has been pulled
into (the new) swift.account.utils module, and both the fake response
and the real response use it, thus ensuring that they can't
accidentally diverge. There's also a new probe test for that
non-divergence.
Also, cleaned up a redundant matching of the Accept header in the code
for generating the account listing.
Note that some of the added tests here pass with or without this code
change; they were added because the code I was changing (parts of the
real account GET) wasn't covered by tests.
Bug 1183169
Change-Id: I2a3b8e5d9053e4d0280a320f31efa7c90c94bb06
* Fixed issue with running probetests with the latest update
of python-swiftclient that removed eventlet
* Fixed issue with replication server tests to not require hard
coded paths
Change-Id: Ibbf727ae99c0f3893ae58e270e2f879a1f618e49
Support separate replication ip address:
- Added new function in utils. This function provides ability
to select separate IP address for replication service.
- Db_replicator and object replicators were changed.
Replication process uses new function now.
Replication network parameters:
- Replication network fields (replication_ip, replication_port)
support was added to device dictionary in swift-ring-builder script.
- Changes were made to support new fields in search, show and set_info
functions.
Implementation of replication servers:
- Separate replication servers use the same code as normal replication
servers, but with replication_server parameter = True. When using a
separate replication network, the non-replication servers set
replication_server = False. When there is no separate replication
network (the default case), replication_server is not included in the config.
DocImpact
Change-Id: Ie9af5bdcdf9241c355e36053ca4adfe49dc35bd0
Implements: blueprint dedicated-replication-network
If mount_check is true (ie an SAIO with "real "devices, not loopback),
then the servers will correctly return 507 when given a nonsense path.
The first element is treated as a drive path, and that path isn't
mounted. This patch adds 507 as a valid status response to the server
check.
Change-Id: I1d1bb0ab78fd9ea17323635da7e686182fbdbf13
Currently the timeout for a wsgi server successfully binding to a port
and for a probetest background service to finish starting are hard coded
to 30 seconds. While a reasonable default for most configurations, a
small virtualized environment may need a little more time in order for
probe tests to complete successfully.
This patch adds a 'bind_timeout' option to the DEFAULT section of the
main wsgi servers' config. Also a new [probe_test] section and
'check_server_timeout' option to test.conf
DocImpact
Change-Id: Ibcaff153c7633bbf32e460fd9dbf04932eddb56f
This change makes the dots prettier during probetests
When calling the resetswift script, the probetests will use subprocess
to redirect stderr to stdout and capture stdout into a buffer. We print
the captured buffer from resetswift's combined stdout/stderr and let
nosetests stdout capturing handle printing the output for debug only if a
test fails.
Change-Id: I022512f2ef5a4c43b0e49264bad1bca98c1f0299
This patch merely fixes a selection of files to the point where
pep8 1.3.3 is happy. Most of the errors are indentation related to
continued lines (E126, E127, E128), bracket positions (E124) and the
use of backslash (E502).
Patch 2 fixes David's comments regarding backslash and an odd comment
- thanks David!
Change-Id: I4fbd77ecf5395743cb96acb95fa946c322c16560