If we move it to constraints it's more globally accessible in our code,
but more importantly it's more obvious to ops that everything breaks if
you try to mis-configure different values per-service.
Change-Id: Ib8f7d08bc48da12be5671abe91a17ae2b49ecfee
Adds a tool, swift-container-deleter, that takes an account/container
and optional prefix, marker, and/or end-marker; spins up an internal
client; makes listing requests against the container; and pushes the
found objects into the object-expirer queue with a special
application/async-deleted content-type.
In order to do this enqueuing efficiently, a new internal-to-the-cluster
container method is introduced: UPDATE. It takes a JSON list of object
entries and runs them through merge_items.
The object-expirer is updated to look for work items with this
content-type and skip the X-If-Deleted-At check that it would normally
do.
Note that the target-container's listing will continue to show the
objects until data is actually deleted, bypassing some of the concerns
raised in the related change about clearing out a container entirely and
then deleting it.
Change-Id: Ia13ee5da3d1b5c536eccaadc7a6fdcd997374443
Related-Change: I50e403dee75585fc1ff2bb385d6b2d2f13653cf8
To simplify unit tests for object-expirer, this patch unifies
expirer's task queue for each unit tests.
In this patch, following changes are applied:
1: Unify expirer's task queue
2: Remove redundant log checking, because some tests checks
logs in their speciality
3: Use mocked methods instead of dummy methods which
raises specialized message exception
Change-Id: I839f0bb43eb827384727356877e33a6be7d9b81d
To prepare for implement general task queue mode to expirer,
this patch splits expirer's method into smaller ones and parametrize task
account. This change will make expirer's general task queue patch [1] more
simple.
This patch has following approaches:
1: Split methods into smaller ones
2: Parameterize task account name to adapt many task accounts
in general task queue
3: Include task account names in log messages
4: Skip task account when the account has no task containers
[1]: https://review.openstack.org/#/c/517389/
Change-Id: I907612f7c258495e9ccc53c1d57de4791b3e7ab7
Object-expirer's task name should be in format of
"<timestamp>-<account>/<container>/<obj>". In object-expirer
implementation, ValueError is catched and handled when expirer's task
objects have invalid name. But in actual swift cluster, invalid task
object name is not created because task object is created by
object-server.
However, without the ValueError catching, some unit tests fail,
because the unit tests create invalid task object names.
This patch fixes invalid task object names in unit tests. The
ValueError catch is remained for unexpected errors, but in the case
the task will be skipped.
This patch will help to refactor expirer's task object parsing.
Change-Id: I8fab8fd180481ce9e97c945904c5c89eec037110
In particular, test that each work item is only done *once*.
Change-Id: I9cc610bffb2aa9a2f2b05f4c49e574ab56d05201
Related-Change: Ic0075a3718face8c509ed0524b63d9171f5b7d7a
In test_expirer.TestObjectExpirer.test_process_based_concurrency,
an assertion checks that expirer execute tasks in round-robin order
for target containers. But the assertion depends on task object path,
because task assignation for each process depends on md5 of task
object path. The dependency makes the assetion confusing.
Now, we have test_expirer.TestObjectExpirer.test_round_robin_order which
is added in [1]. So this patch remove the confusing assertion.
This patch will help to refactor expirer's task object parsing.
I will push patch for the refactoring after this patch.
[1]: https://review.openstack.org/#/c/538171
Change-Id: Ic0075a3718face8c509ed0524b63d9171f5b7d7a
Object-expirer changes order of expiration tasks to avoid deleting
objects in a certain container continuously.
To make review for expirer's task queue update patch [1] easy,
this patch refactors the implementation of the order change. In this
patch, the order change is divided as a function.
In [1], there will be two implementations for legacy task queue
and for general task queue. The two implementations have similar
codes. This patch helps to avoid copying codes in the two implementations.
Other than dividing function, this patch tries to resolve:
- Separate container iteration and object iteration to avoid the generator
termination with (container, None) tuple.
- Using Timestamp class for delete_timestamp to be consist with other modules
- Change yielded delete task object info from tuple to dict because that
includes several complex info (e.g. task_container, task_object,
and target_path)
- Fix minor docs and tests depends on the changes above
[1]: https://review.openstack.org/#/c/517389
Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Change-Id: Ibf61eb1f767a48cb457dd494e1f7c12acfd205de
In expirer's unit tests, FakeInternalClient instances simulates
expirer's task queue behavior. But get_account_info method of
the FakeInternalClient returns container count = 1 and object
count = 2, even if it simulate different count of containers or
objects.
This patch fixes the behavior. The return values of get_account_info
will be equal to simulated container and object counts.
This patch will make review for expirer's task queue upgrade patch [1]
more easy.
[1]: https://review.openstack.org/#/c/517389
Change-Id: Id5339ea7e10e4577ff22daeb91ec90f08704c98d
In expirer's unit tests, fake InternalClient classes are defined
and its instance simulates expirer's task queue behaviors.
To make review for expirer's task queue update patch [1] easy,
this patch refactors the implementation of the fake InternalClient
classes. In this patch, unit tests are refactored by the following
two approaches:
#1: Summarizing duplicated fake InternalClient implementation
#2: Make task account name variable
The #2 approach is for multiple task accounts in [1].
The patch [1] will be rebased after this patch merged.
[1]: https://review.openstack.org/#/c/517389
Change-Id: I10a7151cfdd43460ad38c47f672d3c31b77e7990
Previously, if the expirer had a stale work item (because the object
was overwritten or deleted, or some other process handled the delete),
then it would keep retrying for reclaim_age, but every time it'd get
back a 412.
Now, have the object-server be smart enough to say, "I have more recent
information than you" and let the expirer accept that as success.
Change-Id: I0a94482ed16cb30ce79074e053e6177fe97bcaa9
This boils down to 404, 412, or 416; or 409 when we provided an
X-Timestamp.
This means, among other things, that the expirer won't issue 3 DELETEs
every cycle for every stale work item.
Related-Change: Icd63c80c73f864d2561e745c3154fbfda02bd0cc
Change-Id: Ie5f2d3824e040bbc76d511a54d1316c4c2503732
After deleting an object, the object expirer deletes the corresponding
row from the expirer queue by making DELETE requests directly to the
container servers. The same thing happens after attempting to delete
an object, but failing because the object has already been deleted. If
the DELETE requests fail, then the expirer will encounter that row
again on its next pass and retry the DELETE at that time. Therefore,
it is not necessary for the object server to write an async_pending
for that queue row's deletion.
Currently, however, two of the object servers do write such
async_pendings. Given Rc container replicas, that's 2 * Rc updates
from async_pendings and another Rc from the object expirer
directly. Given a typical Rc of 3, that's 9 container updates per
expiring object.
This commit makes the object server write no async_pendings for DELETE
requests coming from the object expirer. This reduces the number of
container server requests to Rc (typically 3), all issued directly
from the object expirer.
Closes-Bug: 1076202
Change-Id: Icd63c80c73f864d2561e745c3154fbfda02bd0cc
If you want more information, you need to go check out the *other* node.
Maybe this should be further refined to only log at debug for specific
statuses like 404 and 412?
Partial-Bug: 1688558
Related-Bug: 1455221
Change-Id: Ieefd8841154faba40dcf2a03abc5f056bdccd54f
Improve test_get_process_values_* methods in obj/test_expirer in the
form assertRaises(ValueError, x.get_process_values,{}/vals), to use
the assertRaises context form. This improves understandability by
validateing the error strings in addition to the ValueError.
Related-Change: I3d12b79470d122b2114f9ee486b15d381f290f95
Change-Id: I1c66b8894cba8328d19cf99491a8ad18ded71078
This is a follow up from a change that improved the error message.
Related-Change: I3d12b79470d122b2114f9ee486b15d381f290f95
Change-Id: I093801f3516a60b298c13e2aa026c11c68a63792
Currently, the expirer daemon treats 412 (precondition failed)
as successful DELETEs.
On the other hand, it treats 404 as failed while reclaim_age
(usually a week) has not passed.
This patch unifies both cases to the same handling: waiting for
reclaim_age to pass, then deleting the entry.
The reason the expirer should not delete a 412 entry right away,
is that it might be the case that 412 is returned because of
a split brain, where the updated object servers are currently down.
Same reason holds for a 404 response.
Change-Id: Icabbdd72746a211b68f266a49231881f0f4ace94
As reported at bug/1546067, expirer might accidentally deletes an object
which is created after x-delete-at timestamp. This is because expirer
sends a request with "X-Timestamp: <current_timestamp>" and tombstone
is named as <requested_x_timestamp>.ts so if object creation time is
between x-delete-at and expirer's DELETE request x-timestamp, the object
might be hidden by tombstone.
This possibility can be simply removed if the value of x-timestamp which
an expirer sends is the same timestamp as x-delete-at of an actual object.
Namely, expirer pretends to delete an object at the time an user really
wants to delete it.
Change-Id: I53e343f4e73b0b1c4ced9a3bc054541473d26cf8
Closes-Bug: #1546067
The urllib, urllib2 and urlparse modules of Python 2 were reorganized
into a new urllib namespace on Python 3. Replace urllib, urllib2 and
urlparse imports with six.moves.urllib to make the modified code
compatible with Python 2 and Python 3.
The initial patch was generated by the urllib operation of the sixer
tool on: bin/* swift/ test/.
Change-Id: I61a8c7fb7972eabc7da8dad3b3d34bceee5c5d93
The unicode type was renamed to str in Python 3. Use six.text_type to
make the modified code compatible with Python 2 and Python 3.
The initial patch was generated by the unicode operation of the sixer
tool on: bin/* swift/ test/.
Change-Id: I9e13748ccde36ee8110756202d55d3ae945d4860
* Get FakeConn ready for expect 100 continue
* Use debug_logger more and with better interfaces
* Fix patch_policies to be less annoying
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: I28c0a3539d994cbb8e6b94d63a23ed4ea6cb956d
Looks like I wasn't careful enough last time i fixed this bug- this
is basically the same deal, just further down the line. What happens
without this bug fix is you upload a object with a unicode name and
a x-delete-after header and when it does expire the customer object
gets removed but the expirer marker object doesn't.
Change-Id: I82359c87a0919d71693e49ecf08e0f1eedc9d18e
This patch fixes the unit tests to remove the temporary directories
created during run of unit tests. Some of unit tests did not tear down
correctly, whatever it had set it up for running. This would over period
of time bloat up the tmp directory. As on date, there were around 49 tmp
directories left uncleared per round of unit tests. This patch fixes it.
Change-Id: If591375ca9cc87d52c7c9c6dc16c9fb4b49e99fc
When the expirer tries to delete customer objects, if it just walks through the
containers in order the deamon will tend to send DELETEs to the same container
highly concurrently. This will in turn create a lot of asyncs because of all
the concurrent deletes. If the deletes were spread out to multiple containers
it would improve performance and decrease the number of asyncs made.
Change-Id: I3d08118c197b7f18dd7e880bd5664508934ffd24
If the container names in the expirer's account are returned as
unicode strings (as is the case with some json libraries), the
expirer compared eg u'1' == '1', which is problematic. This patch
ensures that the unicode is coerced to ascii so the comparison
is correct.
Change-Id: I72b322e7513f7da32e8dc75c6bf0e7e016948c88
Currently if the object-expirer goes to delete an object and the primary nodes
are unavailable, or the object is on handoffs - the object servers are unable
to verify the x-if-delete-at timestamp and return 412, without writing a
tombstone or updating the containers. The expirer treats 412 as success and
the dark data is not removed form the object servers nor the object removed in
the listing.
As a side effect of this bug, if the expirer encounters split brain the delete
would never get processed in the correct storage policy.
It seems it's just not correct to treat the lack of data as success. Now the
object server will treat x-if-delete at against a non-existent object as a
404, and to distinguish from a successfull process of an x-if-delete-at
request, will return 204.
The expirer will treat a 404 response from swift as a failure, and will
continue to attempt to expire the object until it is older that it's
configurable reclaim age. However swift will only return 404 if the majority
of nodes are able to return success, or if only even a single node is able to
accept the x-if-delete-at request the containers will get updated and
replicaiton will settle the tombstone - the subsequent x-if-delete-at request
will 412 and be removed from the queue.
It's worth noting that if an object with x-delete-at meta is DELETED (by a
client request) an async update for the expiring update containers will be
processed to remove the queue entry - but if no primary nodes handle the
DELETE request replication will never remove the expiring entry and assuming
it's scheduled for beyond the tombstones reclaim age - the queue entry will
not be processable. In this case the expirer will attempt to DELETE the
object (and get 404s) in vain until the queue entry passes the configurable
reclaim age.
DocImpact
Implements: blueprint storage-policies
Change-Id: I66260e99fda37e97d6d2470971b6f811ee9e01be
One can argue that it makes sense for the client-facing proxy server
to have certain middlewares like gatekeeper in its pipeline, but that
is not desirable for InternalClient. In particular, it prevents you
from passing in sysmeta headers using InternalClient, and I found
myself wanting to do that earlier today.
Now InternalClient's proxy application gets exactly what's configured;
no more, no less. This will mean that the object expirer can read and
write sysmeta headers, but I think we can trust it to keep our
secrets.
Change-Id: I17b4a89c24e600754701ee1645b40406421fa6f3
Remove the useless arg ("start index" = 0) in files, since its default
value is 0, to make code cleaner.
Fixes bug #1259750
Change-Id: I52afac28a3248895bb1c012a5934d39e7c2cc5a9
except x,y: was deprected and is removed in Python 3.x.
Use "except x as y:" instead which works in any Python
version >= 2.6.
Change-Id: I7008c74b807340f3457d3a0c8bd0b83f23169d14
Address all the "hacking" lines that are flagged, and all the modules
that just have one item flagged.
Change-Id: I372a4bdf9c7748f73e38c4fd55e5954f1afade5b
Signed-off-by: Peter Portante <peter.portante@redhat.com>
Two types of parallelism are added:
- concurrency to speed up what a single process does
- a way to run multiple daemons to work on different parts of the work
DocImpact
Change-Id: I48997f68eb2fd8de19a5ee8b9fcdf76dde2ba0ab
These bug fixes are lumped together because they all caused problems
with the object expirer doing its job.
There was a bug with the internal client doing listings that happened
to run across a Unicode object name for use as a marker.
There was a bug with the object expirer not utf8 encoding object
names it got from json listings, causing deletes to fail.
There was a bug with the object expirer url quoting object names when
calling the internal client's make_request, when make_request already
handles that.
Change-Id: I29fdd351fd60c8e63874b44d604c5fdff35169d4
Expand recon middleware to include support for account and container
servers in addition to the existing object servers. Also add support
for retrieving recent information from auditors, replicators, and
updaters. In the case of certain checks (such as container auditors)
the stats returned are only for the most recent path processed.
The middleware has also been refactored and should now also handle
errors better in cases where stats are unavailable.
While new check's have been added the output from pre-existing
check's has not changed. This should allow existing 3rd party
utilities such as the Swift ZenPack to continue to function.
Change-Id: Ib9893a77b9b8a2f03179f2a73639bc4a6e264df7
Documentation, including a list of metrics reported and their semantics,
is in the Admin Guide in a new section, "Reporting Metrics to StatsD".
An optional "metric prefix" may be configured which will be prepended to
every metric name sent to StatsD.
Here is the rationale for doing a deep integration like this versus only
sending metrics to StatsD in middleware. It's the only way to report
some internal activities of Swift in a real-time manner. So to have one
way of reporting to StatsD and one place/style of configuration, even
some things (like, say, timing of PUT requests into the proxy-server)
which could be logged via middleware are consistently logged the same
way (deep integration via the logger delegate methods).
When log_statsd_host is configured, get_logger() injects a
swift.common.utils.StatsdClient object into the logger as
logger.statsd_client. Then a set of delegate methods on LogAdapter
either pass through to the StatsdClient object or become no-ops. This
allows StatsD logging to look like:
self.logger.increment('some.metric.here')
and do the right thing in all cases and with no messy conditional logic.
I wanted to use the pystatsd module for the StatsD client, but the
version on PyPi is lagging the git repo (and is missing both the prefix
functionality and timing_since() method). So I wrote my
swift.common.utils.StatsdClient. The interface is the same as
pystatsd.Client, but the code was written from scratch. It's pretty
simple, and the tests I added cover it. This also frees Swift from an
optional dependency on the pystatsd module, making this feature easier
to enable.
There's test coverage for the new code and all existing tests continue
to pass.
Refactored out _one_audit_pass() method in swift/account/auditor.py and
swift/container/auditor.py.
Fixed some misc. PEP8 violations.
Misc test cleanups and refactorings (particularly the way "fake logging"
is handled).
Change-Id: Ie968a9ae8771f59ee7591e2ae11999c44bfe33b2