DiskFile already fills in the _ondisk_info attribute when it tries to open
a diskfile - even if the DiskFile's fileset is not valid or deleted.
During this process the rsync tempfiles would be discovered and logged,
but no-one would attempt to clean them up - even if they were really old.
Instead of logging and ignoring unexpected files when validate a DiskFile
fileset we'll add unexpected files to the unexpected key in the
_ondisk_info attribute.
With a little bit of re-organization in the auditor's object_audit method
to get things into a single return path we can add an unconditional check
for unexpected files and remove those that are "old enough".
Since the replicator will kill any rsync processes that are running longer
than the configured rsync_timeout we know that any rsync tempfiles older
than this can be deleted.
Split unlink_older_than in common.utils into two functions to allow an
explicit list of previously discovered paths to be passed in to avoid an
extra listdir. Since the getmtime handling already ignores OSError
there's less concern of race condition where a previous discovered
unexpected file is reaped by rsync while we're attempting to clean it up.
Update some doc on the new config option.
Closes-Bug: #1554005
Change-Id: Id67681cb77f605e3491b8afcb9c69d769e154283
This change adds 2 new parameters to enable and control concurrent GETs
in swift, these are 'concurrent_gets' and 'concurrency_timeout'.
'concurrent_gets' allows you to turn on or off concurrent GETs, when
on it will set the GET/HEAD concurrency to replica count. And in the
case of EC HEADs it will set it to ndata.
The proxy will then serve only the first valid source to respond.
This applies to all account, container and object GETs except
for EC. For EC only HEAD requests are effected.
It achieves this by changing the request sending mechanism to using
GreenAsyncPile and green threads with a time out between each
request.
'concurrency_timeout' is related to concurrent_gets. And is the
amount of time to wait before firing the next thread. A value of 0
will fire at the same time (fully concurrent), setting another value
will stagger the firing allowing you the ability to give a node a
shorter chance to respond before firing the next. This value is a float
and should be somewhere between 0 and node_timeout. The default is
conn_timeout. Meaning by default it will stagger the firing.
DocImpact
Implements: blueprint concurrent-reads
Change-Id: I789d39472ec48b22415ff9d9821b1eefab7da867
There was a function in swift.common.utils that was importing
swob.HeaderKeyDict at call time. It couldn't import it at compilation
time since utils can't import from swob or else it blows up with a
circular import error.
This commit just moves HeaderKeyDict into swift.common.header_key_dict
so that we can remove the inline import.
Change-Id: I656fde8cc2e125327c26c589cf1045cb81ffc7e5
This patch makes a number of changes to enable content-type
metadata to be updated when using the fast-POST mode of
operation, as proposed in the associated spec [1].
* the object server and diskfile are modified to allow
content-type to be updated by a POST and the updated value
to be stored in .meta files.
* the object server accepts PUTs and DELETEs with older
timestamps than existing .meta files. This is to be
consistent with replication that will leave a later .meta
file in place when replicating a .data file.
* the diskfile interface is modified to provide accessor
methods for the content-type and its timestamp.
* the naming of .meta files is modified to encode two
timestamps when the .meta file contains a content-type value
that was set prior to the latest metadata update; this
enables consistency to be achieved when rsync is used for
replication.
* ssync is modified to sync meta files when content-type
differs between local and remote copies of objects.
* the object server issues container updates when handling
POST requests, notifying the container server of the current
immutable metadata (etag, size, hash, swift_bytes),
content-type with their respective timestamps, and the
mutable metadata timestamp.
* the container server maintains the most recently reported
values for immutable metadata, content-type and mutable
metadata, each with their respective timestamps, in a single
db row.
* new probe tests verify that replication achieves eventual
consistency of containers and objects after discrete updates
to content-type and mutable metadata, and that container-sync
sync's objects after fast-post updates.
[1] spec change-id: I60688efc3df692d3a39557114dca8c5490f7837e
Change-Id: Ia597cd460bb5fd40aa92e886e3e18a7542603d01
Proxy-server now requires Content-Length in the response header
when getting object and does not support chunked transferring with
"Transfer-Encoding: chunked"
This doesn't matter in normal swift, but prohibits us from putting
any middelwares to execute something like streaming processing of
objects, which can't calculate the length of their response body
before they start to send their response.
Change-Id: I60fc6c86338d734e39b7e5f1e48a2647995045ef
In common/test_utils.py, TestStatsdLogging had the majority of its
test cases calling the real socket.getaddrinfo(), which uses real
DNS. This is very slightly slower than using a mock getaddrinfo() when
the machine running the tests has functioning DNS, but on a machine
with no network connection at all, the tests are excruciatingly slow
due to timeouts.
This commit mocks things out as appropriate. There's still one user of
the real getaddrinfo(), but it's for ::1, so that's just local
resolution based on /etc/hosts.
Timing numbers for "./.unittests test.unit.common.test_utils:TestStatsdLogging":
* network, without this patch: 1.8s
* no network, without this patch: 221.2s (ouch)
* network, with this patch: 1.1s
* no network, with this patch: 1.1s
Change-Id: I1a2d6f24fc9bb928894fb1fd8383516250e29e0c
As swift no longer supports Python 2.6, replace assertEqual(None, *)
with assertIsNone in tests to have more clear messages in case of
failure.
Change-Id: I94af3e8156ef40465d4f7a2cb79fb99fc7bbda56
Closes-Bug: #1280522
The log_statsd_host value can now be an IPv6 address or a hostname
which only resolves to an IPv6 address. In both cases, the new
behavior is to use an AF_INET6 socket on which .sendto() is called
with the originally-configured hostname (or IP). This means the
Swift process is not caching a DNS resolution for the lifetime of
the process (a good thing).
If a hostname resolves to both an IPv6 or IPv4 address, an AF_INET
socket is used (i.e. only the IPv4 address will receive the UDP
packet).
The old behavior is preserved: any invalid IP address literals and
failures in DNS resolution or actual StatsD packet sending do not
halt the process or bubble up; they are caught, logged, and
otherwise ignored.
Change-Id: Ibddddcf140e2e69b08edf3feed3e9a5fa17307cf
Make the result of Timestamp(x) != Timestamp(x) be False.
In python 2.7 this requires the __ne__ method to be defined [1].
"The truth of x==y does not imply that x!=y is false." The
functools.total_ordering decorator does not autocreate a __ne__
method.
In python 3 the __ne__ method is not required [2]. "By default,
__ne__() delegates to __eq__() and inverts the result".
This patch puts back the __ne__ method removed in [3]. Whilst no tests
fail on master with python2.7, they do on this patch [4] and it seems
dangerous to have this absurd behaviour lurking.
[1] https://docs.python.org/2/reference/datamodel.html#object.__ne__
[2] https://docs.python.org/3.4/reference/datamodel.html#object.__ne__
[3] Change-Id: Id26777ac2c780316ff10ef7d954c48cc1fd480b5
[4] Change-Id: Ia597cd460bb5fd40aa92e886e3e18a7542603d01
Change-Id: I01fbfa310df3c74390f8e8c2e9ffff81bbf05e47
If "Permission Denied" has happen in NamedTemporaryFile function in
dump_recon_cache method, swift will log a message of reference to a variable
without assignment and not log a message of "Permission Denied".
This patch fixes the handling and add an unit test.
Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Change-Id: Iafdd94905e9e9c81f5966a923324b50c18fcf592
On Python 3, next(obj) calls obj.__next__(), not obj.next(). Add an
alias from __next__() to next() to be compatible with Python 2 and
Python 3.
Change-Id: Ida104d3bd7cdba557e523f18df43d56847060054
Port swift.common.utils.parse_mime_headers() to Python 3:
* On Python 3, tries to decode headers from UTF-8. If an header was
was not encoded to UTF-8, decode the header from Latin1.
* Update the parse_mime_headers() tests: on Python 3, HTTP header
values are Unicode strings.
This change is a follow-up of the change
Ia5ee2ead67e36e8c6416183667f64ae255887736.
Change-Id: I042dd13e9eb0e9844ccd832d538cdac84359ed42
Port FileLikeIter and _MultipartMimeFileLikeObject and
swift.common.utils to Python 3:
* Add a __next__() alias to the next() method. On Python 3, the
next() method is no more used, __next__() is required.
* Use literal byte strings: FileLikeIter _MultipartMimeFileLikeObject
are written to handle binary files.
* test_close(): replace .FileLikeIter('abcdef') with
FileLikeIter([b'a', b'b', b'c']). On Python 3, list(b'abc') returns
[97, 98, 99], whereas ['a', 'b', 'c'] is returned on Python 2.
* Update unit FileLikeIter tests to use byte strings.
Change-Id: Ibacddb70b22f624ecd83e374749578feddf8bca8
The patch moves the MemcacheConnPool._get_addr() method a function in
swift.common.utils. The function is renamed to parse_socket_string()
and the documentation is updated accordingly. The test for it has also
been moved.
Change-Id: Ida65b2fded28d0a059e668646f5b89714298f348
Refactor the disk file get_ondisk_files logic to enable
ECDiskfile to gather *all* fragments found on disk (not just those
with a matching .durable file) and make the fragments available
via the DiskFile interface as a dict mapping:
Timestamp --> list of fragment indexes
Also, if a durable fragment has been found then the timestamp
of the durable file is exposed via the diskfile interface.
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: I55e20a999685b94023d47b231d51007045ac920e
* StatsdClient._send(): on Python 3, encode parts to UTF-8 and
replace '|' with b'|' to join parts.
* timing_stats(): replace func.func_name with func.__name__. The
func_name attribute of functions was removed on Python 3, whereas
the __name__ attribute is available on Python 2 and Python 3.
* Fix unit tests to use bytes
Change-Id: Ic279c9b54e91aabcc52587eed7758e268ffb155e
The addition of range support for SLO segments (commit 25d5e68)
required the range size to be at least the SLO minimum segment size
(default 1 MiB). However, if you're doing something like assembling a
video of short clips out of a larger one, then you might not need a
full 1 MiB.
The reason for the 1 MiB restriction was to protect Swift from
resource overconsumption. It takes CPU, RAM, and internal bandwidth to
connect to an object server, so it's much cheaper to serve a 10 GiB
SLO if it has 10 MiB segments than if it has 10 B segments.
Instead of a strict limit, now we apply ratelimiting to small
segments. The threshold for "small" is configurable and defaults to 1
MiB. SLO segments may now be as small as 1 byte.
If a client makes SLOs as before, it'll still be able to download the
objects as fast as Swift can serve them. However, a SLO with a lot of
small ranges or segments will be slowed down to avoid resource
overconsumption. This is similar to how DLOs work, except that DLOs
ratelimit *every* segment, not just small ones.
UpgradeImpact
For operators: if your cluster has enabled ratelimiting for SLO, you
will want to set rate_limit_under_size to a large number prior to
upgrade. This will preserve your existing behavior of ratelimiting all
SLO segments. 5368709123 is a good value, as that's 1 greater than the
default max object size. Alternately, hold down the 9 key until you
get bored.
If your cluster has not enabled ratelimiting for SLO (the default), no
action is needed.
Change-Id: Id1ff7742308ed816038a5c44ec548afa26612b95
keystoneclient uses threading.Lock(), but swift doesn't
monkeypatch threading, this result in lockup when two
greenthreads try to acquire a non green lock.
This change fixes that.
Change-Id: I9b44284a5eb598a6978364819f253e031f4eaeef
Closes-bug: #1508424
This patch add some unit tests to prevent regression and to describe
validate_hash_path behavior, I found in review with
https://review.openstack.org/#/c/231864/
*bonus*
- Fix test_hash_path to use "with" syntax instead of "try/finally"
for assigning a testing value into a global variable.
Change-Id: I948999a8fb8addb9a378dbf8bee853b205aeafad
contextlib.nested() is missing completely in Python 3.
Since 2.7, we can use multiple context managers in a 'with' statement,
like so:
with thing1() as t1, thing2() as t2:
do_stuff()
Now, if we had some code that needed to nest an arbitrary number of
context managers, there's stuff we could do with contextlib.ExitStack
and such... but we don't. We only use contextlib.nested() in tests to
set up bunches of mocks without crazy-deep indentation, and all that
stuff fits perfectly into multiple-context-manager 'with' statements.
Change-Id: Id472958b007948f05dbd4c7fb8cf3ffab58e2681
* LogAdapter.exception(): on Python 3, socket.error is an alias to
OSError, so group both exceptions to support Python 3.
* ThreadPool: call GreenPipe() with indexed parameter, don't pass the
third parameter by its name, since the parameter name changed
between Python 2 (bufsize) and Python 3 (buffering).
* strip_value() in test_utils: StringIO.getvalue() now requires
seek(0), otherwise the buffer is filled with null characters.
* test_lock_file(): on Python 3, seek(0) is now required to go the
beginning of a file opened in append mode. In append mode, Python
goes to the end of the file.
Change-Id: I4e56a51690f016a0a2e1354380ce11cff1891f64
backward() is written to handle binary files:
* Replace literal native strings to literal byte strings: add b'...'
prefix
* Update unit tests: use byte strings
* TemporaryFile(): use the default mode 'w+b', instead of using 'r+w'
('r+w' mode creates a Unicode file on Python 3).
Change-Id: Ic91f7e6c605db0b888763080d49f0f501029837f
assertEquals is deprecated in py3, replacing it.
Change-Id: Ida206abbb13c320095bb9e3b25a2b66cc31bfba8
Co-Authored-By: Ondřej Nový <ondrej.novy@firma.seznam.cz>
The next() method of Python 2 generators was renamed to __next__().
Call the builtin next() function instead which works on Python 2 and
Python 3.
The patch was generated by the next operation of the sixer tool.
Change-Id: Id12bc16cba7d9b8a283af0d392188a185abe439d
The rfc822 module has been deprecated since Python 2.3, and in
particular is absent from the Python 3 standard library. However, Swift
uses instances of rfc822.Message in a number of places, relying on its
behavior of immediately parsing the headers of a file-like object
without consuming the body, leaving the position of the file at the
start of the body. Python 3's http.client has an undocumented
parse_headers function with the same behavior, which inspired the new
parse_mime_headers utility introduced here. (The HeaderKeyDict returned
by parse_mime_headers doesn't have a `.getheader(key)` method like
rfc822.Message did; the dictionary-like `[key]` or `.get(key)` interface
should be used exclusively.)
The implementation in this commit won't actually work with Python 3, the
email.parser.Parser().parsestr of which expects a Unicode string, but it
is believed that this can be addressed in followup work.
Change-Id: Ia5ee2ead67e36e8c6416183667f64ae255887736
ssync currently does the wrong thing when replicating object dirs
containing both a .data and a .meta file. The ssync sender uses a
single PUT to send both object content and metadata to the receiver,
using the metadata (.meta file) timestamp. This results in the object
content timestamp being advanced to the metadata timestamp,
potentially overwriting newer object data on the receiver and causing
an inconsistency with the container server record for the object.
For example, replicating an object dir with {t0.data(etag=x), t2.meta}
to a receiver with t1.data(etag=y) will result in the creation of
t2.data(etag=x) on the receiver. However, the container server will
continue to list the object as t1(etag=y).
This patch modifies ssync to replicate the content of .data and .meta
separately using a PUT request for the data (no change) and a POST
request for the metadata. In effect, ssync replication replicates the
client operations that generated the .data and .meta files so that
the result of replication is the same as if the original client requests
had persisted on all object servers.
Apart from maintaining correct timestamps across sync'd nodes, this has
the added benefit of not needing to PUT objects when only the metadata
has changed and a POST will suffice.
Taking the same example, ssync sender will no longer PUT t0.data but will
POST t2.meta resulting in the receiver having t1.data and t2.meta.
The changes are backwards compatible: an upgraded sender will only sync
data files to a legacy receiver and will not sync meta files (fixing the
erroneous behavior described above); a legacy sender will operate as
before when sync'ing to an upgraded receiver.
Changes:
- diskfile API provides methods to get the data file timestamp
as distinct from the diskfile timestamp.
- diskfile yield_hashes return tuple now passes a dict mapping data and
meta (if any) timestamps to their respective values in the timestamp
field.
- ssync_sender will encode data and meta timestamps in the
(hash_path, timestamp) tuple sent to the receiver during
missing_checks.
- ssync_receiver compares sender's data and meta timestamps to any
local diskfile and may specify that only data or meta parts are sent
during updates phase by appending a qualifier to the hash returned
in its 'wanted' list.
- ssync_sender now sends POST subrequests when a meta file
exists and its content needs to be replicated.
- ssync_sender may send *only* a POST if the receiver indicates that
is the only part required to be sync'd.
- object server will allow PUT and DELETE with earlier timestamp than
a POST
- Fixed TODO related to replicated objects with fast-POST and ssync
Related spec change-id: I60688efc3df692d3a39557114dca8c5490f7837e
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Closes-Bug: 1501528
Change-Id: I97552d194e5cc342b0a3f4b9800de8aa6b9cb85b
Currently, the rsync module where the replicators send data is static. It
forbids administrators to set rsync configuration based on their current
deployment or needs.
As an example, the rsyncd configuration example encourages to set a connections
limit for the modules account, container and object. It permits to protect
devices from excessives parallels connections, because it would impact
performances.
On a server with many devices, it is tempting to increase this number
proportionally, but nothing guarantees that the distribution of the connections
will be balanced. In the worst scenario, a single device can receive all the
connections, which is a severe impact on performances.
This commit adds a new option named 'rsync_module' to the *-replicator sections
of the *-server configuration file. This configuration variable can be
extrapolated with device attributes like ip, port, device, zone, ... by using
the format {NAME}. eg:
rsync_module = {replication_ip}::object_{device}
With this configuration, an administrators can solve the problem of connections
distribution by creating one module per device in rsyncd configuration.
The default values are backward compatible:
{replication_ip}::account
{replication_ip}::container
{replication_ip}::object
Option vm_test_mode is deprecated by this commit, but backward compatibility is
maintained. The option is only effective when rsync_module is not set. In that
case, {replication_port} is appended to the default value of rsync_module.
Change-Id: Iad91df50dadbe96c921181797799b4444323ce2e
And if they are not, exhaust the node iter to go get more. The
problem without this implementation is a simple overwrite where
a GET follows before the handoff has put the newer obj back on
the 'alive again' node such that the proxy gets n-1 fragments
of the newest set and 1 of the older.
This patch bucketizes the fragments by etag and if it doesn't
have enough continues to exhaust the node iterator until it
has a large enough matching set.
Change-Id: Ib710a133ce1be278365067fd0d6610d80f1f7372
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Closes-Bug: 1457691
'print' function is compatible with 2.x and 3.x python versions
Link : https://www.python.org/dev/peps/pep-3105/
Python 2.6 has a __future__ import that removes print as language syntax,
letting you use the functional form instead
Change-Id: I94e1bc6bd83ad6b05695c7ebdf7cbfd8f6d9f9af