The log_statsd_host value can now be an IPv6 address or a hostname
which only resolves to an IPv6 address. In both cases, the new
behavior is to use an AF_INET6 socket on which .sendto() is called
with the originally-configured hostname (or IP). This means the
Swift process is not caching a DNS resolution for the lifetime of
the process (a good thing).
If a hostname resolves to both an IPv6 or IPv4 address, an AF_INET
socket is used (i.e. only the IPv4 address will receive the UDP
packet).
The old behavior is preserved: any invalid IP address literals and
failures in DNS resolution or actual StatsD packet sending do not
halt the process or bubble up; they are caught, logged, and
otherwise ignored.
Change-Id: Ibddddcf140e2e69b08edf3feed3e9a5fa17307cf
Make the result of Timestamp(x) != Timestamp(x) be False.
In python 2.7 this requires the __ne__ method to be defined [1].
"The truth of x==y does not imply that x!=y is false." The
functools.total_ordering decorator does not autocreate a __ne__
method.
In python 3 the __ne__ method is not required [2]. "By default,
__ne__() delegates to __eq__() and inverts the result".
This patch puts back the __ne__ method removed in [3]. Whilst no tests
fail on master with python2.7, they do on this patch [4] and it seems
dangerous to have this absurd behaviour lurking.
[1] https://docs.python.org/2/reference/datamodel.html#object.__ne__
[2] https://docs.python.org/3.4/reference/datamodel.html#object.__ne__
[3] Change-Id: Id26777ac2c780316ff10ef7d954c48cc1fd480b5
[4] Change-Id: Ia597cd460bb5fd40aa92e886e3e18a7542603d01
Change-Id: I01fbfa310df3c74390f8e8c2e9ffff81bbf05e47
If "Permission Denied" has happen in NamedTemporaryFile function in
dump_recon_cache method, swift will log a message of reference to a variable
without assignment and not log a message of "Permission Denied".
This patch fixes the handling and add an unit test.
Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Change-Id: Iafdd94905e9e9c81f5966a923324b50c18fcf592
On Python 3, next(obj) calls obj.__next__(), not obj.next(). Add an
alias from __next__() to next() to be compatible with Python 2 and
Python 3.
Change-Id: Ida104d3bd7cdba557e523f18df43d56847060054
Port swift.common.utils.parse_mime_headers() to Python 3:
* On Python 3, tries to decode headers from UTF-8. If an header was
was not encoded to UTF-8, decode the header from Latin1.
* Update the parse_mime_headers() tests: on Python 3, HTTP header
values are Unicode strings.
This change is a follow-up of the change
Ia5ee2ead67e36e8c6416183667f64ae255887736.
Change-Id: I042dd13e9eb0e9844ccd832d538cdac84359ed42
Port FileLikeIter and _MultipartMimeFileLikeObject and
swift.common.utils to Python 3:
* Add a __next__() alias to the next() method. On Python 3, the
next() method is no more used, __next__() is required.
* Use literal byte strings: FileLikeIter _MultipartMimeFileLikeObject
are written to handle binary files.
* test_close(): replace .FileLikeIter('abcdef') with
FileLikeIter([b'a', b'b', b'c']). On Python 3, list(b'abc') returns
[97, 98, 99], whereas ['a', 'b', 'c'] is returned on Python 2.
* Update unit FileLikeIter tests to use byte strings.
Change-Id: Ibacddb70b22f624ecd83e374749578feddf8bca8
Refactor the disk file get_ondisk_files logic to enable
ECDiskfile to gather *all* fragments found on disk (not just those
with a matching .durable file) and make the fragments available
via the DiskFile interface as a dict mapping:
Timestamp --> list of fragment indexes
Also, if a durable fragment has been found then the timestamp
of the durable file is exposed via the diskfile interface.
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: I55e20a999685b94023d47b231d51007045ac920e
* StatsdClient._send(): on Python 3, encode parts to UTF-8 and
replace '|' with b'|' to join parts.
* timing_stats(): replace func.func_name with func.__name__. The
func_name attribute of functions was removed on Python 3, whereas
the __name__ attribute is available on Python 2 and Python 3.
* Fix unit tests to use bytes
Change-Id: Ic279c9b54e91aabcc52587eed7758e268ffb155e
The addition of range support for SLO segments (commit 25d5e68)
required the range size to be at least the SLO minimum segment size
(default 1 MiB). However, if you're doing something like assembling a
video of short clips out of a larger one, then you might not need a
full 1 MiB.
The reason for the 1 MiB restriction was to protect Swift from
resource overconsumption. It takes CPU, RAM, and internal bandwidth to
connect to an object server, so it's much cheaper to serve a 10 GiB
SLO if it has 10 MiB segments than if it has 10 B segments.
Instead of a strict limit, now we apply ratelimiting to small
segments. The threshold for "small" is configurable and defaults to 1
MiB. SLO segments may now be as small as 1 byte.
If a client makes SLOs as before, it'll still be able to download the
objects as fast as Swift can serve them. However, a SLO with a lot of
small ranges or segments will be slowed down to avoid resource
overconsumption. This is similar to how DLOs work, except that DLOs
ratelimit *every* segment, not just small ones.
UpgradeImpact
For operators: if your cluster has enabled ratelimiting for SLO, you
will want to set rate_limit_under_size to a large number prior to
upgrade. This will preserve your existing behavior of ratelimiting all
SLO segments. 5368709123 is a good value, as that's 1 greater than the
default max object size. Alternately, hold down the 9 key until you
get bored.
If your cluster has not enabled ratelimiting for SLO (the default), no
action is needed.
Change-Id: Id1ff7742308ed816038a5c44ec548afa26612b95
keystoneclient uses threading.Lock(), but swift doesn't
monkeypatch threading, this result in lockup when two
greenthreads try to acquire a non green lock.
This change fixes that.
Change-Id: I9b44284a5eb598a6978364819f253e031f4eaeef
Closes-bug: #1508424
This patch add some unit tests to prevent regression and to describe
validate_hash_path behavior, I found in review with
https://review.openstack.org/#/c/231864/
*bonus*
- Fix test_hash_path to use "with" syntax instead of "try/finally"
for assigning a testing value into a global variable.
Change-Id: I948999a8fb8addb9a378dbf8bee853b205aeafad
contextlib.nested() is missing completely in Python 3.
Since 2.7, we can use multiple context managers in a 'with' statement,
like so:
with thing1() as t1, thing2() as t2:
do_stuff()
Now, if we had some code that needed to nest an arbitrary number of
context managers, there's stuff we could do with contextlib.ExitStack
and such... but we don't. We only use contextlib.nested() in tests to
set up bunches of mocks without crazy-deep indentation, and all that
stuff fits perfectly into multiple-context-manager 'with' statements.
Change-Id: Id472958b007948f05dbd4c7fb8cf3ffab58e2681
* LogAdapter.exception(): on Python 3, socket.error is an alias to
OSError, so group both exceptions to support Python 3.
* ThreadPool: call GreenPipe() with indexed parameter, don't pass the
third parameter by its name, since the parameter name changed
between Python 2 (bufsize) and Python 3 (buffering).
* strip_value() in test_utils: StringIO.getvalue() now requires
seek(0), otherwise the buffer is filled with null characters.
* test_lock_file(): on Python 3, seek(0) is now required to go the
beginning of a file opened in append mode. In append mode, Python
goes to the end of the file.
Change-Id: I4e56a51690f016a0a2e1354380ce11cff1891f64
backward() is written to handle binary files:
* Replace literal native strings to literal byte strings: add b'...'
prefix
* Update unit tests: use byte strings
* TemporaryFile(): use the default mode 'w+b', instead of using 'r+w'
('r+w' mode creates a Unicode file on Python 3).
Change-Id: Ic91f7e6c605db0b888763080d49f0f501029837f
assertEquals is deprecated in py3, replacing it.
Change-Id: Ida206abbb13c320095bb9e3b25a2b66cc31bfba8
Co-Authored-By: Ondřej Nový <ondrej.novy@firma.seznam.cz>
The next() method of Python 2 generators was renamed to __next__().
Call the builtin next() function instead which works on Python 2 and
Python 3.
The patch was generated by the next operation of the sixer tool.
Change-Id: Id12bc16cba7d9b8a283af0d392188a185abe439d
The rfc822 module has been deprecated since Python 2.3, and in
particular is absent from the Python 3 standard library. However, Swift
uses instances of rfc822.Message in a number of places, relying on its
behavior of immediately parsing the headers of a file-like object
without consuming the body, leaving the position of the file at the
start of the body. Python 3's http.client has an undocumented
parse_headers function with the same behavior, which inspired the new
parse_mime_headers utility introduced here. (The HeaderKeyDict returned
by parse_mime_headers doesn't have a `.getheader(key)` method like
rfc822.Message did; the dictionary-like `[key]` or `.get(key)` interface
should be used exclusively.)
The implementation in this commit won't actually work with Python 3, the
email.parser.Parser().parsestr of which expects a Unicode string, but it
is believed that this can be addressed in followup work.
Change-Id: Ia5ee2ead67e36e8c6416183667f64ae255887736
ssync currently does the wrong thing when replicating object dirs
containing both a .data and a .meta file. The ssync sender uses a
single PUT to send both object content and metadata to the receiver,
using the metadata (.meta file) timestamp. This results in the object
content timestamp being advanced to the metadata timestamp,
potentially overwriting newer object data on the receiver and causing
an inconsistency with the container server record for the object.
For example, replicating an object dir with {t0.data(etag=x), t2.meta}
to a receiver with t1.data(etag=y) will result in the creation of
t2.data(etag=x) on the receiver. However, the container server will
continue to list the object as t1(etag=y).
This patch modifies ssync to replicate the content of .data and .meta
separately using a PUT request for the data (no change) and a POST
request for the metadata. In effect, ssync replication replicates the
client operations that generated the .data and .meta files so that
the result of replication is the same as if the original client requests
had persisted on all object servers.
Apart from maintaining correct timestamps across sync'd nodes, this has
the added benefit of not needing to PUT objects when only the metadata
has changed and a POST will suffice.
Taking the same example, ssync sender will no longer PUT t0.data but will
POST t2.meta resulting in the receiver having t1.data and t2.meta.
The changes are backwards compatible: an upgraded sender will only sync
data files to a legacy receiver and will not sync meta files (fixing the
erroneous behavior described above); a legacy sender will operate as
before when sync'ing to an upgraded receiver.
Changes:
- diskfile API provides methods to get the data file timestamp
as distinct from the diskfile timestamp.
- diskfile yield_hashes return tuple now passes a dict mapping data and
meta (if any) timestamps to their respective values in the timestamp
field.
- ssync_sender will encode data and meta timestamps in the
(hash_path, timestamp) tuple sent to the receiver during
missing_checks.
- ssync_receiver compares sender's data and meta timestamps to any
local diskfile and may specify that only data or meta parts are sent
during updates phase by appending a qualifier to the hash returned
in its 'wanted' list.
- ssync_sender now sends POST subrequests when a meta file
exists and its content needs to be replicated.
- ssync_sender may send *only* a POST if the receiver indicates that
is the only part required to be sync'd.
- object server will allow PUT and DELETE with earlier timestamp than
a POST
- Fixed TODO related to replicated objects with fast-POST and ssync
Related spec change-id: I60688efc3df692d3a39557114dca8c5490f7837e
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Closes-Bug: 1501528
Change-Id: I97552d194e5cc342b0a3f4b9800de8aa6b9cb85b
Currently, the rsync module where the replicators send data is static. It
forbids administrators to set rsync configuration based on their current
deployment or needs.
As an example, the rsyncd configuration example encourages to set a connections
limit for the modules account, container and object. It permits to protect
devices from excessives parallels connections, because it would impact
performances.
On a server with many devices, it is tempting to increase this number
proportionally, but nothing guarantees that the distribution of the connections
will be balanced. In the worst scenario, a single device can receive all the
connections, which is a severe impact on performances.
This commit adds a new option named 'rsync_module' to the *-replicator sections
of the *-server configuration file. This configuration variable can be
extrapolated with device attributes like ip, port, device, zone, ... by using
the format {NAME}. eg:
rsync_module = {replication_ip}::object_{device}
With this configuration, an administrators can solve the problem of connections
distribution by creating one module per device in rsyncd configuration.
The default values are backward compatible:
{replication_ip}::account
{replication_ip}::container
{replication_ip}::object
Option vm_test_mode is deprecated by this commit, but backward compatibility is
maintained. The option is only effective when rsync_module is not set. In that
case, {replication_port} is appended to the default value of rsync_module.
Change-Id: Iad91df50dadbe96c921181797799b4444323ce2e
And if they are not, exhaust the node iter to go get more. The
problem without this implementation is a simple overwrite where
a GET follows before the handoff has put the newer obj back on
the 'alive again' node such that the proxy gets n-1 fragments
of the newest set and 1 of the older.
This patch bucketizes the fragments by etag and if it doesn't
have enough continues to exhaust the node iterator until it
has a large enough matching set.
Change-Id: Ib710a133ce1be278365067fd0d6610d80f1f7372
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Closes-Bug: 1457691
'print' function is compatible with 2.x and 3.x python versions
Link : https://www.python.org/dev/peps/pep-3105/
Python 2.6 has a __future__ import that removes print as language syntax,
letting you use the functional form instead
Change-Id: I94e1bc6bd83ad6b05695c7ebdf7cbfd8f6d9f9af
Get configparser, queue, http_client modules from six.moves.
Patch generated by the six_moves operation of the sixer tool:
https://pypi.python.org/pypi/sixer
Change-Id: I666241ab50101b8cc6f992dd80134ce27327bd7d
The assert_() method is deprecated and can be safely replaced by assertTrue().
This patch makes sure that running the tests does not create undesired
warnings.
Change-Id: I0602ba39ef93263386644ee68088d5f65fcb4a71
* replace "from cStringIO import StringIO"
with "from six.moves import cStringIO as StringIO"
* replace "from StringIO import StringIO"
with "from six import StringIO"
* replace "import cStringIO" and "cStringIO.StringIO()"
with "from six import moves" and "moves.cStringIO()"
* replace "import StringIO" and "StringIO.StringIO()"
with "import six" and "six.StringIO()"
This patch was generated by the stringio operation of the sixer tool:
https://pypi.python.org/pypi/sixer
Change-Id: Iacba77fec3045f96773d1090c0bd48613729a561
Enabled by a new > 0 integer config value, "servers_per_port" in the
[DEFAULT] config section for object-server and/or replication server
configs. The setting's integer value determines how many different
object-server workers handle requests for any single unique local port
in the ring. In this mode, the parent swift-object-server process
continues to run as the original user (i.e. root if low-port binding
is required), binds to all ports as defined in the ring, and forks off
the specified number of workers per listen socket. The child, per-port
servers drop privileges and behave pretty much how object-server workers
always have, except that because the ring has unique ports per disk, the
object-servers will only be handling requests for a single disk. The
parent process detects dead servers and restarts them (with the correct
listen socket), starts missing servers when an updated ring file is
found with a device on the server with a new port, and kills extraneous
servers when their port is found to no longer be in the ring. The ring
files are stat'ed at most every "ring_check_interval" seconds, as
configured in the object-server config (same default of 15s).
Immediately stopping all swift-object-worker processes still works by
sending the parent a SIGTERM. Likewise, a SIGHUP to the parent process
still causes the parent process to close all listen sockets and exit,
allowing existing children to finish serving their existing requests.
The drop_privileges helper function now has an optional param to
suppress the setsid() call, which otherwise screws up the child workers'
process management.
The class method RingData.load() can be told to only load the ring
metadata (i.e. everything except replica2part2dev_id) with the optional
kwarg, header_only=True. This is used to keep the parent and all
forked off workers from unnecessarily having full copies of all storage
policy rings in memory.
A new helper class, swift.common.storage_policy.BindPortsCache,
provides a method to return a set of all device ports in all rings for
the server on which it is instantiated (identified by its set of IP
addresses). The BindPortsCache instance will track mtimes of ring
files, so they are not opened more frequently than necessary.
This patch includes enhancements to the probe tests and
object-replicator/object-reconstructor config plumbing to allow the
probe tests to work correctly both in the "normal" config (same IP but
unique ports for each SAIO "server") and a server-per-port setup where
each SAIO "server" must have a unique IP address and unique port per
disk within each "server". The main probe tests only work with 4
servers and 4 disks, but you can see the difference in the rings for the
EC probe tests where there are 2 disks per server for a total of 8
disks. Specifically, swift.common.ring.utils.is_local_device() will
ignore the ports when the "my_port" argument is None. Then,
object-replicator and object-reconstructor both set self.bind_port to
None if server_per_port is enabled. Bonus improvement for IPv6
addresses in is_local_device().
This PR for vagrant-swift-all-in-one will aid in testing this patch:
https://github.com/swiftstack/vagrant-swift-all-in-one/pull/16/
Also allow SAIO to answer is_local_device() better; common SAIO setups
have multiple "servers" all on the same host with different ports for
the different "servers" (which happen to match the IPs specified in the
rings for the devices on each of those "servers").
However, you can configure the SAIO to have different localhost IP
addresses (e.g. 127.0.0.1, 127.0.0.2, etc.) in the ring and in the
servers' config files' bind_ip setting.
This new whataremyips() implementation combined with a little plumbing
allows is_local_device() to accurately answer, even on an SAIO.
In the default case (an unspecified bind_ip defaults to '0.0.0.0') as
well as an explict "bind to everything" like '0.0.0.0' or '::',
whataremyips() behaves as it always has, returning all IP addresses for
the server.
Also updated probe tests to handle each "server" in the SAIO having a
unique IP address.
For some (noisy) benchmarks that show servers_per_port=X is at least as
good as the same number of "normal" workers:
https://gist.github.com/dbishop/c214f89ca708a6b1624a#file-summary-md
Benchmarks showing the benefits of I/O isolation with a small number of
slow disks:
https://gist.github.com/dbishop/fd0ab067babdecfb07ca#file-results-md
If you were wondering what the overhead of threads_per_disk looks like:
https://gist.github.com/dbishop/1d14755fedc86a161718#file-tabular_results-md
DocImpact
Change-Id: I2239a4000b41a7e7cc53465ce794af49d44796c6
The Python 2 next() method of iterators was renamed to __next__() on
Python 3. Use the builtin next() function instead which works on Python
2 and Python 3.
Change-Id: Ic948bc574b58f1d28c5c58e3985906dee17fa51d