The proxy can now be configured to prefer local object servers for PUT
requests, where "local" is governed by the "write_affinity". The
"write_affinity_node_count" setting controls how many local object
servers to try before giving up and going on to remote ones.
I chose to simply re-order the object servers instead of filtering out
nonlocal ones so that, if all of the local ones are down, clients can
still get successful responses (just slower).
The goal is to trade availability for throughput. By writing to local
object servers across fast LAN links, clients get better throughput
than if the object servers were far away over slow WAN links. The
downside, of course, is that data availability (not durability) may
suffer when drives fail.
The default configuration has no write affinity in it, so the default
behavior is unchanged.
Added some words about these settings to the admin guide.
DocImpact
Change-Id: I09a0bd00524544ff627a3bccdcdc48f40720a86e
A really simple version of this was in container sync already, and I
needed a more complete version for work I'm doing, and I noticed
https://review.openstack.org/#/c/33405/ was also making use of it.
So, here's a more full version.
If https://review.openstack.org/#/c/33405/ lands before this, I'll
update it accordingly.
Change-Id: Iba66b6a97f65e312e04fdba273e8f4ad1d3e1594
Now you can configure the proxy server to read from "local" primary
nodes first, where "local" is governed by the newly-introduced
"read_affinity" setting in the proxy config. This is desirable when
the network links between regions/zones are of varying capacities; in
such a case, it's a good idea to prefer fetching data from closer
backends.
The new setting looks like rN[zM]=P, where N is the region number, M
is the optional zone number, and P is the priority. Multiple values
can be specified by separating them with commas. The priority for
nodes that don't match anything is a very large number, so they'll
sort last.
This only affects the ordering of the primary nodes; it doesn't affect
handoffs at all. Further, while the primary nodes are reordered for
all requests, it only matters for GET/HEAD requests since handling the
other verbs ends up making concurrent requests to *all* the primary
nodes, so ordering is irrelevant.
Note that the default proxy config does not have this setting turned
on, so the default configuration's behavior is unaffected.
blueprint multi-region
Change-Id: Iea4cd367ed37fe5ee69b63234541d358d29963a4
Without a (per-disk) threadpool, requests to a slow disk would affect
all clients by blocking the entire eventlet reactor on
read/write/etc. The slower the disk, the worse the performance. On an
object server, you frequently have at least one slow disk due to
auditing and replication activity sucking up all the available IO. By
kicking those blocking calls out to a separate OS thread, we let the
eventlet reactor make progress in other greenthreads, and by having a
per-disk pool, we ensure that one slow disk can't suck up all the
resources of an entire object server.
There were a few blocking calls that were done with eventlet.tpool,
but that's a fixed-size global threadpool, so I moved them to the
per-disk threadpools. If the object server is configured not to use
per-disk threadpools, (i.e. threads_per_disk = 0, which is the
default), those call sites will still ultimately end up using
eventlet.tpool.execute. You won't end up blocking a whole object
server while waiting for a huge fsync.
If you decide not to use threadpools, the only extra overhead should
be a few extra Python function calls here and there. This is
accomplished by setting threads_per_disk = 0 in the config.
blueprint concurrent-disk-io
Change-Id: I490f8753d926fdcee3a0c65c5aaf715bc2b7c290
If you have StatsD logging turned on, then the iterator returned by
Ring.get_part_nodes() will emit StatsD packets when it yields a
handoff node. That network IO may make eventlet trampoline to another
greenthread before returning from next(). Now, if that other
greenthread tries to call next() on that same iterator, it blows up
with a ValueError.
Any socket IO inside a generator's next() method can cause this. It's
easiest to reproduce with StatsD logging turned on, but logging to
syslog can trigger it too.
You can see this happen sometimes in the proxy's make_requests method
if two of the primary nodes are down. Greenthread A goes into next()
to get a handoff node, then sends a StatsD packet, and so eventlet
trampolines to Greenthread B. Now, Greenthread B also calls next() to
get a handoff node, and dies with a ValueError.
This commit wraps up concurrently-accessed iter_nodes() iterators in a
new thing called a GreenthreadSafeIterator that serializes access.
Bug 1180110
Change-Id: I8fe13d7295c056a2cab9e084f5966078a49bdc13
Autocreate was messy - now cleaned.
Auto-create now occurs at account POST, and container PUT only.
A method for autocreation was added
Autocreation was removed from account_info and container_info.
Fake-it as if the account exists on account HEAD and account GET.
Return 404 on everything else when the account does not exist.
Fix: Bug #1172223
Fix: Bug #1179140
Change-Id: Iac54c1438eb09883fbc29a1ad2ac2245b95efc92
This will be needed in future replication work to avoid circular
imports.
I used swift.obj.base as the module name just because we seemed to
avoid putting code in __init__.py files so far and I didn't want to
buck the trend.
I would love to see other obj things like *_metadata and DiskFile
move into swift.obj.base as well and swift.obj.server just be the
WSGI server logic, but I'll leave that for the future.
I have changed the tests as little as possible (just the references
to where they get the code to test) to show the refactor has not
broken anything. I did add a test for tpool_reraise since there was
none before.
There will be a follow on patch for moving the tests to their new
location(s). I figured I'd wait to put the bikes in the shed until
everyone's done painting it.
Change-Id: I32b4ac88be21eb76c877d3f4cc1e6ac33304835b
Allow Swift daemons and servers to optionally accept a directory as the
configuration parameter. Directory based configuration leverages
ConfigParser's native multi-file support. Files ending in '.conf' in the
given directory are parsed in lexicographical order. Filenames starting with
'.' are ignored. A mixture of file and directory configuration paths is not
supported - if the configuration path is a file behavior is unchanged.
* update swift-init to search for conf.d paths when building servers
(e.g. /etc/swift/proxy-server.conf.d/)
* new script swift-config can be used to inspect the cumulative configuration
* pull a little bit of code out of run_wsgi and test separately
* fix example config bug for the proxy servers client_disconnect option
* added section on directory based configuration to deployment guide
DocImpact
Implements: blueprint confd
Change-Id: I89b0f48e538117f28590cf6698401f74ef58003b
Including the time inside the trans_id can be very useful for knowing
which logs to scan. I made this so the trans_id will still be the
same length (the randomness of the remaining uuid4 should be enough
for this use). I also added a convenience function for retreiving the
time information from a trans_id.
If you're wondering why I just didn't use uuid1 that embeds the time,
it's because it also embeds uuid.getnode() which "The first time this
runs, it may launch a separate program, which could be quite slow."
We could supply our own getnode value, but then we have to guarantee
its uniqueness, yada yada yada.
Change-Id: Ie33caf1e839fd1a21b01a928a8b301126bef7396
A new configuration parameter is added to /etc/swift/swift.conf
[swift-hash]
swift_hash_path_prefix = 'random unique string'
New installations are advised to set this parameter to a random secret,
which would not be disclosed ouside the organization.
The same secret needs to be used by all swift servers of the same cluster.
Existing installations should set this parameter to an empty string
(the default)
DocImpact
Fixes: Bug #1157454
Change-Id: I63b10d0b7d6dd3f74e0f10bb41b5f240fa03578a
Swift never fsyncs, it only fdatasyncs. That is dumb, we have important
metadata we need to save. Also, the code was weird and had no tests.
Change-Id: I6ec875c14560820b686266a28043a2b7631781e9
Different versions of syslog-ng and probably other syslog services
handle multi line log messages differently and sometimes quite
poorly. This patch collapses multi line log messages into single
lines before sending them on to syslog.
It's just a copy of what was already in Python's logging.Formatter
but altered to replace the newlines with #012. I used #012 since
that's a convention we've already used elsewhere in Swift.
Change-Id: I8d0509b7cf48e45c2cf6480b51c67eec5bc94fe2
Change supports kern.log rotation in order to avoid loss
of significant information.
There is a year change functionality added as kern.log
does not keep record of year.
There is also backwards function added which allows
reading logs from the back to the front, speeding up the
execution along with the unit test for it
Fixes Bug 1080682
Change-Id: I93436c405aff5625396514000cab774b66022dd0
Some systems behave badly when they completely run out of space. To
alleviate this problem, you can set the fallocate_reserve conf value
to a number of bytes to "reserve" on each disk. When the disk free
space falls at or below this amount, fallocate calls will fail, even
if the underlying OS fallocate call would succeed. For example, a
fallocate_reserve of 5368709120 (5G) would make all fallocate calls
fail, even for zero-byte files, when the disk free space falls under
5G.
The default fallocate_reserve is 0, meaning "no reserve", and so the
software behaves exactly as it always has unless you set this conf
value to something non-zero.
Also fixed ring builder's search_devs doc bugs.
Related: To get rsync to do the same, see
https://github.com/rackspace/cloudfiles-rsync
Specifically, see this patch:
https://github.com/rackspace/cloudfiles-rsync/blob/master/debian/patches/limit-fs-fullness.diff
DocImpact
Change-Id: I8db176ae0ca5b41c9bcfeb7cb8abb31c2e614527
As Dieter pointed out in bug 1090495
(https://bugs.launchpad.net/swift/+bug/1090495), the volume of metrics
can vary wildly between StatsD metrics.
This patch implements a partial solution by reducing the sample_rate
used for known high-volume metrics (operational experience will need to
inform this over time) and introducing a new tunable,
log_statsd_sample_rate_factor which is multiplied by the sample_rate for
every statsd stat. This tunable can be used to reduce StatsD traffic
proportionally for all metrics and is intended to replace
log_statsd_default_sample_rate, which is left alone for
backward-compatibility, should anyone be using it.
This patch also includes a drive-by fix for log_udp_port which wasn't
being converted to an int (I didn't verify that actually causes trouble
in SysLogHandler(), but it's definitely an improvement regardles).
Change-Id: Id404636e3629f6431cf1c4e64a143959750a3c23
E.g. if HOME is not set, swift-proxy will create the
keystone_signing file not in HOME but in /root.
This is because the swift user doesn't have a shell
in /etc/passwd and so it doesn't set environment variables
when impersonating.
Change-Id: I3013007e0dadf6ddccc176e142b7c78c5d63a351
There have been a bunch of Jenkins failures lately where the StatsD
tests fail because they can't bind to their desired port. There's
nothing special about the particular port they're using, so now we let
the kernel pick an available one for us.
This also lets us get rid of a sleep() in the test that looked like an
attempt to alleviate EADDRINUSE errors, so now in the happy case, the
tests are a few fractions of a second faster.
Change-Id: Idee11349254107a59643539b1566f3588eee7ef4
It's there to let administrators turn down the barrage of stats data
that StatsD must cope with, but it wasn't actually honored. Worse, if
the sample rate was set to e.g. 0.2, the stats would all be multiplied
by its inverse, e.g. 2. This patch actually drops packets when
sample_rate < 1, so you get correct measurements.
Fortunately, the default sample rate is 1 (i.e. drop nothing), and
multiplying by 1/1 doesn't change anything, so stats with the default
sample rate of 1.0 are, and have been, just fine.
Fixes bug 1065643.
Also, make the two touched files compliant with pep8 v1.3.3.
Change-Id: I66663144009ae4c9ee96f6a111745d8f5d2f5ca3
A warning log line is emitted whenever the proxy has to use a handoff
node. Monitoring these warnings can indicate a problem within your
cluster; however, you can disable these log lines by setting the
proxy conf's log_handoffs to false.
While working on this, I also noticed why many proxy log lines did
not have txn_id and client_ip -- subcoroutines. Now the logger thread
locals are copied to the subcoroutines.
Change-Id: Ibac086e1b985f566c068d083620287509de35da8
Based on PatchSet 3 of https://review.openstack.org/#/c/7569/ , make them to pass all funcional tests with both webob 1.x and 1.2.
The additional following compatibility issues were addressed:
- Until patch for range header issue is merged into official webob release, testRangedGetsWithLWSinHeader() should skip test against webob 1.2
(49c175aec2)
- common.constraints.check_utf8() can accept both utf8 str and unicode.
- To convert unicode to utf-8 str if necessary.
- Making proxy_logging can handle invalid utf-8 str
bug 888371
bug 959881
blueprint webob-support
Change-Id: I00e5fd04cd1653259606a4ffdd4926db3c84c496
swift.common.utils.validate_device_partition is a new function to check
that a device and a partition are valid. This means that they don't
contain '/' and are not '.' or '..'.
We use this new function every time we get devices and partitions from a
request.
Fix bug 1005908
Change-Id: Ia545ba8f877e85b4b576d6d7d09d890877ea6d34
Documentation, including a list of metrics reported and their semantics,
is in the Admin Guide in a new section, "Reporting Metrics to StatsD".
An optional "metric prefix" may be configured which will be prepended to
every metric name sent to StatsD.
Here is the rationale for doing a deep integration like this versus only
sending metrics to StatsD in middleware. It's the only way to report
some internal activities of Swift in a real-time manner. So to have one
way of reporting to StatsD and one place/style of configuration, even
some things (like, say, timing of PUT requests into the proxy-server)
which could be logged via middleware are consistently logged the same
way (deep integration via the logger delegate methods).
When log_statsd_host is configured, get_logger() injects a
swift.common.utils.StatsdClient object into the logger as
logger.statsd_client. Then a set of delegate methods on LogAdapter
either pass through to the StatsdClient object or become no-ops. This
allows StatsD logging to look like:
self.logger.increment('some.metric.here')
and do the right thing in all cases and with no messy conditional logic.
I wanted to use the pystatsd module for the StatsD client, but the
version on PyPi is lagging the git repo (and is missing both the prefix
functionality and timing_since() method). So I wrote my
swift.common.utils.StatsdClient. The interface is the same as
pystatsd.Client, but the code was written from scratch. It's pretty
simple, and the tests I added cover it. This also frees Swift from an
optional dependency on the pystatsd module, making this feature easier
to enable.
There's test coverage for the new code and all existing tests continue
to pass.
Refactored out _one_audit_pass() method in swift/account/auditor.py and
swift/container/auditor.py.
Fixed some misc. PEP8 violations.
Misc test cleanups and refactorings (particularly the way "fake logging"
is handled).
Change-Id: Ie968a9ae8771f59ee7591e2ae11999c44bfe33b2
Corrected its/it's mistakes, harmonized line wrapping within some docs
and clarified doc wording in several places.
Change-Id: Ib9ac6d5e859f770a702e1fad6de8d4abe0390b47
Fix bug 942644.
Use constant time string comparisons when doing authentication to help
guard against timing attacks.
Change-Id: I88c4c5cd9edd9e5d60db07b6ae2638b74a2a2e17
Fixes bug 989569.
This patch ensures that the list of groups is completely reset when dropping
privileges.
Change-Id: I049f75e66e08a4a6361504b013bc68c4c38ef093
Updated eventlet.TimeoutError (deprecated) references to
Timeout and, more importantly, updated many except Exception
clauses to except (Exception, Timeout).
Change-Id: Ib089265551bd20b94c00ea84f11140ccd795d301