This is an alternative approach to that proposed in [1]
Adds support for optional per-policy config sections
to be added in proxy-server.conf. This is highly desirable
to allow per-policy affinity options to be set for use with
duplicated EC policies [2] and composite rings [3].
Certain options found in per-policy conf sections will
override their equivalents that may be set in the
[app:proxy-server] section. Currently the options
handled that way are:
sorting_method
read_affinity
write_affinity
write_affinity_node_count
For example:
[proxy-server:policy:0]
sorting_method = affinity
read_affinity = r1=100
write_affinity = r1
write_affinity_node_count = 1 * replicas
The corresponding attributes of the proxy-server Application
are now available from instances of an OverrideConf object
that is obtained from Application.get_policy_options(policy).
[1] Related-Change: I9104fc789ba85ab3ab5ccd34096125b482821389
[2] Related-Change: Idd155401982a2c48110c30b480966a863f6bd305
[3] Related-Change: I0d8928b55020592f8e75321d1f7678688301d797
Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Change-Id: I3f718f425f525baa80045ba067950c752bcaaefc
Recently out gate started blowing up intermittently with a strange
case of ports mixed up. Sometimes a functional tests tries to
authorize on a port that's clearly an object server port, and
the like. As it turns out, eventlet developers added an unavoidable
SO_REUSEPORT into listen(), which makes listen(("localhost",0)
to reuse ports.
There's an issue about it:
https://github.com/eventlet/eventlet/issues/411
This patch is working around the problem while eventlet people
consider the issue.
Change-Id: I67522909f96495a6a30e1acdb79835dce2189549
This patch moves some code from the crypto files
to a more common modules that will be used by symlinks
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: I1758693c5dd428f9f2157966aac49d97c2c7ab12
Signed-off-by: Thiago da Silva <thiago@redhat.com>
Previously, we would set the TZ environment variable to the result of
time.strftime("%z", time.gmtime())
This has a few problems.
1. The "%z" format does not appear in the table of formatting
directives for strftime [1]. While it *does* appear in a
footnote [2] for that section, it is described as "not supported by
all ANSI C libraries." This may explain the next point.
2. On the handful of Linux platforms I've tested, the above produces
"+0000" regardless of the system's timezone. This seems to run
counter to the intent of the patches that introduced the TZ
mangling. (See the first two related changes.)
3. The above does not produce a valid Posix TZ format, which expects
(at minimum) a name consisting of three or more alphabetic
characters followed by the offset to be added to the local time to
get Coordinated Universal Time (UTC).
Further, while we would change os.environ['TZ'], we would *not* call
time.tzset like it says in the docs [3], which seems like a Bad Thing.
Some combination of the above has the net effect of changing some of the
functions in the time module to use UTC. (Maybe all of them? At the very
least, time.localtime and time.mktime.) However, it does *not* change
the offset stored in time.timezone, which causes bad behavior when
dealing with local timestamps [4].
Now, set TZ to "UTC+0" and call tzset. Apparently we don't have a good
way of getting local timezone info, we were (unintentionally?) using UTC
before, and you should probably be running your servers in UTC anyway.
[1] https://docs.python.org/2/library/time.html#time.strftime
[2] https://docs.python.org/2/library/time.html#id2
[3] https://docs.python.org/2/library/time.html#time.tzset
[4] Like in email.utils.mktime_tz, prior to being fixed in
https://hg.python.org/cpython/rev/a283563c8cc4
Change-Id: I007425301914144e228b9cfece5533443e851b6e
Related-Change: Ifc78236a99ed193a42389e383d062b38f57a5a31
Related-Change: I8ec80202789707f723abfe93ccc9cf1e677e4dc6
Related-Change: Iee7488d03ab404072d3d0c1a262f004bb0f2da26
We're functioning as a WSGI server here, so this bit from PEP-3333 seems
to apply:
> The start_response callable must not actually transmit the response
> headers. Instead, it must store them for the server or gateway to
> transmit only after the first iteration of the application return
> value that yields a non-empty bytestrin ... . In other words, response
> headers must not be sent until there is actual body data available, or
> until the application's returned iterable is exhausted.
Plus, it mirrors what swob.Request.call_application does.
Change-Id: I1e8501f8ce91ea912780db64fee1c56bef809a98
The mimetools module has been removed from Python 3: modify
monkey_patch_mimetools() to do nothing on Python 3.
Skip test_monkey_patch_mimetools() on Python 3.
Change-Id: I50f01ec159efedbb4df759ddd1e13928ac28fba6
Previously, if you called get_account_info, get_container_info, or
get_object_info, then the results of that call would be cached in the
WSGI environment as top-level keys. This is okay, except that if you,
in middleware, copy the WSGI environment and then make a subrequest
using the copy, information retrieved in the subrequest is cached
only in the copy and not in the original. This can mean lots of extra
trips to memcache for, say, SLO validation where the segments are in
another container; the object HEAD ends up getting container info for
the segment container, but then the next object HEAD gets it again.
This commit moves the cache for get_*_info into a dictionary at
environ['swift.infocache']; this way, you can shallow-copy the request
environment and still get the benefits from the cache.
Change-Id: I3481b38b41c33cd1e39e19baab56193c5f9bf6ac
Rewrite server side copy and 'object post as copy' feature as middleware to
simplify the PUT method in the object controller code. COPY is no longer
a verb implemented as public method in Proxy application.
The server side copy middleware is inserted to the left of dlo, slo and
versioned_writes middlewares in the proxy server pipeline. As a result,
dlo and slo copy_hooks are no longer required. SLO manifests are now
validated when copied so when copying a manifest to another account the
referenced segments must be readable in that account for the manifest
copy to succeed (previously this validation was not made, meaning the
manifest was copied but could be unusable if the segments were not
readable).
With this change, there should be no change in functionality or existing
behavior. This is asserted with (almost) no changes required to existing
functional tests.
Some notes (for operators):
* Middleware required to be auto-inserted before slo and dlo and
versioned_writes
* Turning off server side copy is not configurable.
* object_post_as_copy is no longer a configurable option of proxy server
but of this middleware. However, for smooth upgrade, config option set
in proxy server app is also read.
DocImpact: Introducing server side copy as middleware
Co-Authored-By: Alistair Coles <alistair.coles@hpe.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Change-Id: Ic96a92e938589a2f6add35a40741fd062f1c29eb
Signed-off-by: Prashanth Pai <ppai@redhat.com>
Signed-off-by: Thiago da Silva <thiago@redhat.com>
Oslo.messaging pika driver requires patching of select module if thread
is patched.
Pika driver uses select call and if it is not patched onsuming messages
blocks whole eventlet loop
Closes-Bug: #1570242
Change-Id: I9756737309f401ebddb7475eb84725f65bca01bf
As swift no longer supports Python 2.6, replace assertEqual(None, *)
with assertIsNone in tests to have more clear messages in case of
failure.
Change-Id: I94af3e8156ef40465d4f7a2cb79fb99fc7bbda56
Closes-Bug: #1280522
If not explicitly configured the versioned_writes middleware
should be auto-inserted in the pipeline after slo and dlo, which
is where the versioned_writes filter section's comments say it
should be in proxy-server.conf-sample. At the moment it can end up
being placed ahead of slo and dlo if they have been explicitly
configured, which results in the linked bug manifesting.
Closes-Bug: #1537042
Change-Id: I6ac95a331f4ef0d4887311940acc6f8bc00fb4eb
Currently a HTTP_REFERER (Referer) header isn't passed down to
subrequests. This means *LO subrequests to segment containers
return a 403 on a *LO GET when accessed by requests using referer
ACLs.
Currently the only way around referer access to *LO's is to make the
segments container world readable.
This change makes sure the referer header is passed into subrequests
allowing a segments container to only need to be locked down with
the same referer as the *LO container.
This is a 1 line change to code, but also adds a unit and 2 functional
functional tests (one for DLO and one for SLO).
Change-Id: I1fa5328979302d9c8133aa739787c8dae6084f54
Closes-Bug: #1526575
keystoneclient uses threading.Lock(), but swift doesn't
monkeypatch threading, this result in lockup when two
greenthreads try to acquire a non green lock.
This change fixes that.
Change-Id: I9b44284a5eb598a6978364819f253e031f4eaeef
Closes-bug: #1508424
contextlib.nested() is missing completely in Python 3.
Since 2.7, we can use multiple context managers in a 'with' statement,
like so:
with thing1() as t1, thing2() as t2:
do_stuff()
Now, if we had some code that needed to nest an arbitrary number of
context managers, there's stuff we could do with contextlib.ExitStack
and such... but we don't. We only use contextlib.nested() in tests to
set up bunches of mocks without crazy-deep indentation, and all that
stuff fits perfectly into multiple-context-manager 'with' statements.
Change-Id: Id472958b007948f05dbd4c7fb8cf3ffab58e2681
assertEquals is deprecated in py3, replacing it.
Change-Id: Ida206abbb13c320095bb9e3b25a2b66cc31bfba8
Co-Authored-By: Ondřej Nový <ondrej.novy@firma.seznam.cz>
The urllib, urllib2 and urlparse modules of Python 2 were reorganized
into a new urllib namespace on Python 3. Replace urllib, urllib2 and
urlparse imports with six.moves.urllib to make the modified code
compatible with Python 2 and Python 3.
The initial patch was generated by the urllib operation of the sixer
tool on: bin/* swift/ test/.
Change-Id: I61a8c7fb7972eabc7da8dad3b3d34bceee5c5d93
The TestCase.assert_() has been deprecated in Python 2.7. Replace it
with assertTrue() or even better methods (assertIn, assertNotIn,
assertIsInstance) which provide better error messages.
Change-Id: I21c730351470031a2dabe5238693095eabdb8964
Keep HTTP_X_USER_ID and HTTP_X_PROJECT_ID to be available as
user_id and project_id in storage.objects.outgoing.bytes in
ceilometer when downloading a multipart object.
Change-Id: I0f4734f021e5d6e84d48ed9bebeb321d7a9590ad
Closes-Bug: #1477283
Rewrite object versioning as middleware to simplify the PUT method
in the object controller.
The functionality remains basically the
same with the only major difference being the ability to now
version slo manifest files. dlo manifests are still not
supported as part of this patch.
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
DocImpact
Change-Id: Ie899290b3312e201979eafefb253d1a60b65b837
Signed-off-by: Thiago da Silva <thiago@redhat.com>
Signed-off-by: Prashanth Pai <ppai@redhat.com>
The SIGHUP receipt used to pop us out of an os.wait() where now, it's in
a "green" wait() and Timeout() combo, some part of which eats the signal
receipt. This causes the while loop condition to never get checked and
SIGHUP no longer works as a server reload command.
The fix is to loop at least every 0.5 seconds, as a trade-off between
not busy-waiting and checking the "keep running" condition often enough
to feel responsive.
Change-Id: I95283b8b7cfc2998ab5813e0ad3ca1fa231696c8
Closes-Bug: #1479972
The assert_() method is deprecated and can be safely replaced by assertTrue().
This patch makes sure that running the tests does not create undesired
warnings.
Change-Id: I0602ba39ef93263386644ee68088d5f65fcb4a71
wsgi.input is a binary stream (bytes), not a text stream (unicode).
* Replace StringIO with BytesIO for WSGI input
* Replace StringIO('') with StringIO() and replace WsgiStringIO('') with
WsgiStringIO(): an empty string is already the default value
Change-Id: I09c9527be2265a6847189aeeb74a17261ddc781a
* replace "from cStringIO import StringIO"
with "from six.moves import cStringIO as StringIO"
* replace "from StringIO import StringIO"
with "from six import StringIO"
* replace "import cStringIO" and "cStringIO.StringIO()"
with "from six import moves" and "moves.cStringIO()"
* replace "import StringIO" and "StringIO.StringIO()"
with "import six" and "six.StringIO()"
This patch was generated by the stringio operation of the sixer tool:
https://pypi.python.org/pypi/sixer
Change-Id: Iacba77fec3045f96773d1090c0bd48613729a561
Enabled by a new > 0 integer config value, "servers_per_port" in the
[DEFAULT] config section for object-server and/or replication server
configs. The setting's integer value determines how many different
object-server workers handle requests for any single unique local port
in the ring. In this mode, the parent swift-object-server process
continues to run as the original user (i.e. root if low-port binding
is required), binds to all ports as defined in the ring, and forks off
the specified number of workers per listen socket. The child, per-port
servers drop privileges and behave pretty much how object-server workers
always have, except that because the ring has unique ports per disk, the
object-servers will only be handling requests for a single disk. The
parent process detects dead servers and restarts them (with the correct
listen socket), starts missing servers when an updated ring file is
found with a device on the server with a new port, and kills extraneous
servers when their port is found to no longer be in the ring. The ring
files are stat'ed at most every "ring_check_interval" seconds, as
configured in the object-server config (same default of 15s).
Immediately stopping all swift-object-worker processes still works by
sending the parent a SIGTERM. Likewise, a SIGHUP to the parent process
still causes the parent process to close all listen sockets and exit,
allowing existing children to finish serving their existing requests.
The drop_privileges helper function now has an optional param to
suppress the setsid() call, which otherwise screws up the child workers'
process management.
The class method RingData.load() can be told to only load the ring
metadata (i.e. everything except replica2part2dev_id) with the optional
kwarg, header_only=True. This is used to keep the parent and all
forked off workers from unnecessarily having full copies of all storage
policy rings in memory.
A new helper class, swift.common.storage_policy.BindPortsCache,
provides a method to return a set of all device ports in all rings for
the server on which it is instantiated (identified by its set of IP
addresses). The BindPortsCache instance will track mtimes of ring
files, so they are not opened more frequently than necessary.
This patch includes enhancements to the probe tests and
object-replicator/object-reconstructor config plumbing to allow the
probe tests to work correctly both in the "normal" config (same IP but
unique ports for each SAIO "server") and a server-per-port setup where
each SAIO "server" must have a unique IP address and unique port per
disk within each "server". The main probe tests only work with 4
servers and 4 disks, but you can see the difference in the rings for the
EC probe tests where there are 2 disks per server for a total of 8
disks. Specifically, swift.common.ring.utils.is_local_device() will
ignore the ports when the "my_port" argument is None. Then,
object-replicator and object-reconstructor both set self.bind_port to
None if server_per_port is enabled. Bonus improvement for IPv6
addresses in is_local_device().
This PR for vagrant-swift-all-in-one will aid in testing this patch:
https://github.com/swiftstack/vagrant-swift-all-in-one/pull/16/
Also allow SAIO to answer is_local_device() better; common SAIO setups
have multiple "servers" all on the same host with different ports for
the different "servers" (which happen to match the IPs specified in the
rings for the devices on each of those "servers").
However, you can configure the SAIO to have different localhost IP
addresses (e.g. 127.0.0.1, 127.0.0.2, etc.) in the ring and in the
servers' config files' bind_ip setting.
This new whataremyips() implementation combined with a little plumbing
allows is_local_device() to accurately answer, even on an SAIO.
In the default case (an unspecified bind_ip defaults to '0.0.0.0') as
well as an explict "bind to everything" like '0.0.0.0' or '::',
whataremyips() behaves as it always has, returning all IP addresses for
the server.
Also updated probe tests to handle each "server" in the SAIO having a
unique IP address.
For some (noisy) benchmarks that show servers_per_port=X is at least as
good as the same number of "normal" workers:
https://gist.github.com/dbishop/c214f89ca708a6b1624a#file-summary-md
Benchmarks showing the benefits of I/O isolation with a small number of
slow disks:
https://gist.github.com/dbishop/fd0ab067babdecfb07ca#file-results-md
If you were wondering what the overhead of threads_per_disk looks like:
https://gist.github.com/dbishop/1d14755fedc86a161718#file-tabular_results-md
DocImpact
Change-Id: I2239a4000b41a7e7cc53465ce794af49d44796c6
The Python 2 next() method of iterators was renamed to __next__() on
Python 3. Use the builtin next() function instead which works on Python
2 and Python 3.
Change-Id: Ic948bc574b58f1d28c5c58e3985906dee17fa51d
This patch changes container sync to use Internal Client instead
of Direct Client.
In the current design, container sync uses direct_get_object to
get the newest source object(which talks to storage node directly).
This works fine for replication storage policies however in
erasure coding policies, direct_get_object would only return part
of the object(it's encoded as several pieces). Using Internal
Client can get the original object in EC case.
Note that for the container sync put/delete part, it's working in
EC since it's using Simple Client.
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
DocImpact
Change-Id: I91952bc9337f354ce6024bf8392046a1ecf6ecc9
The "logging" available in eventlet.wsgi.server/BaseHTTPServer doesn't
generally suite our needs, so it should be bypassed using a NullLogger in
production. But in development it can be useful if tracebacks generated from
inside eventlet.wsgi (say a NameError in DiskFile.__iter__) end up in logs.
Since we already have eventlet_debug parsed inside of run_server we can skip
the NullLogger bypass and let stuff blast out to STDERR when configured for
development/debug logging.
Change-Id: I20a9e82c7fed8948bf649f1f8571b4145fca201d
In a long-term effort to change the recommended ports for Swift,
the first step is to require the bind_port in config files. Later,
we can change the recommended setting.
Anyone currently explicitly setting the ports will not be affected.
Anyone not setting the ports will need to specify them to match their
rings.
DocImpact
Change-Id: Icca83a263acdd0afc9016424a3e9f8c15e944789
RFC 2616 says that HTTP header fields are case-insensitive. However, there are
some S3 clients who don't accept normalized header by Swift and Eventlet. For
example, AWS Java SDK expects that an etag header is 'ETag', not 'Etag'.
This patch disables Eventlet's header capitalization so that the swift3
middleware can normalize the response headers as those clients expect.
Note that this change requires a fix for Eventlet, which will be included in
the next Eventlet release (v0.15).
Change-Id: I6d3428b0dafef776bdb3ebac7639b3126fa5e60d
Discovered some tests that were coupling the code under test with the
storage policies configured in /etc/swift/swift.conf. There was some
tests that created fake rings in their tempdirs, but didn't reset or
patch the POLICIES global. So if your local config needed more rings
that the fake's were setting up (just 2) the tests would puke when they
loaded up an app that looked for rings. I think this probably started
happening when we added eager object ring loading back into the proxy.
* two TestCases in test_wsgi were missing @patch_policies
* fixed issue with patch_policies that could cause state to bleed
between tests
* patch_policies' legacy and default collections get a FakeRing by
default
* drive-by cleanup for test_loadapp_proxy() ring serialized path
handling
* drive-by cleanup for test_internal_client that was doing basically
the same thing as test_wsgi
Change-Id: Ia706000ba961ed24f2c22b81041e53a0c3f302fc
* add get_object
* allow extra headers passthrough on HEAD/metadata reqeusts
* expose (account|container|get_object)_ring properties
Pipeline propety access to the auto_create_account_prefix also allows us to
bypass the early exit on a container HEAD for auto_create_accounts if the
container-updater hasn't cycled yet.
Allow overriding of storage policy index.
This is something the reconciler will need so that it can GET from one
policy, PUT in another, and then DELETE from the first one again.
DocImpact
Implements: blueprint storage-policies
Change-Id: I9b287d15f2426022d669d1186c9e22dd8ca13fb9
Objects now have a storage policy index associated with them as well;
this is determined by their filesystem path. Like before, objects in
policy 0 are in /srv/node/$disk/objects; this provides compatibility
on upgrade. (Recall that policy 0 is given to all existing data when a
cluster is upgraded.) Objects in policy 1 are in
/srv/node/$disk/objects-1, objects in policy 2 are in
/srv/node/$disk/objects-2, and so on.
* 'quarantined' dir already created 'objects' subdir so now there
will also be objects-N created at the same level
This commit does not address replicators, auditors, or updaters except
where method signatures changed. They'll still work if your cluster
has only one storage policy, though.
DocImpact
Implements: blueprint storage-policies
Change-Id: I459f3ed97df516cb0c9294477c28729c30f48e09
FakeLogger gets better log level handling
Parameterize logger on some daemons which were previously
unparameterized and try and use the interface in tests.
FakeRing use more real code
The existing FakeRing mock's implementation bit me on some pretty subtle
character encoding issue by-passing the hash_path code that is normally
part of get_part_nodes. This change tries to exercise more of the real
ring code paths when it makes sense and provide a better Fake for use in
testing.
Add write_fake_ring helper to test.unit for when you need a real ring.
DocImpact
Implements: blueprint storage-policies
Change-Id: Id2e3740b1dd569050f4e083617e7dd6a4249027e
As seen on #1174809, changes use of mutable types as default
arguments and defaults them within the method. Otherwise, those
defaults can be unexpectedly persisted with the function between
invocations and erupt into mass hysteria on the streets.
There was indeed a test (TestSimpleClient.test_get_with_retries)
that was erroneously relying on this behavior. Since previous tests
had populated their own instantiations with a token, this test only
passed because the modified headers dict from previous tests was
being overridden. As expected, with the mutable defaults fix in
SimpleClient, this test begain to fail since it never specified any
token, yet it has always passed anyway. This change also now provides
the expected token.
Change-Id: If95f11d259008517dab511e88acfe9731e5a99b5
Related-Bug: #1174809
When current code modifies the pipeline, it prints the entry point
names instead of the names used to construct the pipeline. This is
inconvenient because a sysadmin cannot copy and paste from the log.
We already save the pipeline name into contexts in most cases, so
the fix simply reuses that to provide friendly names.
Fixes bug: 1311802
Change-Id: Ic76baf1360cd521f140fa1980029ccbce58f1717
One can argue that it makes sense for the client-facing proxy server
to have certain middlewares like gatekeeper in its pipeline, but that
is not desirable for InternalClient. In particular, it prevents you
from passing in sysmeta headers using InternalClient, and I found
myself wanting to do that earlier today.
Now InternalClient's proxy application gets exactly what's configured;
no more, no less. This will mean that the object expirer can read and
write sysmeta headers, but I think we can trust it to keep our
secrets.
Change-Id: I17b4a89c24e600754701ee1645b40406421fa6f3
This is for the same reason that SLO got pulled into middleware, which
includes stuff like automatic retry of GETs on broken connection and
the multi-ring storage policy stuff.
The proxy will automatically insert the dlo middleware at an
appropriate place in the pipeline the same way it does with the
gatekeeper middleware. Clusters will still support DLOs after upgrade
even with an old config file that doesn't mention dlo at all.
Includes support for reading config values from the proxy server's
config section so that upgraded clusters continue to work as before.
Bonus fix: resolve 'after' vs. 'after_fn' in proxy's required filters
list. Having two was confusing, so I kept the more-general one.
DocImpact
blueprint multi-ring-large-objects
Change-Id: Ib3b3830c246816dd549fc74be98b4bc651e7bace
Middleware or core features may need to store metadata
against accounts or containers. This patch adds a
generic mechanism for system metadata to be persisted
in backend databases, without polluting the user
metadata namespace, by using the reserved header
namespace x-<server_type>-sysmeta-*.
Modifications are firstly that backend servers persist
system metadata headers alongside user metadata and
other system state.
For accounts and containers, system metadata in PUT
and POST requests is treated in a similar way to user
metadata. System metadata is not yet supported for
object requests.
Secondly, changes in the proxy controllers ensure that
headers in the system metadata namespace will pass through
in requests to backend servers.
Thirdly, system metadata returned from backend servers
in GET or HEAD responses is added to the cached info
dict, which middleware can access.
Finally, a gatekeeper middleware module is provided
which filters all system metadata headers from requests
and responses by removing headers with names starting
x-account-sysmeta-, x-container-sysmeta-. The gatekeeper
also removes headers starting x-object-sysmeta- in
anticipation of future support for system metadata being
set for objects. This prevents clients from writing or
reading system metadata.
The required_filters list in swift/proxy/server.py is
modified to include the gatekeeper middleware so that
if the gatekeeper has not been configured in the
pipeline then it will be automatically inserted close
to the start of the pipeline.
blueprint cluster-federation
Change-Id: I80b8b14243cc59505f8c584920f8f527646b5f45