This is to fix invalid assert states like:
self.assertTrue('sync_point2: 5', lines.pop().strip())
self.assertTrue('sync_point1: 5', lines.pop().strip())
self.assertTrue('bytes: 1100', lines.pop().strip())
self.assertTrue('deletes: 2', lines.pop().strip())
self.assertTrue('puts: 3', lines.pop().strip())
self.assertTrue('1', jobs_to_delete[0]['partition'])
in which assertEqual should be used.
Change-Id: Ide5af2ae68fae0e5d6eb5c233a24388bb9942144
This patch removes default value for logger passed to
faked BrokerClass in some unit tests for account reaper,
to make sure that a logger instance is properly passed
when generation BrokerClass instance.
This patch also removes unnecessory parameter, which is
added to confirm logger passed to broker, but is not
useful in fact.
NOTE: This patch is a follow-up of my previous commit[1]
[1] https://review.opendev.org/#/c/295875/
Change-Id: Ica10e3ac42375c2dc141478e647408df90028ae9
There are some processes which don't pass their logger instances
when creating AccountBrokers/ContainerBrokers, which causes some
error messages with a different setting from the other logs
generated by the processes.
This patch makes them pass logger instances, to make sure they
generate all logs according to their log settings.
Change-Id: I914c3a2811e1a2b7f19ad2bc9b3d042fcba63820
This started with ShardRanges and its CLI. The sharder is at the
bottom of the dependency chain. Even container backend needs it.
Once we started tinkering with the sharder, it all snowballed to
include the rest of the container services.
Beware, this does affect some of Python 2 code. Mostly it's trivial
and obviously correct, but needs checking by reviewers.
About killing the stray "from __future__ import unicode_literals":
we do not do it in general. The specific problem it caused was
a failure of functional tests because unicode leaked into a field
that was supposed to be encoded. It is just too hard to track the
types when rules change from file to file, so off with its head.
Change-Id: Iba4e65d0e46d8c1f5a91feb96c2c07f99ca7c666
Previously, we'd sometimes shove strings into HASH_PATH_PREFIX or
HASH_PATH_SUFFIX, which would blow up on py3. Now, always use bytes.
Change-Id: Icab9981e8920da505c2395eb040f8261f2da6d2e
Add a symbolic link ("symlink") object support to Swift. This
object will reference another object. GET and HEAD
requests for a symlink object will operate on the referenced object.
DELETE and PUT requests for a symlink object will operate on the
symlink object, not the referenced object, and will delete or
overwrite it, respectively.
POST requests are *not* forwarded to the referenced object and should
be sent directly. POST requests sent to a symlink object will
result in a 307 Error.
Historical information on symlink design can be found here:
https://github.com/openstack/swift-specs/blob/master/specs/in_progress/symlinks.rst.
https://etherpad.openstack.org/p/swift_symlinks
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: Janie Richling <jrichli@us.ibm.com>
Co-Authored-By: Kazuhiro MIYAHARA <miyahara.kazuhiro@lab.ntt.co.jp>
Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Change-Id: I838ed71bacb3e33916db8dd42c7880d5bb9f8e18
Signed-off-by: Thiago da Silva <thiago@redhat.com>
Following OpenStack Style Guidelines:
[1] http://docs.openstack.org/developer/hacking/#unit-tests-and-assertraises
[H203] Unit test assertions tend to give better messages for more specific
assertions. As a result, assertIsNone(...) is preferred over
assertEqual(None, ...) and assertIs(..., None)
Change-Id: If4db8872c4f5705c1fff017c4891626e9ce4d1e4
Currently the container sync daemon fails to copy
an SLO manifest, and the error will stall progress
of the sync process on that container. There are
several reasons why the sync of an SLO manifest
may fail:
1. The GET of the manifest from the source
container returns an X-Static-Large-Object header
that is not allowed to be included with a PUT
to the destination container.
2. The format of the manifest object that is read
from the source is not in the syntax required
for a SLO manifest PUT.
3. Assuming 2 were fixed, the PUT of the manifest
includes an ETag header which will not match the
md5 of the manifest generated by the receiving
proxy's SLO middleware.
4. If the manifest is being synced to a different
account and/or cluster, then the SLO segments may
not have been synced and so the validation of the
PUT manifest will fail.
This patch addresses all of these obstacles by
enabling the destination container-sync middleware to
cause the SLO middleware to be bypassed by setting a
swift.slo_override flag in the request environ. This
flag is only set for request that have been validated
as originating from a container sync peer.
This is justifed by noting that a SLO manifest PUT from
a container sync peer can be assumed to have valid syntax
because it was already been validated when written to
the source container.
Furthermore, we must allow SLO manifests to be synced
without requiring the semantic of their content to be
re-validated because we have no way to enforce or check
that segments have been synced prior to the manifest, nor
to check that the semantic of the manifest is still valid
at the source.
This does mean that GETs to synced SLO manifests may fail
if segments have not been synced. This is however
consistent with the expectation for synced DLO manifests
and indeed for the source SLO manifest if segments have
been deleted since it was written.
Co-Authored-By: Oshrit Feder <oshritf@il.ibm.com>
Change-Id: I8d503419b7996721a671ed6b2795224775a7d8c6
Closes-Bug: #1605597
In addition to the container sync stat. report, keeping per container
statistics allows administrator with more control over bytes
transfered over a specific time per user account: The per container stats
are crucial for billing purposes and provides the operator a 'progress
bar' equivalent on the container's replication status.
Change-Id: Ia8abcdaf53e466e8d60a957c76e32c2b2c5dc3fa
This change adds a remote HEAD object request before each call to
sync_row.
Currently, container-sync-row attempts to replicate the object
(using PUT) regardless of the existance of the object on the remote side,
thus causing each object to be transferred on the wire several times
(depending on the replication factor)
An alternative to HEAD is to do a conditional PUT (using, 100-continue).
However, this change is more involved and requires upgrade of both the
client and server side clusters to work.
In the Tokyo design summit it was decided to start with the HEAD approach.
Change-Id: I60d982dd2cc79a0f13b0924507cd03d7f9c9d70b
Closes-Bug: #1277223
Since commit "Update container sync to use internal client" get_object is
done using internal_client and not directly on nodes which makes the block
of code to shuffle the nodes redundant.
Change-Id: I45a6dab05f6f87510cf73102b1ed191238209efe
This patch makes a number of changes to enable content-type
metadata to be updated when using the fast-POST mode of
operation, as proposed in the associated spec [1].
* the object server and diskfile are modified to allow
content-type to be updated by a POST and the updated value
to be stored in .meta files.
* the object server accepts PUTs and DELETEs with older
timestamps than existing .meta files. This is to be
consistent with replication that will leave a later .meta
file in place when replicating a .data file.
* the diskfile interface is modified to provide accessor
methods for the content-type and its timestamp.
* the naming of .meta files is modified to encode two
timestamps when the .meta file contains a content-type value
that was set prior to the latest metadata update; this
enables consistency to be achieved when rsync is used for
replication.
* ssync is modified to sync meta files when content-type
differs between local and remote copies of objects.
* the object server issues container updates when handling
POST requests, notifying the container server of the current
immutable metadata (etag, size, hash, swift_bytes),
content-type with their respective timestamps, and the
mutable metadata timestamp.
* the container server maintains the most recently reported
values for immutable metadata, content-type and mutable
metadata, each with their respective timestamps, in a single
db row.
* new probe tests verify that replication achieves eventual
consistency of containers and objects after discrete updates
to content-type and mutable metadata, and that container-sync
sync's objects after fast-post updates.
[1] spec change-id: I60688efc3df692d3a39557114dca8c5490f7837e
Change-Id: Ia597cd460bb5fd40aa92e886e3e18a7542603d01
When container-sync PUTs an object to a destination container
it uses the timestamp from the container row rather than the
actual timestamp of the object being copied. The actual timestamp
of the object can be newer, so the sync'd object may end up with
the right content but at the wrong, older, timestamp.
This patch changes the timestamp sent with the sync'd object
to be that of the actual source object being sent.
Drive-by fix to make code more readable by removing a variable
rename mid-function, fix a typo and remove a redundant function
call.
Change-Id: I800e6de4cdeea289864414980a96f5929281da04
Closes-Bug: #1540884
This change introduces a sync_store which holds only containers that
are enabled for sync. The store is implemented using a directory
structure that resembles that of the containers directory, but has
entries only for containers enabled for sync.
The store is maintained in two ways:
1. Preemptively by the container server when processing
PUT/POST/DELETE operations targeted at containers with
x-container-sync-key / x-container-sync-to
2. In the background using the containers replicator
whenever it processes a container set up for sync
The change updates [1]
[1] http://docs.openstack.org/developer/swift/overview_container_sync.html
Change-Id: I9ae4d4c7ff6336611df4122b7c753cc4fa46c0ff
Closes-Bug: #1476623
contextlib.nested() is missing completely in Python 3.
Since 2.7, we can use multiple context managers in a 'with' statement,
like so:
with thing1() as t1, thing2() as t2:
do_stuff()
Now, if we had some code that needed to nest an arbitrary number of
context managers, there's stuff we could do with contextlib.ExitStack
and such... but we don't. We only use contextlib.nested() in tests to
set up bunches of mocks without crazy-deep indentation, and all that
stuff fits perfectly into multiple-context-manager 'with' statements.
Change-Id: Id472958b007948f05dbd4c7fb8cf3ffab58e2681
Enabled by a new > 0 integer config value, "servers_per_port" in the
[DEFAULT] config section for object-server and/or replication server
configs. The setting's integer value determines how many different
object-server workers handle requests for any single unique local port
in the ring. In this mode, the parent swift-object-server process
continues to run as the original user (i.e. root if low-port binding
is required), binds to all ports as defined in the ring, and forks off
the specified number of workers per listen socket. The child, per-port
servers drop privileges and behave pretty much how object-server workers
always have, except that because the ring has unique ports per disk, the
object-servers will only be handling requests for a single disk. The
parent process detects dead servers and restarts them (with the correct
listen socket), starts missing servers when an updated ring file is
found with a device on the server with a new port, and kills extraneous
servers when their port is found to no longer be in the ring. The ring
files are stat'ed at most every "ring_check_interval" seconds, as
configured in the object-server config (same default of 15s).
Immediately stopping all swift-object-worker processes still works by
sending the parent a SIGTERM. Likewise, a SIGHUP to the parent process
still causes the parent process to close all listen sockets and exit,
allowing existing children to finish serving their existing requests.
The drop_privileges helper function now has an optional param to
suppress the setsid() call, which otherwise screws up the child workers'
process management.
The class method RingData.load() can be told to only load the ring
metadata (i.e. everything except replica2part2dev_id) with the optional
kwarg, header_only=True. This is used to keep the parent and all
forked off workers from unnecessarily having full copies of all storage
policy rings in memory.
A new helper class, swift.common.storage_policy.BindPortsCache,
provides a method to return a set of all device ports in all rings for
the server on which it is instantiated (identified by its set of IP
addresses). The BindPortsCache instance will track mtimes of ring
files, so they are not opened more frequently than necessary.
This patch includes enhancements to the probe tests and
object-replicator/object-reconstructor config plumbing to allow the
probe tests to work correctly both in the "normal" config (same IP but
unique ports for each SAIO "server") and a server-per-port setup where
each SAIO "server" must have a unique IP address and unique port per
disk within each "server". The main probe tests only work with 4
servers and 4 disks, but you can see the difference in the rings for the
EC probe tests where there are 2 disks per server for a total of 8
disks. Specifically, swift.common.ring.utils.is_local_device() will
ignore the ports when the "my_port" argument is None. Then,
object-replicator and object-reconstructor both set self.bind_port to
None if server_per_port is enabled. Bonus improvement for IPv6
addresses in is_local_device().
This PR for vagrant-swift-all-in-one will aid in testing this patch:
https://github.com/swiftstack/vagrant-swift-all-in-one/pull/16/
Also allow SAIO to answer is_local_device() better; common SAIO setups
have multiple "servers" all on the same host with different ports for
the different "servers" (which happen to match the IPs specified in the
rings for the devices on each of those "servers").
However, you can configure the SAIO to have different localhost IP
addresses (e.g. 127.0.0.1, 127.0.0.2, etc.) in the ring and in the
servers' config files' bind_ip setting.
This new whataremyips() implementation combined with a little plumbing
allows is_local_device() to accurately answer, even on an SAIO.
In the default case (an unspecified bind_ip defaults to '0.0.0.0') as
well as an explict "bind to everything" like '0.0.0.0' or '::',
whataremyips() behaves as it always has, returning all IP addresses for
the server.
Also updated probe tests to handle each "server" in the SAIO having a
unique IP address.
For some (noisy) benchmarks that show servers_per_port=X is at least as
good as the same number of "normal" workers:
https://gist.github.com/dbishop/c214f89ca708a6b1624a#file-summary-md
Benchmarks showing the benefits of I/O isolation with a small number of
slow disks:
https://gist.github.com/dbishop/fd0ab067babdecfb07ca#file-results-md
If you were wondering what the overhead of threads_per_disk looks like:
https://gist.github.com/dbishop/1d14755fedc86a161718#file-tabular_results-md
DocImpact
Change-Id: I2239a4000b41a7e7cc53465ce794af49d44796c6
This patch changes container sync to use Internal Client instead
of Direct Client.
In the current design, container sync uses direct_get_object to
get the newest source object(which talks to storage node directly).
This works fine for replication storage policies however in
erasure coding policies, direct_get_object would only return part
of the object(it's encoded as several pieces). Using Internal
Client can get the original object in EC case.
Note that for the container sync put/delete part, it's working in
EC since it's using Simple Client.
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
DocImpact
Change-Id: I91952bc9337f354ce6024bf8392046a1ecf6ecc9
* Get FakeConn ready for expect 100 continue
* Use debug_logger more and with better interfaces
* Fix patch_policies to be less annoying
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: I28c0a3539d994cbb8e6b94d63a23ed4ea6cb956d
Container sync might get stuck without a connection timeout if the remote proxy
is not responding.
This patch sets a default timeout of 5.0 seconds for the connection attempt. The
value is much higher than other connection timeouts inside Swift (0.5); however
there might be a much higher latency to the remote peer, thus playing it safe.
There is also a retry if the attempt timed out.
Note that this setting only applies to the connection request itself. Setting
this timeout does not apply when the remote proxy goes away during a request.
Also added a short test to ensure urlopen is called with the timeout value.
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Change-Id: Ic08a55157fa91fe1316653781adf4d66eead61bc
Partial-Bug: 1419916
Replaced throughout code base & tox'd. Functional as well
as probe tests pass with and without policies defined.
POLICY --> 'X-Storage-Policy'
POLICY_INDEX --> 'X-Backend-Storage-Policy-Index'
Change-Id: Iea3d06de80210e9e504e296d4572583d7ffabeac
Have container sync get its object ring from POLICIES now,
update tests to use policy index from container_info and pass
that along for use in ring selection.
This change also introduced the option of specifiying in the cluster info
which of the relam/cluster's is the current realm/cluster.
DocImpact
Implements: blueprint storage-policies
Change-Id: If57d3b0ff8c395f21c81fda76458bc34fcb23257
Also fixed a bug where SimpleClient would send ?format=json for object
requests, which is not necessary.
Change-Id: If06a7dcebc9de2d7c8b28a046d60b902dae821c1
Container sync had a bug where it'd send out the trailing
"; swift_bytes=xxx" part of the content-type header. That trailing part
is just for internal cluster usage by SLO. Since that needed to be
stripped in two places now, I separated it out to a function that both
spots call.
Change-Id: Ibd6035d7a6b78205344bcc9d98bc1b7a9d463427
If container sync got a non-ClientException and then a ClientException
during a get/put operation, it previously would break in the error
handling.
Change-Id: Ib2fa70270f3ec870bf60d5bbb3e8f514aeeb0927
Summary of the new configuration option:
The cluster operators add the container_sync middleware to their
proxy pipeline and create a container-sync-realms.conf for their
cluster and copy this out to all their proxy and container servers.
This file specifies the available container sync "realms".
A container sync realm is a group of clusters with a shared key that
have agreed to provide container syncing to one another.
The end user can then set the X-Container-Sync-To value on a
container to //realm/cluster/account/container instead of the
previously required URL.
The allowed hosts list is not used with this configuration and
instead every container sync request sent is signed using the realm
key and user key.
This offers better security as source hosts can be faked much more
easily than faking per request signatures. Replaying signed requests,
assuming it could easily be done, shouldn't be an issue as the
X-Timestamp is part of the signature and so would just short-circuit
as already current or as superceded.
This also makes configuration easier for the end user, especially
with difficult networking situations where a different host might
need to be used for the container sync daemon since it's connecting
from within a cluster. With this new configuration option, the end
user just specifies the realm and cluster names and that is resolved
to the proper endpoint configured by the operator. If the operator
changes their configuration (key or endpoint), the end user does not
need to change theirs.
DocImpact
Change-Id: Ie1704990b66d0434e4991e26ed1da8b08cb05a37
All the other daemons do this, and since the out deamon wrapper
scripts pass all the command line options through directly, seems
simple enough to handle them by ignoring.
This is also applied to run_once().
Change-Id: I1df83bdf78f0dc3d911019f67f78301967b5da72
Closes-Bug: #1253891
Signed-off-by: Peter Portante <peter.portante@redhat.com>
Refactor on-disk knowledge out of the object server by pushing the
async update pickle creation to the new DiskFileManager class (name is
not the best, so suggestions welcome), along with the REPLICATOR
method logic. We also move the mount checking and thread pool storage
to the new ondisk.Devices object, which then also becomes the new home
of the audit_location_generator method.
For the object server, a new setup() method is now called at the end
of the controller's construction, and the _diskfile() method has been
renamed to get_diskfile(), to allow implementation specific behavior.
We then hide the need for the REST API layer to know how and where
quarantining needs to be performed. There are now two places it is
checked internally, on open() where we verify the content-length,
name, and x-timestamp metadata, and in the reader on close where the
etag metadata is checked if the entire file was read.
We add a reader class to allow implementations to isolate the WSGI
handling code for that specific environment (it is used no-where else
in the REST APIs). This simplifies the caller's code to just use a
"with" statement once open to avoid multiple points where close needs
to be called.
For a full historical comparison, including the usage patterns see:
https://gist.github.com/portante/5488238
(as of master, 2b639f5, Merge
"Fix 500 from account-quota This Commit
middleware")
--------------------------------+------------------------------------
DiskFileManager(conf)
Methods:
.pickle_async_update()
.get_diskfile()
.get_hashes()
Attributes:
.devices
.logger
.disk_chunk_size
.keep_cache_size
.bytes_per_sync
DiskFile(a,c,o,keep_data_fp=) DiskFile(a,c,o)
Methods: Methods:
*.__iter__()
.close(verify_file=)
.is_deleted()
.is_expired()
.quarantine()
.get_data_file_size()
.open()
.read_metadata()
.create() .create()
.write_metadata()
.delete() .delete()
Attributes: Attributes:
.quarantined_dir
.keep_cache
.metadata
*DiskFileReader()
Methods:
.__iter__()
.close()
Attributes:
+.was_quarantined
DiskWriter() DiskFileWriter()
Methods: Methods:
.write() .write()
.put() .put()
* Note that the DiskFile class * Note that the DiskReader() object
implements all the methods returned by the
necessary for a WSGI app DiskFileOpened.reader() method
iterator implements all the methods
necessary for a WSGI app iterator
+ Note that if the auditor is
refactored to not use the DiskFile
class, see
https://review.openstack.org/44787
then we don't need the
was_quarantined attribute
A reference "in-memory" object server implementation of a backend
DiskFile class in swift/obj/mem_server.py and
swift/obj/mem_diskfile.py.
One can also reference
https://github.com/portante/gluster-swift/commits/diskfile for the
proposed integration with the gluster-swift code based on these
changes.
Change-Id: I44e153fdb405a5743e9c05349008f94136764916
Signed-off-by: Peter Portante <peter.portante@redhat.com>
- When localhost:80 was binding the tests was trying to connect into it.
- To test you can simply run sudo python -m SimpleHTTPServer 80 which
should show :
1.0.0.127.in-addr.arpa - - [06/Aug/2013 14:10:42] code 501, message Unsupported method ('DELETE')
1.0.0.127.in-addr.arpa - - [06/Aug/2013 14:10:42] "DELETE /a/c/o HTTP/1.1" 501 -
(the test was passing since 501 would raise ClientException).
mock delete_object in the fourth test to fix that
- Refactor the code to use mock.patch as well.
Closes-Bug: 1208802
Change-Id: I5ddd4ac3a97879f51cf5883fcfc0fe0f0adaeff6
except x,y: was deprected and is removed in Python 3.x.
Use "except x as y:" instead which works in any Python
version >= 2.6.
Change-Id: I7008c74b807340f3457d3a0c8bd0b83f23169d14
Refactor the various auditors to rely on the audit_location_generator
yielding tuples containing paths with the expected suffix.
We also fix the exception handling for container_sync to not expect a
broker object (since the act of creating a broker object can raise an
exception).
For the object auditor we removed an unneeded check for disk_file
since get_data_file_size() will raise DiskFileNotExist under the same
condition (raises code coverage slightly).
Change-Id: I11d405e629063177ef21543b75e9076da1a03b61
A really simple version of this was in container sync already, and I
needed a more complete version for work I'm doing, and I noticed
https://review.openstack.org/#/c/33405/ was also making use of it.
So, here's a more full version.
If https://review.openstack.org/#/c/33405/ lands before this, I'll
update it accordingly.
Change-Id: Iba66b6a97f65e312e04fdba273e8f4ad1d3e1594
A new configuration parameter is added to /etc/swift/swift.conf
[swift-hash]
swift_hash_path_prefix = 'random unique string'
New installations are advised to set this parameter to a random secret,
which would not be disclosed ouside the organization.
The same secret needs to be used by all swift servers of the same cluster.
Existing installations should set this parameter to an empty string
(the default)
DocImpact
Fixes: Bug #1157454
Change-Id: I63b10d0b7d6dd3f74e0f10bb41b5f240fa03578a
container-sync now skips faulty objects in the first and second rounds.
All replicas try in the second round.
No server will give up until the faulty object suceeds
Fixes: bug #1068423
Change-Id: I0defc174b2ce3796a6acf410a2d2eae138e8193d
If an object server is down, container-sync stops syncing the container
even if the it gets object copies from "up" obejct servers.
Bug 1069910
In case the git history gets mangled, this fix was done almost entirely
by Donagh McCabe <donagh.mccabe@hp.com>.
Change-Id: Ieeadcfeb4e880fe5f08e284d7c12492bf7a29460
- It has been to its own gerrit project.
- direct_client should follow next.
- Implements blueprint clientbindings.
Change-Id: I3bb50c95eba81302bfec71cb7ce5288b85a41dc0
Documentation, including a list of metrics reported and their semantics,
is in the Admin Guide in a new section, "Reporting Metrics to StatsD".
An optional "metric prefix" may be configured which will be prepended to
every metric name sent to StatsD.
Here is the rationale for doing a deep integration like this versus only
sending metrics to StatsD in middleware. It's the only way to report
some internal activities of Swift in a real-time manner. So to have one
way of reporting to StatsD and one place/style of configuration, even
some things (like, say, timing of PUT requests into the proxy-server)
which could be logged via middleware are consistently logged the same
way (deep integration via the logger delegate methods).
When log_statsd_host is configured, get_logger() injects a
swift.common.utils.StatsdClient object into the logger as
logger.statsd_client. Then a set of delegate methods on LogAdapter
either pass through to the StatsdClient object or become no-ops. This
allows StatsD logging to look like:
self.logger.increment('some.metric.here')
and do the right thing in all cases and with no messy conditional logic.
I wanted to use the pystatsd module for the StatsD client, but the
version on PyPi is lagging the git repo (and is missing both the prefix
functionality and timing_since() method). So I wrote my
swift.common.utils.StatsdClient. The interface is the same as
pystatsd.Client, but the code was written from scratch. It's pretty
simple, and the tests I added cover it. This also frees Swift from an
optional dependency on the pystatsd module, making this feature easier
to enable.
There's test coverage for the new code and all existing tests continue
to pass.
Refactored out _one_audit_pass() method in swift/account/auditor.py and
swift/container/auditor.py.
Fixed some misc. PEP8 violations.
Misc test cleanups and refactorings (particularly the way "fake logging"
is handled).
Change-Id: Ie968a9ae8771f59ee7591e2ae11999c44bfe33b2
This commit introduces a new algorithm for assigning partition
replicas to devices. Basically, the ring builder organizes the devices
into tiers (first zone, then IP/port, then device ID). When placing a
replica, the ring builder looks for the emptiest device (biggest
parts_wanted) in the furthest-away tier.
In the case where zone-count >= replica-count, the new algorithm will
give the same results as the one it replaces. Thus, no migration is
needed.
In the case where zone-count < replica-count, the new algorithm
behaves differently from the old algorithm. The new algorithm will
distribute things evenly at each tier so that the replication is as
high-quality as possible, given the circumstances. The old algorithm
would just crash, so again, no migration is needed.
Handoffs have also been updated to use the new algorithm. When
generating handoff nodes, first the ring looks for nodes in other
zones, then other ips/ports, then any other drive. The first handoff
nodes (the ones in other zones) will be the same as before; this
commit just extends the list of handoff nodes.
The proxy server and replicators have been altered to avoid looking at
the ring's replica count directly. Previously, with a replica count of
C, RingData.get_nodes() and RingData.get_part_nodes() would return
lists of length C, so some other code used the replica count when it
needed the number of nodes. If two of a partition's replicas are on
the same device (e.g. with 3 replicas, 2 devices), then that
assumption is no longer true. Fortunately, all the proxy server and
replicators really needed was the number of nodes returned, which they
already had. (Bonus: now the only code that mentions replica_count
directly is in the ring and the ring builder.)
Change-Id: Iba2929edfc6ece89791890d0635d4763d821a3aa