In the ring builder, we place partitions with maximum possible
dispersion across tiers, where a "tier" is region, then zone, then
IP/port,then device. Now, instead of IP/port, just use IP. The port
wasn't really getting us anything; two different object servers on two
different ports on one machine aren't separate failure
domains. However, if someone has only a few machines and is using one
object server on its own port per disk, then the ring builder would
end up with every disk in its own IP/port tier, resulting in bad (with
respect to durability) partition placement.
For example: assume 1 region, 1 zone, 4 machines, 48 total disks (12
per machine), and one object server (and hence one port) per
disk. With the old behavior, partition replicas will all go in the one
region, then the one zone, then pick one of 48 IP/port pairs, then
pick the one disk therein. This gives the same result as randomly
picking 3 disks (without replacement) to store data on; it completely
ignores machine boundaries.
With the new behavior, the replica placer will pick the one region,
then the one zone, then one of 4 IPs, then one of 12 disks
therein. This gives the optimal placement with respect to durability.
The same applies to Ring.get_more_nodes().
Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Change-Id: Ibbd740c51296b7e360845b5309d276d7383a3742
As part of commit efb39a5, the account reaper grew a bind_port
attribute, but it wasn't being converted to int, so naturally "6002"
!= 6002, and it wouldn't reap anything.
The bind_port was only used for determining the local devices. Rather
than fix the code to call int(), this commit removes the need for
bind_port entirely by skipping the port check. If your rings have IPs,
this is the same behavior as pre-efb39a5, and if your rings have
hostnames, this still works.
Change-Id: I7bd18e9952f7b9e0d7ce2dce230ee54c5e23709a
This change modifies the swift-ring-builder and introduces new format
of sub-commands (search, list_parts, set_weight, set_info and remove)
in addition to add sub-command so that hostnames can be used in place
of an ip-address for the sub-commands.
The account reaper, container synchronizer, and replicators were also
updated so that they still have a way to identify a particular device
as being "local".
Previously this was Change-Id:
Ie471902413002872fc6755bacd36af3b9c613b74
Change-Id: Ieff583ffb932133e3820744a3f8f9f491686b08d
Co-Authored-By: Alex Pecoraro <alex.pecoraro@emc.com>
Implements: blueprint allow-hostnames-for-nodes-in-rings
Output a dispersion report that shows how many parts have each replica count
at each tier along with some additional context. Also the max_dispersion is a
good canary for what a reasonable overload might be.
Also display a warning on rebalance if the ring's dispersion is sub-optimal.
The primitive form of the dispersion graph is cached on the builder, but the
dispersion command will build it on the fly if you have a ring that was last
rebalanced before the change.
Also add --force option to rebalance to make it write a ring even if less than
1% of parts moved.
Try to clarify some dispersion and balance a little bit in the ring section of
the architectural overview.
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Darrell Bishop <darrell@swiftstack.com>
Change-Id: I7696df25d092fac56588080722e0a4167ed2c824
The number of shown replicas in the partition list might differ from the
actual number of replicas (as shown in the bugreport).
This codes simply iterates for the builder._replica2part2dev and
remembers the number of replicas for each partition.
The code to find the partitions was moved to swift/common/ring/utils.py
to make it easier to test, and a test to ensure the correct number of
replicas is returned was added.
Closes-Bug: 1370070
Change-Id: Id6a3ed437bb86df2f43f8b0b79aa8ccb50bbe13e
Dramatic part of RingBuilder.search_devs which parse a complex format
of a search device string moved to the swift-ring-builder script.
Instead, the search_devs has a simple interface to search devices.
blueprint argparse-in-swift-ring-builder
Change-Id: If3dd77b297b474fb9a058e4693fef2dfb11fca3d
The region is one level above the zone; it is intended to represent a
chunk of machines that is distant from others with respect to
bandwidth and latency.
Old rings will default to having all their devices in region 1. Since
everything is in the same region by default, the ring builder will
simply distribute across zones as it did before, so your partition
assignment won't move because of this change. If you start adding
devices in other regions, of course, the assignment will change to
take that into account.
swift-ring-builder still accepts the same syntax as before, but will
default added devices to region 1 if no region is specified.
Examples:
$ swift-ring-builder foo.builder add r2z1-1.2.3.4:555/sda
$ swift-ring-builder foo.builder add r1z3-1.2.3.4:555/sda
$ swift-ring-builder foo.builder add z3-1.2.3.4:555/sda
Also, some updates to ring-overview doc.
Change-Id: Ifefbb839cdcf033e6c9201fadca95224c7303a29
unit test for it.
Some mistakes is in original docstring of that method. There's no unit
test for two methods in swift.common.ring.utils.
Fixes: bug #1070621
Change-Id: I6f4f211ea67d7fb8ccfe659f30bb0f5d394aca6b