57 Commits

Author SHA1 Message Date
Samuel Merritt
aa82d2cba8 Save ring builder if dispersion changes
There are cases where a rebalance improves dispersion, but doesn't
improve balance. This is because the balance of a ring builder is
taken to be the balance of its least-balanced device, so if there's a
device that has no partitions, wants some, but can't get them, then
we'll never save the ring builder even if every other device in the
ring got better.

We can detect this situation by looking at the dispersion number; if it
changes, then the rebalance needs to be saved in order to continue to
make progress.

Partial-Bug: #1697543

Change-Id: Ie239b958fc7e0547ffda2bebf61546bd4ef3d829
2017-12-15 15:04:00 -08:00
Christopher Bartz
84ea58b8c8 Ringbuilder: Forbid writing empty rings
Swift definitely can't make any use of empty rings, so it should
not be allowed to write them.

Replace warning with an error message & error exit.

Change-Id: I3a1b86368d363e67d1f91d7d8af4b391a0a53fff
Closes-Bug: #1396841
2017-12-07 14:50:11 +01:00
Samuel Merritt
728b4ba140 Add checksum to object extended attributes
Currently, our integrity checking for objects is pretty weak when it
comes to object metadata. If the extended attributes on a .data or
.meta file get corrupted in such a way that we can still unpickle it,
we don't have anything that detects that.

This could be especially bad with encrypted etags; if the encrypted
etag (X-Object-Sysmeta-Crypto-Etag or whatever it is) gets some bits
flipped, then we'll cheerfully decrypt the cipherjunk into plainjunk,
then send it to the client. Net effect is that the client sees a GET
response with an ETag that doesn't match the MD5 of the object *and*
Swift has no way of detecting and quarantining this object.

Note that, with an unencrypted object, if the ETag metadatum gets
mangled, then the object will be quarantined by the object server or
auditor, whichever notices first.

As part of this commit, I also ripped out some mocking of
getxattr/setxattr in tests. It appears to be there to allow unit tests
to run on systems where /tmp doesn't support xattrs. However, since
the mock is keyed off of inode number and inode numbers get re-used,
there's lots of leakage between different test runs. On a real FS,
unlinking a file and then creating a new one of the same name will
also reset the xattrs; this isn't the case with the mock.

The mock was pretty old; Ubuntu 12.04 and up all support xattrs in
/tmp, and recent Red Hat / CentOS releases do too. The xattr mock was
added in 2011; maybe it was to support Ubuntu Lucid Lynx?

Bonus: now you can pause a test with the debugger, inspect its files
in /tmp, and actually see the xattrs along with the data.

Since this patch now uses a real filesystem for testing filesystem
operations, tests are skipped if the underlying filesystem does not
support setting xattrs (eg tmpfs or more than 4k of xattrs on ext4).

References to "/tmp" have been replaced with calls to
tempfile.gettempdir(). This will allow setting the TMPDIR envvar in
test setup and getting an XFS filesystem instead of ext4 or tmpfs.

THIS PATCH SIGNIFICANTLY CHANGES TESTING ENVIRONMENTS

With this patch, every test environment will require TMPDIR to be
using a filesystem that supports at least 4k of extended attributes.
Neither ext4 nor tempfs support this. XFS is recommended.

So why all the SkipTests? Why not simply raise an error? We still need
the tests to run on the base image for OpenStack's CI system. Since
we were previously mocking out xattr, there wasn't a problem, but we
also weren't actually testing anything. This patch adds functionality
to validate xattr data, so we need to drop the mock.

`test.unit.skip_if_no_xattrs()` is also imported into `test.functional`
so that functional tests can import it from the functional test
namespace.

The related OpenStack CI infrastructure changes are made in
https://review.openstack.org/#/c/394600/.

Co-Authored-By: John Dickinson <me@not.mn>

Change-Id: I98a37c0d451f4960b7a12f648e4405c6c6716808
2017-11-03 13:30:05 -04:00
Jan Zerebecki
747b9d9286 Fix swift-ring-builder set_weight with >1 device
When iterating over the (device, weight) tuples do not carry over the
device from the previous iteration.

Closes-Bug: 1454433
Change-Id: Iba82519b0b2bc80e2c1abbed308b651c4da4b06a
2017-10-06 12:53:59 +02:00
Alistair Coles
079be1d5cd Mock RingBuilder.rebalance when testing ringbuilder cli warnings
Use mock to force explicit RingBuilder rebalance results so that the
test is focussed on the ringbuilder.py rebalance command behavior
when balance is (or is not) achieved. Avoids assumptions about
RingBuilder behavior.

Change-Id: I242ffc2f1a4f7b69a679832a65790223642dcea8
Closes-Bug: #1499015
2017-09-11 11:08:37 -06:00
Jenkins
e94b383655 Merge "Add support to increase object ring partition power" 2017-07-05 14:40:42 +00:00
Clay Gerrard
58d7812596 fix flakey time for test_default_sorted_output
Change-Id: Ib7f0c22336e8354d4f46e2343149495bef382f9c
2017-06-30 16:45:43 -07:00
Mingyu Li
a1134e4aa2 Order devices in the output of swift-ring-builder
After the change to reuse device id's [1], the order of devices in the
output of swift-ring-builder is confusing. This patch list the devices
in order of (region, zone, ip, device).
The effect of this patch is as illustrated in [2].

This patch also partially fix Bug 1545016.

1. https://review.openstack.org/#/c/265461/
2. https://github.com/MicrowiseOnGitHub/tempfiles/blob/master/reorder_ring_output

Change-Id: I564ed1b8d0cd4a6250649689b1bce7ad3574fe57
Partial-Bug: 1545016
Closes-Bug: 1536743
2017-06-22 16:06:48 -07:00
Christian Schwede
e1140666d6 Add support to increase object ring partition power
This patch adds methods to increase the partition power of an existing
object ring without downtime for the users using a 3-step process. Data
won't be moved to other nodes; objects using the new increased partition
power will be located on the same device and are hardlinked to avoid
data movement.

1. A new setting "next_part_power" will be added to the rings, and once
the proxy server reloaded the rings it will send this value to the
object servers on any write operation. Object servers will now create a
hard-link in the new location to the original DiskFile object. Already
existing data will be relinked using a new tool in the new locations
using hardlinks.

2. The actual partition power itself will be increased. Servers will now
use the new partition power to read from and write to. No longer
required hard links in the old object location have to be removed now by
the relinker tool; the relinker tool reads the next_part_power setting
to find object locations that need to be cleaned up.

3. The "next_part_power" flag will be removed.

This mostly implements the spec in [1]; however it's not using an
"epoch" as described there. The idea of the epoch was to store data
using different partition powers in their own namespace to avoid
conflicts with auditors and replicators as well as being able to abort
such an operation and just remove the new tree.  This would require some
heavy change of the on-disk data layout, and other object-server
implementations would be required to adopt this scheme too.

Instead the object-replicator is now aware that there is a partition
power increase in progress and will skip replication of data in that
storage policy; the relinker tool should be simply run and afterwards
the partition power will be increased. This shouldn't take that much
time (it's only walking the filesystem and hardlinking); impact should
be low therefore. The relinker should be run on all storage nodes at the
same time in parallel to decrease the required time (though this is not
mandatory). Failures during relinking should not affect cluster
operations - relinking can be even aborted manually and restarted later.

Auditors are not quarantining objects written to a path with a different
partition power and therefore working as before (though they are reading
each object twice in the worst case before the no longer needed hard
links are removed).

Co-Authored-By: Alistair Coles <alistair.coles@hpe.com>
Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>

[1] https://specs.openstack.org/openstack/swift-specs/specs/in_progress/
increasing_partition_power.html

Change-Id: I7d6371a04f5c1c4adbb8733a71f3c177ee5448bb
2017-06-15 15:08:48 -07:00
Kota Tsuyuzaki
d40031b46f Add Composite Ring Functionality
* Adds a composite_builder module which provides the functionality to
  build a composite ring from a number of component ring builders.

* Add id to RingBuilder to differentiate rings in composite.
  A RingBuilder now gets a UUID when it is saved to file if
  it does not already have one. A RingBuilder loaded from
  file does NOT get a UUID assigned unless it was previously persisted in
  the file. This forces users to explicitly assign an id to
  existing ring builders by saving the state back to file.

  The UUID is included in first line of the output from:

    swift-ring-builder <builder-file>

Background:

This is another implementation for Composite Ring [1]
to enable better dispersion for global erasure coded cluster.

The most significant difference from the related-change [1] is that this
solution attempts to solve the problem as an offline tool rather than
dynamic compositing on the running servers. Due to the change, we gain
advantages such as:

- Less code and being simple
- No complex state validation on the running server
- Easy deployments with an offline tool

This patch does not provide a command line utility for managing
composite rings. The interface for such a tool is still under
discussion; this patch provides the enabling functionality first.

Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>

[1] Related-Change: I80ef36d3ac4d4b7c97a1d034b7fc8e0dc2214d16
Change-Id: I0d8928b55020592f8e75321d1f7678688301d797
2017-05-15 16:42:00 -07:00
Christian Schwede
67d545e90f Fix timing test error when rebalancing
The reported timing can be 00:59:59 sometimes, but is still valid. This
will fail in the tests, as seen in [1].

This patch fixes this by mocking the current time, ensuring that the
first two rebalances happen at the same time.

[1] http://logs.openstack.org/97/337297/32/check/gate-swift-python27-ubuntu-xenial/46203f1/console.html#_2017-02-08_07_28_42_589176

Change-Id: I0fd43d5bb13d0e88126f4f6ba14fb87faab6df9c
2017-02-04 05:59:15 +00:00
Christian Schwede
47901ea4d7 Rebalance with min_part_seconds_left > 0
As described in bug #1558754 a rebalance after removing a device fails
if min_part_seconds_left > 0, despite the note that a rebalance should
remove partitions from removed devices on the next run.

This patch skips the early exit, and lets the builder itself handling
the rebalance. Partitions that shouldn't move now are still not moved
(except for removed devices), and there is still a warning if no
partition has been moved due to the fact that min_part_hours did not yet
pass.

A small test has been added to ensure rebalancing after removing a
device works without using the --force option (tests fails on current
master). Another test ensures that a rebalance after a recent change
(for example increasing a device's weight) does not move partitions and
still reports the former warning message.

Closes-Bug: 1558754
Change-Id: I083022d066338cbe6234bab491c7a8e8e0a7b517
2017-01-17 11:36:57 +00:00
kellerbr
aa17ae1b04 Allow Hacking H401, H403 check
Currently swift ignores a lot of the Hacking style guide. This patch
enables the H401 and H403 checks and fixes any violations. With this
we can get a little closer to following the OpenStack style guidelines.

Change-Id: I5109a052f2ceb2e6a9a174cded62f4231156d39b
2017-01-04 17:23:46 +00:00
Jenkins
95322d9830 Merge "Use more specific asserts in test/unit/cli tests" 2016-08-30 00:44:02 +00:00
Jenkins
6d472f8a7a Merge "swift-ring-builder output corrected for ipv6" 2016-08-11 17:35:13 +00:00
Gábor Antal
488f88e30a Use more specific asserts in test/unit/cli tests
I changed asserts with more specific assert methods.
e.g.: from assertTrue(sth == None) to assertIsNone(*) or
assertTrue(isinstance(inst, type)) to assertIsInstace(inst, type)
or assertTrue(not sth) to assertFalse(sth).

The code gets more readable, and a better description will be shown on fail.

Change-Id: I39305808ad2349dc11a42261b41dbb347ac0618a
2016-08-03 12:19:40 +00:00
Nandini Tata
99a6f915ff swift-ring-builder output corrected for ipv6
Adjusted width of ip and port columns in swift-ring-builder command
output to dynamically span to the longest ip or the longest port in
the devices list. Also combined the port and ip address columns for
better visual clarity. Took care of ipv6 format [ipv6]:port

Modified the corresponding test case with expected output.

Change-Id: I65837f8fed70be60b53d5a817a4ce529ad0f070e
Closes-Bug: #1567105
2016-07-30 15:28:11 +00:00
cheng
c1c18da82c check _last_part_moves when pretend_min_part_hours_passed
pretend_min_part_hours_passed do things like this:
self._last_part_moves[part] = 0xff

this will throw exception if self._last_part_moves is None.

this patch is to check self._last_part_moves to prevent exception.
Closes-bug: #1578835

Change-Id: Ic83c7a338b45bfcf61f5ab6100e6db335c3fa81a
2016-07-26 00:58:06 +00:00
Cheng Li
f337421273 Change assertTrue to assertEqual
In test_ringbuilder.py, there is one assertTrue should be
replaced with assertEqual.

Change-Id: I9a0e4a7363a5e16cc9b6df045953dfbb4f9dbd07
Closes-bug: #1604320
2016-07-19 17:29:28 +08:00
Christian Schwede
e5a6d45882 Add ringbuilder tests for --yes option
Also added a Timeout class to test.unit to wrap possible long-running
functions. For example, if there is some regression and the "--yes"
argument is no longer evaluated correctly and the test excepts some
keyboard input, it will be terminated after a few seconds to ensure
there is no long-running blocker on the gate.

Change-Id: I07b17d21d5af7fcc594ce5319ae2b6f7f58df4bb
2016-07-07 14:49:43 +02:00
Christian Schwede
f6b0b75a25 Make test_ringbuilder less brittle
If one has an object.builder file in the current directory and runs
test_ringbuilder, it will fail with an irritating error. That's because
test_use_ringfile_as_builderfile doesn't use self.tmpfile, but
object.builder - and that one might exist in the local directory.

This patch changes this, using self.tmpfile as argument name.

Closes-Bug: 1590356
Change-Id: I4b3287a36e8a5e469eb037128427dc7867910e53
2016-06-08 10:07:44 +00:00
Ondřej Nový
a54095e562 swift-ring-builder --yes option
This option assume a yes response to all questions. It is usefull for
scripts.

Change-Id: I28ca1a44507e0f1265afd36e6ac1e7c6c176428f
2016-05-31 17:11:13 +02:00
Shashirekha Gundur
cf48e75c25 change default ports for servers
Changing the recommended ports for Swift services
from ports 6000-6002 to unused ports 6200-6202;
so they do not conflict with X-Windows or other services.

Updated SAIO docs.

DocImpact
Closes-Bug: #1521339
Change-Id: Ie1c778b159792c8e259e2a54cb86051686ac9d18
2016-04-29 14:47:38 -04:00
Jenkins
9282a9bae4 Merge "Fix ringbuilder tests" 2016-03-21 09:33:58 +00:00
Kota Tsuyuzaki
6d8be59fce Fix ringbuilder tests
Some of tests in test/unit/cli/test_ringbuilder doesn't assert
the exit code and unfortunately some of these passed even if the
statement fails for the assertion actually.

This patch enables to assert the exit code from ringbuider and
fixes some code/test bugs I noticed.

Change-Id: I18fa675ba8a90678e2b5ccb5f90eafab01d22787
2016-02-29 08:03:45 -08:00
Takashi Kajinami
bd93d44bb4 Make sure all temp files get deleted in test_ringbuilder
This patch makes test_ringbuilder create a temporal directory,
run ring builder commands under it, and delete it after testing
for each test cases, to fix temp file leaking.

Change-Id: I6f59fe095ea6485af0e60b5a8e8fc3892e0a0f90
2016-02-29 18:21:46 +09:00
Ondřej Nový
02c06585e6 Renamed variable for better code readability
Change-Id: I22d8db0dd9edc39672fc9997895a24f669975e15
2016-01-29 22:49:16 +01:00
Peter Lisák
f56f29ef7a Add info about state of ring file to default command.
Try to find ring file, load and compare it with builder file, then show result state.
Examples:
Ring file object.ring.gz not found, probably it hasn't been written yet
Ring file object.ring.gz is up-to-date
Ring file object.ring.gz is obsolete
Ring file object.ring.gz is invalid: ValueError('string length not a multiple of item size',)

Change-Id: I4d769aa5fe1c2b1167ec088aa372874f7d13ae48
2016-01-18 16:54:14 +00:00
Ben Martin
1f3304c515 Print min_part_hours lockout time remaining
swift-ring-builder currently only displays min_part_hours and
not the amount of time remaining before a rebalance can occur.
This information is readily available and has been displayed
as a quality of life improvement.

Additionally, a bug where the time since the last rebalance
was always updated when rebalance was called regardless of
if any partitions were reassigned. This can lead to partitions
being unable to be reassigned as they never age according to
the time since last rebalance.

Change-Id: Ie0e2b5e25140cbac7465f31a26a4998beb3892e9
Closes-Bug: #1526017
2016-01-11 10:58:38 -06:00
Paul Dardeau
7f4139bc26 Added unit tests for ringbuilder command-line utility
Added new unit tests:

test_add_device_old_missing_region
test_create_ring_number_of_arguments
test_add_duplicate_devices
test_rebalance_with_seed
test_set_overload_number_of_arguments
test_main_no_arguments
test_main_single_argument
test_main_with_safe

Modified existing unit tests to create sample ring at start of test.
This change was needed to have unit tests run correctly and demonstrate
code coverage.

test_unknown
test_search_device_number_of_arguments
test_list_parts_number_of_arguments
test_set_weight_number_of_arguments
test_set_info_number_of_arguments
test_remove_device_number_of_arguments
test_set_min_part_hours_number_of_arguments
test_set_replicas_number_of_arguments
test_set_replicas_invalid_value

Updates to handled nested mocks.
Updates to handle no exception case when SystemExit is expected.
PEP8 corrections.

Moved new tests from try blocks to use of assertRaises or call to
run_srb using exp_results with specified exit codes.

Updated run_srb to accept a dictionary of expected results. Specifically,
look for 'valid_exit_codes' to test, default to (0,1).

Change-Id: I4cf3f5f055a9babba140c68a9c7ff90b9c50ea62
2015-11-11 15:09:40 +00:00
John Dickinson
d755f5b520 suppress warning output in a unit test
test_write_builder_after_device_removal() wasn't setting a
default min_part_hours so a warnign was printed. Explicitly
adding a min_part_hours suppresses the warning

Change-Id: I6f234b72c34e066abb91f28e6eacf50e29be8842
2015-11-09 22:03:37 -08:00
Jenkins
12d08d6a6c Merge "Device marked to be removed in info about the ring." 2015-11-06 03:00:40 +00:00
Peter Lisák
febdd6c1b4 Device marked to be removed in info about the ring.
Showing devices with 'DEL' mark if will be removed next rebalance.

Change-Id: I171aa8658b1c4ac1689ab9532fe65d114567baa7
2015-10-26 13:26:21 +01:00
Lisak, Peter
71993d84e8 swift-ring-builder can't remove a device with zero weight
If a device with 0 weight is tried to remove, the following rebalance
does not write changes into builder file.

Scenario:
$ swift-ring-builder object.builder set_weight --id 1 0.00
$ swift-ring-builder object.builder rebalance
Wait for moving files out of the device id=1.
$ swift-ring-builder object.builder remove --id 1
$ swift-ring-builder object.builder rebalance
In fact, the device id=1 is not removed after rebalance (must be --force used).

Change-Id: Iad5a444023eae9882a3addd7f119ff4d18559ddd
2015-10-20 16:13:28 +02:00
Jenkins
fdc8828e85 Merge "py3: Replace basestring with six.string_types" 2015-10-13 10:56:05 +00:00
Emile Snyder
92767f28d6 Fix 'swift-ring-builder write_builder' after you remove a device
clayg already posted the code fix in the bug, but noted it needs a test.

Closes-Bug: #1487280
Change-Id: I07317754afac7165baac4e696f07daeba2e72adc
2015-10-12 08:17:49 -07:00
Victor Stinner
84f0a54445 py3: Replace basestring with six.string_types
The builtin basestring type was removed in Python 3. Replace it with
six.string_types which works on Python 2 and Python 3.

Change-Id: Ib92a729682322cc65b41050ae169167be2899e2c
2015-10-09 22:20:03 +02:00
Clay Gerrard
5070869ac0 Validate against duplicate device part replica assignment
We should never assign multiple replicas of the same partition to the
same device - our on-disk layout can only support a single replica of a
given part on a single device.  We should not do this, so we validate
against it and raise a loud warning if this terrible state is ever
observed after a rebalance.

Unfortunately currently there's a couple not necessarily uncommon
scenarios which will trigger this observed state today:

 1. If we have less devices than replicas
 2. If a server or zones aggregate device weight make it the most
    appropriate candidate for multiple replicas and you're a bit unlucky

Fixing #1 would be easy, we should just not allow that state anymore.
Really we never did - if you have a 3 replica ring with one device - you
have one replica.  Everything that iter_nodes'd would de-dupe.  We
should just be insisting that you explicitly acknowledge your replica
count with set_replicas.

I have been lost in the abyss for days searching for a general solutions
to #2.  I'm sure it exists, but I will not have wrestled it to
submission by RC1.  In the meantime we can eliminate a great deal of the
luck required simply by refusing to place more than one replica of a
part on a device in assign_parts.

The meat of the change is a small update to the .validate method in
RingBuilder.  It basically unrolls a pre-existing (part, replica) loop
so that all the replicas of the part come out in order so that we can
build up the set of dev_id's for which all the replicas of a given part
are assigned part-by-part.

If we observe any duplicates - we raise a warning.

To clean the cobwebs out of the rest of the corner cases we're going to
delay get_required_overload from kicking in until we achive dispersion,
and a small check was added when selecting a device subtier to validate
if it's already being used - picking any other device in the tier works
out much better.  If no other devices are available in the tier - we
raise a warning.  A more elegant or optimized solution may exist.

Many unittests did not meet the criteria #1, but the fix was straight
forward after being identified by the pigeonhole check.

However, many more tests were affected by #2 - but again the fix came to
be simply adding more devices.  The fantasy that all failure domains
contain at least replica count devices is prevalent in both our ring
placement algorithm and it's tests.  These tests were trying to
demonstrate some complex characteristics of our ring placement algorithm
and I believe we just got a bit too carried away trying to find the
simplest possible example to demonstrate the desirable trait.  I think
a better example looks more like a real ring - with many devices in each
server and many servers in each zone - I think more devices makes the
tests better.  As much as possible I've tried to maintain the original
intent of the tests - when adding devices I've either spread the weight
out amongst them or added proportional weights to the other tiers.

I added an example straw man test to validate that three devices with
different weights in three different zones won't blow up.  Once we can
do that without raising warnings and assigning duplicate device part
replicas - we can add more.  And more importantly change the warnings to
errors - because we would much prefer to not do that #$%^ anymore.

Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Related-Bug: #1452431
Change-Id: I592d5b611188670ae842fe3d030aa3b340ac36f9
2015-10-02 16:42:25 -07:00
Jenkins
6444f9b16b Merge "Make swift-ring-builder filename usage more consistent" 2015-08-19 01:20:57 +00:00
Christian Schwede
eeb0fa40a1 Make swift-ring-builder filename usage more consistent
Sometimes the given argument is internally altered and another filename is used
without a note to the operator. Even worse, a given .ring.gz filename is
sometimes written out as builder file, without updating the corresponding
.builder file.

There is already a method to parse the given argv and return the name of the
builder and ring file. However, it's rarely used and no warning is given to the
user if it is altered. This patch uses the already parsed builder and ring file
name instead of argv[1], and also adds a note to the user if the used filename
is differently to the one given as argument.

Closes-Bug: 1482096
Change-Id: I2f8ef23aeab8b07caaa799f7dcd57e684b4b2ad2
2015-08-17 16:30:49 -05:00
janonymous
f449e91472 pep8 fix: assertEquals -> assertEqual
assertEquals is deprecated in py3, fixing in:
dir: test/unit/cli/*

Change-Id: I9a2fc1f717beafd5fa8408942046e310e8de0318
2015-08-05 22:32:02 +05:30
Clay Gerrard
e2f69138bf Fix string formatting in dispersion cli command
... and add a basic test that would have prevented the regression

Change-Id: I4c5f643ee291dcc1397ca951450459d8b8ad0bbd
2015-07-24 21:24:55 -07:00
Victor Stinner
6e70f3fa32 Get StringIO and cStringIO from six.moves
* replace "from cStringIO import StringIO"
  with "from six.moves import cStringIO as StringIO"
* replace "from StringIO import StringIO"
  with "from six import StringIO"
* replace "import cStringIO" and "cStringIO.StringIO()"
  with "from six import moves" and "moves.cStringIO()"
* replace "import StringIO" and "StringIO.StringIO()"
  with "import six" and "six.StringIO()"

This patch was generated by the stringio operation of the sixer tool:
https://pypi.python.org/pypi/sixer

Change-Id: Iacba77fec3045f96773d1090c0bd48613729a561
2015-07-15 16:56:33 +02:00
Samuel Merritt
ccf0758ef1 Add ring-builder analyzer.
This is a tool to help developers quantify changes to the ring
builder. It takes a scenario (JSON file) describing the builder's
basic parameters (part_power, replicas, etc.) and a number of
"rounds", where each round is a set of operations to perform on the
builder. For each round, the operations are applied, and then the
builder is rebalanced until it reaches a steady state.

The idea is that a developer observes the ring builder behaving
suboptimally, writes a scenario to reproduce the behavior, modifies
the ring builder to fix it, and references the scenario with the
commit so that others can see that things have improved.

I decided to write this after writing my fourth or fifth hacky one-off
script to reproduce some bad behavior in the ring builder.

Change-Id: I114242748368f142304aab90a6d99c1337bced4c
2015-07-02 08:16:03 -07:00
Christian Schwede
d3213fb1fe Check if device name is valid when adding to the ring
Currently device names can be empty or start and/or end with spaces.
This can create unexpected results, for example these three commands
are all valid:

swift-ring-builder account.builder add "r1z1-127.0.0.1:6000/" 1
swift-ring-builder account.builder add "r1z1-127.0.0.1:6000/sda " 1
swift-ring-builder account.builder add "r1z1-127.0.0.1:6000/ meta" 1

This patch validates device names and prevents empty names or names
starting and/or ending with spaces.

Also fixed the test "test_warn_at_risk" - the test passed if the
exception was not raised.

Closes-Bug: 1438579

Change-Id: I811b0eae7db503279e6429d985275bbab8b29c9f
2015-04-14 13:15:15 -07:00
Samuel Merritt
8d3b3b2ee0 Add some debug output to the ring builder
Sometimes, I get handed a builder file in a support ticket and a
question of the form "why is the balance [not] doing $thing?". When
that happens, I add a bunch of print statements to my local
swift/common/ring/builder.py, figure things out, and then delete the
print statements. This time, instead of deleting the print statements,
I turned them into debug() calls and added a "--debug" flag to the
rebalance command in hopes that someone else will find it useful.

Change-Id: I697af90984fa5b314ddf570280b4585ba0ba363c
2015-03-30 17:47:28 -07:00
Jenkins
bc7c496f71 Merge "Allow hostnames for nodes in Rings" 2015-02-10 04:32:38 +00:00
Hisashi Osanai
efb39a5665 Allow hostnames for nodes in Rings
This change modifies the swift-ring-builder and introduces new format
of sub-commands (search, list_parts, set_weight, set_info and remove)
in addition to add sub-command so that hostnames can be used in place
of an ip-address for the sub-commands.
The account reaper, container synchronizer, and replicators were also
updated so that they still have a way to identify a particular device
as being "local".

Previously this was Change-Id:
Ie471902413002872fc6755bacd36af3b9c613b74

Change-Id: Ieff583ffb932133e3820744a3f8f9f491686b08d
Co-Authored-By: Alex Pecoraro <alex.pecoraro@emc.com>
Implements: blueprint allow-hostnames-for-nodes-in-rings
2015-02-02 05:06:03 +09:00
sarvesh-ranjan
d8fdbc2b2d Typos fixed
Change-Id: I2c216a870ce299039dec9948dcdef3de0721b4da
2015-01-29 18:20:31 -08:00
Jenkins
df529a225f Merge "Allow set_overload to take value as percent" 2015-01-29 06:19:22 +00:00