Fix intermittent problem in part_swapping test

There is an intermittent failure in the test_part_swapping_problem
test found in test/unit/common/ring/test_builder.py.

The test does a rebalance, then changes the ring to test a specific
problem, does some housekeeping and then rebalances again. The problem
is the ringbuilder keeps track of where in the ring it started the
last ring rebalance, saved in `_last_part_gather_start`.

On a rebalance, or more specifically in `_gather_parts_for_balance` we
then we will start somewhere on the other side of the ring with:

  quarter_turn = (self.parts // 4)
  random_half = random.randint(0, self.parts / 2)
  start = (self._last_part_gather_start + quarter_turn +
                 random_half) % self.parts

Because we don't reset `_last_part_gather_start` when we change the ring
in the test, there is edge case where if we are unlucky during both
rebalances whereby both calls to randint returns a relatively large
number pushes the start of the second rebalance to the wrong side
of the ring. Actually it's more problematic, and only 1 large random
and a one in the middle will cause it, maybe pictures help:

  rabalance 1 (r1): quarter_turn = 4, random_half = 5
  rebalance 2 (r2): quarter_turn = 4, random_half = 3

                                     r1                   r2
                                      |                    |
                                      v                    v
  array('H', [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
  array('H', [1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 2, 2, 2, 3, 3, 3]),
  array('H', [2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 3, 4, 4, 4, 4, 4])]

Now when gathering for rebalance 2 it'll pick:

  array('H', [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, X]),
  array('H', [X, X, 1, 1, 2, 2, 2, 3, 3, 3, 2, 2, 2, 3, 3, 3]),
  array('H', [2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 3, 4, 4, 4, 4, 4])]

Which can cause the 3 attempts to gather and rebalance to be used up.
This causes the intermittent failure seen in the bug.

This patch solves this by resetting `_gather_parts_for_balance` to 0
while we tidy up the ring change. Meaning we'll always start on the
correct side of the ring.

Change-Id: I0d3a69620d4734091dfa516efd0d6b2ed87e196b
Closes-Bug: #1724356
This commit is contained in:
Matthew Oliver 2017-12-19 05:47:20 +00:00
parent b86bf15a64
commit a7da223262

View File

@ -1072,6 +1072,11 @@ class TestRingBuilder(unittest.TestCase):
new_dev_parts[dev_id] += 1
for dev in rb._iter_devs():
dev['parts'] = new_dev_parts[dev['id']]
# reset the _last_part_gather_start otherwise
# there is a chance it'll unluckly wrap and try and
# move one of the device 1's from replica 2
# causing the intermitant failure in bug 1724356
rb._last_part_gather_start = 0
rb.pretend_min_part_hours_passed()
rb.rebalance()