Clay Gerrard 7035639dfd Put part-replicas where they go
It's harder than it sounds.  There was really three challenges.

Challenge #1 Initial Assignment
===============================

Before starting to assign parts on this new shiny ring you've
constructed, maybe we'll pause for a moment up front and consider the
lay of the land.  This process is called the replica_plan.

The replica_plan approach is separating part assignment failures into
two modes:

 1) we considered the cluster topology and it's weights and came up with
    the wrong plan

 2) we failed to execute on the plan

I failed at both parts plenty of times before I got it this close.  I'm
sure a counter example still exists, but when we find it the new helper
methods will let us reason about where things went wrong.

Challenge #2 Fixing Placement
=============================

With a sound plan in hand, it's much easier to fail to execute on it the
less material you have to execute with - so we gather up as many parts
as we can - as long as we think we can find them a better home.

Picking the right parts for gather is a black art - when you notice a
balance is slow it's because it's spending so much time iterating over
replica2part2dev trying to decide just the right parts to gather.

The replica plan can help at least in the gross dispersion collection to
gather up the worst offenders first before considering balance.  I think
trying to avoid picking up parts that are stuck to the tier before
falling into a forced grab on anything over parts_wanted helps with
stability generally - but depending on where the parts_wanted are in
relation to the full devices it's pretty easy pick up something that'll
end up really close to where it started.

I tried to break the gather methods into smaller pieces so it looked
like I knew what I was doing.

Going with a MAXIMUM gather iteration instead of balance (which doesn't
reflect the replica_plan) doesn't seem to be costing me anything - most
of the time the exit condition is either solved or all the parts overly
aggressively locked up on min_part_hours.  So far, it mostly seemds if
the thing is going to balance this round it'll get it in the first
couple of shakes.

Challenge #3 Crazy replica2part2dev tables
==========================================

I think there's lots of ways "scars" can build up a ring which can
result in very particular replica2part2dev tables that are physically
difficult to dig out of.  It's repairing these scars that will take
multiple rebalances to resolve.

... but at this point ...

... lacking a counter example ...

I've been able to close up all the edge cases I was able to find.  It
may not be quick, but progress will be made.

Basically my strategy just required a better understanding of how
previous algorithms were able to *mostly* keep things moving by brute
forcing the whole mess with a bunch of randomness.  Then when we detect
our "elegant" careful part selection isn't making progress - we can fall
back to same old tricks.

Validation
==========

We validate against duplicate part replica assignment after rebalance
and raise an ERROR if we detect more than one replica of a part assigned
to the same device.

In order to meet that requirement we have to have as many devices as
replicas, so attempting to rebalance with too few devices w/o changing
your replica_count is also an ERROR not a warning.

Random Thoughts
===============

As usual with rings, the test diff can be hard to reason about -
hopefully I've added enough comments to assure future me that these
assertions make sense.

Despite being a large rewrite of a lot of important code, the existing
code is known to have failed us.  This change fixes a critical bug that's
trivial to reproduce in a critical component of the system.

There's probably a bunch of error messages and exit status stuff that's
not as helpful as it could be considering the new behaviors.

Change-Id: I1bbe7be38806fc1c8b9181a722933c18a6c76e05
Closes-Bug: #1452431
2015-12-07 16:06:42 -08:00

828 lines
37 KiB
Python

# Copyright (c) 2010-2012 OpenStack Foundation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import array
import six.moves.cPickle as pickle
import os
import sys
import unittest
import stat
from contextlib import closing
from gzip import GzipFile
from tempfile import mkdtemp
from shutil import rmtree
from time import sleep, time
from six.moves import range
from swift.common import ring, utils
class TestRingBase(unittest.TestCase):
def setUp(self):
self._orig_hash_suffix = utils.HASH_PATH_SUFFIX
self._orig_hash_prefix = utils.HASH_PATH_PREFIX
utils.HASH_PATH_SUFFIX = 'endcap'
utils.HASH_PATH_PREFIX = ''
def tearDown(self):
utils.HASH_PATH_SUFFIX = self._orig_hash_suffix
utils.HASH_PATH_PREFIX = self._orig_hash_prefix
class TestRingData(unittest.TestCase):
def setUp(self):
self.testdir = os.path.join(os.path.dirname(__file__), 'ring_data')
rmtree(self.testdir, ignore_errors=1)
os.mkdir(self.testdir)
def tearDown(self):
rmtree(self.testdir, ignore_errors=1)
def assert_ring_data_equal(self, rd_expected, rd_got):
self.assertEqual(rd_expected._replica2part2dev_id,
rd_got._replica2part2dev_id)
self.assertEqual(rd_expected.devs, rd_got.devs)
self.assertEqual(rd_expected._part_shift, rd_got._part_shift)
def test_attrs(self):
r2p2d = [[0, 1, 0, 1], [0, 1, 0, 1]]
d = [{'id': 0, 'zone': 0, 'region': 0, 'ip': '10.1.1.0', 'port': 7000},
{'id': 1, 'zone': 1, 'region': 1, 'ip': '10.1.1.1', 'port': 7000}]
s = 30
rd = ring.RingData(r2p2d, d, s)
self.assertEqual(rd._replica2part2dev_id, r2p2d)
self.assertEqual(rd.devs, d)
self.assertEqual(rd._part_shift, s)
def test_can_load_pickled_ring_data(self):
rd = ring.RingData(
[[0, 1, 0, 1], [0, 1, 0, 1]],
[{'id': 0, 'zone': 0, 'ip': '10.1.1.0', 'port': 7000},
{'id': 1, 'zone': 1, 'ip': '10.1.1.1', 'port': 7000}],
30)
ring_fname = os.path.join(self.testdir, 'foo.ring.gz')
for p in range(pickle.HIGHEST_PROTOCOL):
with closing(GzipFile(ring_fname, 'wb')) as f:
pickle.dump(rd, f, protocol=p)
meta_only = ring.RingData.load(ring_fname, metadata_only=True)
self.assertEqual([
{'id': 0, 'zone': 0, 'region': 1, 'ip': '10.1.1.0',
'port': 7000},
{'id': 1, 'zone': 1, 'region': 1, 'ip': '10.1.1.1',
'port': 7000},
], meta_only.devs)
# Pickled rings can't load only metadata, so you get it all
self.assert_ring_data_equal(rd, meta_only)
ring_data = ring.RingData.load(ring_fname)
self.assert_ring_data_equal(rd, ring_data)
def test_roundtrip_serialization(self):
ring_fname = os.path.join(self.testdir, 'foo.ring.gz')
rd = ring.RingData(
[array.array('H', [0, 1, 0, 1]), array.array('H', [0, 1, 0, 1])],
[{'id': 0, 'zone': 0}, {'id': 1, 'zone': 1}], 30)
rd.save(ring_fname)
meta_only = ring.RingData.load(ring_fname, metadata_only=True)
self.assertEqual([
{'id': 0, 'zone': 0, 'region': 1},
{'id': 1, 'zone': 1, 'region': 1},
], meta_only.devs)
self.assertEqual([], meta_only._replica2part2dev_id)
rd2 = ring.RingData.load(ring_fname)
self.assert_ring_data_equal(rd, rd2)
def test_deterministic_serialization(self):
"""
Two identical rings should produce identical .gz files on disk.
Only true on Python 2.7 or greater.
"""
if sys.version_info[0] == 2 and sys.version_info[1] < 7:
return
os.mkdir(os.path.join(self.testdir, '1'))
os.mkdir(os.path.join(self.testdir, '2'))
# These have to have the same filename (not full path,
# obviously) since the filename gets encoded in the gzip data.
ring_fname1 = os.path.join(self.testdir, '1', 'the.ring.gz')
ring_fname2 = os.path.join(self.testdir, '2', 'the.ring.gz')
rd = ring.RingData(
[array.array('H', [0, 1, 0, 1]), array.array('H', [0, 1, 0, 1])],
[{'id': 0, 'zone': 0}, {'id': 1, 'zone': 1}], 30)
rd.save(ring_fname1)
rd.save(ring_fname2)
with open(ring_fname1) as ring1:
with open(ring_fname2) as ring2:
self.assertEqual(ring1.read(), ring2.read())
def test_permissions(self):
ring_fname = os.path.join(self.testdir, 'stat.ring.gz')
rd = ring.RingData(
[array.array('H', [0, 1, 0, 1]), array.array('H', [0, 1, 0, 1])],
[{'id': 0, 'zone': 0}, {'id': 1, 'zone': 1}], 30)
rd.save(ring_fname)
self.assertEqual(oct(stat.S_IMODE(os.stat(ring_fname).st_mode)),
'0644')
class TestRing(TestRingBase):
def setUp(self):
super(TestRing, self).setUp()
self.testdir = mkdtemp()
self.testgz = os.path.join(self.testdir, 'whatever.ring.gz')
self.intended_replica2part2dev_id = [
array.array('H', [0, 1, 0, 1]),
array.array('H', [0, 1, 0, 1]),
array.array('H', [3, 4, 3, 4])]
self.intended_devs = [{'id': 0, 'region': 0, 'zone': 0, 'weight': 1.0,
'ip': '10.1.1.1', 'port': 6000,
'replication_ip': '10.1.0.1',
'replication_port': 6066},
{'id': 1, 'region': 0, 'zone': 0, 'weight': 1.0,
'ip': '10.1.1.1', 'port': 6000,
'replication_ip': '10.1.0.2',
'replication_port': 6066},
None,
{'id': 3, 'region': 0, 'zone': 2, 'weight': 1.0,
'ip': '10.1.2.1', 'port': 6000,
'replication_ip': '10.2.0.1',
'replication_port': 6066},
{'id': 4, 'region': 0, 'zone': 2, 'weight': 1.0,
'ip': '10.1.2.2', 'port': 6000,
'replication_ip': '10.2.0.1',
'replication_port': 6066}]
self.intended_part_shift = 30
self.intended_reload_time = 15
ring.RingData(
self.intended_replica2part2dev_id,
self.intended_devs, self.intended_part_shift).save(self.testgz)
self.ring = ring.Ring(
self.testdir,
reload_time=self.intended_reload_time, ring_name='whatever')
def tearDown(self):
super(TestRing, self).tearDown()
rmtree(self.testdir, ignore_errors=1)
def test_creation(self):
self.assertEqual(self.ring._replica2part2dev_id,
self.intended_replica2part2dev_id)
self.assertEqual(self.ring._part_shift, self.intended_part_shift)
self.assertEqual(self.ring.devs, self.intended_devs)
self.assertEqual(self.ring.reload_time, self.intended_reload_time)
self.assertEqual(self.ring.serialized_path, self.testgz)
# test invalid endcap
_orig_hash_path_suffix = utils.HASH_PATH_SUFFIX
_orig_hash_path_prefix = utils.HASH_PATH_PREFIX
_orig_swift_conf_file = utils.SWIFT_CONF_FILE
try:
utils.HASH_PATH_SUFFIX = ''
utils.HASH_PATH_PREFIX = ''
utils.SWIFT_CONF_FILE = ''
self.assertRaises(SystemExit, ring.Ring, self.testdir, 'whatever')
finally:
utils.HASH_PATH_SUFFIX = _orig_hash_path_suffix
utils.HASH_PATH_PREFIX = _orig_hash_path_prefix
utils.SWIFT_CONF_FILE = _orig_swift_conf_file
def test_has_changed(self):
self.assertEqual(self.ring.has_changed(), False)
os.utime(self.testgz, (time() + 60, time() + 60))
self.assertEqual(self.ring.has_changed(), True)
def test_reload(self):
os.utime(self.testgz, (time() - 300, time() - 300))
self.ring = ring.Ring(self.testdir, reload_time=0.001,
ring_name='whatever')
orig_mtime = self.ring._mtime
self.assertEqual(len(self.ring.devs), 5)
self.intended_devs.append(
{'id': 3, 'region': 0, 'zone': 3, 'weight': 1.0,
'ip': '10.1.1.1', 'port': 9876})
ring.RingData(
self.intended_replica2part2dev_id,
self.intended_devs, self.intended_part_shift).save(self.testgz)
sleep(0.1)
self.ring.get_nodes('a')
self.assertEqual(len(self.ring.devs), 6)
self.assertNotEqual(self.ring._mtime, orig_mtime)
os.utime(self.testgz, (time() - 300, time() - 300))
self.ring = ring.Ring(self.testdir, reload_time=0.001,
ring_name='whatever')
orig_mtime = self.ring._mtime
self.assertEqual(len(self.ring.devs), 6)
self.intended_devs.append(
{'id': 5, 'region': 0, 'zone': 4, 'weight': 1.0,
'ip': '10.5.5.5', 'port': 9876})
ring.RingData(
self.intended_replica2part2dev_id,
self.intended_devs, self.intended_part_shift).save(self.testgz)
sleep(0.1)
self.ring.get_part_nodes(0)
self.assertEqual(len(self.ring.devs), 7)
self.assertNotEqual(self.ring._mtime, orig_mtime)
os.utime(self.testgz, (time() - 300, time() - 300))
self.ring = ring.Ring(self.testdir, reload_time=0.001,
ring_name='whatever')
orig_mtime = self.ring._mtime
part, nodes = self.ring.get_nodes('a')
self.assertEqual(len(self.ring.devs), 7)
self.intended_devs.append(
{'id': 6, 'region': 0, 'zone': 5, 'weight': 1.0,
'ip': '10.6.6.6', 'port': 6000})
ring.RingData(
self.intended_replica2part2dev_id,
self.intended_devs, self.intended_part_shift).save(self.testgz)
sleep(0.1)
next(self.ring.get_more_nodes(part))
self.assertEqual(len(self.ring.devs), 8)
self.assertNotEqual(self.ring._mtime, orig_mtime)
os.utime(self.testgz, (time() - 300, time() - 300))
self.ring = ring.Ring(self.testdir, reload_time=0.001,
ring_name='whatever')
orig_mtime = self.ring._mtime
self.assertEqual(len(self.ring.devs), 8)
self.intended_devs.append(
{'id': 5, 'region': 0, 'zone': 4, 'weight': 1.0,
'ip': '10.5.5.5', 'port': 6000})
ring.RingData(
self.intended_replica2part2dev_id,
self.intended_devs, self.intended_part_shift).save(self.testgz)
sleep(0.1)
self.assertEqual(len(self.ring.devs), 9)
self.assertNotEqual(self.ring._mtime, orig_mtime)
def test_reload_without_replication(self):
replication_less_devs = [{'id': 0, 'region': 0, 'zone': 0,
'weight': 1.0, 'ip': '10.1.1.1',
'port': 6000},
{'id': 1, 'region': 0, 'zone': 0,
'weight': 1.0, 'ip': '10.1.1.1',
'port': 6000},
None,
{'id': 3, 'region': 0, 'zone': 2,
'weight': 1.0, 'ip': '10.1.2.1',
'port': 6000},
{'id': 4, 'region': 0, 'zone': 2,
'weight': 1.0, 'ip': '10.1.2.2',
'port': 6000}]
intended_devs = [{'id': 0, 'region': 0, 'zone': 0, 'weight': 1.0,
'ip': '10.1.1.1', 'port': 6000,
'replication_ip': '10.1.1.1',
'replication_port': 6000},
{'id': 1, 'region': 0, 'zone': 0, 'weight': 1.0,
'ip': '10.1.1.1', 'port': 6000,
'replication_ip': '10.1.1.1',
'replication_port': 6000},
None,
{'id': 3, 'region': 0, 'zone': 2, 'weight': 1.0,
'ip': '10.1.2.1', 'port': 6000,
'replication_ip': '10.1.2.1',
'replication_port': 6000},
{'id': 4, 'region': 0, 'zone': 2, 'weight': 1.0,
'ip': '10.1.2.2', 'port': 6000,
'replication_ip': '10.1.2.2',
'replication_port': 6000}]
testgz = os.path.join(self.testdir, 'without_replication.ring.gz')
ring.RingData(
self.intended_replica2part2dev_id,
replication_less_devs, self.intended_part_shift).save(testgz)
self.ring = ring.Ring(
self.testdir,
reload_time=self.intended_reload_time,
ring_name='without_replication')
self.assertEqual(self.ring.devs, intended_devs)
def test_reload_old_style_pickled_ring(self):
devs = [{'id': 0, 'zone': 0,
'weight': 1.0, 'ip': '10.1.1.1',
'port': 6000},
{'id': 1, 'zone': 0,
'weight': 1.0, 'ip': '10.1.1.1',
'port': 6000},
None,
{'id': 3, 'zone': 2,
'weight': 1.0, 'ip': '10.1.2.1',
'port': 6000},
{'id': 4, 'zone': 2,
'weight': 1.0, 'ip': '10.1.2.2',
'port': 6000}]
intended_devs = [{'id': 0, 'region': 1, 'zone': 0, 'weight': 1.0,
'ip': '10.1.1.1', 'port': 6000,
'replication_ip': '10.1.1.1',
'replication_port': 6000},
{'id': 1, 'region': 1, 'zone': 0, 'weight': 1.0,
'ip': '10.1.1.1', 'port': 6000,
'replication_ip': '10.1.1.1',
'replication_port': 6000},
None,
{'id': 3, 'region': 1, 'zone': 2, 'weight': 1.0,
'ip': '10.1.2.1', 'port': 6000,
'replication_ip': '10.1.2.1',
'replication_port': 6000},
{'id': 4, 'region': 1, 'zone': 2, 'weight': 1.0,
'ip': '10.1.2.2', 'port': 6000,
'replication_ip': '10.1.2.2',
'replication_port': 6000}]
# simulate an old-style pickled ring
testgz = os.path.join(self.testdir,
'without_replication_or_region.ring.gz')
ring_data = ring.RingData(self.intended_replica2part2dev_id,
devs,
self.intended_part_shift)
# an old-style pickled ring won't have region data
for dev in ring_data.devs:
if dev:
del dev["region"]
gz_file = GzipFile(testgz, 'wb')
pickle.dump(ring_data, gz_file, protocol=2)
gz_file.close()
self.ring = ring.Ring(
self.testdir,
reload_time=self.intended_reload_time,
ring_name='without_replication_or_region')
self.assertEqual(self.ring.devs, intended_devs)
def test_get_part(self):
part1 = self.ring.get_part('a')
nodes1 = self.ring.get_part_nodes(part1)
part2, nodes2 = self.ring.get_nodes('a')
self.assertEqual(part1, part2)
self.assertEqual(nodes1, nodes2)
def test_get_part_nodes(self):
part, nodes = self.ring.get_nodes('a')
self.assertEqual(nodes, self.ring.get_part_nodes(part))
def test_get_nodes(self):
# Yes, these tests are deliberately very fragile. We want to make sure
# that if someones changes the results the ring produces, they know it.
self.assertRaises(TypeError, self.ring.get_nodes)
part, nodes = self.ring.get_nodes('a')
self.assertEqual(part, 0)
self.assertEqual(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[0],
self.intended_devs[3]])])
part, nodes = self.ring.get_nodes('a1')
self.assertEqual(part, 0)
self.assertEqual(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[0],
self.intended_devs[3]])])
part, nodes = self.ring.get_nodes('a4')
self.assertEqual(part, 1)
self.assertEqual(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[1],
self.intended_devs[4]])])
part, nodes = self.ring.get_nodes('aa')
self.assertEqual(part, 1)
self.assertEqual(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[1],
self.intended_devs[4]])])
part, nodes = self.ring.get_nodes('a', 'c1')
self.assertEqual(part, 0)
self.assertEqual(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[0],
self.intended_devs[3]])])
part, nodes = self.ring.get_nodes('a', 'c0')
self.assertEqual(part, 3)
self.assertEqual(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[1],
self.intended_devs[4]])])
part, nodes = self.ring.get_nodes('a', 'c3')
self.assertEqual(part, 2)
self.assertEqual(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[0],
self.intended_devs[3]])])
part, nodes = self.ring.get_nodes('a', 'c2')
self.assertEqual(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[0],
self.intended_devs[3]])])
part, nodes = self.ring.get_nodes('a', 'c', 'o1')
self.assertEqual(part, 1)
self.assertEqual(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[1],
self.intended_devs[4]])])
part, nodes = self.ring.get_nodes('a', 'c', 'o5')
self.assertEqual(part, 0)
self.assertEqual(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[0],
self.intended_devs[3]])])
part, nodes = self.ring.get_nodes('a', 'c', 'o0')
self.assertEqual(part, 0)
self.assertEqual(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[0],
self.intended_devs[3]])])
part, nodes = self.ring.get_nodes('a', 'c', 'o2')
self.assertEqual(part, 2)
self.assertEqual(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[0],
self.intended_devs[3]])])
def add_dev_to_ring(self, new_dev):
self.ring.devs.append(new_dev)
self.ring._rebuild_tier_data()
def test_get_more_nodes(self):
# Yes, these tests are deliberately very fragile. We want to make sure
# that if someone changes the results the ring produces, they know it.
exp_part = 6
exp_devs = [71, 77, 30]
exp_zones = set([6, 3, 7])
exp_handoffs = [99, 43, 94, 13, 1, 49, 60, 72, 27, 68, 78, 26, 21, 9,
51, 105, 47, 89, 65, 82, 34, 98, 38, 85, 16, 4, 59,
102, 40, 90, 20, 8, 54, 66, 80, 25, 14, 2, 50, 12, 0,
48, 70, 76, 32, 107, 45, 87, 101, 44, 93, 100, 42, 95,
106, 46, 88, 97, 37, 86, 96, 36, 84, 17, 5, 57, 63,
81, 33, 67, 79, 24, 15, 3, 58, 69, 75, 31, 61, 74, 29,
23, 10, 52, 22, 11, 53, 64, 83, 35, 62, 73, 28, 18, 6,
56, 104, 39, 91, 103, 41, 92, 19, 7, 55]
exp_first_handoffs = [23, 64, 105, 102, 67, 17, 99, 65, 69, 97, 15,
17, 24, 98, 66, 65, 69, 18, 104, 105, 16, 107,
100, 15, 14, 19, 102, 105, 63, 104, 99, 12, 107,
99, 16, 105, 71, 15, 15, 63, 63, 99, 21, 68, 20,
64, 96, 21, 98, 19, 68, 99, 15, 69, 62, 100, 96,
102, 17, 62, 13, 61, 102, 105, 22, 16, 21, 18,
21, 100, 20, 16, 21, 106, 66, 106, 16, 99, 16,
22, 62, 60, 99, 69, 18, 23, 104, 98, 106, 61,
21, 23, 23, 16, 67, 71, 101, 16, 64, 66, 70, 15,
102, 63, 19, 98, 18, 106, 101, 100, 62, 63, 98,
18, 13, 97, 23, 22, 100, 13, 14, 67, 96, 14,
105, 97, 71, 64, 96, 22, 65, 66, 98, 19, 105,
98, 97, 21, 15, 69, 100, 98, 106, 65, 66, 97,
62, 22, 68, 63, 61, 67, 67, 20, 105, 106, 105,
18, 71, 100, 17, 62, 60, 13, 103, 99, 101, 96,
97, 16, 60, 21, 14, 20, 12, 60, 69, 104, 65, 65,
17, 16, 67, 13, 64, 15, 16, 68, 96, 21, 104, 66,
96, 105, 58, 105, 103, 21, 96, 60, 16, 96, 21,
71, 16, 99, 101, 63, 62, 103, 18, 102, 60, 17,
19, 106, 97, 14, 99, 68, 102, 13, 70, 103, 21,
22, 19, 61, 103, 23, 104, 65, 62, 68, 16, 65,
15, 102, 102, 71, 99, 63, 67, 19, 23, 15, 69,
107, 14, 13, 64, 13, 105, 15, 98, 69]
rb = ring.RingBuilder(8, 3, 1)
next_dev_id = 0
for zone in range(1, 10):
for server in range(1, 5):
for device in range(1, 4):
rb.add_dev({'id': next_dev_id,
'ip': '1.2.%d.%d' % (zone, server),
'port': 1234 + device,
'zone': zone, 'region': 0,
'weight': 1.0})
next_dev_id += 1
rb.rebalance(seed=2)
rb.get_ring().save(self.testgz)
r = ring.Ring(self.testdir, ring_name='whatever')
# every part has the same number of handoffs
part_handoff_counts = set()
for part in range(r.partition_count):
part_handoff_counts.add(len(list(r.get_more_nodes(part))))
self.assertEqual(part_handoff_counts, {105})
# which less the primaries - is every device in the ring
self.assertEqual(len(list(rb._iter_devs())) - rb.replicas, 105)
part, devs = r.get_nodes('a', 'c', 'o')
primary_zones = set([d['zone'] for d in devs])
self.assertEqual(part, exp_part)
self.assertEqual([d['id'] for d in devs], exp_devs)
self.assertEqual(primary_zones, exp_zones)
devs = list(r.get_more_nodes(part))
self.assertEqual(len(devs), len(exp_handoffs))
dev_ids = [d['id'] for d in devs]
self.assertEqual(dev_ids, exp_handoffs)
# The first 6 replicas plus the 3 primary nodes should cover all 9
# zones in this test
seen_zones = set(primary_zones)
seen_zones.update([d['zone'] for d in devs[:6]])
self.assertEqual(seen_zones, set(range(1, 10)))
# The first handoff nodes for each partition in the ring
devs = []
for part in range(r.partition_count):
devs.append(next(r.get_more_nodes(part))['id'])
self.assertEqual(devs, exp_first_handoffs)
# Add a new device we can handoff to.
zone = 5
server = 0
rb.add_dev({'id': next_dev_id,
'ip': '1.2.%d.%d' % (zone, server),
'port': 1234, 'zone': zone, 'region': 0, 'weight': 1.0})
next_dev_id += 1
rb.pretend_min_part_hours_passed()
num_parts_changed, _balance, _removed_dev = rb.rebalance(seed=2)
rb.get_ring().save(self.testgz)
r = ring.Ring(self.testdir, ring_name='whatever')
# so now we expect the device list to be longer by one device
part_handoff_counts = set()
for part in range(r.partition_count):
part_handoff_counts.add(len(list(r.get_more_nodes(part))))
self.assertEqual(part_handoff_counts, {106})
self.assertEqual(len(list(rb._iter_devs())) - rb.replicas, 106)
# I don't think there's any special reason this dev goes at this index
exp_handoffs.insert(27, rb.devs[-1]['id'])
# We would change expectations here, but in this part only the added
# device changed at all.
part, devs = r.get_nodes('a', 'c', 'o')
primary_zones = set([d['zone'] for d in devs])
self.assertEqual(part, exp_part)
self.assertEqual([d['id'] for d in devs], exp_devs)
self.assertEqual(primary_zones, exp_zones)
devs = list(r.get_more_nodes(part))
dev_ids = [d['id'] for d in devs]
self.assertEqual(len(dev_ids), len(exp_handoffs))
for index, dev in enumerate(dev_ids):
self.assertEqual(
dev, exp_handoffs[index],
'handoff differs at position %d\n%s\n%s' % (
index, dev_ids[index:], exp_handoffs[index:]))
# The handoffs still cover all the non-primary zones first
seen_zones = set(primary_zones)
seen_zones.update([d['zone'] for d in devs[:6]])
self.assertEqual(seen_zones, set(range(1, 10)))
# Change expectations for the rest of the parts
devs = []
for part in range(r.partition_count):
devs.append(next(r.get_more_nodes(part))['id'])
changed_first_handoff = 0
for part in range(r.partition_count):
if devs[part] != exp_first_handoffs[part]:
changed_first_handoff += 1
exp_first_handoffs[part] = devs[part]
self.assertEqual(devs, exp_first_handoffs)
self.assertEqual(changed_first_handoff, num_parts_changed)
# Remove a device - no need to fluff min_part_hours.
rb.remove_dev(0)
num_parts_changed, _balance, _removed_dev = rb.rebalance(seed=1)
rb.get_ring().save(self.testgz)
r = ring.Ring(self.testdir, ring_name='whatever')
# so now we expect the device list to be shorter by one device
part_handoff_counts = set()
for part in range(r.partition_count):
part_handoff_counts.add(len(list(r.get_more_nodes(part))))
self.assertEqual(part_handoff_counts, {105})
self.assertEqual(len(list(rb._iter_devs())) - rb.replicas, 105)
# Change expectations for our part
exp_handoffs.remove(0)
first_matches = 0
total_changed = 0
devs = list(d['id'] for d in r.get_more_nodes(exp_part))
for i, part in enumerate(devs):
if exp_handoffs[i] != devs[i]:
total_changed += 1
exp_handoffs[i] = devs[i]
if not total_changed:
first_matches += 1
self.assertEqual(devs, exp_handoffs)
# the first 21 handoffs were the same across the rebalance
self.assertEqual(first_matches, 21)
# but as you dig deeper some of the differences show up
self.assertEqual(total_changed, 41)
# Change expectations for the rest of the parts
devs = []
for part in range(r.partition_count):
devs.append(next(r.get_more_nodes(part))['id'])
changed_first_handoff = 0
for part in range(r.partition_count):
if devs[part] != exp_first_handoffs[part]:
changed_first_handoff += 1
exp_first_handoffs[part] = devs[part]
self.assertEqual(devs, exp_first_handoffs)
self.assertEqual(changed_first_handoff, num_parts_changed)
# Test
part, devs = r.get_nodes('a', 'c', 'o')
primary_zones = set([d['zone'] for d in devs])
self.assertEqual(part, exp_part)
self.assertEqual([d['id'] for d in devs], exp_devs)
self.assertEqual(primary_zones, exp_zones)
devs = list(r.get_more_nodes(part))
dev_ids = [d['id'] for d in devs]
self.assertEqual(len(dev_ids), len(exp_handoffs))
for index, dev in enumerate(dev_ids):
self.assertEqual(
dev, exp_handoffs[index],
'handoff differs at position %d\n%s\n%s' % (
index, dev_ids[index:], exp_handoffs[index:]))
seen_zones = set(primary_zones)
seen_zones.update([d['zone'] for d in devs[:6]])
self.assertEqual(seen_zones, set(range(1, 10)))
devs = []
for part in range(r.partition_count):
devs.append(next(r.get_more_nodes(part))['id'])
for part in range(r.partition_count):
self.assertEqual(
devs[part], exp_first_handoffs[part],
'handoff for partitition %d is now device id %d' % (
part, devs[part]))
# Add a partial replica
rb.set_replicas(3.5)
num_parts_changed, _balance, _removed_dev = rb.rebalance(seed=164)
rb.get_ring().save(self.testgz)
r = ring.Ring(self.testdir, ring_name='whatever')
# Change expectations
# We have another replica now
exp_devs.append(90)
exp_zones.add(8)
# and therefore one less handoff
exp_handoffs = exp_handoffs[:-1]
# Caused some major changes in the sequence of handoffs for our test
# partition, but at least the first stayed the same.
devs = list(d['id'] for d in r.get_more_nodes(exp_part))
first_matches = 0
total_changed = 0
for i, part in enumerate(devs):
if exp_handoffs[i] != devs[i]:
total_changed += 1
exp_handoffs[i] = devs[i]
if not total_changed:
first_matches += 1
# most seeds seem to throw out first handoff stabilization with
# replica_count change
self.assertEqual(first_matches, 2)
# and lots of other handoff changes...
self.assertEqual(total_changed, 95)
self.assertEqual(devs, exp_handoffs)
# Change expectations for the rest of the parts
devs = []
for part in range(r.partition_count):
devs.append(next(r.get_more_nodes(part))['id'])
changed_first_handoff = 0
for part in range(r.partition_count):
if devs[part] != exp_first_handoffs[part]:
changed_first_handoff += 1
exp_first_handoffs[part] = devs[part]
self.assertEqual(devs, exp_first_handoffs)
self.assertLessEqual(changed_first_handoff, num_parts_changed)
# Test
part, devs = r.get_nodes('a', 'c', 'o')
primary_zones = set([d['zone'] for d in devs])
self.assertEqual(part, exp_part)
self.assertEqual([d['id'] for d in devs], exp_devs)
self.assertEqual(primary_zones, exp_zones)
devs = list(r.get_more_nodes(part))
dev_ids = [d['id'] for d in devs]
self.assertEqual(len(dev_ids), len(exp_handoffs))
for index, dev in enumerate(dev_ids):
self.assertEqual(
dev, exp_handoffs[index],
'handoff differs at position %d\n%s\n%s' % (
index, dev_ids[index:], exp_handoffs[index:]))
seen_zones = set(primary_zones)
seen_zones.update([d['zone'] for d in devs[:6]])
self.assertEqual(seen_zones, set(range(1, 10)))
devs = []
for part in range(r.partition_count):
devs.append(next(r.get_more_nodes(part))['id'])
for part in range(r.partition_count):
self.assertEqual(
devs[part], exp_first_handoffs[part],
'handoff for partitition %d is now device id %d' % (
part, devs[part]))
# One last test of a partial replica partition
exp_part2 = 136
exp_devs2 = [70, 76, 32]
exp_zones2 = set([3, 6, 7])
exp_handoffs2 = [89, 97, 37, 53, 20, 1, 86, 64, 102, 40, 90, 60, 72,
27, 99, 68, 78, 26, 105, 45, 42, 95, 22, 13, 49, 55,
11, 8, 83, 16, 4, 59, 33, 108, 61, 74, 29, 88, 66,
80, 25, 100, 39, 67, 79, 24, 65, 96, 36, 84, 54, 21,
63, 81, 56, 71, 77, 30, 48, 23, 10, 52, 82, 34, 17,
107, 87, 104, 5, 35, 2, 50, 43, 62, 73, 28, 18, 14,
98, 38, 85, 15, 57, 9, 51, 12, 6, 91, 3, 103, 41, 92,
47, 75, 44, 69, 101, 93, 106, 46, 94, 31, 19, 7, 58]
part2, devs2 = r.get_nodes('a', 'c', 'o2')
primary_zones2 = set([d['zone'] for d in devs2])
self.assertEqual(part2, exp_part2)
self.assertEqual([d['id'] for d in devs2], exp_devs2)
self.assertEqual(primary_zones2, exp_zones2)
devs2 = list(r.get_more_nodes(part2))
dev_ids2 = [d['id'] for d in devs2]
self.assertEqual(len(dev_ids2), len(exp_handoffs2))
for index, dev in enumerate(dev_ids2):
self.assertEqual(
dev, exp_handoffs2[index],
'handoff differs at position %d\n%s\n%s' % (
index, dev_ids2[index:], exp_handoffs2[index:]))
seen_zones = set(primary_zones2)
seen_zones.update([d['zone'] for d in devs2[:6]])
self.assertEqual(seen_zones, set(range(1, 10)))
# Test distribution across regions
rb.set_replicas(3)
for region in range(1, 5):
rb.add_dev({'id': next_dev_id,
'ip': '1.%d.1.%d' % (region, server), 'port': 1234,
# 108.0 is the weight of all devices created prior to
# this test in region 0; this way all regions have
# equal combined weight
'zone': 1, 'region': region, 'weight': 108.0})
next_dev_id += 1
rb.pretend_min_part_hours_passed()
rb.rebalance(seed=1)
rb.pretend_min_part_hours_passed()
rb.rebalance(seed=1)
rb.get_ring().save(self.testgz)
r = ring.Ring(self.testdir, ring_name='whatever')
# There's 5 regions now, so the primary nodes + first 2 handoffs
# should span all 5 regions
part, devs = r.get_nodes('a1', 'c1', 'o1')
primary_regions = set([d['region'] for d in devs])
primary_zones = set([(d['region'], d['zone']) for d in devs])
more_devs = list(r.get_more_nodes(part))
seen_regions = set(primary_regions)
seen_regions.update([d['region'] for d in more_devs[:2]])
self.assertEqual(seen_regions, set(range(0, 5)))
# There are 13 zones now, so the first 13 nodes should all have
# distinct zones (that's r0z0, r0z1, ..., r0z8, r1z1, r2z1, r3z1, and
# r4z1).
seen_zones = set(primary_zones)
seen_zones.update([(d['region'], d['zone']) for d in more_devs[:10]])
self.assertEqual(13, len(seen_zones))
# Here's a brittle canary-in-the-coalmine test to make sure the region
# handoff computation didn't change accidentally
exp_handoffs = [111, 112, 35, 58, 62, 74, 20, 105, 41, 90, 53, 6, 3,
67, 55, 76, 108, 32, 12, 80, 38, 85, 94, 42, 27, 99,
50, 47, 70, 87, 26, 9, 15, 97, 102, 81, 23, 65, 33,
77, 34, 4, 75, 8, 5, 30, 13, 73, 36, 92, 54, 51, 72,
78, 66, 1, 48, 14, 93, 95, 88, 86, 84, 106, 60, 101,
57, 43, 89, 59, 79, 46, 61, 52, 44, 45, 37, 68, 25,
100, 49, 24, 16, 71, 96, 21, 107, 98, 64, 39, 18, 29,
103, 91, 22, 63, 69, 28, 56, 11, 82, 10, 17, 19, 7,
40, 83, 104, 31]
dev_ids = [d['id'] for d in more_devs]
self.assertEqual(len(dev_ids), len(exp_handoffs))
for index, dev_id in enumerate(dev_ids):
self.assertEqual(
dev_id, exp_handoffs[index],
'handoff differs at position %d\n%s\n%s' % (
index, dev_ids[index:], exp_handoffs[index:]))
if __name__ == '__main__':
unittest.main()