
It's harder than it sounds. There was really three challenges. Challenge #1 Initial Assignment =============================== Before starting to assign parts on this new shiny ring you've constructed, maybe we'll pause for a moment up front and consider the lay of the land. This process is called the replica_plan. The replica_plan approach is separating part assignment failures into two modes: 1) we considered the cluster topology and it's weights and came up with the wrong plan 2) we failed to execute on the plan I failed at both parts plenty of times before I got it this close. I'm sure a counter example still exists, but when we find it the new helper methods will let us reason about where things went wrong. Challenge #2 Fixing Placement ============================= With a sound plan in hand, it's much easier to fail to execute on it the less material you have to execute with - so we gather up as many parts as we can - as long as we think we can find them a better home. Picking the right parts for gather is a black art - when you notice a balance is slow it's because it's spending so much time iterating over replica2part2dev trying to decide just the right parts to gather. The replica plan can help at least in the gross dispersion collection to gather up the worst offenders first before considering balance. I think trying to avoid picking up parts that are stuck to the tier before falling into a forced grab on anything over parts_wanted helps with stability generally - but depending on where the parts_wanted are in relation to the full devices it's pretty easy pick up something that'll end up really close to where it started. I tried to break the gather methods into smaller pieces so it looked like I knew what I was doing. Going with a MAXIMUM gather iteration instead of balance (which doesn't reflect the replica_plan) doesn't seem to be costing me anything - most of the time the exit condition is either solved or all the parts overly aggressively locked up on min_part_hours. So far, it mostly seemds if the thing is going to balance this round it'll get it in the first couple of shakes. Challenge #3 Crazy replica2part2dev tables ========================================== I think there's lots of ways "scars" can build up a ring which can result in very particular replica2part2dev tables that are physically difficult to dig out of. It's repairing these scars that will take multiple rebalances to resolve. ... but at this point ... ... lacking a counter example ... I've been able to close up all the edge cases I was able to find. It may not be quick, but progress will be made. Basically my strategy just required a better understanding of how previous algorithms were able to *mostly* keep things moving by brute forcing the whole mess with a bunch of randomness. Then when we detect our "elegant" careful part selection isn't making progress - we can fall back to same old tricks. Validation ========== We validate against duplicate part replica assignment after rebalance and raise an ERROR if we detect more than one replica of a part assigned to the same device. In order to meet that requirement we have to have as many devices as replicas, so attempting to rebalance with too few devices w/o changing your replica_count is also an ERROR not a warning. Random Thoughts =============== As usual with rings, the test diff can be hard to reason about - hopefully I've added enough comments to assure future me that these assertions make sense. Despite being a large rewrite of a lot of important code, the existing code is known to have failed us. This change fixes a critical bug that's trivial to reproduce in a critical component of the system. There's probably a bunch of error messages and exit status stuff that's not as helpful as it could be considering the new behaviors. Change-Id: I1bbe7be38806fc1c8b9181a722933c18a6c76e05 Closes-Bug: #1452431
828 lines
37 KiB
Python
828 lines
37 KiB
Python
# Copyright (c) 2010-2012 OpenStack Foundation
|
|
#
|
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
# you may not use this file except in compliance with the License.
|
|
# You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
|
# implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
|
|
import array
|
|
import six.moves.cPickle as pickle
|
|
import os
|
|
import sys
|
|
import unittest
|
|
import stat
|
|
from contextlib import closing
|
|
from gzip import GzipFile
|
|
from tempfile import mkdtemp
|
|
from shutil import rmtree
|
|
from time import sleep, time
|
|
|
|
from six.moves import range
|
|
|
|
from swift.common import ring, utils
|
|
|
|
|
|
class TestRingBase(unittest.TestCase):
|
|
|
|
def setUp(self):
|
|
self._orig_hash_suffix = utils.HASH_PATH_SUFFIX
|
|
self._orig_hash_prefix = utils.HASH_PATH_PREFIX
|
|
utils.HASH_PATH_SUFFIX = 'endcap'
|
|
utils.HASH_PATH_PREFIX = ''
|
|
|
|
def tearDown(self):
|
|
utils.HASH_PATH_SUFFIX = self._orig_hash_suffix
|
|
utils.HASH_PATH_PREFIX = self._orig_hash_prefix
|
|
|
|
|
|
class TestRingData(unittest.TestCase):
|
|
|
|
def setUp(self):
|
|
self.testdir = os.path.join(os.path.dirname(__file__), 'ring_data')
|
|
rmtree(self.testdir, ignore_errors=1)
|
|
os.mkdir(self.testdir)
|
|
|
|
def tearDown(self):
|
|
rmtree(self.testdir, ignore_errors=1)
|
|
|
|
def assert_ring_data_equal(self, rd_expected, rd_got):
|
|
self.assertEqual(rd_expected._replica2part2dev_id,
|
|
rd_got._replica2part2dev_id)
|
|
self.assertEqual(rd_expected.devs, rd_got.devs)
|
|
self.assertEqual(rd_expected._part_shift, rd_got._part_shift)
|
|
|
|
def test_attrs(self):
|
|
r2p2d = [[0, 1, 0, 1], [0, 1, 0, 1]]
|
|
d = [{'id': 0, 'zone': 0, 'region': 0, 'ip': '10.1.1.0', 'port': 7000},
|
|
{'id': 1, 'zone': 1, 'region': 1, 'ip': '10.1.1.1', 'port': 7000}]
|
|
s = 30
|
|
rd = ring.RingData(r2p2d, d, s)
|
|
self.assertEqual(rd._replica2part2dev_id, r2p2d)
|
|
self.assertEqual(rd.devs, d)
|
|
self.assertEqual(rd._part_shift, s)
|
|
|
|
def test_can_load_pickled_ring_data(self):
|
|
rd = ring.RingData(
|
|
[[0, 1, 0, 1], [0, 1, 0, 1]],
|
|
[{'id': 0, 'zone': 0, 'ip': '10.1.1.0', 'port': 7000},
|
|
{'id': 1, 'zone': 1, 'ip': '10.1.1.1', 'port': 7000}],
|
|
30)
|
|
ring_fname = os.path.join(self.testdir, 'foo.ring.gz')
|
|
for p in range(pickle.HIGHEST_PROTOCOL):
|
|
with closing(GzipFile(ring_fname, 'wb')) as f:
|
|
pickle.dump(rd, f, protocol=p)
|
|
meta_only = ring.RingData.load(ring_fname, metadata_only=True)
|
|
self.assertEqual([
|
|
{'id': 0, 'zone': 0, 'region': 1, 'ip': '10.1.1.0',
|
|
'port': 7000},
|
|
{'id': 1, 'zone': 1, 'region': 1, 'ip': '10.1.1.1',
|
|
'port': 7000},
|
|
], meta_only.devs)
|
|
# Pickled rings can't load only metadata, so you get it all
|
|
self.assert_ring_data_equal(rd, meta_only)
|
|
ring_data = ring.RingData.load(ring_fname)
|
|
self.assert_ring_data_equal(rd, ring_data)
|
|
|
|
def test_roundtrip_serialization(self):
|
|
ring_fname = os.path.join(self.testdir, 'foo.ring.gz')
|
|
rd = ring.RingData(
|
|
[array.array('H', [0, 1, 0, 1]), array.array('H', [0, 1, 0, 1])],
|
|
[{'id': 0, 'zone': 0}, {'id': 1, 'zone': 1}], 30)
|
|
rd.save(ring_fname)
|
|
meta_only = ring.RingData.load(ring_fname, metadata_only=True)
|
|
self.assertEqual([
|
|
{'id': 0, 'zone': 0, 'region': 1},
|
|
{'id': 1, 'zone': 1, 'region': 1},
|
|
], meta_only.devs)
|
|
self.assertEqual([], meta_only._replica2part2dev_id)
|
|
rd2 = ring.RingData.load(ring_fname)
|
|
self.assert_ring_data_equal(rd, rd2)
|
|
|
|
def test_deterministic_serialization(self):
|
|
"""
|
|
Two identical rings should produce identical .gz files on disk.
|
|
|
|
Only true on Python 2.7 or greater.
|
|
"""
|
|
if sys.version_info[0] == 2 and sys.version_info[1] < 7:
|
|
return
|
|
os.mkdir(os.path.join(self.testdir, '1'))
|
|
os.mkdir(os.path.join(self.testdir, '2'))
|
|
# These have to have the same filename (not full path,
|
|
# obviously) since the filename gets encoded in the gzip data.
|
|
ring_fname1 = os.path.join(self.testdir, '1', 'the.ring.gz')
|
|
ring_fname2 = os.path.join(self.testdir, '2', 'the.ring.gz')
|
|
rd = ring.RingData(
|
|
[array.array('H', [0, 1, 0, 1]), array.array('H', [0, 1, 0, 1])],
|
|
[{'id': 0, 'zone': 0}, {'id': 1, 'zone': 1}], 30)
|
|
rd.save(ring_fname1)
|
|
rd.save(ring_fname2)
|
|
with open(ring_fname1) as ring1:
|
|
with open(ring_fname2) as ring2:
|
|
self.assertEqual(ring1.read(), ring2.read())
|
|
|
|
def test_permissions(self):
|
|
ring_fname = os.path.join(self.testdir, 'stat.ring.gz')
|
|
rd = ring.RingData(
|
|
[array.array('H', [0, 1, 0, 1]), array.array('H', [0, 1, 0, 1])],
|
|
[{'id': 0, 'zone': 0}, {'id': 1, 'zone': 1}], 30)
|
|
rd.save(ring_fname)
|
|
self.assertEqual(oct(stat.S_IMODE(os.stat(ring_fname).st_mode)),
|
|
'0644')
|
|
|
|
|
|
class TestRing(TestRingBase):
|
|
|
|
def setUp(self):
|
|
super(TestRing, self).setUp()
|
|
self.testdir = mkdtemp()
|
|
self.testgz = os.path.join(self.testdir, 'whatever.ring.gz')
|
|
self.intended_replica2part2dev_id = [
|
|
array.array('H', [0, 1, 0, 1]),
|
|
array.array('H', [0, 1, 0, 1]),
|
|
array.array('H', [3, 4, 3, 4])]
|
|
self.intended_devs = [{'id': 0, 'region': 0, 'zone': 0, 'weight': 1.0,
|
|
'ip': '10.1.1.1', 'port': 6000,
|
|
'replication_ip': '10.1.0.1',
|
|
'replication_port': 6066},
|
|
{'id': 1, 'region': 0, 'zone': 0, 'weight': 1.0,
|
|
'ip': '10.1.1.1', 'port': 6000,
|
|
'replication_ip': '10.1.0.2',
|
|
'replication_port': 6066},
|
|
None,
|
|
{'id': 3, 'region': 0, 'zone': 2, 'weight': 1.0,
|
|
'ip': '10.1.2.1', 'port': 6000,
|
|
'replication_ip': '10.2.0.1',
|
|
'replication_port': 6066},
|
|
{'id': 4, 'region': 0, 'zone': 2, 'weight': 1.0,
|
|
'ip': '10.1.2.2', 'port': 6000,
|
|
'replication_ip': '10.2.0.1',
|
|
'replication_port': 6066}]
|
|
self.intended_part_shift = 30
|
|
self.intended_reload_time = 15
|
|
ring.RingData(
|
|
self.intended_replica2part2dev_id,
|
|
self.intended_devs, self.intended_part_shift).save(self.testgz)
|
|
self.ring = ring.Ring(
|
|
self.testdir,
|
|
reload_time=self.intended_reload_time, ring_name='whatever')
|
|
|
|
def tearDown(self):
|
|
super(TestRing, self).tearDown()
|
|
rmtree(self.testdir, ignore_errors=1)
|
|
|
|
def test_creation(self):
|
|
self.assertEqual(self.ring._replica2part2dev_id,
|
|
self.intended_replica2part2dev_id)
|
|
self.assertEqual(self.ring._part_shift, self.intended_part_shift)
|
|
self.assertEqual(self.ring.devs, self.intended_devs)
|
|
self.assertEqual(self.ring.reload_time, self.intended_reload_time)
|
|
self.assertEqual(self.ring.serialized_path, self.testgz)
|
|
# test invalid endcap
|
|
_orig_hash_path_suffix = utils.HASH_PATH_SUFFIX
|
|
_orig_hash_path_prefix = utils.HASH_PATH_PREFIX
|
|
_orig_swift_conf_file = utils.SWIFT_CONF_FILE
|
|
try:
|
|
utils.HASH_PATH_SUFFIX = ''
|
|
utils.HASH_PATH_PREFIX = ''
|
|
utils.SWIFT_CONF_FILE = ''
|
|
self.assertRaises(SystemExit, ring.Ring, self.testdir, 'whatever')
|
|
finally:
|
|
utils.HASH_PATH_SUFFIX = _orig_hash_path_suffix
|
|
utils.HASH_PATH_PREFIX = _orig_hash_path_prefix
|
|
utils.SWIFT_CONF_FILE = _orig_swift_conf_file
|
|
|
|
def test_has_changed(self):
|
|
self.assertEqual(self.ring.has_changed(), False)
|
|
os.utime(self.testgz, (time() + 60, time() + 60))
|
|
self.assertEqual(self.ring.has_changed(), True)
|
|
|
|
def test_reload(self):
|
|
os.utime(self.testgz, (time() - 300, time() - 300))
|
|
self.ring = ring.Ring(self.testdir, reload_time=0.001,
|
|
ring_name='whatever')
|
|
orig_mtime = self.ring._mtime
|
|
self.assertEqual(len(self.ring.devs), 5)
|
|
self.intended_devs.append(
|
|
{'id': 3, 'region': 0, 'zone': 3, 'weight': 1.0,
|
|
'ip': '10.1.1.1', 'port': 9876})
|
|
ring.RingData(
|
|
self.intended_replica2part2dev_id,
|
|
self.intended_devs, self.intended_part_shift).save(self.testgz)
|
|
sleep(0.1)
|
|
self.ring.get_nodes('a')
|
|
self.assertEqual(len(self.ring.devs), 6)
|
|
self.assertNotEqual(self.ring._mtime, orig_mtime)
|
|
|
|
os.utime(self.testgz, (time() - 300, time() - 300))
|
|
self.ring = ring.Ring(self.testdir, reload_time=0.001,
|
|
ring_name='whatever')
|
|
orig_mtime = self.ring._mtime
|
|
self.assertEqual(len(self.ring.devs), 6)
|
|
self.intended_devs.append(
|
|
{'id': 5, 'region': 0, 'zone': 4, 'weight': 1.0,
|
|
'ip': '10.5.5.5', 'port': 9876})
|
|
ring.RingData(
|
|
self.intended_replica2part2dev_id,
|
|
self.intended_devs, self.intended_part_shift).save(self.testgz)
|
|
sleep(0.1)
|
|
self.ring.get_part_nodes(0)
|
|
self.assertEqual(len(self.ring.devs), 7)
|
|
self.assertNotEqual(self.ring._mtime, orig_mtime)
|
|
|
|
os.utime(self.testgz, (time() - 300, time() - 300))
|
|
self.ring = ring.Ring(self.testdir, reload_time=0.001,
|
|
ring_name='whatever')
|
|
orig_mtime = self.ring._mtime
|
|
part, nodes = self.ring.get_nodes('a')
|
|
self.assertEqual(len(self.ring.devs), 7)
|
|
self.intended_devs.append(
|
|
{'id': 6, 'region': 0, 'zone': 5, 'weight': 1.0,
|
|
'ip': '10.6.6.6', 'port': 6000})
|
|
ring.RingData(
|
|
self.intended_replica2part2dev_id,
|
|
self.intended_devs, self.intended_part_shift).save(self.testgz)
|
|
sleep(0.1)
|
|
next(self.ring.get_more_nodes(part))
|
|
self.assertEqual(len(self.ring.devs), 8)
|
|
self.assertNotEqual(self.ring._mtime, orig_mtime)
|
|
|
|
os.utime(self.testgz, (time() - 300, time() - 300))
|
|
self.ring = ring.Ring(self.testdir, reload_time=0.001,
|
|
ring_name='whatever')
|
|
orig_mtime = self.ring._mtime
|
|
self.assertEqual(len(self.ring.devs), 8)
|
|
self.intended_devs.append(
|
|
{'id': 5, 'region': 0, 'zone': 4, 'weight': 1.0,
|
|
'ip': '10.5.5.5', 'port': 6000})
|
|
ring.RingData(
|
|
self.intended_replica2part2dev_id,
|
|
self.intended_devs, self.intended_part_shift).save(self.testgz)
|
|
sleep(0.1)
|
|
self.assertEqual(len(self.ring.devs), 9)
|
|
self.assertNotEqual(self.ring._mtime, orig_mtime)
|
|
|
|
def test_reload_without_replication(self):
|
|
replication_less_devs = [{'id': 0, 'region': 0, 'zone': 0,
|
|
'weight': 1.0, 'ip': '10.1.1.1',
|
|
'port': 6000},
|
|
{'id': 1, 'region': 0, 'zone': 0,
|
|
'weight': 1.0, 'ip': '10.1.1.1',
|
|
'port': 6000},
|
|
None,
|
|
{'id': 3, 'region': 0, 'zone': 2,
|
|
'weight': 1.0, 'ip': '10.1.2.1',
|
|
'port': 6000},
|
|
{'id': 4, 'region': 0, 'zone': 2,
|
|
'weight': 1.0, 'ip': '10.1.2.2',
|
|
'port': 6000}]
|
|
intended_devs = [{'id': 0, 'region': 0, 'zone': 0, 'weight': 1.0,
|
|
'ip': '10.1.1.1', 'port': 6000,
|
|
'replication_ip': '10.1.1.1',
|
|
'replication_port': 6000},
|
|
{'id': 1, 'region': 0, 'zone': 0, 'weight': 1.0,
|
|
'ip': '10.1.1.1', 'port': 6000,
|
|
'replication_ip': '10.1.1.1',
|
|
'replication_port': 6000},
|
|
None,
|
|
{'id': 3, 'region': 0, 'zone': 2, 'weight': 1.0,
|
|
'ip': '10.1.2.1', 'port': 6000,
|
|
'replication_ip': '10.1.2.1',
|
|
'replication_port': 6000},
|
|
{'id': 4, 'region': 0, 'zone': 2, 'weight': 1.0,
|
|
'ip': '10.1.2.2', 'port': 6000,
|
|
'replication_ip': '10.1.2.2',
|
|
'replication_port': 6000}]
|
|
testgz = os.path.join(self.testdir, 'without_replication.ring.gz')
|
|
ring.RingData(
|
|
self.intended_replica2part2dev_id,
|
|
replication_less_devs, self.intended_part_shift).save(testgz)
|
|
self.ring = ring.Ring(
|
|
self.testdir,
|
|
reload_time=self.intended_reload_time,
|
|
ring_name='without_replication')
|
|
self.assertEqual(self.ring.devs, intended_devs)
|
|
|
|
def test_reload_old_style_pickled_ring(self):
|
|
devs = [{'id': 0, 'zone': 0,
|
|
'weight': 1.0, 'ip': '10.1.1.1',
|
|
'port': 6000},
|
|
{'id': 1, 'zone': 0,
|
|
'weight': 1.0, 'ip': '10.1.1.1',
|
|
'port': 6000},
|
|
None,
|
|
{'id': 3, 'zone': 2,
|
|
'weight': 1.0, 'ip': '10.1.2.1',
|
|
'port': 6000},
|
|
{'id': 4, 'zone': 2,
|
|
'weight': 1.0, 'ip': '10.1.2.2',
|
|
'port': 6000}]
|
|
intended_devs = [{'id': 0, 'region': 1, 'zone': 0, 'weight': 1.0,
|
|
'ip': '10.1.1.1', 'port': 6000,
|
|
'replication_ip': '10.1.1.1',
|
|
'replication_port': 6000},
|
|
{'id': 1, 'region': 1, 'zone': 0, 'weight': 1.0,
|
|
'ip': '10.1.1.1', 'port': 6000,
|
|
'replication_ip': '10.1.1.1',
|
|
'replication_port': 6000},
|
|
None,
|
|
{'id': 3, 'region': 1, 'zone': 2, 'weight': 1.0,
|
|
'ip': '10.1.2.1', 'port': 6000,
|
|
'replication_ip': '10.1.2.1',
|
|
'replication_port': 6000},
|
|
{'id': 4, 'region': 1, 'zone': 2, 'weight': 1.0,
|
|
'ip': '10.1.2.2', 'port': 6000,
|
|
'replication_ip': '10.1.2.2',
|
|
'replication_port': 6000}]
|
|
|
|
# simulate an old-style pickled ring
|
|
testgz = os.path.join(self.testdir,
|
|
'without_replication_or_region.ring.gz')
|
|
ring_data = ring.RingData(self.intended_replica2part2dev_id,
|
|
devs,
|
|
self.intended_part_shift)
|
|
# an old-style pickled ring won't have region data
|
|
for dev in ring_data.devs:
|
|
if dev:
|
|
del dev["region"]
|
|
gz_file = GzipFile(testgz, 'wb')
|
|
pickle.dump(ring_data, gz_file, protocol=2)
|
|
gz_file.close()
|
|
|
|
self.ring = ring.Ring(
|
|
self.testdir,
|
|
reload_time=self.intended_reload_time,
|
|
ring_name='without_replication_or_region')
|
|
self.assertEqual(self.ring.devs, intended_devs)
|
|
|
|
def test_get_part(self):
|
|
part1 = self.ring.get_part('a')
|
|
nodes1 = self.ring.get_part_nodes(part1)
|
|
part2, nodes2 = self.ring.get_nodes('a')
|
|
self.assertEqual(part1, part2)
|
|
self.assertEqual(nodes1, nodes2)
|
|
|
|
def test_get_part_nodes(self):
|
|
part, nodes = self.ring.get_nodes('a')
|
|
self.assertEqual(nodes, self.ring.get_part_nodes(part))
|
|
|
|
def test_get_nodes(self):
|
|
# Yes, these tests are deliberately very fragile. We want to make sure
|
|
# that if someones changes the results the ring produces, they know it.
|
|
self.assertRaises(TypeError, self.ring.get_nodes)
|
|
part, nodes = self.ring.get_nodes('a')
|
|
self.assertEqual(part, 0)
|
|
self.assertEqual(nodes, [dict(node, index=i) for i, node in
|
|
enumerate([self.intended_devs[0],
|
|
self.intended_devs[3]])])
|
|
|
|
part, nodes = self.ring.get_nodes('a1')
|
|
self.assertEqual(part, 0)
|
|
self.assertEqual(nodes, [dict(node, index=i) for i, node in
|
|
enumerate([self.intended_devs[0],
|
|
self.intended_devs[3]])])
|
|
|
|
part, nodes = self.ring.get_nodes('a4')
|
|
self.assertEqual(part, 1)
|
|
self.assertEqual(nodes, [dict(node, index=i) for i, node in
|
|
enumerate([self.intended_devs[1],
|
|
self.intended_devs[4]])])
|
|
|
|
part, nodes = self.ring.get_nodes('aa')
|
|
self.assertEqual(part, 1)
|
|
self.assertEqual(nodes, [dict(node, index=i) for i, node in
|
|
enumerate([self.intended_devs[1],
|
|
self.intended_devs[4]])])
|
|
|
|
part, nodes = self.ring.get_nodes('a', 'c1')
|
|
self.assertEqual(part, 0)
|
|
self.assertEqual(nodes, [dict(node, index=i) for i, node in
|
|
enumerate([self.intended_devs[0],
|
|
self.intended_devs[3]])])
|
|
|
|
part, nodes = self.ring.get_nodes('a', 'c0')
|
|
self.assertEqual(part, 3)
|
|
self.assertEqual(nodes, [dict(node, index=i) for i, node in
|
|
enumerate([self.intended_devs[1],
|
|
self.intended_devs[4]])])
|
|
|
|
part, nodes = self.ring.get_nodes('a', 'c3')
|
|
self.assertEqual(part, 2)
|
|
self.assertEqual(nodes, [dict(node, index=i) for i, node in
|
|
enumerate([self.intended_devs[0],
|
|
self.intended_devs[3]])])
|
|
|
|
part, nodes = self.ring.get_nodes('a', 'c2')
|
|
self.assertEqual(nodes, [dict(node, index=i) for i, node in
|
|
enumerate([self.intended_devs[0],
|
|
self.intended_devs[3]])])
|
|
|
|
part, nodes = self.ring.get_nodes('a', 'c', 'o1')
|
|
self.assertEqual(part, 1)
|
|
self.assertEqual(nodes, [dict(node, index=i) for i, node in
|
|
enumerate([self.intended_devs[1],
|
|
self.intended_devs[4]])])
|
|
|
|
part, nodes = self.ring.get_nodes('a', 'c', 'o5')
|
|
self.assertEqual(part, 0)
|
|
self.assertEqual(nodes, [dict(node, index=i) for i, node in
|
|
enumerate([self.intended_devs[0],
|
|
self.intended_devs[3]])])
|
|
|
|
part, nodes = self.ring.get_nodes('a', 'c', 'o0')
|
|
self.assertEqual(part, 0)
|
|
self.assertEqual(nodes, [dict(node, index=i) for i, node in
|
|
enumerate([self.intended_devs[0],
|
|
self.intended_devs[3]])])
|
|
|
|
part, nodes = self.ring.get_nodes('a', 'c', 'o2')
|
|
self.assertEqual(part, 2)
|
|
self.assertEqual(nodes, [dict(node, index=i) for i, node in
|
|
enumerate([self.intended_devs[0],
|
|
self.intended_devs[3]])])
|
|
|
|
def add_dev_to_ring(self, new_dev):
|
|
self.ring.devs.append(new_dev)
|
|
self.ring._rebuild_tier_data()
|
|
|
|
def test_get_more_nodes(self):
|
|
# Yes, these tests are deliberately very fragile. We want to make sure
|
|
# that if someone changes the results the ring produces, they know it.
|
|
exp_part = 6
|
|
exp_devs = [71, 77, 30]
|
|
exp_zones = set([6, 3, 7])
|
|
|
|
exp_handoffs = [99, 43, 94, 13, 1, 49, 60, 72, 27, 68, 78, 26, 21, 9,
|
|
51, 105, 47, 89, 65, 82, 34, 98, 38, 85, 16, 4, 59,
|
|
102, 40, 90, 20, 8, 54, 66, 80, 25, 14, 2, 50, 12, 0,
|
|
48, 70, 76, 32, 107, 45, 87, 101, 44, 93, 100, 42, 95,
|
|
106, 46, 88, 97, 37, 86, 96, 36, 84, 17, 5, 57, 63,
|
|
81, 33, 67, 79, 24, 15, 3, 58, 69, 75, 31, 61, 74, 29,
|
|
23, 10, 52, 22, 11, 53, 64, 83, 35, 62, 73, 28, 18, 6,
|
|
56, 104, 39, 91, 103, 41, 92, 19, 7, 55]
|
|
|
|
exp_first_handoffs = [23, 64, 105, 102, 67, 17, 99, 65, 69, 97, 15,
|
|
17, 24, 98, 66, 65, 69, 18, 104, 105, 16, 107,
|
|
100, 15, 14, 19, 102, 105, 63, 104, 99, 12, 107,
|
|
99, 16, 105, 71, 15, 15, 63, 63, 99, 21, 68, 20,
|
|
64, 96, 21, 98, 19, 68, 99, 15, 69, 62, 100, 96,
|
|
102, 17, 62, 13, 61, 102, 105, 22, 16, 21, 18,
|
|
21, 100, 20, 16, 21, 106, 66, 106, 16, 99, 16,
|
|
22, 62, 60, 99, 69, 18, 23, 104, 98, 106, 61,
|
|
21, 23, 23, 16, 67, 71, 101, 16, 64, 66, 70, 15,
|
|
102, 63, 19, 98, 18, 106, 101, 100, 62, 63, 98,
|
|
18, 13, 97, 23, 22, 100, 13, 14, 67, 96, 14,
|
|
105, 97, 71, 64, 96, 22, 65, 66, 98, 19, 105,
|
|
98, 97, 21, 15, 69, 100, 98, 106, 65, 66, 97,
|
|
62, 22, 68, 63, 61, 67, 67, 20, 105, 106, 105,
|
|
18, 71, 100, 17, 62, 60, 13, 103, 99, 101, 96,
|
|
97, 16, 60, 21, 14, 20, 12, 60, 69, 104, 65, 65,
|
|
17, 16, 67, 13, 64, 15, 16, 68, 96, 21, 104, 66,
|
|
96, 105, 58, 105, 103, 21, 96, 60, 16, 96, 21,
|
|
71, 16, 99, 101, 63, 62, 103, 18, 102, 60, 17,
|
|
19, 106, 97, 14, 99, 68, 102, 13, 70, 103, 21,
|
|
22, 19, 61, 103, 23, 104, 65, 62, 68, 16, 65,
|
|
15, 102, 102, 71, 99, 63, 67, 19, 23, 15, 69,
|
|
107, 14, 13, 64, 13, 105, 15, 98, 69]
|
|
|
|
rb = ring.RingBuilder(8, 3, 1)
|
|
next_dev_id = 0
|
|
for zone in range(1, 10):
|
|
for server in range(1, 5):
|
|
for device in range(1, 4):
|
|
rb.add_dev({'id': next_dev_id,
|
|
'ip': '1.2.%d.%d' % (zone, server),
|
|
'port': 1234 + device,
|
|
'zone': zone, 'region': 0,
|
|
'weight': 1.0})
|
|
next_dev_id += 1
|
|
rb.rebalance(seed=2)
|
|
rb.get_ring().save(self.testgz)
|
|
r = ring.Ring(self.testdir, ring_name='whatever')
|
|
|
|
# every part has the same number of handoffs
|
|
part_handoff_counts = set()
|
|
for part in range(r.partition_count):
|
|
part_handoff_counts.add(len(list(r.get_more_nodes(part))))
|
|
self.assertEqual(part_handoff_counts, {105})
|
|
# which less the primaries - is every device in the ring
|
|
self.assertEqual(len(list(rb._iter_devs())) - rb.replicas, 105)
|
|
|
|
part, devs = r.get_nodes('a', 'c', 'o')
|
|
primary_zones = set([d['zone'] for d in devs])
|
|
self.assertEqual(part, exp_part)
|
|
self.assertEqual([d['id'] for d in devs], exp_devs)
|
|
self.assertEqual(primary_zones, exp_zones)
|
|
devs = list(r.get_more_nodes(part))
|
|
self.assertEqual(len(devs), len(exp_handoffs))
|
|
dev_ids = [d['id'] for d in devs]
|
|
self.assertEqual(dev_ids, exp_handoffs)
|
|
|
|
# The first 6 replicas plus the 3 primary nodes should cover all 9
|
|
# zones in this test
|
|
seen_zones = set(primary_zones)
|
|
seen_zones.update([d['zone'] for d in devs[:6]])
|
|
self.assertEqual(seen_zones, set(range(1, 10)))
|
|
|
|
# The first handoff nodes for each partition in the ring
|
|
devs = []
|
|
for part in range(r.partition_count):
|
|
devs.append(next(r.get_more_nodes(part))['id'])
|
|
self.assertEqual(devs, exp_first_handoffs)
|
|
|
|
# Add a new device we can handoff to.
|
|
zone = 5
|
|
server = 0
|
|
rb.add_dev({'id': next_dev_id,
|
|
'ip': '1.2.%d.%d' % (zone, server),
|
|
'port': 1234, 'zone': zone, 'region': 0, 'weight': 1.0})
|
|
next_dev_id += 1
|
|
rb.pretend_min_part_hours_passed()
|
|
num_parts_changed, _balance, _removed_dev = rb.rebalance(seed=2)
|
|
rb.get_ring().save(self.testgz)
|
|
r = ring.Ring(self.testdir, ring_name='whatever')
|
|
|
|
# so now we expect the device list to be longer by one device
|
|
part_handoff_counts = set()
|
|
for part in range(r.partition_count):
|
|
part_handoff_counts.add(len(list(r.get_more_nodes(part))))
|
|
self.assertEqual(part_handoff_counts, {106})
|
|
self.assertEqual(len(list(rb._iter_devs())) - rb.replicas, 106)
|
|
# I don't think there's any special reason this dev goes at this index
|
|
exp_handoffs.insert(27, rb.devs[-1]['id'])
|
|
|
|
# We would change expectations here, but in this part only the added
|
|
# device changed at all.
|
|
part, devs = r.get_nodes('a', 'c', 'o')
|
|
primary_zones = set([d['zone'] for d in devs])
|
|
self.assertEqual(part, exp_part)
|
|
self.assertEqual([d['id'] for d in devs], exp_devs)
|
|
self.assertEqual(primary_zones, exp_zones)
|
|
devs = list(r.get_more_nodes(part))
|
|
dev_ids = [d['id'] for d in devs]
|
|
self.assertEqual(len(dev_ids), len(exp_handoffs))
|
|
for index, dev in enumerate(dev_ids):
|
|
self.assertEqual(
|
|
dev, exp_handoffs[index],
|
|
'handoff differs at position %d\n%s\n%s' % (
|
|
index, dev_ids[index:], exp_handoffs[index:]))
|
|
|
|
# The handoffs still cover all the non-primary zones first
|
|
seen_zones = set(primary_zones)
|
|
seen_zones.update([d['zone'] for d in devs[:6]])
|
|
self.assertEqual(seen_zones, set(range(1, 10)))
|
|
|
|
# Change expectations for the rest of the parts
|
|
devs = []
|
|
for part in range(r.partition_count):
|
|
devs.append(next(r.get_more_nodes(part))['id'])
|
|
changed_first_handoff = 0
|
|
for part in range(r.partition_count):
|
|
if devs[part] != exp_first_handoffs[part]:
|
|
changed_first_handoff += 1
|
|
exp_first_handoffs[part] = devs[part]
|
|
self.assertEqual(devs, exp_first_handoffs)
|
|
self.assertEqual(changed_first_handoff, num_parts_changed)
|
|
|
|
# Remove a device - no need to fluff min_part_hours.
|
|
rb.remove_dev(0)
|
|
num_parts_changed, _balance, _removed_dev = rb.rebalance(seed=1)
|
|
rb.get_ring().save(self.testgz)
|
|
r = ring.Ring(self.testdir, ring_name='whatever')
|
|
|
|
# so now we expect the device list to be shorter by one device
|
|
part_handoff_counts = set()
|
|
for part in range(r.partition_count):
|
|
part_handoff_counts.add(len(list(r.get_more_nodes(part))))
|
|
self.assertEqual(part_handoff_counts, {105})
|
|
self.assertEqual(len(list(rb._iter_devs())) - rb.replicas, 105)
|
|
|
|
# Change expectations for our part
|
|
exp_handoffs.remove(0)
|
|
first_matches = 0
|
|
total_changed = 0
|
|
devs = list(d['id'] for d in r.get_more_nodes(exp_part))
|
|
for i, part in enumerate(devs):
|
|
if exp_handoffs[i] != devs[i]:
|
|
total_changed += 1
|
|
exp_handoffs[i] = devs[i]
|
|
if not total_changed:
|
|
first_matches += 1
|
|
self.assertEqual(devs, exp_handoffs)
|
|
# the first 21 handoffs were the same across the rebalance
|
|
self.assertEqual(first_matches, 21)
|
|
# but as you dig deeper some of the differences show up
|
|
self.assertEqual(total_changed, 41)
|
|
|
|
# Change expectations for the rest of the parts
|
|
devs = []
|
|
for part in range(r.partition_count):
|
|
devs.append(next(r.get_more_nodes(part))['id'])
|
|
changed_first_handoff = 0
|
|
for part in range(r.partition_count):
|
|
if devs[part] != exp_first_handoffs[part]:
|
|
changed_first_handoff += 1
|
|
exp_first_handoffs[part] = devs[part]
|
|
self.assertEqual(devs, exp_first_handoffs)
|
|
self.assertEqual(changed_first_handoff, num_parts_changed)
|
|
|
|
# Test
|
|
part, devs = r.get_nodes('a', 'c', 'o')
|
|
primary_zones = set([d['zone'] for d in devs])
|
|
self.assertEqual(part, exp_part)
|
|
self.assertEqual([d['id'] for d in devs], exp_devs)
|
|
self.assertEqual(primary_zones, exp_zones)
|
|
devs = list(r.get_more_nodes(part))
|
|
dev_ids = [d['id'] for d in devs]
|
|
self.assertEqual(len(dev_ids), len(exp_handoffs))
|
|
for index, dev in enumerate(dev_ids):
|
|
self.assertEqual(
|
|
dev, exp_handoffs[index],
|
|
'handoff differs at position %d\n%s\n%s' % (
|
|
index, dev_ids[index:], exp_handoffs[index:]))
|
|
|
|
seen_zones = set(primary_zones)
|
|
seen_zones.update([d['zone'] for d in devs[:6]])
|
|
self.assertEqual(seen_zones, set(range(1, 10)))
|
|
|
|
devs = []
|
|
for part in range(r.partition_count):
|
|
devs.append(next(r.get_more_nodes(part))['id'])
|
|
for part in range(r.partition_count):
|
|
self.assertEqual(
|
|
devs[part], exp_first_handoffs[part],
|
|
'handoff for partitition %d is now device id %d' % (
|
|
part, devs[part]))
|
|
|
|
# Add a partial replica
|
|
rb.set_replicas(3.5)
|
|
num_parts_changed, _balance, _removed_dev = rb.rebalance(seed=164)
|
|
rb.get_ring().save(self.testgz)
|
|
r = ring.Ring(self.testdir, ring_name='whatever')
|
|
|
|
# Change expectations
|
|
|
|
# We have another replica now
|
|
exp_devs.append(90)
|
|
exp_zones.add(8)
|
|
# and therefore one less handoff
|
|
exp_handoffs = exp_handoffs[:-1]
|
|
# Caused some major changes in the sequence of handoffs for our test
|
|
# partition, but at least the first stayed the same.
|
|
devs = list(d['id'] for d in r.get_more_nodes(exp_part))
|
|
first_matches = 0
|
|
total_changed = 0
|
|
for i, part in enumerate(devs):
|
|
if exp_handoffs[i] != devs[i]:
|
|
total_changed += 1
|
|
exp_handoffs[i] = devs[i]
|
|
if not total_changed:
|
|
first_matches += 1
|
|
# most seeds seem to throw out first handoff stabilization with
|
|
# replica_count change
|
|
self.assertEqual(first_matches, 2)
|
|
# and lots of other handoff changes...
|
|
self.assertEqual(total_changed, 95)
|
|
|
|
self.assertEqual(devs, exp_handoffs)
|
|
|
|
# Change expectations for the rest of the parts
|
|
devs = []
|
|
for part in range(r.partition_count):
|
|
devs.append(next(r.get_more_nodes(part))['id'])
|
|
changed_first_handoff = 0
|
|
for part in range(r.partition_count):
|
|
if devs[part] != exp_first_handoffs[part]:
|
|
changed_first_handoff += 1
|
|
exp_first_handoffs[part] = devs[part]
|
|
self.assertEqual(devs, exp_first_handoffs)
|
|
self.assertLessEqual(changed_first_handoff, num_parts_changed)
|
|
|
|
# Test
|
|
part, devs = r.get_nodes('a', 'c', 'o')
|
|
primary_zones = set([d['zone'] for d in devs])
|
|
self.assertEqual(part, exp_part)
|
|
self.assertEqual([d['id'] for d in devs], exp_devs)
|
|
self.assertEqual(primary_zones, exp_zones)
|
|
devs = list(r.get_more_nodes(part))
|
|
dev_ids = [d['id'] for d in devs]
|
|
self.assertEqual(len(dev_ids), len(exp_handoffs))
|
|
|
|
for index, dev in enumerate(dev_ids):
|
|
self.assertEqual(
|
|
dev, exp_handoffs[index],
|
|
'handoff differs at position %d\n%s\n%s' % (
|
|
index, dev_ids[index:], exp_handoffs[index:]))
|
|
|
|
seen_zones = set(primary_zones)
|
|
seen_zones.update([d['zone'] for d in devs[:6]])
|
|
self.assertEqual(seen_zones, set(range(1, 10)))
|
|
|
|
devs = []
|
|
for part in range(r.partition_count):
|
|
devs.append(next(r.get_more_nodes(part))['id'])
|
|
for part in range(r.partition_count):
|
|
self.assertEqual(
|
|
devs[part], exp_first_handoffs[part],
|
|
'handoff for partitition %d is now device id %d' % (
|
|
part, devs[part]))
|
|
|
|
# One last test of a partial replica partition
|
|
exp_part2 = 136
|
|
exp_devs2 = [70, 76, 32]
|
|
exp_zones2 = set([3, 6, 7])
|
|
exp_handoffs2 = [89, 97, 37, 53, 20, 1, 86, 64, 102, 40, 90, 60, 72,
|
|
27, 99, 68, 78, 26, 105, 45, 42, 95, 22, 13, 49, 55,
|
|
11, 8, 83, 16, 4, 59, 33, 108, 61, 74, 29, 88, 66,
|
|
80, 25, 100, 39, 67, 79, 24, 65, 96, 36, 84, 54, 21,
|
|
63, 81, 56, 71, 77, 30, 48, 23, 10, 52, 82, 34, 17,
|
|
107, 87, 104, 5, 35, 2, 50, 43, 62, 73, 28, 18, 14,
|
|
98, 38, 85, 15, 57, 9, 51, 12, 6, 91, 3, 103, 41, 92,
|
|
47, 75, 44, 69, 101, 93, 106, 46, 94, 31, 19, 7, 58]
|
|
|
|
part2, devs2 = r.get_nodes('a', 'c', 'o2')
|
|
primary_zones2 = set([d['zone'] for d in devs2])
|
|
self.assertEqual(part2, exp_part2)
|
|
self.assertEqual([d['id'] for d in devs2], exp_devs2)
|
|
self.assertEqual(primary_zones2, exp_zones2)
|
|
devs2 = list(r.get_more_nodes(part2))
|
|
dev_ids2 = [d['id'] for d in devs2]
|
|
|
|
self.assertEqual(len(dev_ids2), len(exp_handoffs2))
|
|
for index, dev in enumerate(dev_ids2):
|
|
self.assertEqual(
|
|
dev, exp_handoffs2[index],
|
|
'handoff differs at position %d\n%s\n%s' % (
|
|
index, dev_ids2[index:], exp_handoffs2[index:]))
|
|
|
|
seen_zones = set(primary_zones2)
|
|
seen_zones.update([d['zone'] for d in devs2[:6]])
|
|
self.assertEqual(seen_zones, set(range(1, 10)))
|
|
|
|
# Test distribution across regions
|
|
rb.set_replicas(3)
|
|
for region in range(1, 5):
|
|
rb.add_dev({'id': next_dev_id,
|
|
'ip': '1.%d.1.%d' % (region, server), 'port': 1234,
|
|
# 108.0 is the weight of all devices created prior to
|
|
# this test in region 0; this way all regions have
|
|
# equal combined weight
|
|
'zone': 1, 'region': region, 'weight': 108.0})
|
|
next_dev_id += 1
|
|
rb.pretend_min_part_hours_passed()
|
|
rb.rebalance(seed=1)
|
|
rb.pretend_min_part_hours_passed()
|
|
rb.rebalance(seed=1)
|
|
rb.get_ring().save(self.testgz)
|
|
r = ring.Ring(self.testdir, ring_name='whatever')
|
|
|
|
# There's 5 regions now, so the primary nodes + first 2 handoffs
|
|
# should span all 5 regions
|
|
part, devs = r.get_nodes('a1', 'c1', 'o1')
|
|
primary_regions = set([d['region'] for d in devs])
|
|
primary_zones = set([(d['region'], d['zone']) for d in devs])
|
|
more_devs = list(r.get_more_nodes(part))
|
|
|
|
seen_regions = set(primary_regions)
|
|
seen_regions.update([d['region'] for d in more_devs[:2]])
|
|
self.assertEqual(seen_regions, set(range(0, 5)))
|
|
|
|
# There are 13 zones now, so the first 13 nodes should all have
|
|
# distinct zones (that's r0z0, r0z1, ..., r0z8, r1z1, r2z1, r3z1, and
|
|
# r4z1).
|
|
seen_zones = set(primary_zones)
|
|
seen_zones.update([(d['region'], d['zone']) for d in more_devs[:10]])
|
|
self.assertEqual(13, len(seen_zones))
|
|
|
|
# Here's a brittle canary-in-the-coalmine test to make sure the region
|
|
# handoff computation didn't change accidentally
|
|
exp_handoffs = [111, 112, 35, 58, 62, 74, 20, 105, 41, 90, 53, 6, 3,
|
|
67, 55, 76, 108, 32, 12, 80, 38, 85, 94, 42, 27, 99,
|
|
50, 47, 70, 87, 26, 9, 15, 97, 102, 81, 23, 65, 33,
|
|
77, 34, 4, 75, 8, 5, 30, 13, 73, 36, 92, 54, 51, 72,
|
|
78, 66, 1, 48, 14, 93, 95, 88, 86, 84, 106, 60, 101,
|
|
57, 43, 89, 59, 79, 46, 61, 52, 44, 45, 37, 68, 25,
|
|
100, 49, 24, 16, 71, 96, 21, 107, 98, 64, 39, 18, 29,
|
|
103, 91, 22, 63, 69, 28, 56, 11, 82, 10, 17, 19, 7,
|
|
40, 83, 104, 31]
|
|
dev_ids = [d['id'] for d in more_devs]
|
|
|
|
self.assertEqual(len(dev_ids), len(exp_handoffs))
|
|
for index, dev_id in enumerate(dev_ids):
|
|
self.assertEqual(
|
|
dev_id, exp_handoffs[index],
|
|
'handoff differs at position %d\n%s\n%s' % (
|
|
index, dev_ids[index:], exp_handoffs[index:]))
|
|
|
|
|
|
if __name__ == '__main__':
|
|
unittest.main()
|