Add round-robin candidate generation strategy

The previous patch introduced [placement]max_allocation_candidates
config option to limit the number of candidates generated for a single
query.

If the number of generated allocation candidates are limited by that
config option then it is possible to get candidates from a limited set of
root providers (computes, anchoring providers) as placement uses a
depth-first strategy, generating all candidates from the first root
before considering the next one.

To avoid unbalanced results this patch introduces a new config option
[placement]allocation_candidates_generation_strategy with the possible
values:
* depth-first, the original strategy that generates all candidate from
  the first root before moving to the next. This is will be the default
  strategy for backward compatibility
* breadth-first, a new possible strategy that generates candidates from
  available roots in a round-robin fashion, one candidate from each
  root before taking the second candidate from the first root.

Closes-Bug: #2070257
Change-Id: Ib7a140374bc91cc9ab597d0923b0623f618ec32c
This commit is contained in:
Balazs Gibizer 2024-12-06 16:55:46 +01:00
parent 93674ecfa5
commit f20e13f0b2
7 changed files with 261 additions and 5 deletions

View File

@ -84,6 +84,38 @@ under the same root having inventory from the same resource class
to tune this config option based on the memory available for the
placement service and the client timeout setting on the client side. A good
initial value could be around 100000.
In a deployment with wide and symmetric provider trees we also recommend to
change the [placement]allocation_candidates_generation_strategy to
breadth-first.
"""),
cfg.StrOpt(
'allocation_candidates_generation_strategy',
default="depth-first",
choices=("depth-first", "breadth-first"),
help="""
Defines the order placement visits viable root providers during allocation
candidate generation:
* depth-first, generates all candidates from the first viable root provider
before moving to the next.
* breadth-first, generates candidates from viable roots in a round-robin
fashion, creating one candidate from each viable root before creating the
second candidate from the first root.
If the deployment has wide and symmetric provider trees, i.e. there are
multiple children providers under the same root having inventory from the same
resource class (e.g. in case of nova's mdev GPU or PCI in Placement features)
then the depth-first strategy with a max_allocation_candidates
limit might produce candidates from a limited set of root providers. On the
other hand breadth-first strategy will ensure that the candidates are returned
from all viable roots in a balanced way.
Both strategies produce the candidates in the API response in an undefined but
deterministic order. That is, all things being equal, two requests for
allocation candidates will return the same results in the same order; but no
guarantees are made as to how that order is determined.
"""),
]

View File

@ -24,6 +24,7 @@ from placement import exception
from placement.objects import research_context as res_ctx
from placement.objects import resource_provider as rp_obj
from placement.objects import trait as trait_obj
from placement import util
_ALLOC_TBL = models.Allocation.__table__
@ -718,9 +719,21 @@ def _get_areq_list_generators(areq_lists_by_anchor, all_suffixes):
]
def _generate_areq_lists(areq_lists_by_anchor, all_suffixes):
def _generate_areq_lists(rw_ctx, areq_lists_by_anchor, all_suffixes):
strategy = (
rw_ctx.config.placement.allocation_candidates_generation_strategy)
generators = _get_areq_list_generators(areq_lists_by_anchor, all_suffixes)
return itertools.chain(*generators)
if strategy == "depth-first":
# Generates all solutions from the first anchor before moving to the
# next
return itertools.chain(*generators)
if strategy == "breadth-first":
# Generates solutions from anchors in a round-robin manner. So the
# number of solutions generated are balanced between each viable
# anchors.
return util.roundrobin(*generators)
raise ValueError("Strategy '%s' not recognized" % strategy)
# TODO(efried): Move _merge_candidates to rw_ctx?
@ -769,7 +782,9 @@ def _merge_candidates(candidates, rw_ctx):
all_suffixes = set(candidates)
num_granular_groups = len(all_suffixes - set(['']))
max_a_c = rw_ctx.config.placement.max_allocation_candidates
for areq_list in _generate_areq_lists(areq_lists_by_anchor, all_suffixes):
for areq_list in _generate_areq_lists(
rw_ctx, areq_lists_by_anchor, all_suffixes
):
# At this point, each AllocationRequest in areq_list is still
# marked as use_same_provider. This is necessary to filter by group
# policy, which enforces how these interact with each other.

View File

@ -32,6 +32,9 @@ class TestWideTreeAllocationCandidateExplosion(base.TestCase):
self.conf_fixture.conf.set_override(
"max_allocation_candidates", 100000, group="placement")
self.conf_fixture.conf.set_override(
"allocation_candidates_generation_strategy", "breadth-first",
group="placement")
def create_tree(self, num_roots, num_child, num_res_per_child):
self.roots = {}
@ -108,11 +111,14 @@ class TestWideTreeAllocationCandidateExplosion(base.TestCase):
expected_candidates=1000, expected_computes_with_candidates=2)
def test_too_many_candidates_global_limit_is_hit_result_unbalanced(self):
self.conf_fixture.conf.set_override(
"allocation_candidates_generation_strategy", "depth-first",
group="placement")
# With max_allocation_candidates set to 100k limit this test now
# runs in reasonable time (10 sec on my machine), without that it would
# time out.
# However, with the global limit in place only the first compute gets
# candidates.
# However, with depth-first strategy and with the global limit in place
# only the first compute gets candidates.
# 524288 valid candidates, the generation stops at 100k candidates,
# only 1000 is returned, result is unbalanced as the first 100k
# candidate is always from the first compute.
@ -121,6 +127,21 @@ class TestWideTreeAllocationCandidateExplosion(base.TestCase):
req_limit=1000,
expected_candidates=1000, expected_computes_with_candidates=1)
def test_too_many_candidates_global_limit_is_hit_breadth_first_balanced(
self
):
# With max_allocation_candidates set to 100k limit this test now
# runs in reasonable time (10 sec on my machine), without that it would
# time out.
# With the round-robin candidate generator in place the 100k generated
# candidates spread across both computes now.
# 524288 valid candidates, the generation stops at 100k candidates,
# only 1000 is returned, result is balanced between the computes
self._test_num_candidates_and_computes(
computes=2, pfs=8, vfs_per_pf=8, req_groups=6, req_res_per_group=1,
req_limit=1000,
expected_candidates=1000, expected_computes_with_candidates=2)
def test_global_limit_hit(self):
# 8192 possible candidates, global limit is set to 8000, higher request
# limit so number of candidates are limited by the global limit
@ -140,3 +161,30 @@ class TestWideTreeAllocationCandidateExplosion(base.TestCase):
computes=2, pfs=8, vfs_per_pf=8, req_groups=4, req_res_per_group=1,
req_limit=9000,
expected_candidates=8192, expected_computes_with_candidates=2)
def test_breadth_first_strategy_generates_stable_ordering(self):
"""Run the same query twice against the same two tree and assert that
response text is exactly the same proving that even with breadth-first
strategy the candidate ordering is stable.
"""
self.create_tree(num_roots=2, num_child=8, num_res_per_child=8)
def query():
return client.get(
self.get_candidate_query(
num_groups=2, num_res=1,
limit=1000),
headers=self.headers)
conf = self.conf_fixture.conf
with direct.PlacementDirect(conf) as client:
resp = query()
self.assertEqual(200, resp.status_code)
body1 = resp.text
resp = query()
self.assertEqual(200, resp.status_code)
body2 = resp.text
self.assertEqual(body1, body2)

View File

@ -97,3 +97,60 @@ class TestAllocationCandidatesNoDB(base.TestCase):
for group in different_subtree:
self.assertFalse(
ac_obj._check_same_subtree(group, parent_by_rp))
@mock.patch('placement.objects.research_context._has_provider_trees',
new=mock.Mock(return_value=True))
def _test_generate_areq_list(self, strategy, expected_candidates):
self.conf_fixture.conf.set_override(
"allocation_candidates_generation_strategy", strategy,
group="placement")
rw_ctx = res_ctx.RequestWideSearchContext(
self.context, placement_lib.RequestWideParams(), True)
areq_lists_by_anchor = {
"root1": {
"": ["r1A", "r1B",],
"group1": ["r1g1A", "r1g1B",],
},
"root2": {
"": ["r2A"],
"group1": ["r2g1A", "r2g1B"],
},
"root3": {
"": ["r3A"],
},
}
generator = ac_obj._generate_areq_lists(
rw_ctx, areq_lists_by_anchor, {"", "group1"})
self.assertEqual(expected_candidates, list(generator))
def test_generate_areq_lists_depth_first(self):
# Depth-first will generate all root1 candidates first then root2,
# root3 is ignored as it has no candidate for group1.
expected_candidates = [
('r1A', 'r1g1A'),
('r1A', 'r1g1B'),
('r1B', 'r1g1A'),
('r1B', 'r1g1B'),
('r2A', 'r2g1A'),
('r2A', 'r2g1B'),
]
self._test_generate_areq_list("depth-first", expected_candidates)
@mock.patch('placement.objects.research_context._has_provider_trees',
new=mock.Mock(return_value=True))
def test_generate_areq_lists_breadth_first(self):
# Breadth-first will take one candidate from root1 then root2 then goes
# back to root1 etc. Root2 runs out of candidates earlier than root1 so
# the last two candidates are both from root1. The root3 is still
# ignored as it has no candidates for group1.
expected_candidates = [
('r1A', 'r1g1A'),
('r2A', 'r2g1A'),
('r1A', 'r1g1B'),
('r2A', 'r2g1B'),
('r1B', 'r1g1A'),
('r1B', 'r1g1B')
]
self._test_generate_areq_list("breadth-first", expected_candidates)

View File

@ -32,6 +32,7 @@ from placement.objects import resource_class as rc_obj
from placement.objects import resource_provider as rp_obj
from placement.tests.unit import base
from placement import util
from placement.util import roundrobin
class TestCheckAccept(testtools.TestCase):
@ -1450,3 +1451,30 @@ class RunOnceTests(testtools.TestCase):
self.assertRaises(ValueError, f.reset)
self.assertFalse(f.called)
mock_clean.assert_called_once_with()
class RoundRobinTests(testtools.TestCase):
def test_no_input(self):
self.assertEqual([], list(roundrobin()))
def test_single_input(self):
self.assertEqual([1, 2], list(roundrobin(iter([1, 2]))))
def test_balanced_inputs(self):
self.assertEqual(
[1, "x", 2, "y"],
list(roundrobin(
iter([1, 2]),
iter(["x", "y"]))
)
)
def test_unbalanced_inputs(self):
self.assertEqual(
["A", "D", "E", "B", "F", "C"],
list(roundrobin(
iter("ABC"),
iter("D"),
iter("EF"))
)
)

View File

@ -614,3 +614,16 @@ def run_once(message, logger, cleanup=None):
wrapper.reset = functools.partial(reset, wrapper)
return wrapper
return outer_wrapper
def roundrobin(*iterables):
"""roundrobin(iter('ABC'), iter('D'), iter('EF')) --> A D E B F C
Returns a new generator consuming items from the passed in iterators in a
round-robin fashion.
It is adapted from
https://docs.python.org/3/library/itertools.html#itertools-recipes
"""
iterators = map(iter, iterables)
for num_active in range(len(iterables), 0, -1):
iterators = itertools.cycle(itertools.islice(iterators, num_active))
yield from map(next, iterators)

View File

@ -0,0 +1,63 @@
---
fixes:
- |
In a deployment with wide and symmetric provider trees, i.e. where there
are multiple children providers under the same root having inventory from
the same resource class (e.g. in case of nova's mdev GPU or PCI in
Placement features) if the allocation candidate request asks for resources
from those children RPs in multiple request groups the number of possible
allocation candidates grows rapidly.
E.g.:
* 1 root, 8 child RPs with 1 unit of resource each
a_c requests 6 groups with 1 unit of resource each
=> 8*7*6*5*4*3=20160 possible candidates
* 1 root, 8 child RPs with 6 unit of resources each
a_c requests 6 groups with 6 unit of resources each
=> 8^6=262144 possible candidates
Placement generates these candidates fully before applying the limit
parameter provided in the allocation candidate query to be able do a random
sampling if ``[placement]randomize_allocation_candidates`` is True.
Placement takes excessive time and memory to generate this amount of
allocation candidates and the client might time out waiting for the
response or the Placement API service run out of memory and crash.
To avoid request timeout or out of memory events a new
``[placement]max_allocation_candidates`` config option is implemented. This
limit is applied not after the request limit but *during* the
candidate generation process. So this new option can be used to limit the
runtime and memory consumption of the Placement API service.
The new config option is defaulted to ``-1``, meaning no limit, to keep the
legacy behavior. We suggest to tune this config in the affected
deployments based on the memory available for the Placement service and the
timeout setting of the clients. A good initial value could be around
``100000``.
If the number of generated allocation candidates is limited by the
``[placement]max_allocation_candidates`` config option then it is possible
to get candidates from a limited set of root providers (e.g. compute
nodes) as placement uses a depth-first strategy, i.e. generating all
candidates from the first root before considering the next one. To avoid
this issue a new config option
``[placement]allocation_candidates_generation_strategy`` is introduced
with two possible values:
* ``depth-first``, generates all candidates from the first viable root
provider before moving to the next. This is the default and this
triggers the old behavior
* ``breadth-first``, generates candidates from viable roots in a
round-robin fashion, creating one candidate from each viable root
before creating the second candidate from the first root. This is the
possible behavior.
In a deployment where ``[placement]max_allocation_candidates`` is
configured to a positive number we recommend to set
``[placement]allocation_candidates_generation_strategy`` to
``breadth-first``.
.. _Bug#2070257: https://bugs.launchpad.net/nova/+bug/2070257