Add round-robin candidate generation strategy

The previous patch introduced [placement]max_allocation_candidates config option to limit the number of candidates generated for a single query. If the number of generated allocation candidates are limited by that config option then it is possible to get candidates from a limited set of root providers (computes, anchoring providers) as placement uses a depth-first strategy, generating all candidates from the first root before considering the next one. To avoid unbalanced results this patch introduces a new config option [placement]allocation_candidates_generation_strategy with the possible values: * depth-first, the original strategy that generates all candidate from the first root before moving to the next. This is will be the default strategy for backward compatibility * breadth-first, a new possible strategy that generates candidates from available roots in a round-robin fashion, one candidate from each root before taking the second candidate from the first root. Closes-Bug: #2070257 Change-Id: Ib7a140374bc91cc9ab597d0923b0623f618ec32c
2024-12-06 16:55:46 +01:00
parent 93674ecfa5
commit f20e13f0b2
7 changed files with 261 additions and 5 deletions
--- a/placement/conf/placement.py
+++ b/placement/conf/placement.py
@@ -84,6 +84,38 @@ under the same root having inventory from the same resource class
 to tune this config option based on the memory available for the
 placement service and the client timeout setting on the client side. A good
 initial value could be around 100000.
+
+In a deployment with wide and symmetric provider trees we also recommend to
+change the [placement]allocation_candidates_generation_strategy to
+breadth-first.
+"""),
+    cfg.StrOpt(
+        'allocation_candidates_generation_strategy',
+        default="depth-first",
+        choices=("depth-first", "breadth-first"),
+        help="""
+Defines the order placement visits viable root providers during allocation
+candidate generation:
+
+* depth-first, generates all candidates from the first viable root provider
+  before moving to the next.
+
+* breadth-first, generates candidates from viable roots in a round-robin
+  fashion, creating one candidate from each viable root before creating the
+  second candidate from the first root.
+
+If the deployment has wide and symmetric provider trees, i.e. there are
+multiple children providers under the same root having inventory from the same
+resource class (e.g. in case of nova's mdev GPU or PCI in Placement features)
+then the depth-first strategy with a max_allocation_candidates
+limit might produce candidates from a limited set of root providers. On the
+other hand breadth-first strategy will ensure that the candidates are returned
+from all viable roots in a balanced way.
+
+Both strategies produce the candidates in the API response in an undefined but
+deterministic order. That is, all things being equal, two requests for
+allocation candidates will return the same results in the same order; but no
+guarantees are made as to how that order is determined.
 """),
 ]

--- a/placement/objects/allocation_candidate.py
+++ b/placement/objects/allocation_candidate.py
@@ -24,6 +24,7 @@ from placement import exception
 from placement.objects import research_context as res_ctx
 from placement.objects import resource_provider as rp_obj
 from placement.objects import trait as trait_obj
+from placement import util


 _ALLOC_TBL = models.Allocation.__table__
@@ -718,9 +719,21 @@ def _get_areq_list_generators(areq_lists_by_anchor, all_suffixes):
    ]


-def _generate_areq_lists(areq_lists_by_anchor, all_suffixes):
+def _generate_areq_lists(rw_ctx, areq_lists_by_anchor, all_suffixes):
+    strategy = (
+        rw_ctx.config.placement.allocation_candidates_generation_strategy)
    generators = _get_areq_list_generators(areq_lists_by_anchor, all_suffixes)
-    return itertools.chain(*generators)
+    if strategy == "depth-first":
+        # Generates all solutions from the first anchor before moving to the
+        # next
+        return itertools.chain(*generators)
+    if strategy == "breadth-first":
+        # Generates solutions from anchors in a round-robin manner. So the
+        # number of solutions generated are balanced between each viable
+        # anchors.
+        return util.roundrobin(*generators)
+
+    raise ValueError("Strategy '%s' not recognized" % strategy)

 # TODO(efried): Move _merge_candidates to rw_ctx?

@@ -769,7 +782,9 @@ def _merge_candidates(candidates, rw_ctx):
    all_suffixes = set(candidates)
    num_granular_groups = len(all_suffixes - set(['']))
    max_a_c = rw_ctx.config.placement.max_allocation_candidates
-    for areq_list in _generate_areq_lists(areq_lists_by_anchor, all_suffixes):
+    for areq_list in _generate_areq_lists(
+        rw_ctx, areq_lists_by_anchor, all_suffixes
+    ):
        # At this point, each AllocationRequest in areq_list is still
        # marked as use_same_provider. This is necessary to filter by group
        # policy, which enforces how these interact with each other.
--- a/placement/tests/functional/test_allocation_candidates.py
+++ b/placement/tests/functional/test_allocation_candidates.py
@@ -32,6 +32,9 @@ class TestWideTreeAllocationCandidateExplosion(base.TestCase):

        self.conf_fixture.conf.set_override(
            "max_allocation_candidates", 100000, group="placement")
+        self.conf_fixture.conf.set_override(
+            "allocation_candidates_generation_strategy", "breadth-first",
+            group="placement")

    def create_tree(self, num_roots, num_child, num_res_per_child):
        self.roots = {}
@@ -108,11 +111,14 @@ class TestWideTreeAllocationCandidateExplosion(base.TestCase):
            expected_candidates=1000, expected_computes_with_candidates=2)

    def test_too_many_candidates_global_limit_is_hit_result_unbalanced(self):
+        self.conf_fixture.conf.set_override(
+            "allocation_candidates_generation_strategy", "depth-first",
+            group="placement")
        # With max_allocation_candidates set to 100k limit this test now
        # runs in reasonable time (10 sec on my machine), without that it would
        # time out.
-        # However, with the global limit in place only the first compute gets
-        # candidates.
+        # However, with depth-first strategy and with the global limit in place
+        # only the first compute gets candidates.
        # 524288 valid candidates, the generation stops at 100k candidates,
        # only 1000 is returned, result is unbalanced as the first 100k
        # candidate is always from the first compute.
@@ -121,6 +127,21 @@ class TestWideTreeAllocationCandidateExplosion(base.TestCase):
            req_limit=1000,
            expected_candidates=1000, expected_computes_with_candidates=1)

+    def test_too_many_candidates_global_limit_is_hit_breadth_first_balanced(
+        self
+    ):
+        # With max_allocation_candidates set to 100k limit this test now
+        # runs in reasonable time (10 sec on my machine), without that it would
+        # time out.
+        # With the round-robin candidate generator in place the 100k generated
+        # candidates spread across both computes now.
+        # 524288 valid candidates, the generation stops at 100k candidates,
+        # only 1000 is returned, result is balanced between the computes
+        self._test_num_candidates_and_computes(
+            computes=2, pfs=8, vfs_per_pf=8, req_groups=6, req_res_per_group=1,
+            req_limit=1000,
+            expected_candidates=1000, expected_computes_with_candidates=2)
+
    def test_global_limit_hit(self):
        # 8192 possible candidates, global limit is set to 8000, higher request
        # limit so number of candidates are limited by the global limit
@@ -140,3 +161,30 @@ class TestWideTreeAllocationCandidateExplosion(base.TestCase):
            computes=2, pfs=8, vfs_per_pf=8, req_groups=4, req_res_per_group=1,
            req_limit=9000,
            expected_candidates=8192, expected_computes_with_candidates=2)
+
+    def test_breadth_first_strategy_generates_stable_ordering(self):
+        """Run the same query twice against the same two tree and assert that
+        response text is exactly the same proving that even with breadth-first
+        strategy the candidate ordering is stable.
+        """
+
+        self.create_tree(num_roots=2, num_child=8, num_res_per_child=8)
+
+        def query():
+            return client.get(
+                self.get_candidate_query(
+                    num_groups=2, num_res=1,
+                    limit=1000),
+                headers=self.headers)
+
+        conf = self.conf_fixture.conf
+        with direct.PlacementDirect(conf) as client:
+            resp = query()
+            self.assertEqual(200, resp.status_code)
+            body1 = resp.text
+
+            resp = query()
+            self.assertEqual(200, resp.status_code)
+            body2 = resp.text
+
+            self.assertEqual(body1, body2)
--- a/placement/tests/unit/objects/test_allocation_candidate.py
+++ b/placement/tests/unit/objects/test_allocation_candidate.py
@@ -97,3 +97,60 @@ class TestAllocationCandidatesNoDB(base.TestCase):
        for group in different_subtree:
            self.assertFalse(
                ac_obj._check_same_subtree(group, parent_by_rp))
+
+    @mock.patch('placement.objects.research_context._has_provider_trees',
+                new=mock.Mock(return_value=True))
+    def _test_generate_areq_list(self, strategy, expected_candidates):
+        self.conf_fixture.conf.set_override(
+            "allocation_candidates_generation_strategy", strategy,
+            group="placement")
+
+        rw_ctx = res_ctx.RequestWideSearchContext(
+            self.context, placement_lib.RequestWideParams(), True)
+        areq_lists_by_anchor = {
+            "root1": {
+                "": ["r1A", "r1B",],
+                "group1": ["r1g1A", "r1g1B",],
+            },
+            "root2": {
+                "": ["r2A"],
+                "group1": ["r2g1A", "r2g1B"],
+            },
+            "root3": {
+                "": ["r3A"],
+            },
+        }
+        generator = ac_obj._generate_areq_lists(
+            rw_ctx, areq_lists_by_anchor, {"", "group1"})
+
+        self.assertEqual(expected_candidates, list(generator))
+
+    def test_generate_areq_lists_depth_first(self):
+        # Depth-first will generate all root1 candidates first then root2,
+        # root3 is ignored as it has no candidate for group1.
+        expected_candidates = [
+            ('r1A', 'r1g1A'),
+            ('r1A', 'r1g1B'),
+            ('r1B', 'r1g1A'),
+            ('r1B', 'r1g1B'),
+            ('r2A', 'r2g1A'),
+            ('r2A', 'r2g1B'),
+        ]
+        self._test_generate_areq_list("depth-first", expected_candidates)
+
+    @mock.patch('placement.objects.research_context._has_provider_trees',
+                new=mock.Mock(return_value=True))
+    def test_generate_areq_lists_breadth_first(self):
+        # Breadth-first will take one candidate from root1 then root2 then goes
+        # back to root1 etc. Root2 runs out of candidates earlier than root1 so
+        # the last two candidates are both from root1. The root3 is still
+        # ignored as it has no candidates for group1.
+        expected_candidates = [
+            ('r1A', 'r1g1A'),
+            ('r2A', 'r2g1A'),
+            ('r1A', 'r1g1B'),
+            ('r2A', 'r2g1B'),
+            ('r1B', 'r1g1A'),
+            ('r1B', 'r1g1B')
+        ]
+        self._test_generate_areq_list("breadth-first", expected_candidates)
--- a/placement/tests/unit/test_util.py
+++ b/placement/tests/unit/test_util.py
@@ -32,6 +32,7 @@ from placement.objects import resource_class as rc_obj
 from placement.objects import resource_provider as rp_obj
 from placement.tests.unit import base
 from placement import util
+from placement.util import roundrobin


 class TestCheckAccept(testtools.TestCase):
@@ -1450,3 +1451,30 @@ class RunOnceTests(testtools.TestCase):
        self.assertRaises(ValueError, f.reset)
        self.assertFalse(f.called)
        mock_clean.assert_called_once_with()
+
+
+class RoundRobinTests(testtools.TestCase):
+    def test_no_input(self):
+        self.assertEqual([], list(roundrobin()))
+
+    def test_single_input(self):
+        self.assertEqual([1, 2], list(roundrobin(iter([1, 2]))))
+
+    def test_balanced_inputs(self):
+        self.assertEqual(
+            [1, "x", 2, "y"],
+            list(roundrobin(
+                iter([1, 2]),
+                iter(["x", "y"]))
+            )
+        )
+
+    def test_unbalanced_inputs(self):
+        self.assertEqual(
+            ["A", "D", "E", "B", "F", "C"],
+            list(roundrobin(
+                iter("ABC"),
+                iter("D"),
+                iter("EF"))
+            )
+        )
--- a/placement/util.py
+++ b/placement/util.py
@@ -614,3 +614,16 @@ def run_once(message, logger, cleanup=None):
        wrapper.reset = functools.partial(reset, wrapper)
        return wrapper
    return outer_wrapper
+
+
+def roundrobin(*iterables):
+    """roundrobin(iter('ABC'), iter('D'), iter('EF')) --> A D E B F C
+    Returns a new generator consuming items from the passed in iterators in a
+    round-robin fashion.
+    It is adapted from
+    https://docs.python.org/3/library/itertools.html#itertools-recipes
+    """
+    iterators = map(iter, iterables)
+    for num_active in range(len(iterables), 0, -1):
+        iterators = itertools.cycle(itertools.islice(iterators, num_active))
+        yield from map(next, iterators)
--- a/releasenotes/notes/bug-2070257-allocation-candidates-generation-limit-and-strategy.yaml-e73796898163fb55.yaml
+++ b/releasenotes/notes/bug-2070257-allocation-candidates-generation-limit-and-strategy.yaml-e73796898163fb55.yaml
@@ -0,0 +1,63 @@
+---
+fixes:
+  - |
+    In a deployment with wide and symmetric provider trees, i.e. where there
+    are multiple children providers under the same root having inventory from
+    the same resource class (e.g. in case of nova's mdev GPU or PCI in
+    Placement features) if the allocation candidate request asks for resources
+    from those children RPs in multiple request groups the number of possible
+    allocation candidates grows rapidly.
+    E.g.:
+
+      * 1 root, 8 child RPs with 1 unit of resource each
+        a_c requests 6 groups with 1 unit of resource each
+        => 8*7*6*5*4*3=20160 possible candidates
+
+      * 1 root, 8 child RPs with 6 unit of resources each
+        a_c requests 6 groups with 6 unit of resources each
+        => 8^6=262144 possible candidates
+
+    Placement generates these candidates fully before applying the limit
+    parameter provided in the allocation candidate query to be able do a random
+    sampling if ``[placement]randomize_allocation_candidates`` is True.
+
+    Placement takes excessive time and memory to generate this amount of
+    allocation candidates and the client might time out waiting for the
+    response or the Placement API service run out of memory and crash.
+
+    To avoid request timeout or out of memory events a new
+    ``[placement]max_allocation_candidates`` config option is implemented. This
+    limit is applied not after the request limit but *during* the
+    candidate generation process. So this new option can be used to limit the
+    runtime and memory consumption of the Placement API service.
+
+    The new config option is defaulted to ``-1``, meaning no limit, to keep the
+    legacy behavior. We suggest to tune this config in the affected
+    deployments based on the memory available for the Placement service and the
+    timeout setting of the clients. A good initial value could be around
+    ``100000``.
+
+    If the number of generated allocation candidates is limited by the
+    ``[placement]max_allocation_candidates`` config option then it is possible
+    to get candidates from a limited set of root providers (e.g. compute
+    nodes) as placement uses a depth-first strategy, i.e. generating all
+    candidates from the first root before considering the next one. To avoid
+    this issue a new config option
+    ``[placement]allocation_candidates_generation_strategy`` is introduced
+    with two possible values:
+
+      * ``depth-first``, generates all candidates from the first viable root
+        provider before moving to the next. This is the default and this
+        triggers the old behavior
+
+      * ``breadth-first``, generates candidates from viable roots in a
+        round-robin fashion, creating one candidate from each viable root
+        before creating the second candidate from the first root. This is the
+        possible behavior.
+
+    In a deployment where ``[placement]max_allocation_candidates`` is
+    configured to a positive number we recommend to set
+    ``[placement]allocation_candidates_generation_strategy`` to
+    ``breadth-first``.
+
+    .. _Bug#2070257: https://bugs.launchpad.net/nova/+bug/2070257