Prune a_c search space by invalid prefixes

Assume we have 8 RPs with 1 resource, and we request 8 groups
with 1 resource each.
The full candidate matrix for a single provider tree (compute node)
by satisfying each group independently (G is request group, R is RP):

  G0: [R0, R1,..., R7] # G0 can be fulfilled from R0, R1, ..., R7
  G1: [R0, R1,..., R7]
  ...
  G7: [R0, R1,..., R7]

Placement needs to satisfy every group in the request so it
creates all the possible combinations (a Cartesian product) of the
individual group fulfilments and checks if they are valid, i.e if
there are no two or more groups trying to use the same single piece
of resource.
The product looks like
(C is candidate, G0-R0 means G0 group satisfied from R0 RP):

  C0: [G0-R0, G1-R0, ..., G7-R0] # invalid, R0 has 1 res but C0 needs 8
  C1: [G0-R0, G2-R0, ..., G7-R1] # invalid, R0 has 1 res but C1 needs 7
  ...
  Cx: [G0-R0, G1-R1, ..., G7-R7] # valid, each Rx has 1 res and
                                 # Cx ask form 1 res each

From this picture it is clear that:

* There are a lot more invalid candidates than valid ones. Actually
  in this specific scenario the total number of candidates are
  8^8 ~ 16M. The valid candidates are 8! ~ 40K. Finding the valid ones
  by blindly searching all possible ones are scaling very badly as
  exponential grows faster than factorial. I.e. valid candidates will be
  farther apart from each other in the search space.

* There is a structure within and across the candidates. E.g. If we
  know that C0 is invalid already because of G0-R0 and G1-R0 tries
  to consume the same singe resource from R0 then:

  * We don't need to check how G2 is mapped in C0 as that mapping cannot
    change the fact the whole candidate is invalid.

  * We know that every candidate that starts with G0-R0, G1-R0 are
    invalid for the same reason and we don't need to generate and
    test them

The latter means that C1...Cy (y < x - 1) can be pruned out from the
search space after C0 is tested. This is a lot of candidates. In the
above natural ordering of the product generation algorithm it is
actually more than 40K candidates that we can skip after just testing
C0 alone. When we reach Cx, the first valid candidate, the algo already
pruned out more than 300k candidates.

After this patch the above pruning logic is not turned on automatically
but can be enabled via the config option:

  [workarounds]
  optimize_for_wide_provider_trees = true

The implementation consists of the following pieces:

A recursive product generator algorithm that calls a function on each
partial candidate and if that function signals that the partial
candidate is invalid then the algorithm does skips any candidate that
has the same partial candidate prefix.

The recursion does a tree traversal to find all partial prefixes.
With the above G0-G8, RP0, RP8 example the start of the traversal
looks like:
1. partial product G0-RP0, this does not exceed capacity so recurse
2. partial product G0-RP0, G1-RP0, this exceeds capacity so do not
   recurse but try another RP on this level.
3. partial product G0-RP0, G1-RP1, this does not exceeds capacity so
   recurse.
4. partial product G0-RP0, G1-RP1, G2-RP0, this exceeds capacity so
   do not recurse but try another RP on this level
...

Without the optimization Placement uses the logic

        areq = _consolidate_allocation_requests(areq_list, rw_ctx)
        if rw_ctx.exceeds_capacity(areq)
            continue

on all products after it was generated. The
_consolidate_allocation_requests folds the individual
AllocationRequestResource object in the product into a single
allocation. This has a side effect on some of the ARRs so the logic does
copy the affected ARRs. This is expensive especially if we want to call
it on every partial product as well. However if we only want to check
the capacity we don't need to fold the ARRs we just need to sum the
amount each ARR hold and the check that against the capacity. So
_exceeds_capacity() was added to do this optimized, side effect and copy
free, capacity check when the optimization is enabled.

When a valid product is generated _consolidate_allocation_requests still
needs to be called to get the folded AllocationRequest in any case as
the caller of _merge_candidates expects such structure. But the final
rw_ctx.exceeds_capacity can we skipped if the optimization is enabled.

The depth of the recursion is equal to the number of iterables passed to
the product call. It can be seen by the fact that each level of
recursion appends a new item to the partial product and when the length
of the partial product equals to the number of iterables then we have a
full product and the algo yields. The default python recursion limit is
1000 so we are not really limited by that as that means we could handle
~ 990 iterables, meaning an allocation candidate query with 990 request
groups. The limiting factor of this algorithm is not recursion depth but
execution time.

Gemini 2.5 pro was used to put together the generic Cartesian product
algorithm.

Co-Authored-by: Sean Mooney <work@seanmooney.info>
Assisted-By: gemini-2.5-pro
Closes-Bug: #2126751
Change-Id: I13ab83a165c229ae57876df4570e8af25221a45e
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
This commit is contained in:
Balazs Gibizer
2025-10-02 11:40:31 +02:00
parent 0b783c7e16
commit 5b73b980d0
8 changed files with 699 additions and 94 deletions
+2
View File
@@ -23,6 +23,7 @@ from placement.conf import base
from placement.conf import database
from placement.conf import paths
from placement.conf import placement
from placement.conf import workarounds
# To avoid global config, we require an existing ConfigOpts to be passed
@@ -36,6 +37,7 @@ def register_opts(conf):
placement.register_opts(conf)
logging.register_options(conf)
policy_opts.set_defaults(conf)
workarounds.register_opts(conf)
# The oslo.middleware does not present a register_opts method, instead
# it shares a list of available opts.
conf.register_opts(cors.CORS_OPTS, 'cors')
+90
View File
@@ -0,0 +1,90 @@
# Copyright 2015 OpenStack Foundation
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from oslo_config import cfg
workarounds_group = cfg.OptGroup(
'workarounds',
title='Workaround Options',
help="""
A collection of workarounds used to mitigate bugs or issues found under
certain conditions. These should only be enabled in exceptional circumstances.
All options are linked against bug IDs, where more information on the issue can
be found.
""")
workaround_opts = [
cfg.BoolOpt(
"optimize_for_wide_provider_trees",
default=False,
help="""
Enable optimization of allocation candidate generation for wide provider trees.
As reported in `bug #2126751`_ in the situation where many similar child
provider is defined under the same root provider, placement's allocation
candidate generation algorithm scales poorly. This config option enables
certain optimizations that help decrease the time it takes to generate the
GET /allocation_candidates response for queries requesting multiple resources
from those child providers.
For example if a compute has 8 or more child resource providers providing one
resource each (e.g. 8 individual PGPU) and a VM requests 8 or more such
resources each in independent request groups then without this optimization
enabled the GET /allocation_candidates query takes too long to compute and
the scheduling will fail.
Setting the ``[placement]max_allocation_candidates``
config option to a small number (e.g. 100) can help to a certain degree but
alone cannot solve the problem when the number of devices available or the
number of requested devices increases.
**When to enable:** If you have at least 8 child resource providers within a
tree providing inventory of the same resource class. And you are trying
to support VMs with more than 4 such resources.
E.g.:
* Nova's PCI in Placement feature is enabled and you have at least 8 PCI
devices with the same product_id in a single compute and you are using
flavors requesting more than 4 such devices.
* Nova's GPU support is enabled and you have at least 8 GPUs per compute node
while requesting more than 4 per VM.
**When not to enable:** If you have a flat resource provider tree, i.e. all
resources reported on the root provider. Or if your flavors are not requesting
more than 4 PCI or GPU resources of the same type.
Related options:
* ``[placement]max_allocation_candidates``: If you need to enable the
this optimization then you are also in a situation where you want to set
``max_allocation_candidates`` to a number not more than 1000.
* ``[placement]allocation_candidates_generation_strategy``: If you use
``max_allocation_candidates`` then it is suggested to configure
``allocation_candidates_generation_strategy`` to ``breadth-first`` which will
return candidates balanced across available compute nodes.
.. _bug #2126751: https://bugs.launchpad.net/placement/+bug/2126751
"""),
]
def register_opts(conf):
conf.register_group(workarounds_group)
conf.register_opts(workaround_opts, group=workarounds_group)
def list_opts():
return {workarounds_group: workaround_opts}
+74 -18
View File
@@ -12,8 +12,8 @@
import collections
import copy
import functools
import itertools
import time
import os_traits
from oslo_log import log as logging
@@ -210,9 +210,10 @@ class AllocationRequest(object):
usp = (self.use_same_provider
if self.use_same_provider is not None else '<?>')
repr_str = ('%s(anchor=...%s, same_provider=%s, '
'resource_requests=[%s])' %
'resource_requests=[%s], mappings=%s)' %
(self.__class__.__name__, anchor, usp,
', '.join([str(arr) for arr in self.resource_requests])))
', '.join([str(arr) for arr in self.resource_requests]),
self.mappings))
return repr_str
def __eq__(self, other):
@@ -267,6 +268,10 @@ class AllocationRequestResource(object):
resource_class=self.resource_class,
amount=self.amount)
def __repr__(self):
return str(
(self.resource_provider.uuid, self.resource_class, self.amount))
class ProviderSummary(object):
@@ -689,12 +694,63 @@ def _consolidate_allocation_requests(areqs, rw_ctx):
mappings=mappings)
def _get_areq_list_generators(areq_lists_by_anchor, all_suffixes):
def _exceeds_capacity(rw_ctx, areq_list):
"""Checks if the list of allocation requests combined into a single
allocation does not go over the available capacity.
Note that this is an optimized, side effect and
AllocationResourceRequest copy free, version of the following logic:
areq = _consolidate_allocation_requests(areq_list, rw_ctx)
return rw_ctx.exceeds_capacity(areq)
This optimized call can only be used if the consolidated areq is not
needed after the call just the result of the exceeds_capacity check. In
return this is a lot more performant than the former.
:return: True if the consolidated allocation requests more resources than
available, False otherwise.
"""
amount_by_rp_rc = collections.defaultdict(int)
for areq in areq_list:
for arr in areq.resource_requests:
key = (arr.resource_provider.id, arr.resource_class)
amount_by_rp_rc[key] += arr.amount
psum = rw_ctx.psum_res_by_rp_rc[key]
if amount_by_rp_rc[key] > psum.capacity:
return True
if amount_by_rp_rc[key] > psum.max_unit:
return True
return False
def _get_product_generator(rw_ctx):
"""Returns a generator that produces a cartesian product of N iterables.
If optimize_for_wide_provider_trees config is enabled then the returned
generator will test the product and only returns products that are
valid allocation candidates from resource consumption perspective.
Otherwise, it returns the generic itertools.product that does not
so such testing.
"""
if rw_ctx.config.workarounds.optimize_for_wide_provider_trees:
exceeds_capacity = functools.partial(_exceeds_capacity, rw_ctx)
return functools.partial(util.filtered_product, exceeds_capacity)
else:
return itertools.product
def _get_areq_list_generators(rw_ctx, areq_lists_by_anchor, all_suffixes):
"""Returns a generator for each anchor provider that generates viable
candidates (areq_lists) for the given anchor
"""
return [
# We're using itertools.product to go from this:
# We're using itertools.product to go from this if optimization
# is not enabled:
# areq_lists_by_suffix = {
# '': [areq__A, areq__B, ...],
# '1': [areq_1_A, areq_1_B, ...],
@@ -712,7 +768,10 @@ def _get_areq_list_generators(areq_lists_by_anchor, all_suffixes):
# [areq__B, areq_1_B, ..., areq_42_B], return.
# ...,
# ]
itertools.product(*list(areq_lists_by_suffix.values()))
# When optimization is enabled we use a custom product
# implementation that can do capacity checks on each partial product
# and prune products with invalid prefixes speeding up the generation.
_get_product_generator(rw_ctx)(*list(areq_lists_by_suffix.values()))
for areq_lists_by_suffix in areq_lists_by_anchor.values()
# Filter out any entries that don't have allocation requests for
# *all* suffixes (i.e. all RequestGroups)
@@ -723,7 +782,8 @@ def _get_areq_list_generators(areq_lists_by_anchor, all_suffixes):
def _generate_areq_lists(rw_ctx, areq_lists_by_anchor, all_suffixes):
strategy = (
rw_ctx.config.placement.allocation_candidates_generation_strategy)
generators = _get_areq_list_generators(areq_lists_by_anchor, all_suffixes)
generators = _get_areq_list_generators(
rw_ctx, areq_lists_by_anchor, all_suffixes)
if strategy == "depth-first":
# Generates all solutions from the first anchor before moving to the
# next
@@ -783,9 +843,7 @@ def _merge_candidates(candidates, rw_ctx):
all_suffixes = set(candidates)
num_granular_groups = len(all_suffixes - set(['']))
max_a_c = rw_ctx.config.placement.max_allocation_candidates
dropped = 0
start = time.monotonic()
optimize = rw_ctx.config.workarounds.optimize_for_wide_provider_trees
for areq_list in _generate_areq_lists(
rw_ctx, areq_lists_by_anchor, all_suffixes
):
@@ -819,16 +877,14 @@ def _merge_candidates(candidates, rw_ctx):
# *independent* queries, it's possible that the combined result
# now exceeds capacity where amounts of the same RP+RC were
# folded together. So do a final capacity check/filter.
if rw_ctx.exceeds_capacity(areq):
dropped += 1
#
# If optimization is enabled then we know that the areq returned from
# the product generator has already been tested for capacity so we
# don't need to re-test it.
if not optimize and rw_ctx.exceeds_capacity(areq):
continue
areqs.add(areq)
if len(areqs) == 1:
LOG.warn(
"Found the first valid candidate in %.2f secs and "
"dropped %d invalid ones", time.monotonic() - start, dropped)
start = time.monotonic()
dropped = 0
if max_a_c >= 0 and len(areqs) >= max_a_c:
break
@@ -194,20 +194,24 @@ class TestWideTreeAllocationCandidateExplosion(base.TestCase):
# one VF, but it could be on PF resource it does not matter). We have
# many RPs and we request many groups of one resource. This creates a
# situation where even if the number of candidates are limited by
# max_allocation_candidates the algorithm generate a lot of invalid
# candidates that needs to be filtered out which takes excessive time.
# max_allocation_candidates the possible number of candidates generated
# by satisfying each group independently and then generating all
# possible combinations results in an exponential number of possible
# candidate from which most of them are invalid due to two groups
# independently satisfied by the same single resource.
# Filtering this list for valid candidates take too much time.
#
# We have 8 RPs with 1 resource, and we request 8 groups with
# 1 resource.
# Placement will generate an initial candidate matrix by satisfying
# each group independently (G is request group, R is RP):
# The full candidate matrix by satisfying each group independently
# (G is request group, R is RP):
#
# G1: [R1, R2,..., R8]
# G2: [R1, R2,..., R8]
# ...
# G8: [R1, R2,..., R8]
#
# Then creates all the possible combinations and check if they are
# Creating all the possible combinations and checking if they are
# valid (C is candidate, G1-R1 means G1 group satisfied from R1 RP):
# C1: [G1-R1, G2-R1, ..., G8-R1] # invalid R1 has 1 res but C1 needs 8
# C2: [G1-R1, G2-R1, ..., G8-R2] # invalid R1 has 1 res but C2 needs 7
@@ -215,82 +219,133 @@ class TestWideTreeAllocationCandidateExplosion(base.TestCase):
# Cx: [G1-R1, G2-R2, ..., G8-R8] # valid each Rx has 1 res and
# # Cx ask form 1 res each
#
# So placement generates an exessive amount of invalid (and therefore
# later filtered) candidates before it finds the first valid one.
# The max_allocation_candidates check only applies to valid candidates
# so it cannot prevent the excessive runtime of generating candidates
# that turns out to be invalid.
# After bugfix #2126751 placement is changed not to generate all these
# candidate, but instead if it finds that a candidate is invalid
# because a prefix of the groups (G1-R1 and G2-R2) causing an
# overallocation then all possible candidates that starts with
# the same prefix are removed from the search space. This moves the
# algorithm from exponential to factorial.
#
# With the extra logging we see that the first valid Cx is:
# WARNING [placement.objects.allocation_candidate] Found the first
# valid candidate in 1.73 secs and dropped 342391 invalid ones
#
# If you bump this from 1000 to 10k max candidates then you will see a
# very long runtime.
#
# This runs in 12 seconds.
# This runs in 1.2 seconds. If you bump this from 1000 to 10k maximum
# candidates then it will run in 106 seconds.
self.conf_fixture.conf.set_override(
"optimize_for_wide_provider_trees", True, group="workarounds")
self.conf_fixture.conf.set_override(
"max_allocation_candidates", 1000, group="placement")
self._test_num_candidates_and_computes(
computes=1, pfs=8, vfs_per_pf=1, req_groups=8, req_res_per_group=1,
req_limit=10000,
expected_candidates=1000, expected_computes_with_candidates=1)
def test_many_non_viable_candidates_21_8(self):
# This is runs in 0.14 seconds
self.conf_fixture.conf.set_override(
"optimize_for_wide_provider_trees", True, group="workarounds")
self.conf_fixture.conf.set_override(
"max_allocation_candidates", 1000, group="placement")
self._test_num_candidates_and_computes(
computes=1, pfs=21, vfs_per_pf=1, req_groups=8,
req_res_per_group=1,
req_limit=1000,
expected_candidates=1000, expected_computes_with_candidates=1)
# This is bug https://bugs.launchpad.net/placement/+bug/2126751 the below
# case should run in reasonable time
#
# def test_many_non_viable_candidates_21_8(self):
# # This is runs for more than 120 seconds
# self.conf_fixture.conf.set_override(
# "max_allocation_candidates", 1000, group="placement")
# self._test_num_candidates_and_computes(
# computes=1, pfs=21, vfs_per_pf=1, req_groups=8,
# req_res_per_group=1,
# req_limit=1000,
# expected_candidates=1000, expected_computes_with_candidates=1)
#
# def test_many_non_viable_candidates_21_16(self):
# # This is runs for more than 120 seconds
# self.conf_fixture.conf.set_override(
# "max_allocation_candidates", 1000, group="placement")
# self._test_num_candidates_and_computes(
# computes=1, pfs=21, vfs_per_pf=1, req_groups=16,
# req_res_per_group=1,
# req_limit=1000,
# expected_candidates=1000, expected_computes_with_candidates=1)
#
# def test_many_non_viable_candidates_21_21(self):
# # This is runs for more than 120 seconds
# self.conf_fixture.conf.set_override(
# "max_allocation_candidates", 1000, group="placement")
# self._test_num_candidates_and_computes(
# computes=1, pfs=21, vfs_per_pf=1, req_groups=21,
# req_res_per_group=1,
# req_limit=1000,
# expected_candidates=1000, expected_computes_with_candidates=1)
#
# def test_many_non_viable_candidates_21_8_two_computes(self):
# # This is runs for more than 120 seconds
# self.conf_fixture.conf.set_override(
# "max_allocation_candidates", 1000, group="placement")
# self.conf_fixture.conf.set_override(
# "allocation_candidates_generation_strategy", "breadth-first",
# group="placement")
# self._test_num_candidates_and_computes(
# computes=2, pfs=21, vfs_per_pf=1, req_groups=8,
# req_res_per_group=1,
# req_limit=1000,
# expected_candidates=1000, expected_computes_with_candidates=2)
#
# def test_many_non_viable_candidates_21_21_two_computes(self):
# # This is runs for more than 120 seconds
# self.conf_fixture.conf.set_override(
# "max_allocation_candidates", 1000, group="placement")
# self.conf_fixture.conf.set_override(
# "allocation_candidates_generation_strategy", "breadth-first",
# group="placement")
# self._test_num_candidates_and_computes(
# computes=2, pfs=21, vfs_per_pf=1, req_groups=21,
# req_res_per_group=1,
# req_limit=1000,
# expected_candidates=1000, expected_computes_with_candidates=2)
def test_many_non_viable_candidates_21_16(self):
# This is runs in 0.21 seconds
self.conf_fixture.conf.set_override(
"optimize_for_wide_provider_trees", True, group="workarounds")
self.conf_fixture.conf.set_override(
"max_allocation_candidates", 1000, group="placement")
self._test_num_candidates_and_computes(
computes=1, pfs=21, vfs_per_pf=1, req_groups=16,
req_res_per_group=1,
req_limit=1000,
expected_candidates=1000, expected_computes_with_candidates=1)
def test_many_non_viable_candidates_21_21(self):
# This is runs in 3 seconds
self.conf_fixture.conf.set_override(
"optimize_for_wide_provider_trees", True, group="workarounds")
self.conf_fixture.conf.set_override(
"max_allocation_candidates", 1000, group="placement")
self._test_num_candidates_and_computes(
computes=1, pfs=21, vfs_per_pf=1, req_groups=21,
req_res_per_group=1,
req_limit=1000,
expected_candidates=1000, expected_computes_with_candidates=1)
def test_many_non_viable_candidates_21_8_two_computes(self):
# This is runs in 0.17 seconds
self.conf_fixture.conf.set_override(
"optimize_for_wide_provider_trees", True, group="workarounds")
self.conf_fixture.conf.set_override(
"max_allocation_candidates", 1000, group="placement")
self.conf_fixture.conf.set_override(
"allocation_candidates_generation_strategy", "breadth-first",
group="placement")
self._test_num_candidates_and_computes(
computes=2, pfs=21, vfs_per_pf=1, req_groups=8,
req_res_per_group=1,
req_limit=1000,
expected_candidates=1000, expected_computes_with_candidates=2)
def test_many_non_viable_candidates_21_21_two_computes(self):
# This is runs in 1.6 seconds
self.conf_fixture.conf.set_override(
"optimize_for_wide_provider_trees", True, group="workarounds")
self.conf_fixture.conf.set_override(
"max_allocation_candidates", 1000, group="placement")
self.conf_fixture.conf.set_override(
"allocation_candidates_generation_strategy", "breadth-first",
group="placement")
self._test_num_candidates_and_computes(
computes=2, pfs=21, vfs_per_pf=1, req_groups=21,
req_res_per_group=1,
req_limit=1000,
expected_candidates=1000, expected_computes_with_candidates=2)
def test_many_non_viable_candidates_32_32_two_computes(self):
# This is runs in 2.45 seconds
self.conf_fixture.conf.set_override(
"optimize_for_wide_provider_trees", True, group="workarounds")
self.conf_fixture.conf.set_override(
"max_allocation_candidates", 1000, group="placement")
self.conf_fixture.conf.set_override(
"allocation_candidates_generation_strategy", "breadth-first",
group="placement")
self._test_num_candidates_and_computes(
computes=2, pfs=32, vfs_per_pf=1, req_groups=32,
req_res_per_group=1,
req_limit=1000,
expected_candidates=1000, expected_computes_with_candidates=2)
def test_many_non_viable_candidates_48_48_two_computes(self):
# This is runs in 0.36 seconds with 100 max candidates and runs in
# 3.6 seconds for 1000.
self.conf_fixture.conf.set_override(
"optimize_for_wide_provider_trees", True, group="workarounds")
self.conf_fixture.conf.set_override(
"max_allocation_candidates", 100, group="placement")
self.conf_fixture.conf.set_override(
"allocation_candidates_generation_strategy", "breadth-first",
group="placement")
self._test_num_candidates_and_computes(
computes=2, pfs=48, vfs_per_pf=1, req_groups=48,
req_res_per_group=1,
req_limit=100,
expected_candidates=100, expected_computes_with_candidates=2)
def test_many_non_viable_candidates_64_64_two_computes(self):
# This is runs in 0.35 seconds for 10 max candidates and runs in
# 0.5 seconds with 100 and runs in 5.1 seconds with 1000.
self.conf_fixture.conf.set_override(
"optimize_for_wide_provider_trees", True, group="workarounds")
self.conf_fixture.conf.set_override(
"max_allocation_candidates", 10, group="placement")
self.conf_fixture.conf.set_override(
"allocation_candidates_generation_strategy", "breadth-first",
group="placement")
self._test_num_candidates_and_computes(
computes=2, pfs=64, vfs_per_pf=1, req_groups=64,
req_res_per_group=1,
req_limit=10,
expected_candidates=10, expected_computes_with_candidates=2)
@@ -10,11 +10,14 @@
# License for the specific language governing permissions and limitations
# under the License.
import copy
from oslo_utils.fixture import uuidsentinel as uuids
from unittest import mock
from placement import lib as placement_lib
from placement.objects import allocation_candidate as ac_obj
from placement.objects import research_context as res_ctx
from placement.objects import resource_provider as rp_obj
from placement.tests.unit.objects import base
@@ -154,3 +157,250 @@ class TestAllocationCandidatesNoDB(base.TestCase):
('r1B', 'r1g1B')
]
self._test_generate_areq_list("breadth-first", expected_candidates)
def _rp(rp_id, capacity=1, max_unit=None):
return {(rp_id, "SRIOV_VF"):
ac_obj.ProviderSummaryResource(
resource_class="SRIOV_VF", capacity=capacity,
used=0, max_unit=max_unit or capacity)}
def _alloc_req(group, rp_id, amount):
return ac_obj.AllocationRequest(
anchor_root_provider_uuid=uuids.root,
use_same_provider=True,
resource_requests=[
ac_obj.AllocationRequestResource(
resource_provider=rp_obj.ResourceProvider(
context=None, id=rp_id, uuid=rp_id),
resource_class="SRIOV_VF",
amount=amount)
],
mappings={group: [rp_id]})
class TestOptimizedAllocationCandidatesNoDB(base.TestCase):
def setUp(self):
super().setUp()
self.conf_fixture.conf.set_override(
"optimize_for_wide_provider_trees", True, group="workarounds")
@mock.patch('placement.objects.research_context._has_provider_trees',
new=mock.Mock(return_value=True))
def test_multiple_groups_usage_overlap_3_3(self):
rw_ctx = res_ctx.RequestWideSearchContext(
self.context, placement_lib.RequestWideParams(), True)
# We have 3 child RPs each having a capacity of one resource
rw_ctx.psum_res_by_rp_rc.update(_rp("RP1", capacity=1))
rw_ctx.psum_res_by_rp_rc.update(_rp("RP2", capacity=1))
rw_ctx.psum_res_by_rp_rc.update(_rp("RP3", capacity=1))
G1_RP1 = _alloc_req("G1", rp_id="RP1", amount=1)
G1_RP2 = _alloc_req("G1", rp_id="RP2", amount=1)
G1_RP3 = _alloc_req("G1", rp_id="RP3", amount=1)
G2_RP1 = _alloc_req("G2", rp_id="RP1", amount=1)
G2_RP2 = _alloc_req("G2", rp_id="RP2", amount=1)
G2_RP3 = _alloc_req("G2", rp_id="RP3", amount=1)
G3_RP1 = _alloc_req("G3", rp_id="RP1", amount=1)
G3_RP2 = _alloc_req("G3", rp_id="RP2", amount=1)
G3_RP3 = _alloc_req("G3", rp_id="RP3", amount=1)
# This algorithm starts with the possible solutions for each group
# independent of the other groups in the same request. So here G1
# can be fulfilled from RP1, RP2, or RP3. As well as G2, and G3.
areq_lists_by_anchor = {
"root": {
"G1": [G1_RP1, G1_RP2, G1_RP3],
"G2": [G2_RP1, G2_RP2, G2_RP3],
"G3": [G3_RP1, G3_RP2, G3_RP3],
},
}
orig_consolidate = ac_obj._consolidate_allocation_requests
consolidate_calls = []
def wrap_consolidate_allocation_requests(areq_list, rw_ctx):
# we need to deep copy the call args as they are mutated during
# the test run
consolidate_calls.append(copy.deepcopy(areq_list))
return orig_consolidate(areq_list, rw_ctx)
orig_exceeds_capacity = ac_obj._exceeds_capacity
# pairs of areq_list from the call arg and the return value of the
# wrapped call
exceeds_capacity_calls = []
def warp_exceeds_capacity(rw_ctx, areq_list):
result = orig_exceeds_capacity(rw_ctx, areq_list)
exceeds_capacity_calls.append((areq_list, result))
return result
with (
# Consolidate is not called during product generation as we
# re-implemented exceeds_capacity to work on non consolidated
# areq_lists
mock.patch.object(
ac_obj, "_consolidate_allocation_requests",
new=mock.NonCallableMock
),
# the rw_ctx exceeds_capacity working on consolidated areqs
# are not called at all, the local _exceeds_capacity called instead
# on all partial products, mocked below.
mock.patch.object(
rw_ctx, "exceeds_capacity", new=mock.NonCallableMock()
),
mock.patch.object(
ac_obj, "_exceeds_capacity",
side_effect=warp_exceeds_capacity
) as mock_exceeds_capacity,
):
generator = ac_obj._generate_areq_lists(
rw_ctx, areq_lists_by_anchor, {"G1", "G2", "G3"})
areq_lists = list(generator)
# We don't have 27 (3^3) valid solutions just 6 (3!) as if one of
# the Group is fulfilled from an RP then the other Groups cannot be
# fulfilled from the same RP as each RP has 1 resource only.
# Without the optimize_for_wide_provider_trees = true the
# _generate_areq_lists call would generate all the 27 possible
# product and then later processing would filter out the invalid,
# overlapping ones.
self.assertEqual(6, len(areq_lists))
self.assertEqual([
(G1_RP1, G2_RP2, G3_RP3),
(G1_RP1, G2_RP3, G3_RP2),
(G1_RP2, G2_RP1, G3_RP3),
(G1_RP2, G2_RP3, G3_RP1),
(G1_RP3, G2_RP1, G3_RP2),
(G1_RP3, G2_RP2, G3_RP1),],
areq_lists)
# areq_list, result pairs where areq_list is a partial product
# and the result is the expected return value of _exceeds_capacity
expected_exceeds_capacity_calls = [
((G1_RP1,), False),
((G1_RP1, G2_RP1), True),
# This is the pruning. The algo did not try to generate
# G1_RP1, G2_RP1, G3_RPx for all three possible x RPs as it knows
# that if the prefix is invalid then all the product with that
# prefix is also invalid.
((G1_RP1, G2_RP2), False),
((G1_RP1, G2_RP2, G3_RP1), True),
# This is another optimization (compared to an index odometer
# based product algo) that the recursive call reuses the already
# calculated and checked valid prefix of G1_RP1, G2_RP2 and don't
# need to re-create it to try G3_RP2 with it.
((G1_RP1, G2_RP2, G3_RP2), True),
((G1_RP1, G2_RP2, G3_RP3), False), # this is a valid product
# simple backtrack as we run out of possibilities on level 3
((G1_RP1, G2_RP3), False),
((G1_RP1, G2_RP3, G3_RP1), True),
((G1_RP1, G2_RP3, G3_RP2), False), # this is a valid product
((G1_RP1, G2_RP3, G3_RP3), True),
# double backtrack as we run out the possibilities on level 2 and
# level 3
((G1_RP2,), False),
((G1_RP2, G2_RP1), False),
((G1_RP2, G2_RP1, G3_RP1), True),
((G1_RP2, G2_RP1, G3_RP2), True),
((G1_RP2, G2_RP1, G3_RP3), False), # this is a valid product
((G1_RP2, G2_RP2), True), # suffixes are pruned
((G1_RP2, G2_RP3), False),
((G1_RP2, G2_RP3, G3_RP1), False), # this is a valid product
((G1_RP2, G2_RP3, G3_RP2), True),
((G1_RP2, G2_RP3, G3_RP3), True), # double backtrack
((G1_RP3,), False),
((G1_RP3, G2_RP1), False),
((G1_RP3, G2_RP1, G3_RP1), True),
((G1_RP3, G2_RP1, G3_RP2), False), # this is a valid product
((G1_RP3, G2_RP1, G3_RP3), True), # backtrack
((G1_RP3, G2_RP2), False),
((G1_RP3, G2_RP2, G3_RP1), False), # this is a valid product
((G1_RP3, G2_RP2, G3_RP2), True),
((G1_RP3, G2_RP2, G3_RP3), True), # backtrack
((G1_RP3, G2_RP3), True), # suffixes are pruned
# backtrack all the way as we finished with the level 1
# possibilities as well
]
for i, p in enumerate(
zip(expected_exceeds_capacity_calls, exceeds_capacity_calls)
):
expected, actual = p
self.assertEqual(
expected, actual, "Call index %d does not match" % i)
# Why 30 is the correct answer?
# I don't have an easy way to prove that mathematically. It was
# easier to list them all out above so they can be reviewed
#
# With this 3 by 3 example the number of checks are higher when
# optimization is enabled than without it. But if you increase the
# size of the product then the number of pruned space grows fast
# as well as the size of the product space. On an 8 by 8 example
# this is already a must-have optimization to run the checks in
# less than a minute. See the included functional tests for the
# scale results.
self.assertEqual(30, len(mock_exceeds_capacity.mock_calls))
self.assertEqual(30, len(expected_exceeds_capacity_calls))
class TestExceedsCapacityNoDB(base.TestCase):
def setUp(self):
super().setUp()
patcher = mock.patch(
'placement.objects.research_context._has_provider_trees',
new=mock.Mock(return_value=True))
self.addCleanup(patcher.stop)
patcher.start()
self.rw_ctx = res_ctx.RequestWideSearchContext(
self.context, placement_lib.RequestWideParams(), True)
self.rw_ctx.psum_res_by_rp_rc.update(
_rp("RP1", capacity=3, max_unit=1))
self.rw_ctx.psum_res_by_rp_rc.update(
_rp("RP2", capacity=2, max_unit=2))
self.rw_ctx.psum_res_by_rp_rc.update(
_rp("RP3", capacity=1, max_unit=1))
def test_not_exceeds(self):
self.assertFalse(
ac_obj._exceeds_capacity(
self.rw_ctx, (_alloc_req("G1", rp_id="RP1", amount=1),)))
self.assertFalse(
ac_obj._exceeds_capacity(
self.rw_ctx, (
_alloc_req("G1", rp_id="RP2", amount=1),
_alloc_req("G2", rp_id="RP2", amount=1),
)))
self.assertFalse(
ac_obj._exceeds_capacity(
self.rw_ctx, (
_alloc_req("G1", rp_id="RP3", amount=1),
_alloc_req("G2", rp_id="RP1", amount=1),
)))
def test_exceeds_capacity(self):
self.assertTrue(
ac_obj._exceeds_capacity(
self.rw_ctx, (
_alloc_req("G1", rp_id="RP3", amount=1),
_alloc_req("G2", rp_id="RP3", amount=1),
)))
def test_exceeds_max_unit(self):
self.assertTrue(
ac_obj._exceeds_capacity(self.rw_ctx, (
_alloc_req("G1", rp_id="RP1", amount=1),
_alloc_req("G2", rp_id="RP1", amount=1)
)))
+70
View File
@@ -14,6 +14,7 @@
import datetime
import itertools
from unittest import mock
import fixtures
@@ -1478,3 +1479,72 @@ class RoundRobinTests(testtools.TestCase):
iter("EF"))
)
)
class ProductGeneratorTest(testtools.TestCase):
@staticmethod
def _no_skip(*args, **kwargs):
return False
def test_no_input(self):
self.assertEqual(
[()], list(util.filtered_product(should_skip=self._no_skip)))
def test_full_product_no_pruning(self):
product = list(util.filtered_product(self._no_skip, [0, 1], [0, 1]))
self.assertEqual(4, len(product))
self.assertEqual([(0, 0), (0, 1), (1, 0), (1, 1)], product)
def test_product_with_filtering(self):
"""This test shows that the filtering removes products from the
results that are invalid
"""
def reject_if_non_unique_items(partial_product):
# A simple check that rejects a partial candidate if any of the
# items in the candidate are non-unique
return len(set(partial_product)) != len(partial_product)
product = list(util.filtered_product(
reject_if_non_unique_items, [0, 1], [0, 1]))
self.assertEqual(2, len(product))
self.assertEqual([(0, 1), (1, 0)], product)
def test_product_with_pruning(self):
"""This test shows that if the filter rejects partial candidates then
the product generation will skip generating any product with that
partial prefix and therefore takes fewer steps than the same
itertools.product() call.
"""
mock_filter = mock.Mock(return_value=False)
product = list(
util.filtered_product(mock_filter, *([range(3)] * 3)))
self.assertEqual(27, len(product))
self.assertEqual(product, list(itertools.product(*([range(3)] * 3))))
nr_of_check_calls = 0
def reject_if_non_unique_items(partial_product):
# A simple check that rejects a partial candidate if any of the
# items in the candidate are non-unique. While counting the
# number of times the check is called.
nonlocal nr_of_check_calls
nr_of_check_calls += 1
return len(set(partial_product)) != len(partial_product)
product = list(util.filtered_product(
reject_if_non_unique_items, *([range(3)] * 3)))
self.assertEqual(6, len(product))
self.assertEqual(
[(0, 1, 2),
(0, 2, 1),
(1, 0, 2),
(1, 2, 0),
(2, 0, 1),
(2, 1, 0)],
product)
# If a filter is provided that rejected partial candidates then the
# product generation does fewer actual steps, this is the pruning we
# want
self.assertLess(nr_of_check_calls, len(mock_filter.mock_calls))
+51
View File
@@ -626,3 +626,54 @@ def roundrobin(*iterables):
for num_active in range(len(iterables), 0, -1):
iterators = itertools.cycle(itertools.islice(iterators, num_active))
yield from map(next, iterators)
def filtered_product(should_skip, *iterables):
"""Recursively generates the Cartesian product of a list of iterables,
allowing for parts of the product space to be skipped.
:param should_skip: A function that takes a partial product (a tuple)
and returns True if the rest of this product branch should be
skipped (pruned), False otherwise.
:param iterables: A list of iterables to find the product of.
:yield: Tuples representing the elements of the Cartesian product. For each
returned product the caller can assume that the function should_skip
returned False.
"""
# Convert iterables to tuples to ensure they can be iterated over multiple
# times.
frozen_iterables = tuple(map(tuple, iterables))
num_iterables = len(frozen_iterables)
def _generate(index: int, current_product):
"""A nested recursive helper function to generate the product."""
# Base case. If we have processed all iterables, we have a complete
# product.
if index == num_iterables:
yield current_product
return
# Iterate through items in the current iterable and extend the
# current partial product to see if we should continue or backtrack.
for item in frozen_iterables[index]:
new_partial_product = current_product + (item,)
# Check if we should skip this entire branch. This is the core of
# the pruning logic.
if should_skip(new_partial_product):
# Move to the next item on the current level, pruning any
# product with the same invalid prefix.
continue
# If not skipped, recurse to the next level to get a longer
# partial product
yield from _generate(index + 1, new_partial_product)
# If the input list is empty, the cartesian product is one empty tuple.
if not frozen_iterables:
yield ()
return
# Start the recursion with an empty partial product and asking for
# appending the first item to it from the first iterable.
yield from _generate(0, ())
@@ -0,0 +1,31 @@
---
fixes:
- |
Added a new config option
``[workarounds]optimize_for_wide_provider_trees``. It is disable by
default.
As reported in `bug #2126751`_ in the situation where many similar child
provider is defined under the same root provider, placement's allocation
candidate generation algorithm scales poorly. This config option enables
certain optimizations that help decrease the time it takes to generate
the GET /allocation_candidates response for queries requesting multiple
resources from those child providers.
It should be enabled if you have at least 8 child resource
providers within a tree providing inventory of the same resource class.
And you are trying to support VMs with more than 4 such resources.
E.g.:
* Nova's PCI in Placement feature is enabled and you have at least 8 PCI
devices with the same product_id in a single compute and you are using
flavors requesting more than 4 such devices.
* Nova's GPU support is enabled and you have at least 8 GPUs per compute
node while requesting more than 4 per VM.
You can keep it disabled if you have a flat resource provider tree, i.e.
all resources reported on the root provider. Or if your flavors are not
requesting more than 4 PCI or GPU resources of the same type.
.. _bug #2126751: https://bugs.launchpad.net/nova/+bug/2126751