[placement] Enable limiting GET /allocation_candidates

This adds a limit query parameter to GET /allocation_candidates?limit=5&resource=VCPU:1 A 'limit' filter is added to the AllocationCandidates. If set, after the database query has been run to create the allocation requests and provider summaries, a slice or sample of the allocation requests is taken to limit the results. The summaries are then filtered to only include those in the allocation requests. This method avoids needing to make changes to the generated SQL, the creation of which is fairly complex, or the database tables. The amount of data queried is still high in the extreme case, but the amount of data sent over the wire (as JSON) is shrunk. This is a trade-off that was discussed in the spec and the discussion surrounding its review. If it turns out that memory use server-side is an issue we can investigate changing the SQL. A configuration setting, [placement]/randomize_allocation_candidates, is added to allow deployers to declare whether they want the results to be returned in whatever order the database chooses or a random order. The default is "False" which is expected to preserve existing behavior and impose a packing placement strategy. When the config setting is combined with the limit parameter, if "True" the limited results are a random sampling from the full results. If "False", it is a slice from the front. This is done as a new microversion, 1.16, with updates to docs, a reno and adjustments to the api history doc. Change-Id: I5f3d4f49c34fd3cd6b9d2e12b3c3c4cdcb409bec Implements: bp allocation-candidates-limit
2017-11-19 12:09:38 +00:00 · 2017-11-19 12:09:38 +00:00 · 4a97bbd824
commit 4a97bbd824
parent 609ddc2244
12 changed files with 235 additions and 27 deletions
--- a/nova/api/openstack/placement/handlers/allocation_candidate.py
+++ b/nova/api/openstack/placement/handlers/allocation_candidate.py
@ -194,12 +194,21 @@ def list_allocation_candidates(req):
    """
    context = req.environ['placement.context']
    want_version = req.environ[microversion.MICROVERSION_ENVIRON]
-    util.validate_query_params(req, schema.GET_SCHEMA_1_10)
+    get_schema = schema.GET_SCHEMA_1_10
+    if want_version.matches((1, 16)):
+        get_schema = schema.GET_SCHEMA_1_16
+    util.validate_query_params(req, get_schema)

    requests = util.parse_qs_request_groups(req.GET)
+    limit = req.GET.getall('limit')
+    # JSONschema has already confirmed that limit has the form
+    # of an integer.
+    if limit:
+        limit = int(limit[0])

    try:
-        cands = rp_obj.AllocationCandidates.get_by_requests(context, requests)
+        cands = rp_obj.AllocationCandidates.get_by_requests(context, requests,
+                                                            limit)
    except exception.ResourceClassNotFound as exc:
        raise webob.exc.HTTPBadRequest(
            _('Invalid resource class in resources parameter: %(error)s') %
--- a/nova/api/openstack/placement/microversion.py
+++ b/nova/api/openstack/placement/microversion.py
@ -57,6 +57,7 @@ VERSIONS = [
    '1.14',  # Adds parent and root provider UUID on resource provider
             # representation and 'in_tree' filter on GET /resource_providers
    '1.15',  # Include last-modified and cache-control headers
+    '1.16',  # Add 'limit' query parameter to GET /allocation_candidates
 ]


--- a/nova/api/openstack/placement/rest_api_version_history.rst
+++ b/nova/api/openstack/placement/rest_api_version_history.rst
@ -207,3 +207,10 @@ actual last modified time of the most recently modified associated database
 entity or the current time if there is no direct mapping to the database. In
 addition, 'cache-control: no-cache' headers are added where the 'last-modified'
 header has been added to prevent inadvertent caching of resources.
+
+1.16 Limit allocation candidates
+--------------------------------
+
+Add support for a ``limit`` query parameter when making a
+``GET /allocation_candidates`` request. The parameter accepts an integer
+value, `N`, which limits the maximum number of candidates returned.
--- a/nova/api/openstack/placement/schemas/allocation_candidate.py
+++ b/nova/api/openstack/placement/schemas/allocation_candidate.py
@ -11,6 +11,9 @@
 #    under the License.
 """Placement API schemas for getting allocation candidates."""

+import copy
+
+
 # Represents the allowed query string parameters to the GET
 # /allocation_candidates API call
 GET_SCHEMA_1_10 = {
@ -25,3 +28,15 @@ GET_SCHEMA_1_10 = {
    ],
    "additionalProperties": False,
 }
+
+
+# Add limit query parameter.
+GET_SCHEMA_1_16 = copy.deepcopy(GET_SCHEMA_1_10)
+GET_SCHEMA_1_16['properties']['limit'] = {
+    # A query parameter is always a string in webOb, but
+    # we'll handle integer here as well.
+    "type": ["integer", "string"],
+    "pattern": "^[1-9][0-9]*$",
+    "minimum": 1,
+    "minLength": 1
+}
--- a/nova/conf/placement.py
+++ b/nova/conf/placement.py
@ -49,7 +49,17 @@ Possible values:
        help="""
 Endpoint interface for this node. This is used when picking the URL in the
 service catalog.
-""")
+"""),
+    cfg.BoolOpt('randomize_allocation_candidates',
+                default=False,
+                help="""
+If True, when limiting allocation candidate results, the results will be
+a random sampling of the full result set. If False, allocation candidates
+are returned in a deterministic but undefined order. That is, all things
+being equal, two requests for allocation candidates will return the same
+results in the same order; but no guarantees are made as to how that order
+is determined.
+"""),
 ]

 deprecated_opts = {
--- a/nova/objects/resource_provider.py
+++ b/nova/objects/resource_provider.py
@ -13,6 +13,7 @@
 import collections
 import copy
 import itertools
+import random

 # NOTE(cdent): The resource provider objects are designed to never be
 # used over RPC. Remote manipulation is done with the placement HTTP
@ -22,6 +23,7 @@ import itertools

 import os_traits
 from oslo_concurrency import lockutils
+from oslo_config import cfg
 from oslo_db import api as oslo_db_api
 from oslo_db import exception as db_exc
 from oslo_log import log as logging
@ -56,6 +58,7 @@ _RC_CACHE = None
 _TRAIT_LOCK = 'trait_sync'
 _TRAITS_SYNCED = False

+CONF = cfg.CONF
 LOG = logging.getLogger(__name__)


@ -3266,18 +3269,29 @@ class AllocationCandidates(base.NovaObject):
    }

    @classmethod
-    def get_by_requests(cls, context, requests):
+    def get_by_requests(cls, context, requests, limit=None):
        """Returns an AllocationCandidates object containing all resource
        providers matching a set of supplied resource constraints, with a set
        of allocation requests constructed from that list of resource
-        providers.
+        providers. If CONF.placement.randomize_allocation_candidates is True
+        (default is False) then the order of the allocation requests will
+        be randomized.

        :param requests: List of nova.api.openstack.placement.util.RequestGroup
+        :param limit: An integer, N, representing the maximum number of
+                      allocation candidates to return. If
+                      CONF.placement.randomize_allocation_candidates is True
+                      this will be a random sampling of N of the available
+                      results. If False then the first N results, in whatever
+                      order the database picked them, will be returned. In
+                      either case if there are fewer than N total results,
+                      all the results will be returned.
        """
        _ensure_rc_cache(context)
        _ensure_trait_sync(context)
        alloc_reqs, provider_summaries = cls._get_by_requests(context,
-                                                              requests)
+                                                              requests,
+                                                              limit)
        return cls(
            context,
            allocation_requests=alloc_reqs,
@ -3286,7 +3300,7 @@ class AllocationCandidates(base.NovaObject):

    @staticmethod
    @db_api.api_context_manager.reader
-    def _get_by_requests(context, requests):
+    def _get_by_requests(context, requests, limit=None):
        # We first get the list of "root providers" that either have the
        # requested resources or are associated with the providers that
        # share one or more of the requested resource(s)
@ -3339,22 +3353,58 @@ class AllocationCandidates(base.NovaObject):
            # IDs.
            rp_ids = _get_provider_ids_matching_all(context, resources,
                                                    trait_map)
-            return _alloc_candidates_no_shared(context, resources, rp_ids)
+            alloc_request_objs, summary_objs = _alloc_candidates_no_shared(
+                context, resources, rp_ids)
+        else:
+            if trait_map:
+                trait_rps = _get_provider_ids_having_any_trait(context,
+                                                               trait_map)
+                if not trait_rps:
+                    # If there aren't any providers that have any of the
+                    # required traits, just exit early...
+                    return [], []

-        if trait_map:
-            trait_rps = _get_provider_ids_having_any_trait(context, trait_map)
-            if not trait_rps:
-                # If there aren't any providers that have any of the required
-                # traits, just exit early...
-                return [], []
+            # rp_ids contains a list of resource provider IDs that EITHER have
+            # all the requested resources themselves OR have some resources
+            # and are related to a provider that is sharing some resources
+            # with it. In other words, this is the list of resource provider
+            # IDs that are NOT sharing resources.
+            rps = _get_all_with_shared(context, resources)
+            rp_ids = set([r[0] for r in rps])
+            alloc_request_objs, summary_objs = _alloc_candidates_with_shared(
+                context, resources, trait_map, rp_ids, sharing_providers)

-        # rp_ids contains a list of resource provider IDs that EITHER have all
-        # the requested resources themselves OR have some resources and are
-        # related to a provider that is sharing some resources with it. In
-        # other words, this is the list of resource provider IDs that are NOT
-        # sharing resources.
-        rps = _get_all_with_shared(context, resources)
-        rp_ids = set([r[0] for r in rps])
+        # Limit the number of allocation request objects. We do this after
+        # creating all of them so that we can do a random slice without
+        # needing to mess with the complex sql above or add additional
+        # columns to the DB.

-        return _alloc_candidates_with_shared(context, resources, trait_map,
-                                             rp_ids, sharing_providers)
+        # Track the resource provider uuids that we have chosen so that
+        # we can pull out their summaries below.
+        alloc_req_rp_uuids = set()
+        if limit and limit <= len(alloc_request_objs):
+            if CONF.placement.randomize_allocation_candidates:
+                alloc_request_objs = random.sample(alloc_request_objs, limit)
+            else:
+                alloc_request_objs = alloc_request_objs[:limit]
+            # Extract resource provider uuids from the resource requests.
+            for aro in alloc_request_objs:
+                for arr in aro.resource_requests:
+                    alloc_req_rp_uuids.add(arr.resource_provider.uuid)
+        elif CONF.placement.randomize_allocation_candidates:
+            random.shuffle(alloc_request_objs)
+
+        # Limit summaries to only those mentioned in the allocation requests.
+        if limit and limit <= len(alloc_request_objs):
+            kept_summary_objs = []
+            for summary in summary_objs:
+                rp_uuid = summary.resource_provider.uuid
+                # Skip a summary if we are limiting and haven't selected an
+                # allocation request that uses the resource provider.
+                if rp_uuid not in alloc_req_rp_uuids:
+                    continue
+                kept_summary_objs.append(summary)
+        else:
+            kept_summary_objs = summary_objs
+
+        return alloc_request_objs, kept_summary_objs
--- a/nova/tests/functional/api/openstack/placement/gabbits/allocation-candidates.yaml
+++ b/nova/tests/functional/api/openstack/placement/gabbits/allocation-candidates.yaml
@ -43,6 +43,42 @@ tests:
  response_strings:
      - Invalid resource class in resources parameter

+- name: get bad limit microversion
+  GET: /allocation_candidates?resources=VCPU:1&limit=5
+  request_headers:
+      openstack-api-version: placement 1.15
+  status: 400
+  response_strings:
+      - Invalid query string parameters
+      - "'limit' was unexpected"
+
+- name: get bad limit type
+  GET: /allocation_candidates?resources=VCPU:1&limit=cow
+  request_headers:
+      openstack-api-version: placement 1.16
+  status: 400
+  response_strings:
+      - Invalid query string parameters
+      - "Failed validating 'pattern'"
+
+- name: get bad limit value negative
+  GET: /allocation_candidates?resources=VCPU:1&limit=-99
+  request_headers:
+      openstack-api-version: placement 1.16
+  status: 400
+  response_strings:
+      - Invalid query string parameters
+      - "Failed validating 'pattern'"
+
+- name: get bad limit value zero
+  GET: /allocation_candidates?resources=VCPU:1&limit=0
+  request_headers:
+      openstack-api-version: placement 1.16
+  status: 400
+  response_strings:
+      - Invalid query string parameters
+      - "Failed validating 'pattern'"
+
 - name: get allocation candidates no allocations yet
  GET: /allocation_candidates?resources=VCPU:1,MEMORY_MB:1024,DISK_GB:100
  status: 200
@ -126,3 +162,11 @@ tests:
      cache-control: no-cache
      # Does last-modified look like a legit timestamp?
      last-modified:  /^\w+, \d+ \w+ \d{4} [\d:]+ GMT$/
+
+- name: get allocation candidates with limit
+  GET: /allocation_candidates?resources=VCPU:1,MEMORY_MB:1024,DISK_GB:100&limit=1
+  status: 200
+  request_headers:
+      openstack-api-version: placement 1.16
+  response_json_paths:
+      $.allocation_requests.`len`: 1
--- a/nova/tests/functional/api/openstack/placement/gabbits/microversion.yaml
+++ b/nova/tests/functional/api/openstack/placement/gabbits/microversion.yaml
@ -39,13 +39,13 @@ tests:
  response_json_paths:
      $.errors[0].title: Not Acceptable

- name: latest microversion is 1.15
+- name: latest microversion is 1.16
  GET: /
  request_headers:
      openstack-api-version: placement latest
  response_headers:
      vary: /OpenStack-API-Version/
-      openstack-api-version: placement 1.15
+      openstack-api-version: placement 1.16

 - name: other accept header bad version
  GET: /
--- a/nova/tests/functional/db/test_allocation_candidates.py
+++ b/nova/tests/functional/db/test_allocation_candidates.py
@ -335,12 +335,13 @@ class AllocationCandidatesTestCase(ProviderDBBase):
        # _validate_allocation_requests to make failure results more readable.
        self.rp_uuid_to_name = {}

-    def _get_allocation_candidates(self, requests=None):
+    def _get_allocation_candidates(self, requests=None, limit=None):
        if requests is None:
            requests = [placement_lib.RequestGroup(
                use_same_provider=False,
                resources=self.requested_resources)]
-        return rp_obj.AllocationCandidates.get_by_requests(self.ctx, requests)
+        return rp_obj.AllocationCandidates.get_by_requests(self.ctx, requests,
+                                                           limit)

    def _validate_allocation_requests(self, expected, candidates):
        """Assert correctness of allocation requests in allocation candidates.
@ -508,6 +509,57 @@ class AllocationCandidatesTestCase(ProviderDBBase):
        self.assertEqual(1, len(cn2_p_sum.traits))
        self.assertEqual(os_traits.HW_CPU_X86_AVX2, cn2_p_sum.traits[0].name)

+    def test_all_local_limit(self):
+        """Create some resource providers that can satisfy the request for
+        resources with local (non-shared) resources, limit them, and verify
+        that the allocation requests returned by AllocationCandidates
+        correspond with each of these resource providers.
+        """
+        # Create three compute node providers with VCPU, RAM and local disk
+        for name in ('cn1', 'cn2', 'cn3'):
+            cn = self._create_provider(name)
+            _add_inventory(cn, fields.ResourceClass.VCPU, 24,
+                           allocation_ratio=16.0)
+            _add_inventory(cn, fields.ResourceClass.MEMORY_MB, 32768,
+                           min_unit=64, step_size=64, allocation_ratio=1.5)
+            total_gb = 1000 if name == 'cn3' else 2000
+            _add_inventory(cn, fields.ResourceClass.DISK_GB, total_gb,
+                           reserved=100, min_unit=10, step_size=10,
+                           allocation_ratio=1.0)
+
+        # Ask for just one candidate.
+        limit = 1
+        alloc_cands = self._get_allocation_candidates(limit=limit)
+        allocation_requests = alloc_cands.allocation_requests
+        self.assertEqual(limit, len(allocation_requests))
+
+        # provider summaries should have only one rp
+        self.assertEqual(limit, len(alloc_cands.provider_summaries))
+
+        # Do it again, with conf set to randomize. We can't confirm the
+        # random-ness but we can be sure the code path doesn't explode.
+        self.flags(randomize_allocation_candidates=True, group='placement')
+
+        # Ask for two candidates.
+        limit = 2
+        alloc_cands = self._get_allocation_candidates(limit=limit)
+        allocation_requests = alloc_cands.allocation_requests
+        self.assertEqual(limit, len(allocation_requests))
+
+        # provider summaries should have two rps
+        self.assertEqual(limit, len(alloc_cands.provider_summaries))
+
+        # Do it again, asking for more than are available.
+        limit = 5
+        # We still only expect 2 because cn3 does not match default requests.
+        expected_length = 2
+        alloc_cands = self._get_allocation_candidates(limit=limit)
+        allocation_requests = alloc_cands.allocation_requests
+        self.assertEqual(expected_length, len(allocation_requests))
+
+        # provider summaries should have two rps
+        self.assertEqual(expected_length, len(alloc_cands.provider_summaries))
+
    def test_local_with_shared_disk(self):
        """Create some resource providers that can satisfy the request for
        resources with local VCPU and MEMORY_MB but rely on a shared storage
--- a/placement-api-ref/source/allocation_candidates.inc
+++ b/placement-api-ref/source/allocation_candidates.inc
@ -30,6 +30,7 @@ Request
 .. rest_parameters:: parameters.yaml

  - resources: resources_query_required
+  - limit: allocation_candidates_limit

 Response (microversions 1.12 - )
 --------------------------------
--- a/placement-api-ref/source/parameters.yaml
+++ b/placement-api-ref/source/parameters.yaml
@ -42,6 +42,14 @@ trait_name:
    The name of a trait.

 # variables in query
+allocation_candidates_limit:
+  type: integer
+  in: query
+  required: false
+  min_version: 1.16
+  description: >
+    A positive integer used to limit the maximum number of allocation
+    candidates returned in the response.
 member_of:
  type: string
  in: query
--- a/releasenotes/notes/allocation-candidates-limit-37fe5c2ce57daf7f.yaml
+++ b/releasenotes/notes/allocation-candidates-limit-37fe5c2ce57daf7f.yaml
@ -0,0 +1,11 @@
+---
+features:
+  - |
+    Add support, in new placement microversion 1.16, for a ``limit`` query
+    parameter when making a ``GET /allocation_candidates`` request. The
+    parameter accepts an integer value, `N`, which limits the number of
+    candidates returned. A new configuration item
+    ``[placement]/randomize_allocation_candidates``, defaulting to `False`,
+    controls how the limited results are chosen. If `True`, a random sampling
+    of the entire result set is taken, otherwise the first N results are
+    returned.