Merge "Return alternate allocation requests to scheduler"

2017-08-28 21:53:41 +00:00
parent 2c3c027526 9432447a0c
commit ddc83f2230
1 changed files with 309 additions and 0 deletions
--- a/specs/pike/approved/placement-allocation-requests.rst
+++ b/specs/pike/approved/placement-allocation-requests.rst
@@ -0,0 +1,309 @@
 ..
 This work is licensed under a Creative Commons Attribution 3.0 Unported
 License.
 http://creativecommons.org/licenses/by/3.0/legalcode
 =============================
 Placement Allocation Requests
 =============================
 https://blueprints.launchpad.net/nova/+spec/placement-allocation-requests
 We propose to have the placement API return to the scheduler a set of
 alternative allocation choices that the scheduler may then use to both make a
 fitness decision as well as attempt a claim of resources on multiple complex
 resource providers.
 Problem description
 ===================
 Nova's scheduler will soon be claiming resources by sending a `POST
 /allocations/{consumer_uuid}` request to the Placement API after selecting a
 target compute host. The Nova scheduler constructs the claim request for only a
 single resource provider at the moment: the provider representing the target
 compute host that it selected. Only claiming against a single resource provider
 is problematic; as we move to representing more and more complex resource
 provider relationships (nested providers and providers of shared resources), we
 want the Nova scheduler to be able to claim resources against these nested or
 sharing resource providers.
 In order for this to happen, we propose creating a new REST API endpoint in the
 Placement API called `GET /allocation_requests` that will return a collection
 of opaque (to the Nova compute node and conductor) HTTP request bodies that can
 be provided to a `POST /allocations/{consumer_uuid}` request along with a set
 of information the Nova scheduler can use to make fitness choices for the
 launch requests.
 Use Cases
 ---------
 This is an internal blueprint/spec, not intended to implement for any
 particular use case but rather simplify and structure the communication between
 the Nova scheduler and the Placement API.
 Proposed change
 ===============
 We propose adding a new `GET /allocation_requests` REST API endpoint that will
 return both a collection of opaque request bodies that can be sent to the `POST
 /allocations/{consumer_uuid}` endpoint as well as a collection of information
 that the scheduler can use to determine best fit for an instance launch
 request.
 .. note:: At this time, we make no suggestion as to **how** the scheduler will
          use the information returned back from the placement API in its
          fitness decision. It may choose to replace the information that it
          currently uses from the cell databases with information from the
          placement API, or it could choose to merge the information somehow.
          That piece is left for future discussion.
 The scheduler shall then proceed to choose an appropriate destination host for
 a build request (or more than one destination host if the
 `RequestSpec.num_instances` is greater than 1). However, instead of immediately
 returning this destination host, the scheduler will now work with the placement
 API to claim resources on the chosen host **before** sending its decision back
 to the conductor.
 The scheduler will claim resources against the destination host by choosing an
 allocation request that contains the UUID of the destination host and calling
 the placement API's `POST /allocations/{consumer_uuid}` call, passing in the
 allocation request as the body of the HTTP request along with the user and
 project ID of the instance.
 If the attempt to claim resources fails due to a concurrent update (a condition
 that is normal and expected in environments with heavy load), the scheduler
 will retry the claim request several times and then, if still unable to claim
 resources against the initially-selected destination host, will move to the
 next host in its list of weighed hosts for the request.
 Alternatives
 ------------
 There were a number of alternative approaches considered by the team.
 Alternative 1 was to have the Placement API transparently claim resources on
 more than one provider. The scheduler would pick the primary resource provider
 (compute node), attempt to `POST /allocations/{consumer_uuid}` to claim
 resources against that compute node, and the placement API would write
 allocation records for resources against *that* compute node resource provider
 as well as sharing resource providers (e.g. in the case of a shared storage
 pool) and child providers (e.g. consuming SRIOV_NET_VF resources from a
 particular SRIOV physical function child provider). While this alternative
 would shield from the Nova scheduler implementation details about sharing
 providers and nested provider hierarchies, the Placement API is not well-suited
 to make decisions about things like packing/spreading strategies or picking a
 particular SRIOV PF for a target network function workload. Instead, the Nova
 scheduler is responsible for sorting the list of providers it receives from the
 Placement API that meet resource and trait requirements and choosing which
 providers to allocate against.
 Alternative 2 was to modify the existing `GET /resource_providers` Placement
 REST API endpoint to return information about sharing providers and child
 providers and have the scheduler reporting client contain the necessary logic
 to build provider hierarchies, determine which sharing provider is associated
 with which providers, and essentially re-build a representation of usage and
 inventory records in memory. This alternative kept the Placement API free of
 much complex logic but came at the cost of dramatically changing the returned
 response from an established REST API endpoint and making the usage of that
 REST API endpoint inconsistent depending on the caller.
 Data model impact
 -----------------
 None.
 REST API impact
 ---------------
 The new `GET /allocation_requests` Placement REST API endpoint shall accept
 requests with the following query parameters:
 * `resources`: A comma-delimited string of `RESOURCE_CLASS:AMOUNT` pairs, one
  for each class of resource requested. Example:
  `?resources=VCPU:1,MEMORY_MB:1024,DISK_GB:100`
 Given an HTTP request of:
 `GET /allocation_requests?resources=$RESOURCES`
 where `$RESOURCES` = "VCPU:4,MEMORY_MB:16384,DISK_GB:100" and given two empty
 compute nodes each attached via an aggregate to a resource provider sharing
 `DISK_GB` resources, the following would be the HTTP response returned by the
 placement API::
    {
        "allocation_requests": [
            {
                "allocations": [
                    {
                        "resource_provider": {
                            "uuid": $COMPUTE_NODE1_UUID
                        },
                        "resources": {
                            "VCPU": $AMOUNT_REQUESTED_VCPU,
                            "MEMORY_MB": $AMOUNT_REQUESTED_MEMORY_MB
                        }
                    },
                    {
                        "resource_provider": {
                            "uuid": $SHARED_STORAGE_UUID
                        },
                        "resources": {
                            "DISK_GB": $AMOUNT_REQUESTED_DISK_GB
                        }
                    },
                ],
            },
            {
                "allocations": [
                    {
                        "resource_provider": {
                            "uuid": $COMPUTE_NODE2_UUID
                        },
                        "resources": {
                            "VCPU": $AMOUNT_REQUESTED_VCPU,
                            "MEMORY_MB": $AMOUNT_REQUESTED_MEMORY_MB
                        }
                    },
                    {
                        "resource_provider": {
                            "uuid": $SHARED_STORAGE_UUID
                        },
                        "resources": {
                            "DISK_GB": $AMOUNT_REQUESTED_DISK_GB
                        }
                    },
                ],
            },
        ],
        "provider_summaries": {
            $COMPUTE_NODE1_UUID: {
                "resources": {
                    "VCPU": {
                        "capacity": 120,   # NOTE, this represents the total - reserved * allocation_ratio
                        "used": 4,
                    },
                    "MEMORY_MB": {
                        "capacity": 1024,
                        "used": 48,
                    }
                }
            },
            $COMPUTE_NODE2_UUID: {
                "resources": {
                    "VCPU": {
                        "capacity": 120,
                        "used": 4,
                    },
                    "MEMORY_MB": {
                        "capacity": 1024,
                        "used": 48,
                    }
                }
            },
            $SHARED_STORAGE_UUID: {
                "resources": {
                    "DISK_GB": {
                        "capacity": 2000,
                        "used": 100,
                    }
                }
            }
        }
    ]
 Note that we are not dealing with either nested resource providers or traits in
 the above. Those concepts will be added to the response in future patches.
 Security impact
 ---------------
 None.
 Notifications impact
 --------------------
 None.
 Other end user impact
 ---------------------
 None.
 Performance Impact
 ------------------
 Returning a list of allocation requests that all meet the Nova scheduler's
 request for resources/traits and allowing the Nova scheduler to iterate over
 these allocation requests, retrying them if a concurrent claim happens, should
 actually increase the throughput of the Nova scheduler by reducing the amount
 of time between resource constraint retries.
 Other deployer impact
 ---------------------
 The Placement service will need to be upgraded before the nova-scheduler
 service.
 Developer impact
 ----------------
 None.
 Implementation
 ==============
 Assignee(s)
 -----------
 jaypipes
 Work Items
 ----------
 #. Implement the API logic in the Placement service with a new microversion.
 #. Update the FilterScheduler driver to use the new Placement API.
 Dependencies
 ============
 * https://blueprints.launchpad.net/nova/+spec/shared-resources-pike
  Partially completed in Pike.
 Testing
 =======
 Unit and in-tree functional tests. Integration testing will be covered by
 existing Tempest testing.
 Documentation Impact
 ====================
 There should be good devref documentation written that describes in more
 explicit detail what the placement service is responsible for and what the Nova
 scheduler is responsible for, and how this new API call will be used to shared
 information between placement and Nova scheduler.
 References
 ==========
 * Original straw-man proposal was developed on etherpad:
  http://etherpad.openstack.org/p/placement-allocations-straw-man
 * Spec for claiming resources in the scheduler:
  https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/placement-claims.html
 History
 =======
 .. list-table:: Revisions
   :header-rows: 1
   * - Release Name
     - Description
   * - Pike
     - Introduced