Merge "Return alternate allocation requests to scheduler"

2017-08-28 21:53:41 +00:00
parent 2c3c027526 9432447a0c
commit ddc83f2230
1 changed files with 309 additions and 0 deletions
--- a/specs/pike/approved/placement-allocation-requests.rst
+++ b/specs/pike/approved/placement-allocation-requests.rst
@@ -0,0 +1,309 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+=============================
+Placement Allocation Requests
+=============================
+
+https://blueprints.launchpad.net/nova/+spec/placement-allocation-requests
+
+We propose to have the placement API return to the scheduler a set of
+alternative allocation choices that the scheduler may then use to both make a
+fitness decision as well as attempt a claim of resources on multiple complex
+resource providers.
+
+Problem description
+===================
+
+Nova's scheduler will soon be claiming resources by sending a `POST
+/allocations/{consumer_uuid}` request to the Placement API after selecting a
+target compute host. The Nova scheduler constructs the claim request for only a
+single resource provider at the moment: the provider representing the target
+compute host that it selected. Only claiming against a single resource provider
+is problematic; as we move to representing more and more complex resource
+provider relationships (nested providers and providers of shared resources), we
+want the Nova scheduler to be able to claim resources against these nested or
+sharing resource providers.
+
+In order for this to happen, we propose creating a new REST API endpoint in the
+Placement API called `GET /allocation_requests` that will return a collection
+of opaque (to the Nova compute node and conductor) HTTP request bodies that can
+be provided to a `POST /allocations/{consumer_uuid}` request along with a set
+of information the Nova scheduler can use to make fitness choices for the
+launch requests.
+
+Use Cases
+---------
+
+This is an internal blueprint/spec, not intended to implement for any
+particular use case but rather simplify and structure the communication between
+the Nova scheduler and the Placement API.
+
+Proposed change
+===============
+
+We propose adding a new `GET /allocation_requests` REST API endpoint that will
+return both a collection of opaque request bodies that can be sent to the `POST
+/allocations/{consumer_uuid}` endpoint as well as a collection of information
+that the scheduler can use to determine best fit for an instance launch
+request.
+
+.. note:: At this time, we make no suggestion as to **how** the scheduler will
+          use the information returned back from the placement API in its
+          fitness decision. It may choose to replace the information that it
+          currently uses from the cell databases with information from the
+          placement API, or it could choose to merge the information somehow.
+          That piece is left for future discussion.
+
+The scheduler shall then proceed to choose an appropriate destination host for
+a build request (or more than one destination host if the
+`RequestSpec.num_instances` is greater than 1). However, instead of immediately
+returning this destination host, the scheduler will now work with the placement
+API to claim resources on the chosen host **before** sending its decision back
+to the conductor.
+
+The scheduler will claim resources against the destination host by choosing an
+allocation request that contains the UUID of the destination host and calling
+the placement API's `POST /allocations/{consumer_uuid}` call, passing in the
+allocation request as the body of the HTTP request along with the user and
+project ID of the instance.
+
+If the attempt to claim resources fails due to a concurrent update (a condition
+that is normal and expected in environments with heavy load), the scheduler
+will retry the claim request several times and then, if still unable to claim
+resources against the initially-selected destination host, will move to the
+next host in its list of weighed hosts for the request.
+
+Alternatives
+------------
+
+There were a number of alternative approaches considered by the team.
+
+Alternative 1 was to have the Placement API transparently claim resources on
+more than one provider. The scheduler would pick the primary resource provider
+(compute node), attempt to `POST /allocations/{consumer_uuid}` to claim
+resources against that compute node, and the placement API would write
+allocation records for resources against *that* compute node resource provider
+as well as sharing resource providers (e.g. in the case of a shared storage
+pool) and child providers (e.g. consuming SRIOV_NET_VF resources from a
+particular SRIOV physical function child provider). While this alternative
+would shield from the Nova scheduler implementation details about sharing
+providers and nested provider hierarchies, the Placement API is not well-suited
+to make decisions about things like packing/spreading strategies or picking a
+particular SRIOV PF for a target network function workload. Instead, the Nova
+scheduler is responsible for sorting the list of providers it receives from the
+Placement API that meet resource and trait requirements and choosing which
+providers to allocate against.
+
+Alternative 2 was to modify the existing `GET /resource_providers` Placement
+REST API endpoint to return information about sharing providers and child
+providers and have the scheduler reporting client contain the necessary logic
+to build provider hierarchies, determine which sharing provider is associated
+with which providers, and essentially re-build a representation of usage and
+inventory records in memory. This alternative kept the Placement API free of
+much complex logic but came at the cost of dramatically changing the returned
+response from an established REST API endpoint and making the usage of that
+REST API endpoint inconsistent depending on the caller.
+
+Data model impact
+-----------------
+
+None.
+
+REST API impact
+---------------
+
+The new `GET /allocation_requests` Placement REST API endpoint shall accept
+requests with the following query parameters:
+
+* `resources`: A comma-delimited string of `RESOURCE_CLASS:AMOUNT` pairs, one
+  for each class of resource requested. Example:
+  `?resources=VCPU:1,MEMORY_MB:1024,DISK_GB:100`
+
+Given an HTTP request of:
+
+`GET /allocation_requests?resources=$RESOURCES`
+
+where `$RESOURCES` = "VCPU:4,MEMORY_MB:16384,DISK_GB:100" and given two empty
+compute nodes each attached via an aggregate to a resource provider sharing
+`DISK_GB` resources, the following would be the HTTP response returned by the
+placement API::
+
+    {
+        "allocation_requests": [
+            {
+                "allocations": [
+                    {
+                        "resource_provider": {
+                            "uuid": $COMPUTE_NODE1_UUID
+                        },
+                        "resources": {
+                            "VCPU": $AMOUNT_REQUESTED_VCPU,
+                            "MEMORY_MB": $AMOUNT_REQUESTED_MEMORY_MB
+                        }
+                    },
+                    {
+                        "resource_provider": {
+                            "uuid": $SHARED_STORAGE_UUID
+                        },
+                        "resources": {
+                            "DISK_GB": $AMOUNT_REQUESTED_DISK_GB
+                        }
+                    },
+                ],
+            },
+            {
+                "allocations": [
+                    {
+                        "resource_provider": {
+                            "uuid": $COMPUTE_NODE2_UUID
+                        },
+                        "resources": {
+                            "VCPU": $AMOUNT_REQUESTED_VCPU,
+                            "MEMORY_MB": $AMOUNT_REQUESTED_MEMORY_MB
+                        }
+                    },
+                    {
+                        "resource_provider": {
+                            "uuid": $SHARED_STORAGE_UUID
+                        },
+                        "resources": {
+                            "DISK_GB": $AMOUNT_REQUESTED_DISK_GB
+                        }
+                    },
+                ],
+            },
+        ],
+        "provider_summaries": {
+            $COMPUTE_NODE1_UUID: {
+                "resources": {
+                    "VCPU": {
+                        "capacity": 120,   # NOTE, this represents the total - reserved * allocation_ratio
+                        "used": 4,
+                    },
+                    "MEMORY_MB": {
+                        "capacity": 1024,
+                        "used": 48,
+                    }
+                }
+            },
+            $COMPUTE_NODE2_UUID: {
+                "resources": {
+                    "VCPU": {
+                        "capacity": 120,
+                        "used": 4,
+                    },
+                    "MEMORY_MB": {
+                        "capacity": 1024,
+                        "used": 48,
+                    }
+                }
+            },
+            $SHARED_STORAGE_UUID: {
+                "resources": {
+                    "DISK_GB": {
+                        "capacity": 2000,
+                        "used": 100,
+                    }
+                }
+            }
+        }
+    ]
+
+Note that we are not dealing with either nested resource providers or traits in
+the above. Those concepts will be added to the response in future patches.
+
+Security impact
+---------------
+
+None.
+
+Notifications impact
+--------------------
+
+None.
+
+Other end user impact
+---------------------
+
+None.
+
+Performance Impact
+------------------
+
+Returning a list of allocation requests that all meet the Nova scheduler's
+request for resources/traits and allowing the Nova scheduler to iterate over
+these allocation requests, retrying them if a concurrent claim happens, should
+actually increase the throughput of the Nova scheduler by reducing the amount
+of time between resource constraint retries.
+
+Other deployer impact
+---------------------
+
+The Placement service will need to be upgraded before the nova-scheduler
+service.
+
+Developer impact
+----------------
+
+None.
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+jaypipes
+
+Work Items
+----------
+
+#. Implement the API logic in the Placement service with a new microversion.
+#. Update the FilterScheduler driver to use the new Placement API.
+
+Dependencies
+============
+
+* https://blueprints.launchpad.net/nova/+spec/shared-resources-pike
+
+  Partially completed in Pike.
+
+Testing
+=======
+
+Unit and in-tree functional tests. Integration testing will be covered by
+existing Tempest testing.
+
+Documentation Impact
+====================
+
+There should be good devref documentation written that describes in more
+explicit detail what the placement service is responsible for and what the Nova
+scheduler is responsible for, and how this new API call will be used to shared
+information between placement and Nova scheduler.
+
+References
+==========
+
+* Original straw-man proposal was developed on etherpad:
+
+  http://etherpad.openstack.org/p/placement-allocations-straw-man
+
+* Spec for claiming resources in the scheduler:
+
+  https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/placement-claims.html
+
+History
+=======
+
+.. list-table:: Revisions
+   :header-rows: 1
+
+   * - Release Name
+     - Description
+   * - Pike
+     - Introduced