Merge "Return alternate allocation requests to scheduler"
This commit is contained in:
309
specs/pike/approved/placement-allocation-requests.rst
Normal file
309
specs/pike/approved/placement-allocation-requests.rst
Normal file
@@ -0,0 +1,309 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=============================
|
||||
Placement Allocation Requests
|
||||
=============================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/placement-allocation-requests
|
||||
|
||||
We propose to have the placement API return to the scheduler a set of
|
||||
alternative allocation choices that the scheduler may then use to both make a
|
||||
fitness decision as well as attempt a claim of resources on multiple complex
|
||||
resource providers.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Nova's scheduler will soon be claiming resources by sending a `POST
|
||||
/allocations/{consumer_uuid}` request to the Placement API after selecting a
|
||||
target compute host. The Nova scheduler constructs the claim request for only a
|
||||
single resource provider at the moment: the provider representing the target
|
||||
compute host that it selected. Only claiming against a single resource provider
|
||||
is problematic; as we move to representing more and more complex resource
|
||||
provider relationships (nested providers and providers of shared resources), we
|
||||
want the Nova scheduler to be able to claim resources against these nested or
|
||||
sharing resource providers.
|
||||
|
||||
In order for this to happen, we propose creating a new REST API endpoint in the
|
||||
Placement API called `GET /allocation_requests` that will return a collection
|
||||
of opaque (to the Nova compute node and conductor) HTTP request bodies that can
|
||||
be provided to a `POST /allocations/{consumer_uuid}` request along with a set
|
||||
of information the Nova scheduler can use to make fitness choices for the
|
||||
launch requests.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
This is an internal blueprint/spec, not intended to implement for any
|
||||
particular use case but rather simplify and structure the communication between
|
||||
the Nova scheduler and the Placement API.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
We propose adding a new `GET /allocation_requests` REST API endpoint that will
|
||||
return both a collection of opaque request bodies that can be sent to the `POST
|
||||
/allocations/{consumer_uuid}` endpoint as well as a collection of information
|
||||
that the scheduler can use to determine best fit for an instance launch
|
||||
request.
|
||||
|
||||
.. note:: At this time, we make no suggestion as to **how** the scheduler will
|
||||
use the information returned back from the placement API in its
|
||||
fitness decision. It may choose to replace the information that it
|
||||
currently uses from the cell databases with information from the
|
||||
placement API, or it could choose to merge the information somehow.
|
||||
That piece is left for future discussion.
|
||||
|
||||
The scheduler shall then proceed to choose an appropriate destination host for
|
||||
a build request (or more than one destination host if the
|
||||
`RequestSpec.num_instances` is greater than 1). However, instead of immediately
|
||||
returning this destination host, the scheduler will now work with the placement
|
||||
API to claim resources on the chosen host **before** sending its decision back
|
||||
to the conductor.
|
||||
|
||||
The scheduler will claim resources against the destination host by choosing an
|
||||
allocation request that contains the UUID of the destination host and calling
|
||||
the placement API's `POST /allocations/{consumer_uuid}` call, passing in the
|
||||
allocation request as the body of the HTTP request along with the user and
|
||||
project ID of the instance.
|
||||
|
||||
If the attempt to claim resources fails due to a concurrent update (a condition
|
||||
that is normal and expected in environments with heavy load), the scheduler
|
||||
will retry the claim request several times and then, if still unable to claim
|
||||
resources against the initially-selected destination host, will move to the
|
||||
next host in its list of weighed hosts for the request.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
There were a number of alternative approaches considered by the team.
|
||||
|
||||
Alternative 1 was to have the Placement API transparently claim resources on
|
||||
more than one provider. The scheduler would pick the primary resource provider
|
||||
(compute node), attempt to `POST /allocations/{consumer_uuid}` to claim
|
||||
resources against that compute node, and the placement API would write
|
||||
allocation records for resources against *that* compute node resource provider
|
||||
as well as sharing resource providers (e.g. in the case of a shared storage
|
||||
pool) and child providers (e.g. consuming SRIOV_NET_VF resources from a
|
||||
particular SRIOV physical function child provider). While this alternative
|
||||
would shield from the Nova scheduler implementation details about sharing
|
||||
providers and nested provider hierarchies, the Placement API is not well-suited
|
||||
to make decisions about things like packing/spreading strategies or picking a
|
||||
particular SRIOV PF for a target network function workload. Instead, the Nova
|
||||
scheduler is responsible for sorting the list of providers it receives from the
|
||||
Placement API that meet resource and trait requirements and choosing which
|
||||
providers to allocate against.
|
||||
|
||||
Alternative 2 was to modify the existing `GET /resource_providers` Placement
|
||||
REST API endpoint to return information about sharing providers and child
|
||||
providers and have the scheduler reporting client contain the necessary logic
|
||||
to build provider hierarchies, determine which sharing provider is associated
|
||||
with which providers, and essentially re-build a representation of usage and
|
||||
inventory records in memory. This alternative kept the Placement API free of
|
||||
much complex logic but came at the cost of dramatically changing the returned
|
||||
response from an established REST API endpoint and making the usage of that
|
||||
REST API endpoint inconsistent depending on the caller.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
The new `GET /allocation_requests` Placement REST API endpoint shall accept
|
||||
requests with the following query parameters:
|
||||
|
||||
* `resources`: A comma-delimited string of `RESOURCE_CLASS:AMOUNT` pairs, one
|
||||
for each class of resource requested. Example:
|
||||
`?resources=VCPU:1,MEMORY_MB:1024,DISK_GB:100`
|
||||
|
||||
Given an HTTP request of:
|
||||
|
||||
`GET /allocation_requests?resources=$RESOURCES`
|
||||
|
||||
where `$RESOURCES` = "VCPU:4,MEMORY_MB:16384,DISK_GB:100" and given two empty
|
||||
compute nodes each attached via an aggregate to a resource provider sharing
|
||||
`DISK_GB` resources, the following would be the HTTP response returned by the
|
||||
placement API::
|
||||
|
||||
{
|
||||
"allocation_requests": [
|
||||
{
|
||||
"allocations": [
|
||||
{
|
||||
"resource_provider": {
|
||||
"uuid": $COMPUTE_NODE1_UUID
|
||||
},
|
||||
"resources": {
|
||||
"VCPU": $AMOUNT_REQUESTED_VCPU,
|
||||
"MEMORY_MB": $AMOUNT_REQUESTED_MEMORY_MB
|
||||
}
|
||||
},
|
||||
{
|
||||
"resource_provider": {
|
||||
"uuid": $SHARED_STORAGE_UUID
|
||||
},
|
||||
"resources": {
|
||||
"DISK_GB": $AMOUNT_REQUESTED_DISK_GB
|
||||
}
|
||||
},
|
||||
],
|
||||
},
|
||||
{
|
||||
"allocations": [
|
||||
{
|
||||
"resource_provider": {
|
||||
"uuid": $COMPUTE_NODE2_UUID
|
||||
},
|
||||
"resources": {
|
||||
"VCPU": $AMOUNT_REQUESTED_VCPU,
|
||||
"MEMORY_MB": $AMOUNT_REQUESTED_MEMORY_MB
|
||||
}
|
||||
},
|
||||
{
|
||||
"resource_provider": {
|
||||
"uuid": $SHARED_STORAGE_UUID
|
||||
},
|
||||
"resources": {
|
||||
"DISK_GB": $AMOUNT_REQUESTED_DISK_GB
|
||||
}
|
||||
},
|
||||
],
|
||||
},
|
||||
],
|
||||
"provider_summaries": {
|
||||
$COMPUTE_NODE1_UUID: {
|
||||
"resources": {
|
||||
"VCPU": {
|
||||
"capacity": 120, # NOTE, this represents the total - reserved * allocation_ratio
|
||||
"used": 4,
|
||||
},
|
||||
"MEMORY_MB": {
|
||||
"capacity": 1024,
|
||||
"used": 48,
|
||||
}
|
||||
}
|
||||
},
|
||||
$COMPUTE_NODE2_UUID: {
|
||||
"resources": {
|
||||
"VCPU": {
|
||||
"capacity": 120,
|
||||
"used": 4,
|
||||
},
|
||||
"MEMORY_MB": {
|
||||
"capacity": 1024,
|
||||
"used": 48,
|
||||
}
|
||||
}
|
||||
},
|
||||
$SHARED_STORAGE_UUID: {
|
||||
"resources": {
|
||||
"DISK_GB": {
|
||||
"capacity": 2000,
|
||||
"used": 100,
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
Note that we are not dealing with either nested resource providers or traits in
|
||||
the above. Those concepts will be added to the response in future patches.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Returning a list of allocation requests that all meet the Nova scheduler's
|
||||
request for resources/traits and allowing the Nova scheduler to iterate over
|
||||
these allocation requests, retrying them if a concurrent claim happens, should
|
||||
actually increase the throughput of the Nova scheduler by reducing the amount
|
||||
of time between resource constraint retries.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
The Placement service will need to be upgraded before the nova-scheduler
|
||||
service.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
jaypipes
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
#. Implement the API logic in the Placement service with a new microversion.
|
||||
#. Update the FilterScheduler driver to use the new Placement API.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
* https://blueprints.launchpad.net/nova/+spec/shared-resources-pike
|
||||
|
||||
Partially completed in Pike.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Unit and in-tree functional tests. Integration testing will be covered by
|
||||
existing Tempest testing.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
There should be good devref documentation written that describes in more
|
||||
explicit detail what the placement service is responsible for and what the Nova
|
||||
scheduler is responsible for, and how this new API call will be used to shared
|
||||
information between placement and Nova scheduler.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* Original straw-man proposal was developed on etherpad:
|
||||
|
||||
http://etherpad.openstack.org/p/placement-allocations-straw-man
|
||||
|
||||
* Spec for claiming resources in the scheduler:
|
||||
|
||||
https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/placement-claims.html
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Pike
|
||||
- Introduced
|
||||
Reference in New Issue
Block a user