Return Selection Objects

In Queens, we are going to be returning a lot more information from the
Scheduler to the Conductor. If we return it as a series of nested
dictionaries and lists, it will be fragile and confusing. It would be
far better to create an object to hold this data and make it accessible
in a simpler and documented way.

Blueprint: return-selection-objects

Change-Id: I7e94525ba9ee22885ea18a752a416024c36b4a40
This commit is contained in:
Ed Leafe
2017-08-29 14:50:05 +00:00
parent f49397be06
commit 5753850889

View File

@@ -0,0 +1,232 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
========================
Return Selection Objects
========================
https://blueprints.launchpad.net/nova/+spec/return-selection-objects
In Queens, we will be changing what we return from select_destinations() in
order to both provide additional 'alternate' hosts for each requested instance
and also the allocation_request for building on each host. Returning this as an
unstructured chunk of data will be fragile and potentially confusing. It would
be far better to create an object to hold this data and make it accessible in a
simpler and documented way.
Problem description
===================
Before Queens, the scheduler's select_destinations() method returned a list,
containing a dictionary representing the selected host for each requested
instance. In Queens, we need to return much more information to the caller of
select_destinations(). We could attempt to return a list of HostDicts, which
represents the selected host as in the past, along with zero or more
'alternate' hosts that are in the same cell and also meet the requested
resources. Additionally, each of these will also be accompanied by a dictionary
for the allocation_request required to claim that host. The end result will be
a list, with one item per requested instance. Each item in that list will be a
list of 2-tuples of (HostState, allocation_request). The HostState is a simple
dict, but the allocation_request is itself a complex nested dict.
The result of these changes would mean that the data returned would be a
complex nested combination of dictionaries, lists, and tuples. This data
structure would be both difficult to understand how to use correctly, and
confusing to developers looking at the code for the first time (or even after a
period of being away from it). It is also unversioned, meaning it is impossible
to track and respond to future changes in a reliable manner.
Use Cases
---------
As an experienced Nova developer, I would like to be able to write code that
uses the information returned from select_destinations() without having to
decipher a complex data structure.
As a newcomer to the Nova codebase, I would like to be able to read code that
is clear so that I can work with it quicker and with more confidence that my
changes won't break something.
Proposed change
===============
We propose to create a new Selection object that would contain the data that
represents a single destination: both the host information as well as the
corresponding allocation_request needed for claiming. The host information,
which is currently in a dictionary containing hostname, nodename, and limits
keys, will now be stored as simple object fields along with the
allocation_request. Additionally, the compute_node_uuid field will be added, as
it would be useful to have this available in some of our allocation cleanup
tasks.
There is no need for a corresponding SelectionList object, as there is no need
for DB creation or retrieval. The select_destinations() method will return
simple Python lists of Selection objects. The Scheduler will return one list
of Selection objects for each requested instance, representing the selected
host as well as any alternates.
Alternatives
------------
We could cache the allocation_request data in placement, and simply return a
key along with the resource providers. When a claim needs to be made, the key
would be POSTed instead of the full allocation_request data, and Placement
would use the cached data to carry out the claim. This has the advantage that
nothing on the Nova side of things ever uses the data in the
allocation_request; to Nova, it's an opaque blob. The downside, of course, is
that Placement would have to handle the cache.
We could return the full allocation_request data to the scheduler, and then
handle the caching and key management on the Nova side. When a claim/unclaim is
needed, the allocation_request would be retrieved from this cache and POSTed to
placement. This alternative doesn't require any changes to placement, but
requires that both the API-level cell and all local cells have access to some
form of 'global ram' cache that is accessible across cells.
We could just return an unstructured bunch of Python data, and add a ton of
comments everywhere it is used in the hope that anyone looking at the code
would understand what each bit represents, and that every future change to the
data required would be able to be handled without versioning.
Data model impact
-----------------
There will be no changes to any database schemas, but this will introduce a new
versioned object. This object will contain the following fields, along with
their types::
* compute_node_uuid: fields.UUIDField
* service_host: fields.StringField
* nodename: fields.StringField
* cell_uuid: fields.UUIDField
* numa_limits: fields.ObjectField("NUMATopologyLimits")
* allocation_request: fields.StringField
There isn't a good field type for the allocation_request value, as it is a
complex nested structure, so instead we'll store it as its JSON respresentation
in a StringField. The structure of an allocation_request, as described in this
spec[2], looks like::
"allocations": [
{
"resource_provider": {
"uuid": $COMPUTE_NODE1_UUID
},
"resources": {
"VCPU": $AMOUNT_REQUESTED_VCPU,
"MEMORY_MB": $AMOUNT_REQUESTED_MEMORY_MB
}
},
{
"resource_provider": {
"uuid": $SHARED_STORAGE_UUID
},
"resources": {
"DISK_GB": $AMOUNT_REQUESTED_DISK_GB
}
},
]
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
None
Developer impact
----------------
It will make life a little easier for anyone working with the Nova codebase by
not making them decipher complex data structures, but other than that, none.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
ed-leafe
Other contributors:
None
Work Items
----------
* Create the Selection object.
* Modify the scheduler's select_destinations() method to populate these objects
with the selected host info and return them.
Dependencies
============
None
Testing
=======
This is one part of the overall sweeping changes being made in Queens, and all
of it will have to be tested. The Selection object will need some basic tests,
but the bulk of the testing will be in the conductor to verify that it is
working with Selection objects for host selection, resource claiming, and
retries on build failures.
Documentation Impact
====================
The developer reference docs will need to be updated to document this new
object. The docs for the scheduler workflow will also need to be updated to
reflect these changes.
References
==========
The initial problem was documented in a blog post[0], and was then discussed at
the Nova Scheduler subteam meeting[1], where this approach was agreed upon.
[0] https://blog.leafe.com/handling-unstructured-data/
[1] http://eavesdrop.openstack.org/meetings/nova_scheduler/2017/nova_scheduler.2017-08-28-14.00.log.html#l-140
[2] https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/placement-allocation-requests.html
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Queens
- Introduced