Merge "Granular Resource Request Syntax (Rocky)"

2018-02-12 17:57:19 +00:00 · 2018-02-12 17:57:19 +00:00 · 32b00cceb7
parent f350472e62 0c4cc9f3d2
commit 32b00cceb7
1 changed files with 520 additions and 0 deletions
--- a/specs/rocky/approved/granular-resource-requests.rst
+++ b/specs/rocky/approved/granular-resource-requests.rst
@ -0,0 +1,520 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+==================================
+ Granular Resource Request Syntax
+==================================
+
+https://blueprints.launchpad.net/nova/+spec/granular-resource-requests
+
+As `Generic`_ and `Nested Resource Providers`_ begin to crystallize and be
+exercised, it becomes necessary to be able to express:
+
+* _`Requirement 1`: Requesting an allocation of a particular resource class
+  with a particular set of traits, and requesting a *different* allocation of
+  the *same* resource class with a *different* set of traits.
+
+* _`Requirement 2`: Ensuring that requests of certain resources are allocated
+  from the same resource provider.
+
+* _`Requirement 3`: The ability to spread allocations of effectively-identical
+  resources across multiple resource providers in situations of high
+  saturation.
+
+This specification attempts to address these requirements by way of a numbered
+syntax on resource and trait keys in flavor extra_specs and the ``GET
+/allocation_candidates`` `Placement API`_.
+
+.. note:: This document uses "RP" as an abbreviation for "Resource Provider"
+          throughout.
+
+Problem description
+===================
+
+Up to this point with generic and nested resource providers and traits, it is
+only possible to request a single blob of resources with a single blob of
+traits.  More specifically:
+
+* The resources can only be expressed as an integer count of a single
+  resource class.  There is no way to express a second *resource_class*:*count*
+  with the same resource class.
+* All specified traits apply to all requested resources.  There is no way to
+  apply certain traits to certain resources.
+* All resources of a given resource class are allocated from the same RP.
+
+The `Use Cases`_ below exemplify scenarios that cannot be expressed within
+these restrictions.
+
+Use Cases
+---------
+
+Consider the following hardware representation ("wiring diagram"):
+
+.. code::
+
+    +-----------------------------------+
+    |                CN1                |
+    +-+--------------+-+--------------+-+
+      |     NIC1     | |     NIC2     |
+      +-+---+--+---+-+ +-+---+--+---+-+
+        |PF1|  |PF2|     |PF3|  |PF4|
+        +-+-+  +-+-+     +-+-+  +-+-+
+           \      \__   __/      /
+            \        \ /        /
+            |         X         |
+            |    ____/ \____    |
+            |   /           \   |
+          +-+--+-+         +-+--+-+
+          | NET1 |         | NET2 |
+          +------+         +------+
+
+Assume this is modeled in Placement as:
+
+.. code::
+
+    RP1 (represents PF1):
+    {
+        SRIOV_NET_VF=16,
+        NET_EGRESS_BYTES_SEC=1250000000,  # 10Gbps
+        traits: [CUSTOM_NET1, HW_NIC_ACCEL_SSL]
+    }
+    RP2 (represents PF2):
+    {
+        SRIOV_NET_VF=16,
+        NET_EGRESS_BYTES_SEC=1250000000,  # 10Gbps
+        traits: [CUSTOM_NET2, HW_NIC_ACCEL_SSL]
+    }
+    RP3 (represents PF3):
+    {
+        SRIOV_NET_VF=16,
+        NET_EGRESS_BYTES_SEC=125000000,  # 1Gbps
+        traits: [CUSTOM_NET1]
+    }
+    RP4 (represents PF4):
+    {
+        SRIOV_NET_VF=16,
+        NET_EGRESS_BYTES_SEC=125000000,  # 1Gbps
+        traits: [CUSTOM_NET2]
+    }
+
+
+Use Case 1
+~~~~~~~~~~
+As an Operator, I need to be able to express a boot request for an instance
+with **one SR-IOV VF on physical network NET1 and a second SR-IOV VF on
+physical network NET2**.
+
+I expect the scheduler to receive the following allocation candidates:
+
+* ``[RP1(SRIOV_NET_VF:1), RP2(SRIOV_NET_VF:1)]``
+* ``[RP1(SRIOV_NET_VF:1), RP4(SRIOV_NET_VF:1)]``
+* ``[RP3(SRIOV_NET_VF:1), RP2(SRIOV_NET_VF:1)]``
+* ``[RP3(SRIOV_NET_VF:1), RP4(SRIOV_NET_VF:1)]``
+
+This demonstrates the ability to get *different* allocations of the *same*
+resource class from *different* providers in a single request (`Requirement
+1`_).
+
+Use Case 2
+~~~~~~~~~~
+Request: **one VF with egress bandwidth of 10000 bytes/sec**. (No, it doesn't
+make sense that I don't care which physnet I'm on -- mentally replace NET with
+SWITCH if that bothers you.)
+
+Expect:
+
+* ``[RP1(SRIOV_NET_VF:1), RP1(NET_EGRESS_BYTES_SEC:10000)]``
+* ``[RP2(SRIOV_NET_VF:1), RP2(NET_EGRESS_BYTES_SEC:10000)]``
+* ``[RP3(SRIOV_NET_VF:1), RP3(NET_EGRESS_BYTES_SEC:10000)]``
+* ``[RP4(SRIOV_NET_VF:1), RP4(NET_EGRESS_BYTES_SEC:10000)]``
+
+This demonstrates the ability to ensure that allocations of *different*
+resource classes can be made to come from the *same* resource provider
+(`Requirement 2`_).
+
+Use Case 3
+~~~~~~~~~~
+Request:
+
+* **One VF on NET1 with bandwidth 10000 bytes/sec**
+* **One VF on NET2 with bandwidth 20000 bytes/sec on a NIC with SSL
+  acceleration**  (This one should always land on RP2.)
+
+Expect:
+
+| * ``[RP1(SRIOV_NET_VF:1, NET_EGRESS_BYTES_SEC:10000),``
+|   ``RP2(SRIOV_NET_VF:1, NET_EGRESS_BYTES_SEC:20000)]``
+| * ``[RP3(SRIOV_NET_VF:1, NET_EGRESS_BYTES_SEC:10000),``
+|   ``RP2(SRIOV_NET_VF:1, NET_EGRESS_BYTES_SEC:20000)]``
+
+This demonstrates *both* `Requirement 1`_ and `Requirement 2`_.
+
+Use Case 4
+~~~~~~~~~~
+As an Operator, I need to be able to express a request for more than one VF and
+have the request succeed even if my PFs are nearly saturated.  For this use
+case, assume that **each PF resource provider has only two VFs unallocated**.
+I need to be able to express a request for **four VFs on NET1**.
+
+Expect: ``[RP1(SRIOV_NET_VF:2), RP3(SRIOV_NET_VF:2)]``
+
+This demonstrates `Requirement 3`_.
+
+Proposed change
+===============
+
+Numbered Request Groups
+-----------------------
+With the existing syntax (once `Dependencies`_ land), a resource request can be
+logically expressed as:
+
+.. code-block:: python
+
+    resources = { resource_classA: rcA_count,
+                  resource_classB: rcB_count,
+                  ... },
+    required = [ TRAIT_C, TRAIT_D, ... ]
+
+Semantically, each resulting allocation candidate will consist of
+``resource_class``\ *N*: ``rc``\ *N*\ ``_count`` resources spread arbitrarily
+across resource providers within the same tree (i.e. all resource providers in
+a single allocation candidate will have the same ``root_provider_uuid``).
+*Each* resource provider in *each* resulting allocation candidate will possess
+*all* of the listed ``required`` traits.
+
+.. note:: When shared resource providers are fully implemented, the above will
+          read, "...spread arbitrarily across resource providers within the
+          same tree *or aggregate*".
+
+Also, it is unsupported for resource classes or traits to be repeated.
+
+The proposed change is to augment the above to include numbered resource
+groupings as follows:
+
+Logical Representation
+~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+    resources = { resource_classA: rcA_count,
+                  resource_classB: rcB_count,
+                  ... },
+    required = [ TRAIT_C, TRAIT_D, ... ],
+
+    resources1 = { resource_class1A: rc1A_count,
+                   resource_class1B: rc1B_count,
+                   ... },
+    required1 = [ TRAIT_1C, TRAIT_1D, ... ],
+
+    resources2 = { resource_class2A: rc2A_count,
+                   resource_class2B: rc2B_count,
+                   ... },
+    required2 = [ TRAIT_2C, TRAIT_2D, ... ],
+
+    ...,
+
+    resourcesX = { resource_classXA: rcXA_count,
+                   resource_classXB: rcXB_count,
+                   ... },
+    requiredX = [ TRAIT_XC, TRAIT_XD, ... ],
+
+Semantics
+~~~~~~~~~
+The term "results" is used below to refer to the contents of one item in the
+``allocation_requests`` list within the ``GET /allocation_candidates``
+response.
+
+* The semantic for the (single) un-numbered grouping is unchanged.  That is, it
+  may still return results from different RPs in the same tree (or, when
+  "shared" is fully implemented, the same aggregate).
+* However, a numbered group will always return results from the *same* RP.
+  This is to satisfy `Requirement 2`_.
+* Separate groups (numbered or un-numbered) may return results from the same
+  RP.  That is, you are not guaranteeing RP exclusivity by separating groups.
+  (If you want to guarantee such exclusivity, you need to do it with traits.)
+* It is still not supported to repeat a resource class within a given (numbered
+  or un-numbered) ``resources`` grouping, but there is no restriction on
+  repeating a resource class from one grouping to the next.  The same applies
+  to traits.  This is to satisfy `Requirement 1`_.
+* A given ``required``\ *N* list applies *only* to its matching ``resources``\
+  *N* list.  This goes for the un-numbered ``required``/``resources`` as well.
+* The numeric suffixes are arbitrary.  Other than binding ``resources``\ *N* to
+  ``required``\ *N*, they have no implied meaning.  In particular, they are not
+  required to be sequential; and there is no semantic significance to their
+  order.
+* For both numbered and un-numbered ``resources``, a single
+  *resource_class*:*count* will never be split across multiple RPs.
+  While such a split could be seen to be sane for e.g. VFs, it is clearly not
+  valid for e.g. DISK_GB.  If you want to be able to split, use separate
+  numbered groups.  This satisfies `Requirement 3`_.
+* Specifying a ``resources`` (numbered or un-numbered) without a corresponding
+  ``required`` returns results unfiltered by traits.
+* It is an error to specify a ``required`` (numbered or un-numbered) without a
+  corresponding ``resources``.
+
+Syntax In Flavors
+~~~~~~~~~~~~~~~~~
+In reference to the `Logical Representation`_, the existing (once
+`Dependencies`_ have landed) implementation is to specify ``resources`` and
+``required`` traits in the flavor extra_specs as follows:
+
+* Each member of ``resources`` is specified as a separate extra_specs entry of
+  the form:
+
+.. parsed-literal::
+
+    resources:*resource_classA*\ =\ *rcA_count*
+
+* Each member of ``required``  is specified as a separate extra_specs entry of
+  the form:
+
+.. parsed-literal::
+
+    trait:*TRAIT_B*\ =required
+
+For example::
+
+    resources:VCPU=2
+    resources:MEMORY_MB=2048
+    trait:HW_CPU_X86_AVX=required
+    trait:CUSTOM_MAGIC=required
+
+**Proposed:** Allow the same syntax for numbered resource and trait groupings
+via the number being appended to the ``resources`` and ``trait`` keyword:
+
+.. parsed-literal::
+
+    resources\ *N*:*resource_classC*\ =\ *rcC_count*
+    trait\ *N*:*TRAIT_D*\ =required
+
+A given numbered ``resources`` or ``trait`` key may be repeated to specify
+multiple resources/traits in the same grouping, just as with the un-numbered
+syntax.
+
+For example::
+
+    resources:VCPU=2
+    resources:MEMORY_MB=2048
+    trait:HW_CPU_X86_AVX=required
+    trait:CUSTOM_MAGIC=required
+    resources1:SRIOV_NET_VF=1
+    resources1:NET_EGRESS_BYTES_SEC=10000
+    trait1:CUSTOM_PHYSNET_NET1=required
+    resources2:SRIOV_NET_VF=1
+    resources2:NET_EGRESS_BYTES_SEC:20000
+    trait2:CUSTOM_PHYSNET_NET2=required
+    trait2:HW_NIC_ACCEL_SSL=required
+
+Syntax In the Placement API
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+In reference to the `Logical Representation`_, the existing (once
+`Dependencies`_ have landed) `Placement API`_ implementation is via the ``GET
+/allocation_candidates`` querystring as follows:
+
+* The ``resources`` are grouped together under a single key called
+  ``resources`` whose value is a comma-separated list of
+  ``resource_class``\ *N*:``rc``\ *N*\ ``_count``.
+* The traits are grouped together under a single key called ``required`` whose
+  value is a comma-separated list of *TRAIT_Y*.
+
+For example::
+
+    GET /allocation_candidates?resources=VCPU:2,MEMORY_MB:2048
+        &required=HW_CPU_X86_AVX,CUSTOM_MAGIC
+
+**Proposed:** Allow the same syntax for numbered resource and trait groupings
+via the number being appended to the ``resources`` and ``required`` keywords.
+In the following example, groups 1 and 2 represent `Use Case 3`_::
+
+    GET /allocation_candidates?resources=VCPU:2,MEMORY_MB:2048
+        &required=HW_CPU_X86_AVX,CUSTOM_MAGIC
+        &resources1=SRIOV_NET_VF:1,NET_EGRESS_BYTES_SEC:10000
+        &required1=CUSTOM_PHYSNET_NET1
+        &resources2=SRIOV_NET_VF:1,NET_EGRESS_BYTES_SEC:20000
+        &required2=CUSTOM_PHYSNET_NET2,HW_NIC_ACCEL_SSL
+
+There is no change to the response payload syntax.
+
+Alternatives
+------------
+
+* `Requirement 2`_ could also be expressed via aggregates by associating each
+  RP with a unique aggregate, once shared resource providers are fully
+  implemented.
+
+* We could allow the "number" suffixes to be any arbitrary string.  However,
+  using integers is easy to understand and validate, and obviates worries about
+  escaping/encoding special characters, etc.
+
+* There has been discussion over time about the need for a JSON payload-based
+  API to enable richer expression to request allocation candidates.  While this
+  is still a possibility for the future, it was considered unnecessary in this
+  case, as the current requirements can be met via the proposed (relatively
+  simple) enhancements to the querystring syntax of the existing ``GET
+  /allocation_candidates`` API.
+
+* It has been suggested to include (or at least keep the way open for) syntax
+  that would allow the user to express (anti-)affinity of resources.  The
+  change proposed by this spec leaves a small niche of affinity-related use
+  cases unsatisfied.  The scope and exact form of, and real-world need for,
+  these use cases is poorly understood at this time, and is therefore not
+  addressed by this specification.
+
+Data model impact
+-----------------
+None.
+
+REST API impact
+---------------
+
+See `Syntax In the Placement API`_.  To summarize, the ``GET
+/allocation_candidates`` `Placement API`_ is modified to accept arbitrary query
+parameter keys of the format ``resources``\ *N* and ``required``\ *N*, where
+*N* can be any integer.  The format of the values to these query parameters is
+identical to that of ``resources`` and ``required``, respectively.
+
+Otherwise, there is no REST API impact.
+
+Security impact
+---------------
+None
+
+Notifications impact
+--------------------
+None
+
+Other end user impact
+---------------------
+Operators will need to understand the `Syntax In Flavors`_ and the `Semantics`_
+of the changes in order to create flavors exploiting the new functionality.
+See `Documentation Impact`_.
+
+There is no impact on the nova or openstack CLIs.  The existing CLI syntax is
+adequate for expressing the newly-supported extra_specs keys.
+
+Performance Impact
+------------------
+
+Use of the new syntax results in the ``GET /allocation_candidates`` `Placement
+API`_ effectively doing multiple lookups per request.  This has the potential
+to impact performance in the database by a factor of N+1, where N is the number
+of numbered resource groupings specified in a given request.  Clever SQL
+expression may reduce or eliminate this impact.
+
+There should be no impact outside of the database, as this feature should not
+result in a significant increase in the number of records returned by the ``GET
+/allocation_candidates`` API (if anything, the increased specificity will
+*decrease* the number of results).
+
+Other deployer impact
+---------------------
+None
+
+Developer impact
+----------------
+
+Developers of modules supplying Resource Provider representations (e.g. virt
+drivers) will need to be aware of this feature in order to model their RPs
+appropriately.
+
+Upgrade impact
+--------------
+None
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+* efried
+
+Work Items
+----------
+
+Implementation work was begun in Queens.  Several patches were merged; the
+remaining patches have been started but are waiting on dependencies.
+
+https://review.openstack.org/#/q/project:openstack/nova+branch:master+topic:bp/granular-resource-requests
+
+Scheduler
+~~~~~~~~~
+
+* Negotiate microversion capabilities with the `Placement API`_.
+* Recognize and parse the new `Syntax In Flavors`_.
+* If the new flavor extra_specs syntax is recognized and the `Placement API`_
+  is not capable of the appropriate microversion, error.
+* Construct the ``GET /allocation_candidates`` querystring according to the
+  flavor extra_specs.
+* Send the ``GET /allocation_candidates`` request to Placement, specifying the
+  appropriate microversion if the new syntax is in play.
+
+Placement
+~~~~~~~~~
+
+* Publish a new microversion.
+* Recognize and parse the new ``GET /allocation_candidates`` querystring key
+  formats if invoked at the new microversion.
+* Construct the appropriate database query/ies.
+* Everything else is unchanged.
+
+Dependencies
+============
+This work builds on reapproval and completion of the `Nested Resource
+Providers`_ effort.
+
+Testing
+=======
+Functional tests, including gabbits, will be added to exercise the new syntax.
+New fixtures may be required to express some of the more complicated
+configurations, particularly involving nested resource providers.  Test cases
+will be designed to prove various combinations and permutations of the items
+listed in `Semantics`_.  For example, a ``GET /allocation_candidates`` request
+using both numbered and un-numbered groupings against a placement service
+containing multiple nested resource provider trees with three or more levels
+and involving trait propagation.  Migration scenarios will also be tested.
+
+Documentation Impact
+====================
+
+* The `Placement API`_ reference will be updated to describe the new syntax to
+  the ``GET /allocation_candidates`` API.
+* The `Placement Devref`_ will be updated to describe the new microversion.
+* Admin documentation (presumably the same as introduced/enhanced via the
+  `Traits in Flavors`_ effort) will be updated to describe the new `Syntax In
+  Flavors`_.
+
+References
+==========
+
+* `Traits in Flavors`_ spec
+* `Traits in the GET /allocation_candidates API`_ spec
+* `Generic`_ Resource Providers original spec
+* `Nested Resource Providers`_ spec
+* `Placement API`_ reference
+* `Placement Devref`_
+* `<https://etherpad.openstack.org/p/nova-multi-alloc-request-syntax-brainstorm>`_
+* `<https://review.openstack.org/#/q/project:openstack/nova+branch:master+topic:bp/granular-resource-requests>`_
+
+.. _`Traits in Flavors`: https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/request-traits-in-nova.html
+.. _`Traits in the GET /allocation_candidates API`: https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/add-trait-support-in-allocation-candidates.html
+.. _`Generic`: https://specs.openstack.org/openstack/nova-specs/specs/newton/implemented/resource-providers.html
+.. _`Nested Resource Providers`: https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/nested-resource-providers.html
+.. _`Placement API`: https://developer.openstack.org/api-ref/placement/#list-allocation-candidates
+.. _`Placement Devref`: https://docs.openstack.org/nova/latest/user/placement.html
+
+History
+=======
+
+.. list-table:: Revisions
+   :header-rows: 1
+
+   * - Release Name
+     - Description
+   * - Queens
+     - Introduced, approved, implementation started
+   * - Rocky
+     - Reproposed