Merge "re-propose numa with placement"

2020-05-22 09:24:44 +00:00 · 2020-05-22 09:24:44 +00:00 · ad3192d320
parent 216f5941ee e290c1fcac
commit ad3192d320
1 changed files with 652 additions and 0 deletions
--- a/specs/victoria/approved/numa-topology-with-rps.rst
+++ b/specs/victoria/approved/numa-topology-with-rps.rst
@ -0,0 +1,652 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+=====================================
+NUMA Topology with Resource Providers
+=====================================
+
+https://blueprints.launchpad.net/nova/+spec/numa-topology-with-rps
+
+Now that `Nested Resource Providers`_ is a thing in both Placement API and
+Nova compute nodes, we could use the Resource Providers tree for explaining
+the relationship between a root Resource Provider (root RP) ie. a compute node,
+and one or more Non-Uniform Memory Access (NUMA) nodes (aka. cells), each of
+them having separate resources, like memory or PCI devices.
+
+.. note::
+
+  This spec only targets to model resource capabilities for NUMA nodes in some
+  general and quite abstract manner. We won't address in this spec how we
+  should model NUMA-affinized hardware like PCI devices or GPUs and will
+  discuss these relationships in a later spec.
+
+
+Problem description
+===================
+
+The NUMATopologyFilter checks a number of resources, including emulator threads
+policies, CPU pinned instances and memory page sizes. Additionally, it does two
+different verifications :
+
+- *whether* some host can fit the query because it has enough capacity
+
+- *which* resource(s) should be used for this query (eg. which pCPUs or NUMA
+  node)
+
+
+With NUMA topologies modeled as Placement resources, those two questions could
+be answered by the Placement service as potential allocation candidates that
+the filter would *only* be responsible for choosing between them in some
+very specific cases (eg. PCI device NUMA affinity, CPU pinning and NUMA
+anti-affinity).
+
+Accordingly, we could model the host memory and the CPU topologies as a set of
+resource providers arranged in a tree, and just directly allocate resources for
+a specific instance from a resource provider subtree representing a NUMA node
+and its resources.
+
+That said, non resource-related features (like `choosing a specific CPU pin
+within a NUMA node for a vCPU`_) would still be only done by the virt driver,
+and are not covered by this spec.
+
+Use Cases
+---------
+
+Consider the following NUMA topology for a "2-NUMA nodes, 4 cores" host with no
+Hyper-Threading:
+
+.. code::
+
+    +--------------------------------------+
+    |                  CN1                 |
+    +-+---------------+--+---------------+-+
+      |     NUMA1     |  |     NUMA2     |
+      +-+----+-+----+-+  +-+----+-+----+-+
+        |CPU1| |CPU2|      |CPU3| |CPU4|
+        +----+ +----+      +----+ +----+
+
+Here, CPU1 and CPU2 would share the same memory through a common memory
+controller, while CPU3 and CPU4 would share their own memory.
+
+Ideally, applications that require low-latency memory access from multiple
+vCPUs on the same instance (for parallel computing reasons) would like to
+ensure that those CPU resources are provided by the same NUMA node, or some
+performance penalties would occur (if your application is CPU-bound or
+I/O-bound of course). For the moment, if you're an operator, you can use flavor
+extra specs to indicate a desired guest NUMA topology for your instance like:
+
+.. code::
+
+  $ openstack flavor set FLAVOR-NAME \
+      --property hw:numa_nodes=FLAVOR-NODES \
+      --property hw:numa_cpus.N=FLAVOR-CORES \
+      --property hw:numa_mem.N=FLAVOR-MEMORY
+
+See all the `NUMA possible extra specs`_ for a flavor.
+
+.. note ::
+
+  The example above is only needed when you want to not evenly divide your
+  virtual CPUs and memory between NUMA nodes, of course.
+
+
+Proposed change
+===============
+
+Given there are a lot of NUMA concerns, let's do an iterative approach about
+the model we agree.
+
+NUMA nodes being nested Resource Providers
+------------------------------------------
+
+Given virt drivers can amend a provider tree given by the compute node
+ResourceTracker, then the libvirt driver could create child providers for each
+of the 2 sockets representing separate NUMA node.
+
+Since CPU resources are tied to a specific NUMA node, it makes sense to model
+the corresponding resource classes as part of the child NUMA Resource
+Providers. In order to facilitate querying NUMA resources, we propose to
+decorate the NUMA child resource providers with a specific trait named
+``HW_NUMA_ROOT`` that would be on each NUMA *node*. That would help to know
+which hosts would be *NUMA-aware* and which others are not.
+
+Memory is a bit tougher to represent. The granularity of a NUMA node having
+an amount of attached memory is somehow a first approach but we're missing the
+point that the smallest allocatable unit you can assign with Nova is
+really a page size. Accordingly, we should rather model our NUMA subtree
+with children Resource Providers that represent the smallest unit of memory
+you can allocate, ie. a page size. Since a pagesize is not a *consumable*
+amount but rather a *qualitative* information that helps us to allocate
+``MEMORY_MB`` resources, we propose three traits :
+
+- ``MEMORY_PAGE_SIZE_SMALL`` and ``MEMORY_PAGE_SIZE_LARGE`` would allow us to
+  know whether the memory page size is default or optionally configured.
+
+- ``CUSTOM_MEMORY_PAGE_SIZE_<X>`` where <X> is an integer would allow us to
+  know the size of the page in KB. To make it clear, even if the trait is a
+  custom one, it's important to have a naming convention for it so the
+  scheduler could ask about page sizes without knowing all the traits.
+
+
+.. code::
+
+                                   +-------------------------------+
+                                   |  <CN_NAME>                    |
+                                   |  DISK_GB: 5                   |
+                                   +-------------------------------+
+                                   |  (no specific traits)         |
+                                   +--+---------------------------++
+                                      |                           |
+                                      |                           |
+               +-------------------------+                   +--------------------------+
+               | <NUMA_NODE_O>           |                   | <NUMA_NODE_1>            |
+               | VCPU: 8                 |                   | VCPU: 8                  |
+               | PCPU: 16                |                   | PCPU: 8                  |
+               +-------------------------+                   +--------------------------+
+               | HW_NUMA_ROOT            |                   | HW_NUMA_ROOT             |
+               +-------------------+-----+                   +--------------------------+
+                 /                 |    \                                          /+\
+                 +                 |     \_____________________________          .......
+                 |                 |                                   \
+   +-------------+-----------+   +-+--------------------------+   +-------------------------------+
+   | <RP_UUID>               |   | <RP_UUID>                  |   | <RP_UUID>                     |
+   | MEMORY_MB: 1024         |   | MEMORY_MB: 1024            |   |MEMORY_MB: 10240               |
+   | step_size=1             |   | step_size=2                |   |step_size=1024                 |
+   +-------------------------+   +----------------------------+   +-------------------------------+
+   |MEMORY_PAGE_SIZE_SMALL   |   |MEMORY_PAGE_SIZE_LARGE      |   |MEMORY_PAGE_SIZE_LARGE         |
+   |CUSTOM_MEMORY_PAGE_SIZE_4|   |CUSTOM_MEMORY_PAGE_SIZE_2048|   |CUSTOM_MEMORY_PAGE_SIZE_1048576|
+   +-------------------------+   +----------------------------+   +-------------------------------+
+
+
+.. note ::
+
+    As we said above, we don't want to support children PCI devices for Ussuri
+    at the moment. Other current children RPs for a root compute node, like
+    ones for VGPU resources or bandwidth resources would still have their
+    parent be the compute node.
+
+NUMA RP
+-------
+
+Resource Provider names for NUMA nodes shall follow a convention of
+``nodename_NUMA#`` where nodename would be the hypervisor hostname (given by
+the virt driver) and where NUMA# would literally be a string made of 'NUMA'
+postfixed by the NUMA cell ID which is provided by the virt driver.
+
+Each NUMA node would be then a child Resource Provider, having two resource
+classes :
+
+* ``VCPU``: for telling how many virtual cores (not able to be pinned) the NUMA
+  node has.
+* ``PCPU``: for telling how many possible pinned cores the NUMA node has.
+
+A specific trait should be decorating it as we explained : ``HW_NUMA_ROOT``.
+
+Memory pagesize RP
+------------------
+
+Each `NUMA RP`_ should have child RPs for each possible memory page
+size per host, and having a single resource class :
+
+* ``MEMORY_MB``: for telling how much memory the NUMA node has in that specific
+  page size.
+
+This RP would be decorated by two traits :
+
+ - either ``MEMORY_PAGE_SIZE_SMALL`` (default if not configured) or
+   ``MEMORY_PAGE_SIZE_LARGE`` (if large pages are configured)
+
+ - the size of the page size : CUSTOM_MEMORY_PAGE_SIZE_# (where # is the size
+   in KB - default to 4 as the kernel defaults to 4KB page sizes)
+
+
+Compute node RP
+---------------
+
+The root Resource Provider (ie. the compute node) would only provide resources
+for classes that are not NUMA-related. Existing children RPs for vGPUs or
+bandwidth-aware resources should still have this parent (until we discuss
+about NUMA affinity for PCI devices).
+
+
+Optionally configured NUMA resources
+------------------------------------
+
+Given there are NUMA workloads but also non-NUMA workloads, it's also important
+for operators to just have compute nodes accepting the latter.
+That said, having the compute node resources to be split between multiple
+NUMA nodes could be a problem for those non-NUMA workloads if they want to keep
+the existing behaviour.
+
+For example, say an instance with 2 vCPUs and one host having 2 NUMA nodes but
+each one only accepting one VCPU, then the Placement API wouldn't accept that
+host (given each nested RP only accepts one VCPU). For that reason, we need to
+have a configuration for saying which resources should be nested.
+To reinforce the above, that means a host would be either NUMA or non-NUMA,
+hence non-NUMA workloads being set on a specific NUMA node if host is set so.
+The proposal we make here will be :
+
+.. code::
+
+  [compute]
+  enable_numa_reporting_to_placement = <bool> (default None for Ussuri)
+
+
+For below, we will tell hosts as "NUMA-aware" ones that have this option be
+``True``. For hosts that have this option to ``False`` they are explicitely
+asked to have a legacy behaviour and will be called "non-NUMA-aware".
+
+Depending on the value of the option, Placement would accept or not a host
+for the according request. The resulting matrix can be::
+
+  +----------------------------------------+----------+-----------+----------+
+  | ``enable_numa_reporting_to_placement`` | ``None`` | ``False`` | ``True`` |
+  +========================================+==========+===========+==========+
+  | NUMA-aware flavors                     | Yes      | No        | Yes      |
+  +----------------------------------------+----------+-----------+----------+
+  | NUMA-agnostic flavors                  | Yes      | Yes       | No       |
+  +----------------------------------------+----------+-----------+----------+
+
+where ``Yes`` means that there could be allocation candidates from this host,
+while ``No`` means that no allocation candidates will be returned.
+
+In order to distinghish compute nodes that have the ``False`` value instead of
+``None``, we will decorate the former with a specific trait name
+``HW_NON_NUMA``. Accordingly, we will query Placement by adding this forbidden
+trait for *not* getting nodes that operators explicitly don't want them to
+support NUMA-aware flavors.
+
+.. note::
+   By default, the value for that configuration option will be ``None`` for
+   upgrade reasons. By the Ussuri timeframe, operators will have to decide
+   which hosts they want to support NUMA-aware instances and which should be
+   dedicated for 'non-NUMA-aware' instances. A `nova-status pre-upgrade check`
+   command will be provided that will warn them to decide before upgrading to
+   Victoria, if the default value is about to change as we could decide later
+   in this cycle. Once we stop supporting ``None`` (in Victoria or later), the
+   ``HW_NON_NUMA`` trait would no longer be needed so we could stop querying
+   it.
+
+.. note::
+   Since we allow a transition period for helping the operators to decide, we
+   will also make clear that this is a one-way change and that we won't
+   provide a backwards support for turning a NUMA-aware host into a
+   non-NUMA-aware host.
+
+See the `Upgrade impact`_ section for further details.
+
+.. note:: Since the discovery of a NUMA topology is made by virt drivers, it
+          makes the population of those nested Resource Providers to necessarly
+          be done by each virt driver. Consequently, while the above
+          configuration option is said to be generic, the use of this option
+          for populating the Resource Providers tree will only be done by
+          the virt drivers. Of course, a shared module could be imagined for
+          the sake of consistency between drivers, but this is an
+          implementation detail.
+
+
+The very simple case: I don't care about a NUMA-aware instance
+--------------------------------------------------------------
+
+For flavors just asking for, say, vCPUs and memory without asking them to be
+NUMA-aware, then we will make a single Placement call asking to *not* land
+them on a NUMA-aware host::
+
+    resources=VCPU:<X>,MEMORY_MB=<Y>
+    &required=!HW_NUMA_ROOT
+
+In this case, even if NUMA-aware hosts have enough resources for this query,
+the Placement API won't provide them but only non-NUMA-aware ones (given the
+forbidden ``HW_NUMA_ROOT`` trait).
+We're giving the possibility to the operator to shard their clouds between
+NUMA-aware hosts and non-NUMA-aware hosts but that's not really changing the
+current behaviour as of now where operators create aggregates to make sure
+non-NUMA-aware instances can't land on NUMA-aware hosts.
+
+See the `Upgrade impact` session for rolling upgrade situations where clouds
+are partially upgraded to Ussuri and where only a very few nodes are reshaped.
+
+
+Asking for NUMA-aware vCPUs
+---------------------------
+
+As NUMA-aware hosts have a specific topology with memory being in a grand-child
+RP, we basically need to ensure we can translate the existing expressiveness in
+the flavor extra specs into a Placement allocation candidates query that asks
+for parenting between the NUMA RP containing the ``VCPU`` resources and the
+memory pagesize RP containing the ``MEMORY_MB`` resources.
+
+Accordingly, here are some examples:
+
+* for a flavor of 8 VCPUs, 8GB of RAM and ``hw:numa_nodes=2``::
+
+    resources_MEM1=MEMORY_MB:4096
+    &required_MEM1=MEMORY_PAGE_SIZE_SMALL
+    &resources_PROC1=VCPU:4
+    &required_NUMA1=HW_NUMA_ROOT
+    &same_subtree=_MEM1,_PROC1,_NUMA1
+    &resources_MEM2=MEMORY_MB:4096
+    &required_MEM2=MEMORY_PAGE_SIZE_SMALL
+    &resources_PROC2=VCPU:4
+    &required_NUMA2=HW_NUMA_ROOT
+    &same_subtree=_MEM2,_PROC2,_NUMA2
+    &group_policy=none
+
+
+.. note::
+   We use ``none`` as a value for ``group_policy`` which means that in this
+   example, allocation candidates can all be from ``PROC1`` group meaning
+   that we defeat the purpose of having the resources separated into different
+   NUMA nodes (which is the purpose of ``hw:numa_nodes=2``). This is OK
+   as we will also modify the ``NUMATopologyFilter`` to only accept
+   allocation candidates for a host that are in different NUMA nodes.
+   It will probably be implemented in the ``nova.virt.hardware`` module but
+   that's an implementation detail.
+
+* for a flavor of 8 VCPUs, 8GB of RAM and ``hw:numa_nodes=1``::
+
+    resources_MEM1=MEMORY_MB:8192
+    &required_MEM1=MEMORY_PAGE_SIZE_SMALL
+    &resources_PROC1=VCPU:8
+    &required_NUMA1=HW_NUMA_ROOT
+    &same_subtree=_MEM1,_PROC1,_NUMA1
+
+* for a flavor of 8 VCPUs, 8GB of RAM and
+  ``hw:numa_nodes=2&hw:numa_cpus.0=0,1&hw:numa_cpus.1=2,3,4,5,6,7``::
+
+    resources_MEM1=MEMORY_MB:4096
+    &required_MEM1=MEMORY_PAGE_SIZE_SMALL
+    &resources_PROC1=VCPU:2
+    &required_NUMA1=HW_NUMA_ROOT
+    &same_subtree=_MEM1,_PROC1,_NUMA1
+    &resources_MEM2=MEMORY_MB:4096
+    &required_MEM2=MEMORY_PAGE_SIZE_SMALL
+    &resources_PROC2=VCPU:6
+    &required_NUMA2=HW_NUMA_ROOT
+    &same_subtree=_MEM2,_PROC2,_NUMA2
+    &group_policy=none
+
+* for a flavor of 8 VCPUs, 8GB of RAM and
+  ``hw:numa_nodes=2&hw:numa_cpus.0=0,1&hw:numa_mem.0=1024
+  &hw:numa_cpus.1=2,3,4,5,6,7&hw:numa_mem.1=7168``::
+
+    resources_MEM1=MEMORY_MB:1024
+    &required_MEM1=MEMORY_PAGE_SIZE_SMALL
+    &resources_PROC1=VCPU:2
+    &required_NUMA1=HW_NUMA_ROOT
+    &same_subtree=_MEM1,_PROC1,_NUMA1
+    &resources_MEM2=MEMORY_MB:7168
+    &required_MEM2=MEMORY_PAGE_SIZE_SMALL
+    &resources_PROC2=VCPU:6
+    &required_NUMA2=HW_NUMA_ROOT
+    &same_subtree=_MEM2,_PROC2,_NUMA2
+    &group_policy=none
+
+As you can understand, the ``VCPU`` and ``MEMORY_MB`` values will be a result
+of the division of respectively the flavored vCPUs and the flavored memory by
+the value of ``hw:numa_nodes`` (which is actually already calculated and
+provided as NUMATopology object information in the RequestSpec object).
+
+.. note::
+   The translation mechanism from a flavor-based request into Placement query
+   will be handled by the scheduler service.
+
+.. note::
+   Since memory is provided as grand-child, we need to always ask for a
+   ``MEMORY_PAGE_SIZE_SMALL`` which is the default.
+
+
+Asking for specific memory page sizes
+-------------------------------------
+
+
+Operators defining a flavor of 2 vCPUs, 4GB of RAM and
+``hw:mem_page_size=2MB,hw:numa_nodes=2`` will see that the Placement query will
+become::
+
+    resources_PROC1=VCPU:1
+    &resources_MEM1=MEMORY_MB:2048
+    &required_MEM1=CUSTOM_MEMORY_PAGE_SIZE_2048
+    &required_NUMA1=HW_NUMA_ROOT
+    &same_subtree=_PROC1,_MEM1,_NUMA1
+    &resources_PROC2=VCPU:1
+    &resources_MEM2=MEMORY_MB:2048
+    &required_MEM2=CUSTOM_MEMORY_PAGE_SIZE_2048
+    &required_NUMA2=HW_NUMA_ROOT
+    &same_subtree=_PROC2,_MEM2,_NUMA2
+    &group_policy=none
+
+If you only want large page size support without really specifying which size
+(eg. by specifying ``hw:mem_page_size=large`` instead of, say, ``2MB``), then
+the above same request for large pages would translate into::
+
+    resources_PROC1=VCPU:1
+    &resources_MEM1=MEMORY_MB:2048
+    &required_MEM1=MEMORY_PAGE_SIZE_LARGE
+    &required_NUMA1=HW_NUMA_ROOT
+    &same_subtree=_PROC1,_MEM1,_NUMA1
+    &resources_PROC2=VCPU:1
+    &resources_MEM2=MEMORY_MB:2048
+    &required_MEM2=MEMORY_PAGE_SIZE_LARGE
+    &required_NUMA2=HW_NUMA_ROOT
+    &same_subtree=_PROC2,_MEM2,_NUMA2
+    &group_policy=none
+
+Asking the same with ``hw:mem_page_size=small`` would translate into::
+
+    resources_PROC1=VCPU:1
+    &resources_MEM1=MEMORY_MB:2048
+    &required_MEM1=MEMORY_PAGE_SIZE_SMALL
+    &required_NUMA1=HW_NUMA_ROOT
+    &same_subtree=_PROC1,_MEM1,_NUMA1
+    &resources_PROC2=VCPU:1
+    &resources_MEM2=MEMORY_MB:2048
+    &required_MEM2=MEMORY_PAGE_SIZE_SMALL
+    &required_NUMA2=HW_NUMA_ROOT
+    &same_subtree=_PROC2,_MEM2,_NUMA2
+    &group_policy=none
+
+And eventually, asking with ``hw:mem_page_size=any`` would mean::
+
+    resources_PROC1=VCPU:1
+    &resources_MEM1=MEMORY_MB:2048
+    &required_NUMA1=HW_NUMA_ROOT
+    &same_subtree=_PROC1,_MEM1,_NUMA1
+    &resources_PROC2=VCPU:1
+    &resources_MEM2=MEMORY_MB:2048
+    &required_NUMA2=HW_NUMA_ROOT
+    &same_subtree=_PROC2,_MEM2,_NUMA2
+    &group_policy=none
+
+
+.. note:: As we said for vCPUs, given we query with ``group_policy=none``,
+   allocation candidates would be within the same NUMA node but that's fine
+   since we also said that the scheduler filter would then no agree with
+   them if there is a ``hw:numa_nodes=X`` there.
+
+The fallback case for NUMA-aware flavors
+----------------------------------------
+
+In the `Optionally configured NUMA resources`_ section, we said that we would
+want to accept NUMA-aware flavors to land on hosts that have the
+``enable_numa_reporting_to_placement`` option set to ``None``. Since we can't
+yet build a ``OR`` query for allocation candidates, we propose to make another
+call to Placement.
+In this specific call (we name it a fallback call), we want to get all
+non-reshaped nodes that are *not* explicitly said to not support NUMA.
+In this case, the request is fairly trivial since we decorated them with the
+``HW_NON_NUMA`` trait::
+
+  resources=VCPU:<X>,MEMORY_MB=<Y>
+  &required=!HW_NON_NUMA,!HW_NUMA_ROOT
+
+Then we would get all compute nodes that have the ``None`` value (
+including nodes that are still running the Train release in a rolling upgrade
+fashion).
+
+Of course, we would get nodes that could potentially *not* accept the
+NUMA-aware flavor but we rely on the ``NUMATopologyFilter`` for not selecting
+them, exactly like what we do in Train.
+
+There is some open question about whether we should do the fallback call only
+if the NUMA-specific call is not getting candidates or if we should generate
+the two calls either way and merge the results.
+The former is better for performance reasons since we avoid a potentially
+unnecessary call but would generate some potential spread/pack affinity issues.
+Here we all agree on the fact we can leave the question unresolved for now and
+defer the resolution to the implementation phase.
+
+Alternatives
+------------
+
+Modeling of NUMA resources could be done by using specific NUMA resource
+classes, like ``NUMA_VCPU`` or ``NUMA_MEMORY_MB`` that would only be set for
+children NUMA resource providers, and where ``VCPU`` and ``MEMORY_MB`` resource
+classes would only be set on the root Resource Provider (here the compute
+node).
+
+If the Placement allocations candidates API was also able to provide a way to
+say 'you can split the resources between resource providers', we wouldn't need
+to carry a specific configuration option for a long time. All hosts would then
+be reshaped to be NUMA-aware but then non-NUMA-aware instances could
+potentially land on those hosts. That wouldn't change the fact that for
+optimal capacity, operators need to shard their clouds between NUMA workloads
+and non-NUMA ones, but from a Placement perspective, all hosts would be equal.
+This alternative proposal has largely already been discussed in a
+spec but the outcome consensus was that it was very
+difficult to implement and potentially not worth the difficulty.
+
+Data model impact
+-----------------
+None
+
+REST API impact
+---------------
+
+None
+
+Security impact
+---------------
+None
+
+Notifications impact
+--------------------
+None
+
+Other end user impact
+---------------------
+
+None, flavors won't need to be modified since we will provide a translation
+mechanism. That said, we will explicitly explain in the documentation that
+we won't support any placement-like extra specs in flavors.
+
+Performance Impact
+------------------
+
+Only when changing the configuration option to ``True``, a reshape is done.
+
+Other deployer impact
+---------------------
+
+Operators would want to migrate some instances from hosts to anothers before
+explicitely enabling or disabling NUMA awareness on their nodes since they will
+have to consider the capacity usage accordingly as they will have to shard
+their cloud. This being said, this would only be necessary for clouds that
+weren't yet already dividing NUMA-aware and non-NUMA-aware workloads between
+hosts thru aggregates.
+
+Developer impact
+----------------
+
+None, except virt driver maintainers.
+
+Upgrade impact
+--------------
+
+As described above, in order to prevent a flavor update during upgrade, we will
+provide a translation mechanism that will take the existing
+flavor extra spec properties and transform them into Placement numbered groups
+query.
+
+Since there will be a configuration option for telling that a host would become
+NUMA-aware, the corresponding allocations accordingly have to change hence the
+virt drivers be responsible for providing a reshape mechanism that will
+eventually call the `Placement API /reshaper endpoint`_ when starting the
+compute service. This reshape implementation will absolutely need to consider
+the Fast Forward Upgrade (FFU) strategy where all controlplane is down and
+should possibly document any extra step required for FFU with an eventual
+removal in a couple of releases once all deployers no longer need this support.
+
+Last but not the least, we will provide a transition period (at least during
+the Ussuri timeframe) where operators can decide which hosts to dedicate to
+NUMA-aware workloads. A specific ``nova-status pre-upgrade check`` command
+will warn them to do so before upgrading to Victoria.
+
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+* sean-k-mooney
+* bauzas
+
+
+Feature Liaison
+---------------
+bauzas
+
+Work Items
+----------
+
+* libvirt driver passing NUMA topology through ``update_provider_tree()`` API
+* Hyper-V driver passing NUMA topology through ``update_provider_tree()`` API
+* Possible work on the NUMATopologyFilter to look at the candidates
+* Scheduler translating flavor extra specs for NUMA properties into Placement
+  queries
+* ``nova-status pre-upgrade check`` command
+
+
+Dependencies
+============
+
+None.
+
+
+Testing
+=======
+
+Functional tests and unittests.
+
+Documentation Impact
+====================
+
+None.
+
+References
+==========
+
+* _`Nested Resource Providers`: https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/nested-resource-providers.html
+* _`choosing a specific CPU pin within a NUMA node for a vCPU`: https://docs.openstack.org/nova/latest/admin/cpu-topologies.html#customizing-instance-cpu-pinning-policies
+* _`NUMA possible extra specs`: https://docs.openstack.org/nova/latest/admin/flavors.html#extra-specs-numa-topology
+* _`Huge pages`: https://docs.openstack.org/nova/latest/admin/huge-pages.html
+* _`Placement API /reshaper endpoint`: https://developer.openstack.org/api-ref/placement/?expanded=id84-detail#reshaper
+* _`Placement can_split`: https://review.opendev.org/#/c/658510/
+* _`physical CPU resources`: https://specs.openstack.org/openstack/nova-specs/specs/train/approved/cpu-resources.html
+
+History
+=======
+
+.. list-table:: Revisions
+   :header-rows: 1
+
+   * - Release Name
+     - Description
+   * - Ussuri
+     - Introduced
+   * - Victoria
+     - Re-proposed