Merge "Additional upgrade clarifications for cpu-resources"
This commit is contained in:
commit
08c926e6a5
|
@ -163,13 +163,6 @@ whether hosts have hyperthreading or not. To this end, we will add the new
|
|||
``HW_CPU_HYPERTHREADING`` trait, which will be reported for hosts where
|
||||
hyperthreading is detected.
|
||||
|
||||
.. note::
|
||||
|
||||
The ``HW_CPU_HYPERTHREADING`` trait will need to be among the traits that
|
||||
the virt driver cannot always override, since the operator may want to
|
||||
indicate that a single NUMA node on a multi-NUMA-node host is meant for
|
||||
guests that tolerate hyperthread siblings as dedicated CPUs.
|
||||
|
||||
.. note::
|
||||
|
||||
This has significant implications for the existing CPU thread policies
|
||||
|
@ -329,9 +322,10 @@ confusing.
|
|||
Data model impact
|
||||
-----------------
|
||||
|
||||
The ``NUMATopology`` object will need to be updated to include
|
||||
``cpu_shared_set`` and ``cpu_dedicated_set`` fields and to deprecate the
|
||||
``cpu_set`` field.
|
||||
The ``NUMATopology`` object will need to be updated to include a new
|
||||
``pcpuset`` field, which complements the existing ``cpuset`` field. In the
|
||||
future, we may wish to rename these to e.g. ``cpu_shared_set`` and
|
||||
``cpu_dedicated_set``.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
@ -400,13 +394,22 @@ situations:
|
|||
* `NUMA, CPU Pinning and 'vcpu_pin_set'
|
||||
<https://that.guru/blog/cpu-resources/>`__
|
||||
|
||||
A key point here is that the new behavior must be opt-in during Train. We
|
||||
recognize that operators may need time to upgrade a critical number of compute
|
||||
nodes so that they are reporting ``PCPU`` classes. This is reflected at
|
||||
numerous points below.
|
||||
|
||||
Configuration options
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
:Summary: A user must unset the ``vcpu_pin_set`` and ``reserved_host_cpus``
|
||||
config options and set one or both of the existing ``[compute]
|
||||
cpu_shared_set`` and new ``[compute] cpu_dedicated_set`` options.
|
||||
|
||||
We will deprecate the ``vcpu_pin_set`` config option in Train. If both the
|
||||
``[compute] cpu_dedicated_set`` and ``[compute] cpu_shared_set`` config options
|
||||
are set in Train, this option will be ignored entirely and ``[compute]
|
||||
cpu_shared_set`` will be used in place of ``vcpu_pin_set`` to calculate the
|
||||
are set in Train, the ``vcpu_pin_set`` option will be ignored entirely and
|
||||
``[compute] cpu_shared_set`` will be used instead to calculate the
|
||||
amount of ``VCPU`` resources to report for each compute node. If the
|
||||
``[compute] cpu_dedicated_set`` option is not set in Train, we will issue a
|
||||
warning and fall back to using ``vcpu_pin_set`` as the set of host logical
|
||||
|
@ -426,22 +429,42 @@ float across the cores that are supposed to be "dedicated" to the pinned
|
|||
instances.
|
||||
|
||||
We will also deprecate the ``reserved_host_cpus`` config option in Train. If
|
||||
both the ``[compute] cpu_dedicated_set`` and ``[compute] cpu_shared_set``
|
||||
either the ``[compute] cpu_dedicated_set`` or ``[compute] cpu_shared_set``
|
||||
config options are set in Train, the value of the ``reserved_host_cpus`` config
|
||||
option will be ignored and neither the ``VCPU`` nor ``PCPU`` inventories will
|
||||
have a reserved value unless explicitly set via the placement API.
|
||||
|
||||
If the ``[compute] cpu_dedicated_set`` config option is not set, a warning will
|
||||
be logged stating that ``reserved_host_cpus`` is deprecated and that the
|
||||
operator should set both ``[compute] cpu_shared_set`` and ``[compute]
|
||||
cpu_dedicated_set``.
|
||||
If neither the ``[compute] cpu_dedicated_set`` or ``[compute] cpu_shared_set``
|
||||
config options are set, a warning will be logged stating that
|
||||
``reserved_host_cpus`` is deprecated and that the operator should set either
|
||||
``[compute] cpu_shared_set`` and ``[compute] cpu_dedicated_set``.
|
||||
|
||||
The meaning of ``[compute] cpu_shared_set`` will change with this feature, from
|
||||
being a list of host CPUs used for emulator threads to a list of host CPUs used
|
||||
for both emulator threads and ``VCPU`` resources. Note that because this option
|
||||
already exists, we can't rely on its presence to do things like ignore
|
||||
``vcpu_pin_set``, as outlined previously, and must rely on ``[compute]
|
||||
cpu_dedicated_set`` instead.
|
||||
cpu_dedicated_set`` instead. For this same reason, we will only use ``[compute]
|
||||
cpu_shared_set`` to determine the number of ``VCPU`` resources if
|
||||
``vcpu_pin_set`` is unset. If ``vcpu_pin_set`` is set, a warning will be logged
|
||||
and ``vcpu_pin_set`` will continue to be used to calculate the number of
|
||||
``VCPU`` resource available while ``[compute] cpu_shared_set`` will continue to
|
||||
be used only for emulator threads.
|
||||
|
||||
.. note::
|
||||
|
||||
It is possible that there are already hosts in the wild that have
|
||||
``[compute] cpu_shared_set`` set but do not have ``vcpu_pin_set`` set.
|
||||
We consider this is to be exceptionally unlikely and purposefully ignore
|
||||
this combination. The only reason to define ``[compute] cpu_shared_set`` in
|
||||
Stein or before is to use emulator thread offloading, which is used to
|
||||
isolate the additional work the emulator needs to do from the work the guest
|
||||
OS is doing. It is mainly required for real-time use cases. The use of
|
||||
``[compute] cpu_shared_set`` without ``vcpu_pin_set`` could result in
|
||||
instance vCPUs being pinned to any host core including those listed in
|
||||
``cpu_shared_set``. This would defeat the whole purpose of the feature and
|
||||
is very unlikely to be configured by the performance conscious users of this
|
||||
feature, hence the reason for the scenario being ignored.
|
||||
|
||||
Finally, we will change documentation for the ``cpu_allocation_ratio`` config
|
||||
option to make it abundantly clear that this option ONLY applies to ``VCPU``
|
||||
|
@ -450,26 +473,44 @@ and not ``PCPU`` resources
|
|||
Flavor extra specs and image metadata properties
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
We will alias the ``hw:cpu_policy`` flavor extra spec and ``hw_cpu_policy``
|
||||
image metadata option to ``resources=(V|P)CPU:${flavor.vcpus}`` using a
|
||||
scheduler prefilter. For flavors/images using the ``shared`` policy, we will
|
||||
replace this with the ``resources=VCPU:${flavor.vcpus}`` extra spec, and for
|
||||
flavors/images using the ``dedicated`` policy, we will replace this with the
|
||||
:Summary: We will attempt to rewrite legacy flavor extra specs and image
|
||||
metadata properties to the new resource types and traits, falling
|
||||
back if no matches are found.
|
||||
|
||||
We will alias the legacy ``hw:cpu_policy`` and ``hw:cpu_thread_policy`` flavor
|
||||
extra specs and their ``hw_cpu_policy`` and ``hw_cpu_thread_policy`` image
|
||||
metadata counterparts to placement requests.
|
||||
|
||||
The ``hw:cpu_policy`` flavor extra spec and ``hw_cpu_policy`` image metadata
|
||||
option will be aliased to ``resources=(V|P)CPU:${flavor.vcpus}``. For
|
||||
flavors/images using the ``shared`` policy, the scheduler will replace this
|
||||
with the ``resources=VCPU:${flavor.vcpus}`` extra spec, and for flavors/images
|
||||
using the ``dedicated`` policy, we will replace this with the
|
||||
``resources=PCPU:${flavor.vcpus}`` extra spec. Note that this is similar,
|
||||
though not identical, to how we currently translate ``Flavour.vcpus`` into a
|
||||
placement request for ``VCPU`` resources during scheduling.
|
||||
|
||||
In addition, we will alias the ``hw:cpu_thread_policy`` flavor extra spec and
|
||||
``hw_cpu_thread_policy`` image metadata option to
|
||||
``trait:HW_CPU_HYPERTHREADING`` using a scheduler prefilter. For flavors/images
|
||||
using the ``isolate`` policy, we will replace this with
|
||||
The ``hw:cpu_thread_policy`` flavor extra spec and ``hw_cpu_thread_policy``
|
||||
image metadata option will be aliased to ``trait:HW_CPU_HYPERTHREADING``. For
|
||||
flavors/images using the ``isolate`` policy, we will replace this with
|
||||
``trait:HW_CPU_HYPERTHREADING=forbidden``, and for flavors/images using the
|
||||
``require`` policy, we will replace this with the
|
||||
``trait:HW_CPU_HYPERTHREADING=required`` extra spec.
|
||||
|
||||
If the requests for placement inventory matching these requests fails, we will
|
||||
revert to the legacy behavior and query placement once more. This second
|
||||
request may return hosts that have been upgraded but these requests will fail
|
||||
once the instance reaches the compute node as the libvirt driver will reject
|
||||
it.
|
||||
|
||||
Placement inventory
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
:Summary: We will automatically reshape inventory of existing instances using
|
||||
pinned CPUs to use inventory of the ``PCPU`` resource class instead
|
||||
of ``VCPU``. This will happen once the ``[compute]
|
||||
cpu_dedicated_set`` config option is set.
|
||||
|
||||
For existing compute nodes that have guests which use dedicated CPUs, the virt
|
||||
driver will need to move inventory of existing ``VCPU`` resources (which are
|
||||
actually using dedicated host CPUs) to the new ``PCPU`` resource class.
|
||||
|
@ -486,6 +527,32 @@ for the instance itself and N ``PCPU`` allocated to avoid another instance
|
|||
using them). This will be considered legacy behavior and won't be supported for
|
||||
new instances.
|
||||
|
||||
Summary
|
||||
~~~~~~~
|
||||
|
||||
The final upgrade process will look like similar to standard upgrades, though
|
||||
there are some slight changes necessary:
|
||||
|
||||
- Upgrade controllers
|
||||
|
||||
- Update compute nodes in batches
|
||||
|
||||
For compute nodes hosting pinned instances:
|
||||
|
||||
- If set, unset ``vcpu_pin_set`` and set ``[compute] cpu_dedicated_set``. If
|
||||
unset, set ``[compute] cpu_dedicated_set`` to the entire range of host
|
||||
CPUs.
|
||||
|
||||
For compute nodes hosting unpinned instances:
|
||||
|
||||
- If set, unset ``vcpu_pin_set`` and set ``[compute] cpu_shared_set``. If
|
||||
unset, no action is necessary unless:
|
||||
|
||||
- If set, unset ``reserved_host_cpus`` and set ``[compute] cpu_shared_set``
|
||||
to the entire range of host cores minus a number of host cores you wish to
|
||||
reserve.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
|
@ -571,3 +638,5 @@ History
|
|||
- Proposed again, not accepted
|
||||
* - Train
|
||||
- Proposed again
|
||||
* - Ussuri
|
||||
- Updated, based on final implementation
|
||||
|
|
Loading…
Reference in New Issue