From 4b506d0e9f75200940aa41de419cc626b7dfbbbf Mon Sep 17 00:00:00 2001 From: Stephen Finucane Date: Tue, 18 Jun 2019 16:52:09 +0100 Subject: [PATCH] Additional upgrade clarifications for cpu-resources Based on feedback from the mailing list [1] and changes during implementation. Key changes: - We don't require an operator set both 'cpu_shared_set' and 'cpu_dedicated_set'. Clarify why this is the case. - Add an upgrade summary and summaries to each upgrade step, since this is by far the hairiest part of the whole exercise. - Replace references to an optional prefilter with the scheduler aliasing with optional fallback functionality actually used [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/thread.html#7084 Change-Id: I468abe984d81c264a588f23d4b3804106339a597 Signed-off-by: Stephen Finucane Blueprint: cpu-resources --- specs/train/approved/cpu-resources.rst | 123 +++++++++++++++++++------ 1 file changed, 96 insertions(+), 27 deletions(-) diff --git a/specs/train/approved/cpu-resources.rst b/specs/train/approved/cpu-resources.rst index 6c7dab67f..55d77d192 100644 --- a/specs/train/approved/cpu-resources.rst +++ b/specs/train/approved/cpu-resources.rst @@ -163,13 +163,6 @@ whether hosts have hyperthreading or not. To this end, we will add the new ``HW_CPU_HYPERTHREADING`` trait, which will be reported for hosts where hyperthreading is detected. -.. note:: - - The ``HW_CPU_HYPERTHREADING`` trait will need to be among the traits that - the virt driver cannot always override, since the operator may want to - indicate that a single NUMA node on a multi-NUMA-node host is meant for - guests that tolerate hyperthread siblings as dedicated CPUs. - .. note:: This has significant implications for the existing CPU thread policies @@ -329,9 +322,10 @@ confusing. Data model impact ----------------- -The ``NUMATopology`` object will need to be updated to include -``cpu_shared_set`` and ``cpu_dedicated_set`` fields and to deprecate the -``cpu_set`` field. +The ``NUMATopology`` object will need to be updated to include a new +``pcpuset`` field, which complements the existing ``cpuset`` field. In the +future, we may wish to rename these to e.g. ``cpu_shared_set`` and +``cpu_dedicated_set``. REST API impact --------------- @@ -400,13 +394,22 @@ situations: * `NUMA, CPU Pinning and 'vcpu_pin_set' `__ +A key point here is that the new behavior must be opt-in during Train. We +recognize that operators may need time to upgrade a critical number of compute +nodes so that they are reporting ``PCPU`` classes. This is reflected at +numerous points below. + Configuration options ~~~~~~~~~~~~~~~~~~~~~ +:Summary: A user must unset the ``vcpu_pin_set`` and ``reserved_host_cpus`` + config options and set one or both of the existing ``[compute] + cpu_shared_set`` and new ``[compute] cpu_dedicated_set`` options. + We will deprecate the ``vcpu_pin_set`` config option in Train. If both the ``[compute] cpu_dedicated_set`` and ``[compute] cpu_shared_set`` config options -are set in Train, this option will be ignored entirely and ``[compute] -cpu_shared_set`` will be used in place of ``vcpu_pin_set`` to calculate the +are set in Train, the ``vcpu_pin_set`` option will be ignored entirely and +``[compute] cpu_shared_set`` will be used instead to calculate the amount of ``VCPU`` resources to report for each compute node. If the ``[compute] cpu_dedicated_set`` option is not set in Train, we will issue a warning and fall back to using ``vcpu_pin_set`` as the set of host logical @@ -426,22 +429,42 @@ float across the cores that are supposed to be "dedicated" to the pinned instances. We will also deprecate the ``reserved_host_cpus`` config option in Train. If -both the ``[compute] cpu_dedicated_set`` and ``[compute] cpu_shared_set`` +either the ``[compute] cpu_dedicated_set`` or ``[compute] cpu_shared_set`` config options are set in Train, the value of the ``reserved_host_cpus`` config option will be ignored and neither the ``VCPU`` nor ``PCPU`` inventories will have a reserved value unless explicitly set via the placement API. -If the ``[compute] cpu_dedicated_set`` config option is not set, a warning will -be logged stating that ``reserved_host_cpus`` is deprecated and that the -operator should set both ``[compute] cpu_shared_set`` and ``[compute] -cpu_dedicated_set``. +If neither the ``[compute] cpu_dedicated_set`` or ``[compute] cpu_shared_set`` +config options are set, a warning will be logged stating that +``reserved_host_cpus`` is deprecated and that the operator should set either +``[compute] cpu_shared_set`` and ``[compute] cpu_dedicated_set``. The meaning of ``[compute] cpu_shared_set`` will change with this feature, from being a list of host CPUs used for emulator threads to a list of host CPUs used for both emulator threads and ``VCPU`` resources. Note that because this option already exists, we can't rely on its presence to do things like ignore ``vcpu_pin_set``, as outlined previously, and must rely on ``[compute] -cpu_dedicated_set`` instead. +cpu_dedicated_set`` instead. For this same reason, we will only use ``[compute] +cpu_shared_set`` to determine the number of ``VCPU`` resources if +``vcpu_pin_set`` is unset. If ``vcpu_pin_set`` is set, a warning will be logged +and ``vcpu_pin_set`` will continue to be used to calculate the number of +``VCPU`` resource available while ``[compute] cpu_shared_set`` will continue to +be used only for emulator threads. + +.. note:: + + It is possible that there are already hosts in the wild that have + ``[compute] cpu_shared_set`` set but do not have ``vcpu_pin_set`` set. + We consider this is to be exceptionally unlikely and purposefully ignore + this combination. The only reason to define ``[compute] cpu_shared_set`` in + Stein or before is to use emulator thread offloading, which is used to + isolate the additional work the emulator needs to do from the work the guest + OS is doing. It is mainly required for real-time use cases. The use of + ``[compute] cpu_shared_set`` without ``vcpu_pin_set`` could result in + instance vCPUs being pinned to any host core including those listed in + ``cpu_shared_set``. This would defeat the whole purpose of the feature and + is very unlikely to be configured by the performance conscious users of this + feature, hence the reason for the scenario being ignored. Finally, we will change documentation for the ``cpu_allocation_ratio`` config option to make it abundantly clear that this option ONLY applies to ``VCPU`` @@ -450,26 +473,44 @@ and not ``PCPU`` resources Flavor extra specs and image metadata properties ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -We will alias the ``hw:cpu_policy`` flavor extra spec and ``hw_cpu_policy`` -image metadata option to ``resources=(V|P)CPU:${flavor.vcpus}`` using a -scheduler prefilter. For flavors/images using the ``shared`` policy, we will -replace this with the ``resources=VCPU:${flavor.vcpus}`` extra spec, and for -flavors/images using the ``dedicated`` policy, we will replace this with the +:Summary: We will attempt to rewrite legacy flavor extra specs and image + metadata properties to the new resource types and traits, falling + back if no matches are found. + +We will alias the legacy ``hw:cpu_policy`` and ``hw:cpu_thread_policy`` flavor +extra specs and their ``hw_cpu_policy`` and ``hw_cpu_thread_policy`` image +metadata counterparts to placement requests. + +The ``hw:cpu_policy`` flavor extra spec and ``hw_cpu_policy`` image metadata +option will be aliased to ``resources=(V|P)CPU:${flavor.vcpus}``. For +flavors/images using the ``shared`` policy, the scheduler will replace this +with the ``resources=VCPU:${flavor.vcpus}`` extra spec, and for flavors/images +using the ``dedicated`` policy, we will replace this with the ``resources=PCPU:${flavor.vcpus}`` extra spec. Note that this is similar, though not identical, to how we currently translate ``Flavour.vcpus`` into a placement request for ``VCPU`` resources during scheduling. -In addition, we will alias the ``hw:cpu_thread_policy`` flavor extra spec and -``hw_cpu_thread_policy`` image metadata option to -``trait:HW_CPU_HYPERTHREADING`` using a scheduler prefilter. For flavors/images -using the ``isolate`` policy, we will replace this with +The ``hw:cpu_thread_policy`` flavor extra spec and ``hw_cpu_thread_policy`` +image metadata option will be aliased to ``trait:HW_CPU_HYPERTHREADING``. For +flavors/images using the ``isolate`` policy, we will replace this with ``trait:HW_CPU_HYPERTHREADING=forbidden``, and for flavors/images using the ``require`` policy, we will replace this with the ``trait:HW_CPU_HYPERTHREADING=required`` extra spec. +If the requests for placement inventory matching these requests fails, we will +revert to the legacy behavior and query placement once more. This second +request may return hosts that have been upgraded but these requests will fail +once the instance reaches the compute node as the libvirt driver will reject +it. + Placement inventory ~~~~~~~~~~~~~~~~~~~ +:Summary: We will automatically reshape inventory of existing instances using + pinned CPUs to use inventory of the ``PCPU`` resource class instead + of ``VCPU``. This will happen once the ``[compute] + cpu_dedicated_set`` config option is set. + For existing compute nodes that have guests which use dedicated CPUs, the virt driver will need to move inventory of existing ``VCPU`` resources (which are actually using dedicated host CPUs) to the new ``PCPU`` resource class. @@ -486,6 +527,32 @@ for the instance itself and N ``PCPU`` allocated to avoid another instance using them). This will be considered legacy behavior and won't be supported for new instances. +Summary +~~~~~~~ + +The final upgrade process will look like similar to standard upgrades, though +there are some slight changes necessary: + +- Upgrade controllers + +- Update compute nodes in batches + + For compute nodes hosting pinned instances: + + - If set, unset ``vcpu_pin_set`` and set ``[compute] cpu_dedicated_set``. If + unset, set ``[compute] cpu_dedicated_set`` to the entire range of host + CPUs. + + For compute nodes hosting unpinned instances: + + - If set, unset ``vcpu_pin_set`` and set ``[compute] cpu_shared_set``. If + unset, no action is necessary unless: + + - If set, unset ``reserved_host_cpus`` and set ``[compute] cpu_shared_set`` + to the entire range of host cores minus a number of host cores you wish to + reserve. + + Implementation ============== @@ -571,3 +638,5 @@ History - Proposed again, not accepted * - Train - Proposed again + * - Ussuri + - Updated, based on final implementation