The thread policy aspects of the virt-driver-cpu-pinning feature are incorrect. There were not implemented for Kilo and shouldn't be included in this doc. While there was a note stating as much, it makes more sense to remove the documentation from the CPU pinning spec and move it into the (renamed) CPU thread pinning spec. Change-Id: I28980de107ae8e5b8b8c8394bf7b007ff028e855
268 lines
9.2 KiB
ReStructuredText
268 lines
9.2 KiB
ReStructuredText
..
|
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
|
License.
|
|
|
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
|
|
|
=============================================
|
|
Virt driver pinning guest vCPUs to host pCPUs
|
|
=============================================
|
|
|
|
https://blueprints.launchpad.net/nova/+spec/virt-driver-cpu-pinning
|
|
|
|
This feature aims to improve the libvirt driver so that it is able to strictly
|
|
pin guest vCPUS to host pCPUs. This provides the concept of "dedicated CPU"
|
|
guest instances.
|
|
|
|
Problem description
|
|
===================
|
|
|
|
If a host is permitting overcommit of CPUs, there can be prolonged time
|
|
periods where a guest vCPU is not scheduled by the host, if another guest is
|
|
competing for the CPU time. This means that workloads executing in a guest can
|
|
have unpredictable latency, which may be unacceptable for the type of
|
|
application being run.
|
|
|
|
Use Cases
|
|
---------
|
|
|
|
Depending on the workload being executed, the end user or cloud admin may
|
|
wish to have control over which physical CPUs (pCPUs) are utilized by the
|
|
virtual CPUs (vCPUs) of any given instance.
|
|
|
|
Project Priority
|
|
----------------
|
|
|
|
None
|
|
|
|
Proposed change
|
|
===============
|
|
|
|
The flavor extra specs will be enhanced to support one new parameter:
|
|
|
|
* hw:cpu_policy=shared|dedicated
|
|
|
|
If the policy is set to 'shared' no change will be made compared to the current
|
|
default guest CPU placement policy. The guest vCPUs will be allowed to freely
|
|
float across host pCPUs, albeit potentially constrained by NUMA policy. If the
|
|
policy is set to 'dedicated' then the guest vCPUs will be strictly pinned to a
|
|
set of host pCPUs. In the absence of an explicit vCPU topology request, the
|
|
virt drivers typically expose all vCPUs as sockets with 1 core and 1 thread.
|
|
When strict CPU pinning is in effect the guest CPU topology will be setup to
|
|
match the topology of the CPUs to which it is pinned, i.e. if a 2 vCPU guest is
|
|
pinned to a single host core with 2 threads, then the guest will get a topology
|
|
of 1 socket, 1 core, 2 threads.
|
|
|
|
The image metadata properties will also allow specification of the pinning
|
|
policy:
|
|
|
|
* hw_cpu_policy=shared|dedicated
|
|
|
|
.. NOTE::
|
|
The original definition of this specification included support for
|
|
configurable CPU thread policies. However, this part of the spec was not
|
|
implemented in OpenStack "Kilo" and has since been extracted into a
|
|
separate proposal attached to
|
|
https://blueprints.launchpad.net/nova/+spec/virt-driver-cpu-thread-pinning.
|
|
|
|
The scheduler will have to be enhanced so that it considers the usage of CPUs
|
|
by existing guests. Use of a dedicated CPU policy will have to be accompanied
|
|
by the setup of aggregates to split the hosts into two groups, one allowing
|
|
overcommit of shared pCPUs and the other only allowing dedicated CPU guests,
|
|
i.e. we do not want a situation with dedicated CPU and shared CPU guests on the
|
|
same host. It is likely that the administrator will already need to setup host
|
|
aggregates for the purpose of using huge pages for guest RAM. The same grouping
|
|
will be usable for both dedicated RAM (via huge pages) and dedicated CPUs (via
|
|
pinning).
|
|
|
|
The compute host already has a notion of CPU sockets which are reserved for
|
|
execution of base operating system services. This facility will be preserved
|
|
unchanged, i.e. dedicated CPU guests will only be placed on CPUs which are not
|
|
marked as reserved for the base OS.
|
|
|
|
Alternatives
|
|
------------
|
|
|
|
There is no alternative way to ensure that a guest has predictable execution
|
|
latency free of cache effects from other guests working on the host, that does
|
|
not involve CPU pinning.
|
|
|
|
The proposed solution is to use host aggregates for grouping compute hosts into
|
|
those for dedicated vs. overcommit CPU policy. An alternative would be to allow
|
|
compute hosts to have both dedicated and overcommit guests, splitting them onto
|
|
separate sockets, i.e. if there were four sockets, two sockets could be used
|
|
for dedicated CPU guests while two sockets could be used for overcommit guests,
|
|
with usage determined on a first-come, first-served basis. A problem with this
|
|
approach is that there is not strict workload isolation even if separate
|
|
sockets are used. Cached effects can be observed, and they will also contend
|
|
for memory access, so the overcommit guests can negatively impact performance
|
|
of the dedicated CPU guests even if on separate sockets. So while this would
|
|
be simpler from an administrative POV, it would not give the same performance
|
|
guarantees that are important for NFV use cases. It would none the less be
|
|
possible to enhance the design in the future, so that overcommit & dedicated
|
|
CPU guests could co-exist on the same host for those use cases where admin
|
|
simplicity is more important than perfect performance isolation. It is believed
|
|
that it is better to start off with the simpler to implement design based on
|
|
host aggregates for the first iteration of this feature.
|
|
|
|
Data model impact
|
|
-----------------
|
|
|
|
The 'compute_node' table will gain a new field to record information about
|
|
what host CPUs are available and what are in use by guest instances with
|
|
dedicated CPU resource assigned. Similar to the 'numa_topology' field this
|
|
will be a structured data field containing something like
|
|
|
|
::
|
|
|
|
{'cells': [
|
|
{
|
|
'cpuset': '0,1,2,3',
|
|
'sib': ['0,1', '2,3'],
|
|
'pin': '0,2',
|
|
'id': 0
|
|
},
|
|
{
|
|
'cpuset': '4,5,6,7',
|
|
'sib': ['4,5', '6,7'],
|
|
'pin': '4',
|
|
'id': 1
|
|
}
|
|
]}
|
|
|
|
The 'instance_extra' table will gain a new field to record information
|
|
about what host CPUs each guest CPU is being pinned to, which will also
|
|
contain structured data similar to that used in the 'numa_topology' field
|
|
of the same table.
|
|
|
|
::
|
|
|
|
{'cells': [
|
|
{
|
|
'id': 0,
|
|
'pin': {0: 0, 1: 3},
|
|
'topo': {'sock': 1, 'core': 1, 'th': 2}
|
|
},
|
|
{
|
|
'id': 1,
|
|
'pin': {2: 1, 3: 2},
|
|
'topo': {'sock': 1, 'core': 1, 'th': 2}
|
|
}
|
|
]}
|
|
|
|
|
|
REST API impact
|
|
---------------
|
|
|
|
No impact.
|
|
|
|
The existing APIs already support arbitrary data in the flavor extra specs.
|
|
|
|
Security impact
|
|
---------------
|
|
|
|
No impact.
|
|
|
|
Notifications impact
|
|
--------------------
|
|
|
|
No impact.
|
|
|
|
The notifications system is not used by this change.
|
|
|
|
Other end user impact
|
|
---------------------
|
|
|
|
There are no changes that directly impact the end user, other than the fact
|
|
that their guest should have more predictable CPU execution latency.
|
|
|
|
Performance Impact
|
|
------------------
|
|
|
|
No impact.
|
|
|
|
Other deployer impact
|
|
---------------------
|
|
|
|
The cloud administrator will gain the ability to define flavors which offer
|
|
dedicated CPU resources. The administrator will have to place hosts into groups
|
|
using aggregates such that the scheduler can separate placement of guests with
|
|
dedicated vs shared CPUs. Although not required by this design, it is expected
|
|
that the administrator will commonly use the same host aggregates to group
|
|
hosts for both CPU pinning and large page usage, since these concepts are
|
|
complementary and expected to be used together. This will minimise the
|
|
administrative burden of configuring host aggregates.
|
|
|
|
Developer impact
|
|
----------------
|
|
|
|
It is expected that most hypervisors will have the ability to setup dedicated
|
|
pCPUs for guests vs shared pCPUs. The flavor parameter is simple enough that
|
|
any Nova driver would be able to support it.
|
|
|
|
Implementation
|
|
==============
|
|
|
|
Assignee(s)
|
|
-----------
|
|
|
|
Primary assignee:
|
|
ndipanov
|
|
|
|
Other contributors:
|
|
berrange
|
|
vladik
|
|
|
|
Work Items
|
|
----------
|
|
|
|
* Enhance libvirt to support setup of strict CPU pinning for guests when the
|
|
appropriate policy is set in the flavor
|
|
|
|
Dependencies
|
|
============
|
|
|
|
* Virt driver guest NUMA node placement & topology
|
|
|
|
https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement
|
|
|
|
Testing
|
|
=======
|
|
|
|
It is not practical to test this feature using the gate and tempest at this
|
|
time, since effective testing will require that the guests running the test
|
|
be provided with multiple NUMA nodes, each in turn with multiple CPUs.
|
|
|
|
The Nova docs/source/devref documentation will be updated to include a
|
|
detailed set of instructions for manually testing the feature. This will
|
|
include testing of the previously developed NUMA and huge pages features
|
|
too. This doc will serve as the basis for later writing further automated
|
|
tests, as well as a useful basis for writing end user documentation on
|
|
the feature.
|
|
|
|
Documentation Impact
|
|
====================
|
|
|
|
The new flavor parameter available to the cloud administrator needs to be
|
|
documented along with recommendations about effective usage. The docs will
|
|
also need to mention the compute host deployment pre-requisites such as the
|
|
need to setup aggregates. The testing guide mentioned in the previous
|
|
section will provide useful material for updating the docs with.
|
|
|
|
References
|
|
==========
|
|
|
|
Current "big picture" research and design for the topic of CPU and memory
|
|
resource utilization and placement. vCPU topology is a subset of this
|
|
work
|
|
|
|
* https://wiki.openstack.org/wiki/VirtDriverGuestCPUMemoryPlacement
|
|
|
|
Previously approved for Juno but implementation not completed
|
|
|
|
* https://review.openstack.org/93652
|
|
|
|
Virt driver pinning guest vCPUs threads to host pCPUs threads blueprint
|
|
|
|
* https://blueprints.launchpad.net/nova/+spec/virt-driver-cpu-thread-pinning
|