I/O (PCIe) based NUMA scheduling
This feature will add support for intelligent NUMA node placement for
guests that have been assigned a hosts PCI device, avoiding unnecessary
memory transactions.
Implements: blueprint input-output-based-numa-scheduling
Change-Id: I0bb5ac4ccf4a2defd9fd2a186f0a13dc3717d83b
This commit is contained in:
183
specs/juno/input-output-based-numa-scheduling.rst
Normal file
183
specs/juno/input-output-based-numa-scheduling.rst
Normal file
@@ -0,0 +1,183 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
================================
|
||||
I/O (PCIe) based NUMA scheduling
|
||||
================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/input-output-based-numa-scheduling
|
||||
|
||||
I/O based NUMA scheduling will add support for intelligent NUMA node placement
|
||||
for guests that have been assigned a host PCI device, avoiding unnecessary
|
||||
memory transactions
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Currently it is common for virtualisation host platforms to exhibit multi NUMA
|
||||
node characteristics.
|
||||
|
||||
An optimal configuration would be where the guests assigned PCI device and RAM
|
||||
allocation are associated with the same NUMA node. This will ensure there is
|
||||
no cross NUMA node memory traffic.
|
||||
|
||||
To reach a remote NUMA node the memory request must traverse the inter CPU
|
||||
link and use the remote NUMA nodes associated memory controller to access the
|
||||
remote node. This incurs a latency penalty on remote NUMA node memory access
|
||||
which is not desirable for NFV workloads.
|
||||
|
||||
Openstack needs to offer more fine grained control of NUMA configuration in
|
||||
order to deliver higher performing, lower latency guest applications. The
|
||||
default guest placement policy is to use any available pCPU or NUMA node.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Libvirt now provides the numa node a PCI device is associated with, we will
|
||||
use this information to populate the nova DB. For versions of libvirt that do
|
||||
not provide this information we will add a fall back mechanism to query the
|
||||
host for this info.
|
||||
|
||||
Logic will be added to nova scheduler to allow it decide on which host is best
|
||||
able satisfy a guests PCI NUMA node requirements.
|
||||
|
||||
Logic, similar to what will be implemented in the nova scheduler will be added
|
||||
to the libvirt driver to allow it decide on which NUMA node to place the guest.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Libvirt supports integration with a NUMA daemon (numad) that monitors NUMA
|
||||
topology and usage. It attempts to locate guests for optimum NUMA locality,
|
||||
dynamically adjusting to changing system conditions.
|
||||
|
||||
This is insufficient because we need this intelligence in nova for host
|
||||
selection and node deployment.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
The PciDevice model will be extended to add a field identifying the NUMA node
|
||||
that PCI device is associated with.
|
||||
|
||||
numa_node = Column(Integer, nullable=False, default="-1")
|
||||
|
||||
A DB migration script will use ALTER_TABLE to add a new column to the
|
||||
pci_devices table in nova DB.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
There will be no change to the REST API.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
This blueprint does not introduce any new security issues.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
This blueprint does not introduce new notifications.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
This blueprint adds no other end user impact.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
The benefits of associating a guests PCI device and RAM allocation with the
|
||||
same NUMA node will provides an optimal configuration that will give improved
|
||||
I/O throughput and reduced memory latencies, compared with the default libvirt
|
||||
guest placement policy.
|
||||
|
||||
This feature will add some scheduling overhead, but this overhead will deliver
|
||||
improved performance on the host.
|
||||
|
||||
The optimisation described here is dependent on the guest CPU and RAM
|
||||
allocation being associated with the same NUMA node. This feature is described
|
||||
in the "Virt driver guest NUMA node placement & topology" blueprint referenced
|
||||
in the dependency section.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
To use this feature the deployer must use HW that is capable of reporting
|
||||
numa related info to the OS.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
This blueprint will have no developer impact.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
James Chapman
|
||||
|
||||
Other contributors:
|
||||
Przemyslaw Czesnowicz
|
||||
Sean Mooney
|
||||
Adrian Hoban
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Add a NUMA node attribute to the pci_device object
|
||||
* Use libvirt to discover hosts PCI device NUMA node association
|
||||
* Enable nova compute synchronise PCI device NUMA node associations with nova
|
||||
DB
|
||||
* Enable libvirt driver configure guests with requested PCI device NUMA node
|
||||
associations
|
||||
* Enable the nova scheduler decide on which host is best able to support
|
||||
a guest
|
||||
* Enable libvirt driver decide on which NUMA node to place a guest
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
The blueprint listed below will define a policy used by the scheduler to decide
|
||||
on which host to place a guest. We plan to respect this policy while extending
|
||||
it to add support for a PCI devices NUMA node association.
|
||||
|
||||
Virt driver guest NUMA node placement & topology
|
||||
* https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement
|
||||
|
||||
The blueprint listed below will support use cases requiring SR-IOV NICs to
|
||||
participate in neutron managed networks.
|
||||
|
||||
Enable a nova instance to be booted up with neutron SRIOV ports
|
||||
* https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Scenario tests will be added to validate these modifications.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
This feature will not add a new scheduling filter, but as it depends on the bp
|
||||
mentioned in the dependency section we will need to extend their filter. We
|
||||
will add documentation as required.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
Support for NUMA and VCPU topology configuration
|
||||
* https://blueprints.launchpad.net/nova/+spec/virt-driver-guest-cpu-memory-placement
|
||||
|
||||
Virt driver guest NUMA node placement & topology
|
||||
* https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement
|
||||
|
||||
Enable a nova instance to be booted up with neutron SRIOV ports
|
||||
* https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov
|
||||
Reference in New Issue
Block a user