PCI device tracking in Placement

This change introduces a spec to describe
modeling of PCI device using Placement.

Blueprint: pci-device-tracking-in-placement

Co-Authored-By: Sean Mooney <smooney@redhat.com>

Change-Id: I6c822427031b6162b3ead8562256d2e80f9070b4
This commit is contained in:
Sean Mooney 2021-05-12 20:32:54 +01:00 committed by Balazs Gibizer
parent d345f4dbd4
commit 480346c85b
1 changed files with 755 additions and 0 deletions

View File

@ -0,0 +1,755 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
================================
PCI Device Tracking In Placement
================================
https://blueprints.launchpad.net/nova/+spec/pci-device-tracking-in-placement
The OpenStack Placement service was designed to provide tracking
of quantitative resources via resource class inventories and qualitative
characteristics via traits. Over the last few cycles, nova has utilized
Placement to track basic resources such as CPUs, RAM, and disk, and more
complex resources such as Virtual GPUs. This spec describes how Nova can
utilize Placement to track generic PCI devices without going into the details
of the NUMA awareness of such devices.
Problem description
===================
Nova has supported generic stateless PCI passthrough for many releases using a
dedicated PCI tracker in conjunction with a ``PciPassthroughFilter`` scheduler
post filter.
The PCI tracker is responsible for tracking which PCI devices are available,
claimed, and allocated, the capabilities of the device, its consumer when
claimed or allocated as well as the type of PCI device and location.
The ``PciPassthroughFilter`` is responsible for ensuring that devices,
requested by the VM, exist on a host during scheduling. These PCI requests come
from two sources: flavor-based PCI requests that are generated using the
``pci_passthrough:alias`` `flavor extra specs`_ and neutron based PCI requests
generated from SR-IOV backed neutron ports.
.. _`flavor extra specs`: https://docs.openstack.org/nova/latest/configuration/extra-specs.html#pci_passthrough:alias
While the current approach to PCI tracking works there are some limitations
in the current design and there is room for optimization.
.. rubric:: Limitations
* During server creation PCI devices are not claimed until the instance_claim
is created on the compute node. As a result, it is possible for two
concurrent server create requests to race for the last device on a host
resulting in re-schedules.
* While Nova today tracks the capabilities of network interfaces in the
``extra_info`` field of the ``pci_devices`` table and the
``PciPassthroughFilter`` could match on those capabilities there is no
user-facing way to express a request for an SR-IOV neutron port with a
specific network capability e.g. TSO.
* There is no admin-facing interface to check the available and allocated PCI
resources in the cloud. The only way is to look into the Nova database.
.. rubric:: Optimizations
* Today when the virt driver is assigning a PCI device on the compute hosts
it needs to look at all available PCI devices on the host and select one that
fulfills the PCI and NUMA requirements. If we model PCI devices in Placement
we only need to consider the devices associated with the PCI resource
allocation in Placement.
* Today when we schedule we perform host filtering of viable hosts based on
PCI devices in python. By utilizing Placement we can move that filtering to
SQL.
Use Cases
---------
- As an operator I want instance creation to atomically claim resources to
decrease the chance of retries.
- As an operator, I want to shorten the time it takes to select a host by
running fewer filters.
- As an operator, I want to utilize traits and resource classes to model
PCI aliases and requests for more expressive device management.
- As an operator, I want to be able to associate quotas with PCI device usage.
- As an operator, I want to be able to use the Placement API to query available
and allocated PCI devices in my cloud.
.. note::
Device quotas would require unified limits to be implemented. Implementing
quotas is out of the scope of this spec beyond enabling the use case by
modeling PCI devices in Placement.
This spec will also only focus on flavor-based PCI passthrough. Neutron
SR-IOV port will be addressed in a follow-up spec to limit the scope.
Proposed change
===============
Opt-in reporting of PCI devices in Placement
--------------------------------------------
To support upgrade of existing deployments with PCI passthrough configured
and to be able to deprecate and eventually remove some of the functionality of
the current PCI device tracker the new Placement based PCI device tracking will
be disabled by default in the first release. The new
``[pci]report_in_placement`` config option can be used to enable the
functionality. It will be defaulted to ``False`` first and once it is turned to
``True`` nova-compute will refuse to start if disabled again. In a future
release, after the PCI tracking in Placement is feature complete, the default
will be changed to ``True``.
PCI device_spec configuration
-----------------------------
Below we propose a change to the ``[pci]passthrough_whitelist`` configuration
option. While we are making this change we will take this opportunity to
update the name of the configuration option. The old name of the
``[pci]passthrough_whitelist`` config option will be deprecated for eventual
removal and a new name ``[pci]device_spec`` will be added. Both the
old and the new name will support the newly proposed ``resource_class`` and
``traits`` tags.
The syntax of the `PCI passthrough device list`_ configuration option will be
extended to support two additional standard tags ``resource_class`` and
``traits``. These new tags will only take effect if the
``[pci]report_in_placement`` config option is set to ``True``.
.. code-block:: js
device_spec = {
"vendor_id": "1002",
"product_id": "67FF",
"resource_class":"CUSTOM_GPU",
"traits": "CUSTOM_RADEON_RX_560", "CUSTOM_GDDR5"
}
device_spec = {
"address": "0000:82:00.0",
"resource_class":"CUSTOM_FPGA",
"traits": "CUSTOM_XILINX_XC7VX690T"
}
The ``resource_class`` tag will be accepted only when the ``physical_network``
tag is not defined and will enable a PCI device to be associated with a custom
resource class. Each PCI whitelist entry may have at most one resource class
associated with it. Devices that have a ``physical_network`` tag will not be
reported in Placement at this time as Neutron based SR-IOV is out of the
scope of the current spec.
Where a PCI device does not have a ``physical_network`` or a ``resource_class``
tag present it will be reported with a generated custom resource class.
The resource class will be ``CUSTOM_PCI_<vendor_id>_<product_id>``.
The ``traits`` tag will be a comma-separated list of standard or custom trait
names that will be reported for the device RP in Placement.
Nova will normalize and prefix the resource class and trait names with
``CUSTOM_``, if isn't already prefixed, before creating them in Placement.
Nova will first check the provided trait name in os_traits and if it exists
as a standard trait then that will be used instead of creating a custom one.
.. note::
Initially traits will only be additive, in the future if we need to we can
allow traits to be removed using a +/- syntax but this is not included
in the scope of this spec.
As detailed in the `Modeling PCI devices in Placement`_ section, each
physical device (PF) will be its own resource provider with inventories of
the relevant PF and VF resource classes. As such traits cannot vary per VF
device under the same parent PF. If VFs are individually matched by different
``device_spec`` entries, then defining different ``traits`` for different
VFs under the same PF is a configuration error and will be rejected.
While it would possible to support defining different ``resource_class``
names for different VFs under the same parent PF, this is considered bad
practice and unnecessary complexity. Such configuration will be rejected.
If different traits need to be supported on a PF than its children VFs
then it is suggested to match the PF and its VFs in two separate
``device_spec`` entries and differentiate the PF and VF traits by namespacing
them, e.g.: CUSTOM_PCI_PF_XXX and CUSTOM_PCI_VF_YYY
.. note::
Nova will detect if the ``resource_class`` or ``traits`` configuration of
an already reported device is changed at a nova-compute service restart. If
the affected device is free the Nova will apply the change in Placement but
if the device is already allocated then the nova-compute service will refuse
to start.
.. note::
In the future when PCI tracking in Placement will be extended to device_spec
entries with ``physical_network`` tag, these entries will not allow
specifying a ``resource_class`` but nova will use the standard
``SRIOV_NET_VF``, ``PCI_NETDEV`` and ``VDPA_NETDEV`` classes. This will
not prevent type-VF and type-PF devices to be consumed via PCI alias, as the
alias can request these standard resource classes too.
.. _`PCI passthrough device list`: https://docs.openstack.org/nova/latest/configuration/config.html#pci.passthrough_whitelist
The new Placement based PCI tracking feature won't support the ``devname`` tag
in the ``[pci]device_spec`` configuration. Usage of this tag is already limited
as not all PCI devices has a device name. Also ``devname`` only works
properly if the names are kept stable across hypervisor reboots and upgrades.
If the ``[pci]report_in_placement`` is set to ``True`` and the
``[pci]device_spec`` has any entry with ``devname`` tag then the nova-compute
service will refuse to start.
Modeling PCI devices in Placement
----------------------------------
PCI device modeling in Placement will closely mirror that of vGPUs.
Each PCI device of type ``type-PCI`` and ``type-PF`` will be modeled as a
Placement resource provider (RP) with the name
``<hypervisor_hostname>_<pci_address>``. The hypervisor_hostname prefix will be
the same string as the name of the root RP. The pci_address part of the
name will be the full PCI address in the same format of ``DDDD:BB:AA.FF``.
.. note::
The pGPU RPs are using the libvirt nodedev name but this spec is not try to
follow that naming scheme as the libvirt nodedev names are not considered
stable. Also nova always uses the RP UUID to identify and RP instead of its
name. So these names are only for troubleshooting purposes.
Each PCI device RP will have an inventory of resource class and traits based
on the ``[pci]device_spec`` entry matching with the given device. If the device
has children devices (VFs) matching with any ``device_spec`` entry then the
resource inventory and traits of the children will be reported to the parent PF
RP too.
If a PCI device is matching a ``device_spec`` entry without a
``physical_network`` tag then an inventory of 1 is reported of the
``resource_class`` specified in the matching ``device_spec`` entry or if
``resource_class`` is not specified there then with the generated
``CUSTOM_PCI_<vendor_id>_<product_id>`` resource class.
If a ``type-VF`` device is matching a ``device_spec`` entry then the related
resource inventory will be reported on RP representing its parent PF device.
The PF RP will be created even if the ``type-PF`` device is not matching any
``device_spec`` entry but in that case, only VF inventory will exist on the RP.
If multiple VFs from the same parent PF is matching the ``device_spec`` then
the total resource inventory of VFs will be the total number of matching VF
devices.
Each PCI device RP will have traits reported according to the ``traits`` tag
of the matching ``device_spec`` entry. Nova might report additional traits on
the device RP automatically for scheduling purposes.
Listing both the parent PF device and any of this children VF devices at the
same time will not be support if ``[pci]report_in_placement`` is enabled. See
`Dependent device handling`_ section for more details.
.. note::
Even though neutron-requested PCI devices are out of the scope of this spec
the handling of ``type-PF`` and ``type-VF`` devices cannot be ignored as
those device types can also be requested via PCI alias by setting the
``device_type`` tag accordingly.
.. note::
The PCI alias can only request devices based on ``vendor_id`` and
``product_id`` today and that information will be automatically included in
the Placement inventory as the resource class.
.. note::
In the future Nova can be extended to automatically report PCI device
capabilities as custom traits in placement. However this is out of scope of
the current spec. If needed the deployer can add these traits via the
``[pci]device_spec`` configuration manually.
Reporting inventories from libvirt to Placement
-----------------------------------------------
The resource tracker's ``update_available_resource`` periodic task calls the
``update_provider_tree`` method on the libvirt virt driver. The
implementation of ``update_provider_tree`` in the libvirt virt driver will be
extended to create the relevant PF RPs, and report inventories and traits via
the ``provider_tree`` interface to Placement.
When ``update_provider_tree`` is called during compute service startup (via
init_host) the virt driver will do a reshape of the provider tree to make sure
that existing VMs with PCI allocation will have the corresponding resource
allocation in Placement as well.
.. note::
The compute restart logic needs to handle the case when a device is not
present any more either due to changes in the ``[pci]device_spec`` config
option or due to a physical device removal from the hypervisor. The driver
needs to modify the VF resource inventory on the PF RP (when a VF is removed)
or delete the PF RP (if the PF is removed and no children VFs matched). Nova
cannot prevent the removal of a PCI device from the hypervisor while the
device is allocated to a VM. Still Nova will emit a warning in such case.
PCI alias configuration
-----------------------
The `PCI alias definition`_ in ``[pci]alias`` configuration option will be
extended to support two new tags, ``resource_class``, ``traits``. The
``resource_class`` tag can hold exactly one resource class name. While the
``traits`` tag can hold a comma-separated list of trait names. Also trait names
in ``traits`` can be prefixed with ``!`` to express a forbidden trait.
When the ``resource_class`` is specified, the ``vendor_id`` and ``product_id``
tags will no longer be required.
.. note::
If both ``resource_class`` and ``vendor_id`` and ``product_id`` fields are
provided in the alias then Nova will use the ``resource_class`` for the
Placement query but the ``vendor_id`` and ``product_id`` filtering will
happen in the ``PciPassthroughFilter``.
.. note::
Later if more complex trait requirements are needed we can add support for
multiple ``traits`` tag by adding a free postfix. Also later we can add
support for ``in:`` prefix in the value of the ``traits`` tag to express
an OR relationship. E.g.
.. code-block:: js
{
"traits1": "T1,!T2",
"traits2": "in:T3,T4"
}
.. _`PCI alias definition`: https://docs.openstack.org/nova/latest/configuration/config.html#pci.alias
Requesting PCI devices
----------------------
The syntax and handling of the ``pci_passthrough:alias`` `flavor extra specs`_
will not change. Also, Nova will continue using the ``InstancePCIRequest`` to
track the requested PCI devices for a VM.
Scheduling
----------
A new prefilter will be added to convert ``InstancePCIRequest`` requests into
Placement resource requests. Each PCI request will be its own Placement named
request group. If a PCI request comes from a PCI alias and the alias does not
have a ``resource_class`` associated with it it will be computed using the
``vendor_id`` and ``product_id`` ``CUSTOM_PCI_<vendor_id>_<product_id>``.
The prefilter will be disabled by default to enable rolling
upgrades. There will be a new config option ``[scheduler]pci_prefilter`` that
can be used to enable the prefilter. Enabling that prefilter has a list of
prerequisites. See the `Upgrade impact`_ section for the full upgrade
procedure.
.. note::
For now the prefilter will only create request groups from PCI requests
coming from the flavor. PCI requests coming from Neutron ports will be
ignored by the prefilter and kept scheduled by the ``PciPassthroughFilter``.
Dependent device handling
-------------------------
Today nova allows matching both a parent PF and its children VFs in the
configuration and these devices are tracked as separate resources. However,
they cannot be consumed independently. When the PF is consumed its children VFs
become unavailable. Also when a VF is consumed its parent PF becomes
unavailable. This dynamic device type selection will be deprecated and the new
Placement based PCI tracking will only allow configuring either the PF device
or its children VF devices. The old PCI tracker will continue support this
functionality but as soon as ``[pci]report_in_placement`` is set to True on a
compute that compute will reject configurations that are enabling both the PF
and in children VFs.
PCI NUMA affinity
-----------------
The PCI NUMA affinity code (mostly in ``hardware.py``) will need to be modified
to limit the PCI devices considered to just those included in the allocation
candidate summary. Also at the same time, this code should provide information
to the scheduler about which allocation candidate is valid from affinity
perspective.
To enable this the allocation candidates will be added to the ``HostState``
object of the filter scheduler. The ``NUMATopologyFilter`` will then need to
pass the allocation candidates to the hardware.py functions which will need to
remove any allocation candidates from that list that do not fulfill the NUMA
requirements. The filter should then pop any invalid allocation candidates
from the ``HostState`` object. At the end of the scheduling process, the filter
scheduler will have to reconstruct the host allocation candidate set from the
``HostState`` object.
By extending the ``HostState`` object with the allocation candidate we will
enable the filters to filter not just by the host but optionally by the
allocation candidates of the host without altering the filter API therefore
maintaining compatibility with external filters.
PCI tracker
-----------
The PCI tracker will have to be enhanced to support allocation aware claims.
To do this we will need to extend the ``PciDevicePool`` object to have a unique
identifier that can be correlated by the resource tracker to the RP in
Placement. The suggested value is the PCI address. In the case of ``type-PF``
and ``type-PCI`` the PCI address for the allocation will be the PCI address of
the device to claim. In the case of ``type-VF`` it will be the address of the
parent device.
Both the instance claim and the move claim need to be extended similarly.
VM lifecycle operations
-----------------------
The initial scheduling is very similar to the later scheduling done due to
move operations. So, the existing implementation can be reused. Also, the
current logic that switches the source node Placement allocation to be held by
the migration UUID can be reused.
Live migration is not supported with PCI alias-based PCI devices and this will
not be changed by the current spec.
Attaching and detaching PCI devices are only supported via Neutron SR-IOV ports
and that is out of the scope of this spec.
Neutron SR-IOV ports (out of scope)
-----------------------------------
This is out of scope in the current spec. But certain aspects of the problem
are already known and therefore listed here.
There are a list of Neutron port ``vnic_type`` (e.g. ``direct``,
``direct-physical``,etc) where the port needs to be backed by VF or PF PCI
devices.
In the simpler case when a port only requires a PCI device but does
not require any other resources (e.g. bandwidth) then Nova needs to create
Placement request groups for each Neutron port with the already proposed
prefilter. See `Scheduling`_ for more details. In this case, neither the
name of the resource class nor the vendor ID, product ID pair is known at
scheduling time (compared to the PCI alias case) therefore the prefilter does
not know what resource class needs to be requested in the Placement request
group.
To resolve this, PCI devices that are intended to be used for Neutron-based
SR-IOV should should not use the ``resource_class`` tag in the
``[pci]device_spec``. Instead Nova will use standard resource classes to
model these resource.
Today nova allows consuming type-PCI or type-VF for ``direct`` ports. This
is mostly there due to historical reasons and it should be cleaned up. A
better device categorization is suggested:
* A device in the ``device_spec`` will be consumable only via PCI alias
if it does not have ``physical_network`` tag attached.
* A device that has ``physical_network`` tag attached will be considered a
network device and it will be modelled as ``PCI_NETDEV`` resource.
* A device that has ``physical_network`` tag and also has the capability to
provide VFs will have a trait ``HW_NIC_SRIOV`` but still use the
``PCI_NETDEV`` resource class.
* A device that has ``physical_network`` tag and is a VF will be modelled
as ``SRIOV_NET_VF`` resource.
This way every Neutron ``vnic_type`` can be mapped to one single resource
class by Nova. The following ``vnic_type`` -> resource class mapping is
suggested:
* ``direct``, ``macvtap``, ``virtio-forwarder``, ``remote-managed`` ->
``SRIOV_NET_VF``
* ``direct-physical`` -> ``PCI_NETDEV``
* ``vdpa`` -> ``VDPA_NETDEV``
Nova will use these resource classes to report device inventories to
Placement. Then the prefilter can translate the ``vnic_type`` of the ports to
request the specific resource class during scheduling.
Another specialty of Neutron-based SR-IOV is that the devices listed in the
``device_spec`` always have a ``physical_network`` tag.
This information needs to be reported as a trait to the PF RP in Placement.
Also, the port's requested physnet needs to be included in the Placement
request group by the prefilter.
There is a more complex case when the Neutron port not only requests a PCI
device but also requests additional resources (e.g. bandwidth) via the port
``resource_request`` attribute. In this case, Nova already generates Placement
request groups from the ``resource_request`` and as in the simple case will
generate a request group from the PCI request. The resource request
of these groups of a neutron port needs to be correlated to ensure that a port
gets the PCI device and the bandwidth from the same physical device. However
today the bandwidth is modeled under the Neutron RP subtree while PCI devices
will be modeled right under the root RP. So the two RPs to allocate from are
not within the same subtree. (Note that Placement always fulfills a named
request group from a single RP but allows correlating such request groups
within the same subtree.) We have multiple options here:
* Create a scheduler filter that removes allocation candidates where these
request groups are fulfilled from different physical devices
* Report the bandwidth and the PCI device resource on the same RP. This breaks
the clear ownership of a single RP as the bandwidth is reported by the
neutron agent while the PCI device is reported by Nova.
* Move the two RPs (bandwidth and PCI dev) into the same subtree. This
needs an agreement between Nova and Neutron devs where to move the RPs and
needs an extra reshape to implement the move.
* Enhance Placement to allow sharing of resources between RPs within the same
RP tree. By that, we could make the bandwidth RP a sharing RP that shares
resources with the PCI device RP representing the physical device.
Based on the selected solution either:
* Neutron requests the specific resource class for the SRIOV
port via the port ``resource_request`` field
* Nova can include these resources to the request when the
``InstancePCIRequest`` objects are created based on the requested ports.
Alternatives
------------
* We could keep using the legacy tracking with all its good and bad properties.
* We could track each PCI device record as a separate RP.
This would result in each VF having its own RP allowing each VF to have
different traits. This is not proposed as it will significantly increase the
possible permutations of allocation candidates per host.
* We could keep supporting the dynamic PF or VF consumption in Placement but
it is deemed more complex than useful. We will keep supporting it via the
legacy code path but the new code path will not support it.
* We could model each PCI device under a NUMA node.
This can be done in the future by moving the RP under a NUMA node RP instead
of the compute node RP but it is declared out of the scope of this initial
spec.
Data model impact
-----------------
``InstancePCIRequest`` object will be extended to include the required and
forbidden traits and the resource class requested via the PCI alias in the
flavor and defined in the PCI alias configuration.
``PciDevicePool`` object will be extended to store PCI address information so
that the PCI device allocated in Placement can be correlated to the PCI device
to be claimed by the PCI tracker.
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
In general, this is expected to improve the scheduling performance but
should have no runtime performance impact on guests.
The introduction of a new prefilter will make the computation of the placement
query slightly longer and the resulting execution time may increase for
instances with PCI requests but should have no effect for instances without
PCI requests. This added complexity is expected to be offset by the removal of
the requirement to enable the ``PciPassthroughFilter`` scheduler filter
eventually. As a result of the offloading of the filtering to Placement and the
removal of reschedules due to racing for the last PCI device on a host, the
overall performance is expected to improve.
Other deployer impact
---------------------
To utilize the new feature the operator will have to define two new config
options. One to enable the placement prefilter and a second to enable the
reporting of the PCI devices to Placement.
Developer impact
----------------
Drivers, other than the libvirt virt driver, need to be adapted to use the new
Placement based PCI device tracking.
Upgrade impact
--------------
The new Placement based PCI tracking will be disabled by default. Deployments
already using PCI devices can freely upgrade to the new Nova version without
any impact. At this state the PCI device management will be done by the
``PciPassthroughFilter`` in the scheduler and the PCI claim in the PCI device
tracker in the compute service same as in the previous version of Nova.
Then after the upgrade the new PCI device tracking can be enabled in two
phases.
First the PCI inventory reporting needs to be enabled by
``[pci]report_to_placement`` on each compute host. During the startup of the
nova-compute service with ``[pci]report_to_placement = True`` config the
service will do the reshape of the provider tree and start reporting PCI device
inventory to Placement. Nova compute will also heal the PCI allocation of the
existing instances in Placement. This healing will be done for new
instances with PCI requests until a future release where the prefilter enabled
by default. This is needed to keep the resource usage in sync in Placement
even if the instance scheduling is done without the prefilter requesting
PCI allocations in Placement.
.. note::
Even after we make the prefilter enabled by default in a future release the
prefilter still need to be kept configurable as we don't know when the hyperv
virt driver will implement PCI tracking in Placement.
.. note::
Operators are encouraged to take the opportunity to rename the
``[pci]passthrough_whitelist`` config option to the new ``[pci]device_spec``
option. The syntax of the two options are the same.
.. note::
The ``devname`` tag is not supported in the ``[pci]device_spec`` and in the
``[pci]passthrough_whitelist`` any more if ``[pci]report_to_placement`` is
enabled. We suggest to use the ``address`` tag instead.
.. note::
If the deployment is configured to rely on the dynamic dependent device
behavior, i.e. both the PF and its children VFs are matching the
``device_spec`` then reconfiguration will be needed as the new code patch
will not support this and the nova-compute service will reject to start with
such configuration. To do the reconfiguration the deployer needs to look at
the current allocation of the PCI devices on each compute node:
* if neither the PF nor any of its children VFs are allocated then the
deployer can decide which device(s) kept in the ``device_spec``.
* if the PF is already allocated then the PF needs to be kept in the
``device_spec`` but all children VFs has to be removed.
* if any of the children VF device is allocated then the parent PF needs to
be removed from the ``device_spec`` and at least the currently allocated
VFs needs to be kept in the config, while other non allocated children VFs
can be kept or removed from the ``device_spec`` at will.
.. note::
Once ``[pci]report_to_placement`` is enabled for a compute host it cannot be
disabled any more.
Second, after every compute has been configured to report PCI inventories to
Placement the scheduling prefilter needs to be enabled in the nova-scheduler
configuration via the ``[scheduler]pci_prefilter`` configuration option.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
balazs-gibizer
Feature Liaison
---------------
Feature liaison:
sean-k-mooney
Work Items
----------
* rename PCI ``passthrough_whitelist`` to ``device_spec``
* support for adding a resource class and traits to ``device_spec``
* introduce ``[pci]report_in_placement``
* reject dependent devices config and ``devname`` config
* PCI reshape and allocation healing for existing instances
* support for adding resource class and required and forbidden traits to PCI
alias
* prefilter
* extension of ``HostState`` object with an allocations candidate list
* extension of scheduler to populate ``HostState`` object with candidates and
then reconstruct the candidate list after filtering.
* extension of ``hardware.py`` to consider allocation candidates when filtering
for NUMA affinity.
* extension of PCI manager claiming to be allocation aware.
Dependencies
============
The unified limits feature exists in an opt-in, experimental state and will
allow defining limits for the new PCI resources if enabled.
Testing
=======
As this is a PCI passthrough related feature it cannot be tested in upstream
tempest. Testing will be primarily done via the extensive unit and functional
test suites that exists for instances with PCI devices and NUMA topology in the
libvirt functional tests.
Documentation Impact
====================
The PCI passthrough doc will have to be rewritten to document the new
``resource_class`` and ``trait`` tags for the PCI ``device_spec`` and
PCI alias.
References
==========
* _`CPU resource tracking spec`: https://specs.openstack.org/openstack/nova-specs/specs/train/implemented/cpu-resources.html
* _`Unified Limits Integration in Nova`: https://specs.openstack.org/openstack/nova-specs/specs/ussuri/approved/unified-limits-nova.html
* _`Support virtual GPU resources`: https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/add-support-for-vgpu.html
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Xena
- Introduced
* - Zed
- Extended and re-proposed