Files
nova-specs/specs/mitaka/implemented/sriov-physical-function-passthrough.rst
Dirk Mueller 77b2ae9c5f Fix citation references (Sphinx 1.6.x compatibility)
Sphinx 1.6.x gained a cross-reference check warning that caused
the build to fail in "tox -e docs" environment. Removing underscores
from the citation reference identifier ensures that the cross references
are properly detected. Similarly missing references to footnotes are
now a warning (which is upgraded to an error in the docs tox
environment) so adjust references where it makes sense and is needed.

Closes-Bug: #1695127
Change-Id: I7e55dcf910e0ba6dd85b565db8cb1ecbdd39634a
2017-06-05 10:39:37 +02:00

287 lines
12 KiB
ReStructuredText

..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
============================================================
Enable passthrough of SR-IOV physical functions to instances
============================================================
https://blueprints.launchpad.net/nova/+spec/sriov-physical-function-passthrough
Nova has supported passthrough of PCI devices with its libvirt driver for a
few releases already, during which time the code has seen some stabilization
and a few minor feature additions.
In the case of SR-IOV enabled cards, it is possible to treat any port on the
card either as a number of virtual devices (called VFs - virtual functions) or
as a full device (PF - physical function).
Nova's current handling exposes only virtual functions as resources that can
be requested by instances - and this is the most common use case by far.
However with the rise of the requirements to virtualize network applications,
it can be necessary to give instances full control over the port and not just a
single virtual function.
OpenStack is seen as one of the central bits of technology for the NFV
use-cases, and a lot of the work has already gone into making OpenStack and
Nova NFV enabled, so we want to make sure that we close these small remaining
gaps.
Problem description
===================
Currently it is not possible to pass through a physical function to an
OpenStack instance, but some NFV applications need to have full control of the
port, while others are happy with using a VF of an SR-IOV enabled card. It is
beneficial to be able to do so with the same set of cards, as pre-provisioning
resources on the granularity smaller than compute hosts is cumbersome
to manage and goes against the goal of Nova to provide on demand
resources. We want to be able to give certain instances unlimited access to the
port by assigning the PF to it, but revert back to using VFs when the PF is not
being used, so as to ensure on-demand provisioning of available resources. This
may not be possible with every SR-IOV card and their respective Linux drivers,
in which case certain ports will need to be pre-provisioned as either PFs or
VFs by administratior ahead of time.
This in turn means that Nova would have to keep track of which VFs belong to
particular PFs and make sure that this is reflected in the way resources are
tracked (so even a single VF being used means the related PF is unavailable and
vice versa, if a PF is being used, all of it's VFs are marked as used).
PCI device management code in Nova currently filters out any
device that is a physical function (this is currently hard-coded). In
addition, modeling of PCI device resources in Nova currently assumes flat
hierarchy and resource tracking logic does not understand the relationship
between different PCI devices that can be exposed to Nova.
Use Cases
----------
Certain NFV workloads may need to have the full control of the physical device,
in order to use some of the functionality not available to VFs, to bypass some
limitations certian cards impose on VFs, or to exclusively use the full
bandwidth of the port. However, due to the dynamic nature of the elastic cloud,
and the promise of Nova to deliver resources on demand, we do not wish to have
to pre-provision certain SR-IOV cards to be used as PFs as this defeats the
promise of the infrastructure management tool that allows for quick
re-purposing of resources that Nova brings.
Modern SR-IOV enabled cards along with their drivers usually allow for such
reconfiguration to be done on the fly, so once the passthrough of the PF is no
longer needed on a specific host (either the instance using it got moved or
deleted), the PF is bound back to it's Linux driver, thus enabling the use of
VFs provided that initialization steps (if any are needed) are done upon
handing the device back. It is not possible to
guarantee that this always works however, due to the vast range of equipment
and drivers available on the market, so we want to make sure that there is a
way to tell Nova that a card is in certain configuration and cannot be assumed
to be reconfigurable.
Additional use cases (that will require further work) will be enabled by having
the Nova data model usefully express the relationship between PF and its VFs.
Some of them have been proposed as separate specs (see [1]_ and [2]_).
Proposed change
===============
Two problems we need to solve are:
1) How to enable requesting a full physical device. This means extending the
InstancePCIRequest data model to be able to hold this information. Since
the whitelist parsing logic that builds up the Spec objects probes the
system and has the information about whether a device is a PF or not, it is
enough to add a physical_function field to the PCI alias schema and the
PCIRequest object.
2) Enable scheduling and resource tracking based on the request that can now
be for the whole device. This means extending the data model for PCIDevices
to hold information about relationship between physical and virtual
functions (this relationship is already recorded but not in a suitable
format), and also extending the
way we expose the aggregate data about PCI devices to the resource tracker
(a.k.a. the PCIDeviceStats class) to be able to present PFs and their
counts, and to make sure to track the corresponding VFs that become
unavailable once the PF is claimed/used.
In addition to the above, we will want to make sure that whitelist syntax can
support passing throught PFs. This will require very few changes it turns out.
Currently if a whitelist entry
specifies an address or a devname of a PF, the matching code will make sure
any of the VFs match. This behavior, combined with allowing a device that is a
PF to be tracked by nova (by removing the hard-coded check that skips any PFs)
should be sufficient to allow most of the flexibility administrators need.
As it is not sufficient for a device to be whitelisted to be requestable by
users (it needs to either have an alias that is specified on the flavor),
simply defaulting to whitelisting PFs along with all of their VFs if a PF
address is whitelisted gives us the flexibility we need, while keeping
backwards compatibility.
As is the case with the current implementation, there is some initial
configuration that will be needed on hosts that have PCI devices that can be
passed through. In addition to the standard setup needed to enable SR-IOV and
configure the cards, and
whitelist configuration setup that Nova requires, administrators may also need
to add an automated way (such as udev rules) to re-enable VFs, since
depending on the driver and the card used, any existing
configuration may be lost once a VM is given full control of the port, and the
device is unbound from the host driver.
In order for PFs to work as Neutron ports, some additional work that is outside
of scope of this blueprint will be needed. We aim to make internal Nova changes
that are needed the focus here and defer on the integration work to a future
(possibly cross-project) blueprint. For the libvirt driver, this means that,
since there will be no Neutron support
at first, the only way to assign such a device would be using the <hostdev>
element, and no support for <interface> is in scope for this blueprint.
Alternatives
------------
There are no real alternatives that cover all of the use cases. An alternative
that would cover only the requirement for bandwidth would be to allow for
reserving of all VFs of a single PF by a single instance while using only a
single VF, effectively reserving the bandwidth. In addition to not being a
solution for all the applications, it also does not reduce the complexity of
the change much as the relationship between VFs still needs to be modeled in
Nova.
Data model impact
-----------------
Even though there is a way currently to figure out the PF a single VF belongs
to (through the use of `extra_info` free-form field) it may be necessary to add
a more "query friendly" relationship, that will allow us to answer the question
"given a PCI device record that is a PF, which VF records does it contain".
It is likely to be implemented as a foreign key relationship to the same table,
and objects support will be added, but the actual implementation discussion is
better suited for the actual code proposal review.
It will also be necessary to be able to know relations between individual PFs
and VFs in the aggregate view of the PCI device data used in scheduling, so
changes to the way PciDeviceStats holds aggregate
data. This will also result in changes to the filtering/claiming logic, the
extent of which may impact decisions about the data model so this is
best discussed on actual implementation changes.
REST API impact
---------------
There are no API changes required. PCI devices are requested through flavor
extra-specs by specifying an alias of a device specification. Currently,
device specifications and their aliases are part of the Nova deployment
configuration, and thus are deployment specific.
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None - non-admin users will continue to use only things exposed to them via
flavor extra-specs, which they cannot modify in any way.
Performance Impact
------------------
Scheduling of instances requiring PCI passthrough devices will be doing more
work and on a bit more data than currently in the case of PF requests. It is
unlikely that this will have any noticeable performance impact however.
Other deployer impact
---------------------
PCI alias syntax for enabling the PCI devices will become more feature-full, in
order to account for specifically requesting a PF.
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Nikola Đipanov <ndipanov@redhat.com>
Other contributors:
Vladik Romanovsky <vromanso@redhat.com>
Work Items
----------
* Re-work the DB models and corresponding objects to have explicit relationship
between the PF entry and it's corresponding VFs. Update the claiming
logic inside the PCI manager class so that claiming/assigning the PF claims
all of it's VFs and vice versa.
* Change the PCIDeviceStats class to expose PFs in it's pools, and change the
claiming/consuming logic to claim appropriate amounts of VFs when a PF is
consumed or claimed. Once this work item is complete, all of the scheduling
and resource tracking logic will be aware of the PF constraint.
* Add support for specifying the PF requirement through the pci_alias
configuration options, so that it can be requested through flavor
extra-specs.
Dependencies
============
None
Testing
=======
Changes proposed here only extend existing functionality, so they will require
updating the current test suite to make sure new functionality is covered.
It is expected that the tests currently in place are to prevent any regression
to the existing functionality. No new test suites are required to be added for
this functionality, only new test cases.
Documentation Impact
====================
Documentation for the PCI passthrough features in Nova will need to be updated
to reflect the above changes - that is to say - no impact out of the ordinary.
References
==========
.. [1] https://review.openstack.org/#/c/182242/
.. [2] https://review.openstack.org/#/c/142094/
History
=======
Optional section for Mitaka intended to be used each time the spec
is updated to describe new design, API or any database schema
updated. Useful to let reader understand what's happened along the
time.
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Mitaka
- Introduced