Merge "trivial: Fix spelling, formatting of vDPA spec"
This commit is contained in:
commit
8411c164db
|
@ -5,86 +5,88 @@
|
||||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
======================================
|
======================================
|
||||||
Libvirt: support vdpa based networking
|
Libvirt: support vDPA based networking
|
||||||
======================================
|
======================================
|
||||||
|
|
||||||
https://blueprints.launchpad.net/nova/+spec/libvirt-vdpa-support
|
https://blueprints.launchpad.net/nova/+spec/libvirt-vdpa-support
|
||||||
|
|
||||||
Over the years a number of different technologies have been developed
|
Over the years a number of different technologies have been developed to
|
||||||
to offload networking and other function to external process to accelerate
|
offload networking and other function to external processes in order to
|
||||||
QEMU instance performance. In kernel 5.7+ a new virtual bus know as the
|
accelerate QEMU instance performance. In kernel 5.7+ a new virtual bus know as
|
||||||
``vDPA`` (vHost data path acceleration) was introduced to provide a vendor
|
*vDPA* (vHost data path acceleration) was introduced to provide a vendor
|
||||||
neutral way to accelerate standard virtio device using software or hardware
|
neutral way to accelerate standard virtio device using software or hardware
|
||||||
accelerator implementations. In Libvirt 6.9.0 vDPA support was introduced
|
accelerator implementations. vDPA support was introduced in Libvirt 6.9.0 to
|
||||||
to leverage the VDPA capabilities introduced in QEMU 5.1+.
|
leverage the vDPA capabilities introduced in QEMU 5.1+.
|
||||||
This blueprint tracks enhancing nova to leverage these new capabilities
|
This blueprint tracks enhancing nova to leverage these new capabilities
|
||||||
for offloading to hardware-based smart NICs via hardware offloaded ovs.
|
for offloading to hardware-based smart NICs via hardware offloaded OVS.
|
||||||
|
|
||||||
Problem description
|
Problem description
|
||||||
===================
|
===================
|
||||||
|
|
||||||
Current hardware offloaded networking solutions require vendor specific drivers
|
Current hardware offloaded networking solutions require vendor specific drivers
|
||||||
in the guest to function. The vdpa bus allows an abstraction layer to exist
|
in the guest to function. The vDPA bus allows an abstraction layer to exist
|
||||||
between the accelerator and the vm without the cpu overhead of traditional
|
between the accelerator and the VM without the CPU overhead of traditional
|
||||||
software vHost implementations. VDPA enabled vswitch offloads allow
|
software vHost implementations. vDPA-enabled vSwitch offloads allow
|
||||||
the guest to use standard virtio drivers instead of a vendor specific driver.
|
the guest to use standard virtio drivers instead of a vendor specific driver.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
While VDPA technically can support live migration in the future QEMU
|
While vDPA is expected to support live migration in the future, QEMU does
|
||||||
currently does not support live migration with VDPA devices. One of the main
|
not currently support live migration with vDPA devices. One of the main
|
||||||
advantages of VDPA based networking over sriov is the ability to abstract the
|
advantages of vDPA based networking over SR-IOV is the ability to abstract
|
||||||
device state from the VM allowing transparent live migration via a software
|
the device state from the VM, allowing transparent live migration via a
|
||||||
fallback. Until that fallback is implemented in QEMU, live migration will be
|
software fallback. Until that fallback is implemented in QEMU, live
|
||||||
blocked at the api via a HTTP 409 (Conflict) error response so that we can
|
migration will be blocked at the API layer via a HTTP 409 (Conflict) error
|
||||||
enable it without a new micro-version.
|
response so that we can enable it without a new micro-version.
|
||||||
|
|
||||||
As Open vSiwtch is currently the only control plane capable of managing VDPA
|
As Open vSwitch is currently the only control plane capable of managing vDPA
|
||||||
devices and since that requires hardware offloads to function this spec
|
devices, and since that requires hardware offloads to function, this spec
|
||||||
will focus on enabling VDPA networking exclusively with hardware offloaded
|
will focus on enabling vDPA networking exclusively with hardware-offloaded
|
||||||
OVS. In a future release this functionality can be extended to other vswitch
|
OVS. In a future release this functionality can be extended to other vSwitch
|
||||||
implementations such as VPP or linux bridge if they become VDPA enabled.
|
implementations such as VPP or linux bridge if they become vDPA enabled.
|
||||||
|
|
||||||
Use Cases
|
Use Cases
|
||||||
---------
|
---------
|
||||||
|
|
||||||
As an operator, I want to offer hardware accelerated networking without
|
As an operator, I want to offer hardware-accelerated networking without
|
||||||
requiring tenants to install vendor-specific drivers in the guest.
|
requiring tenants to install vendor-specific drivers in their guests.
|
||||||
|
|
||||||
As an operator, I want to leverage hardware accelerated networking while
|
As an operator, I want to leverage hardware-accelerated networking while
|
||||||
maintaining the ability to have transparent live migration.
|
maintaining the ability to have transparent live migration.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
Transparent live migration will not initially be supported and will be
|
|
||||||
enabled only after it is supported officially in a future QEMU release.
|
Transparent live migration will not initially be supported and will be
|
||||||
|
enabled only after it is supported officially in a future QEMU release.
|
||||||
|
|
||||||
Proposed change
|
Proposed change
|
||||||
===============
|
===============
|
||||||
|
|
||||||
* A new vnic-type vdpa has been introduced to neutron to request vdpa offloaded
|
* A new vnic-type ``vdpa`` has been introduced to neutron to request vDPA
|
||||||
networking. https://github.com/openstack/neutron-lib/commit/8c6ab5e
|
offloaded networking. https://github.com/openstack/neutron-lib/commit/8c6ab5e
|
||||||
* The ``nova.network.model`` class will be extended to define the new vDPA
|
* The ``nova.network.model`` class will be extended to define the new ``vdpa``
|
||||||
vnic-type constant.
|
vnic-type constant.
|
||||||
* The libvirt driver will be extended to generate the vDPA interface XML
|
* The libvirt driver will be extended to generate the vDPA interface XML
|
||||||
* The PCI tracker will be extended with a new device-type ``type-VDPA``.
|
* The PCI tracker will be extended with a new device-type, ``VDPA``.
|
||||||
While the existing whitelist mechanism is unchanged, if a device is
|
While the existing whitelist mechanism is unchanged, if a device is
|
||||||
whitelisted and is bound to a vdpa driver, it will be inventoried as
|
whitelisted and is bound to a vDPA driver, it will be inventoried as
|
||||||
type-VDPA. In the libvirt driver this will be done by extending the private
|
``VDPA``. In the libvirt driver this will be done by extending the private
|
||||||
_get_pci_passthrough_devices and _get_device_type functions to detect
|
``_get_pci_passthrough_devices`` and ``_get_device_type`` functions to detect
|
||||||
if a VF is a parent of a VDPA nodedev. these function are
|
if a VF is a parent of a vDPA nodedev. These functions are called by
|
||||||
called via get_available_resources in the resource tracker to generate the
|
``get_available_resources`` in the resource tracker to generate the
|
||||||
resources dictionary consumed by _setup_pci_tracker at the start up of
|
resources dictionary consumed by ``_setup_pci_tracker`` at the startup of
|
||||||
the compute agent in _init_compute_node.
|
the compute agent in ``_init_compute_node``.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
The vdpa device type is required to ensure that the VF associated with vDPA
|
|
||||||
devices cannot be allocated to VMs via PCI alias or standard neutron SR-IOV
|
The vDPA device type is required to ensure that the VF associated with vDPA
|
||||||
support. VFs associated with VDPA devices cannot be managed using standard
|
devices cannot be allocated to VMs via PCI alias or standard neutron SR-IOV
|
||||||
kernel control plane command such as ip tools, as a result allocating them
|
support. VFs associated with vDPA devices cannot be managed using standard
|
||||||
to an interface managed by the ``sriov-nic-agent`` or via alias based PCI
|
kernel control plane command such as ``ip``. As a result, allocating them
|
||||||
pass-through is not valid. This will also provide automatic numa affinity and
|
to an interface managed by the ``sriov-nic-agent`` or via alias-based PCI
|
||||||
a path to eventurally report vdpa devices in placement as part of generic pci
|
passthrough is not valid. This will also provide automatic NUMA affinity
|
||||||
device tracking in placement in the future.
|
and a path to eventurally report vDPA devices in placement as part of
|
||||||
|
generic PCI device tracking in placement in the future.
|
||||||
|
|
||||||
|
|
||||||
Alternatives
|
Alternatives
|
||||||
|
@ -92,27 +94,27 @@ Alternatives
|
||||||
|
|
||||||
We could delegate vDPA support to cyborg.
|
We could delegate vDPA support to cyborg.
|
||||||
This would still require the libvirt changes and neutron changes while
|
This would still require the libvirt changes and neutron changes while
|
||||||
also complicating the deployment. Since vDPA based NICs are fixed function
|
also complicating the deployment. Since vDPA-based NICs are fixed function
|
||||||
NICs there is not really any advantage to this approach that justifies
|
NICs there is not really any advantage to this approach that justifies
|
||||||
the added complexity of inter service interaction.
|
the added complexity of inter-service interaction.
|
||||||
|
|
||||||
We could use the resources table added for vPMEM devices to track the devices
|
We could use the resources table added for vPMEM devices to track the devices
|
||||||
in the DB instead of the PCI tracker. This would complicate the code paths as
|
in the DB instead of the PCI tracker. This would complicate the code paths as
|
||||||
we would not be able to share any of the PCI numa affinity code that already
|
we would not be able to share any of the PCI NUMA affinity code that already
|
||||||
exists.
|
exists.
|
||||||
|
|
||||||
we could support live migration by treating vDPA devices as if they are
|
We could support live migration by treating vDPA devices as if they are
|
||||||
direct mode SR-IOV interface, nova would hot unplug and plug
|
direct mode SR-IOV interface, meaning nova would hot unplug and plug
|
||||||
the interface during the migration. In the future this could be replaced with
|
the interface during the migration. In the future this could be replaced with
|
||||||
transparent live migration if the QEMU version on both hosts is new enough.
|
transparent live migration if the QEMU version on both hosts is new enough.
|
||||||
since we don't know when that will be this option is deferred until a future
|
Since we don't know when that will be, this option is deferred until a future
|
||||||
release to reduce complexity.
|
release to reduce complexity.
|
||||||
|
|
||||||
A new workaround config option could be added
|
A new workaround config option could be added
|
||||||
enable_virtual_vdpa_devices=True|False (default: False). When set to True it
|
``enable_virtual_vdpa_devices=True|False`` (default: ``False``). When set to
|
||||||
would allow a virtual vdpa devices such as the ``vdpa_sim``
|
``True`` it would allow a virtual vDPA devices such as the ``vdpa_sim``
|
||||||
devices to be tracked and used. Virtual PCI devices do not have a VF or PCI
|
devices to be tracked and used. Virtual PCI devices do not have a VF or PCI
|
||||||
device associated with them, setting this value would result in the no-op
|
device associated with them, so setting this value would result in the no-op
|
||||||
os-vif driver being used and a sentinel value being used to track the device
|
os-vif driver being used and a sentinel value being used to track the device
|
||||||
in the PCI tracker. This would allow testing without real vDPA hardware in CI
|
in the PCI tracker. This would allow testing without real vDPA hardware in CI
|
||||||
and is not intended for production use. This was declared out of scope to
|
and is not intended for production use. This was declared out of scope to
|
||||||
|
@ -121,30 +123,30 @@ Functional tests will be used instead to ensure adequate code coverage.
|
||||||
|
|
||||||
A new standard trait ``HW_NIC_VDPA`` could be reported by the
|
A new standard trait ``HW_NIC_VDPA`` could be reported by the
|
||||||
libvirt driver on hosts with vDPA devices, and the required version of QEMU
|
libvirt driver on hosts with vDPA devices, and the required version of QEMU
|
||||||
and libvirt. This would be used in combination with a new placement pre filter
|
and libvirt. This would be used in combination with a new placement prefilter
|
||||||
to append a required trait request to the unnamed group if any VM interface is
|
to append a required trait request to the unnamed group if any VM interface has
|
||||||
vnic-type ``vdpa``. This will not be done as it will not be required when
|
vnic-type ``vdpa``. This will not be done as it will not be required when
|
||||||
PCI devices are tracked in placement. Since standard traits cannot be removed
|
PCI devices are tracked in placement. Since standard traits cannot be removed,
|
||||||
no new trait will be added and the PCI passthrough filter will instead be used
|
no new trait will be added and the PCI passthrough filter will instead be used
|
||||||
to filter host on device type.
|
to filter host on device type.
|
||||||
|
|
||||||
Data model impact
|
Data model impact
|
||||||
-----------------
|
-----------------
|
||||||
|
|
||||||
The allowed values of the vnic-type in the nova.network.model.VIF class will
|
The allowed values of the vnic-type in the ``nova.network.model.VIF`` class
|
||||||
be extended. The allowed values of the pci_deivces table device_type will be
|
will be extended. The allowed values of the ``device_type`` column in the
|
||||||
extended.
|
``pci_devices`` table will be extended.
|
||||||
|
|
||||||
optionally the VIFMigrateData or LibvirtLiveMigrateData object can be extended
|
Optionally, the ``VIFMigrateData`` or ``LibvirtLiveMigrateData`` object can be
|
||||||
to denote the destination host support transparent vdpa live migration. This is
|
extended to denote the destination host support transparent vDPA live
|
||||||
optional as currently it would always be false until a QEMU and libvirt version
|
migration. This is optional as currently it would always be false until a QEMU
|
||||||
are released that support this feature.
|
and libvirt version are released that support this feature.
|
||||||
|
|
||||||
REST API impact
|
REST API impact
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
The neutron port-binding extension vnic-types has been extended
|
The neutron port-binding extension vnic-types has been extended
|
||||||
to add vnic-type ``vdpa``, no nova api changes are required.
|
to add vnic-type ``vdpa``. No nova API changes are required.
|
||||||
|
|
||||||
Security impact
|
Security impact
|
||||||
---------------
|
---------------
|
||||||
|
@ -159,29 +161,29 @@ None
|
||||||
Other end user impact
|
Other end user impact
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
vDPA ports will work like sriov ports from an enduser perspective
|
vDPA ports will work like SR-IOV ports from an end user perspective,
|
||||||
however the device model presented to the guest will be a VirtIO NIC
|
however the device model presented to the guest will be a virtio NIC
|
||||||
and live migration will initially be blocked until supported by QEMU.
|
and live migration will initially be blocked until supported by QEMU.
|
||||||
|
|
||||||
Performance Impact
|
Performance Impact
|
||||||
------------------
|
------------------
|
||||||
|
|
||||||
The performance will be the same as SR-IOV
|
The performance will be the same as SR-IOV
|
||||||
in terms of dataplane performance and nova scheduling or vm creation
|
in terms of dataplane performance and nova scheduling or VM creation
|
||||||
a.k.a. None.
|
a.k.a. None.
|
||||||
|
|
||||||
Other deployer impact
|
Other deployer impact
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
vDPA requires a very new kernel to use.
|
vDPA requires a very new kernel and very new versions of QEMU and libvirt to
|
||||||
initial support for vdpa was added in kernel 5.7
|
use. Initial support for vDPA was added in kernel 5.7, QEMU 5.1 and
|
||||||
requiring qemu 5.1 and libvirt 6.9.0 to function.
|
libvirt 6.9.0.
|
||||||
|
|
||||||
The operator will need to ensure all dependencies are present to use
|
The operator will need to ensure all dependencies are present to use
|
||||||
this feature. Intel NIC support is present in 5.7 but a the time of
|
this feature. Intel NIC support is present in 5.7 at the time of writing,
|
||||||
this spec no NIC that support vDPA is available on the market from intel.
|
no NIC that support vDPA is available on the market from Intel.
|
||||||
That means the first publicly available nics for vdpa are the mellanox/nvidia
|
The first publicly available NICs with vDPA capabilities are the
|
||||||
connectx-6 dx/lx which are only enabled in kernel 5.9
|
Mellanox/Nvidia ConnectX-6 DX/LX NICs, which are only enabled in kernel 5.9
|
||||||
|
|
||||||
Developer impact
|
Developer impact
|
||||||
----------------
|
----------------
|
||||||
|
@ -213,42 +215,43 @@ sean-k-mooney
|
||||||
Work Items
|
Work Items
|
||||||
----------
|
----------
|
||||||
|
|
||||||
- update libvirt driver
|
- Update libvirt driver
|
||||||
- add prefilter
|
- Add prefilter
|
||||||
- add docs
|
- Add docs
|
||||||
- update tests
|
- Update tests
|
||||||
|
|
||||||
Dependencies
|
Dependencies
|
||||||
============
|
============
|
||||||
|
|
||||||
libvirt 6.9.0+
|
- libvirt 6.9.0+
|
||||||
qemu 5.1+
|
- QEMU 5.1+
|
||||||
linux 5.7+
|
- Linux 5.7+
|
||||||
|
|
||||||
Testing
|
Testing
|
||||||
=======
|
=======
|
||||||
|
|
||||||
This will be tested primarly via unit and functional
|
This will be tested primarily via unit and functional tests,
|
||||||
tests however a tempest job using the vdpa sim module may be created
|
however a Tempest job using the vDPA sim module may be created
|
||||||
if it proves practical to do so. The main challenges to this are
|
if it proves practical to do so. The main challenges to this are
|
||||||
creating a stable testing environment with the required dependencies.
|
creating a stable testing environment with the required dependencies.
|
||||||
fedora rawhide has all the required depencies but ship with python 3.9
|
Fedora rawhide has all the required dependencies but ships with python 3.9.
|
||||||
openstack currently does not work properly under python 3.9
|
OpenStack currently does not work properly under python 3.9
|
||||||
|
|
||||||
Alternative test environments such as ubuntu 20.04 do not provide new enough
|
Alternative test environments such as Ubuntu 20.04 do not provide new enough
|
||||||
kernel by default or ship the require libvirt. compilation from source
|
kernel by default or do not ship the required libvirt and QEMU versions.
|
||||||
is an option but we may or may not want to do that in the upstream ci.
|
Compilation from source is an option but we may or may not want to do that in
|
||||||
|
the upstream CI.
|
||||||
|
|
||||||
Documentation Impact
|
Documentation Impact
|
||||||
====================
|
====================
|
||||||
|
|
||||||
The existing admin networking document will be extended to introduce vdpa
|
The existing admin networking document will be extended to introduce vDPA
|
||||||
and describe the requirement for use.
|
and describe the requirement for use.
|
||||||
|
|
||||||
References
|
References
|
||||||
==========
|
==========
|
||||||
|
|
||||||
The nova neutron ptg discussion on this topic can be found on line 186
|
The nova-neutron PTG discussion on this topic can be found on line 186
|
||||||
here: https://etherpad.opendev.org/p/r.321f34cf3eb9caa9d87a9ec8349c3d29
|
here: https://etherpad.opendev.org/p/r.321f34cf3eb9caa9d87a9ec8349c3d29
|
||||||
|
|
||||||
An introduction to this topic and is available as a blog at
|
An introduction to this topic and is available as a blog at
|
||||||
|
|
Loading…
Reference in New Issue