From c6a96a17db76b8ab00af3f68cbc71eb6028a2801 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ren=C3=A9=20Ribaud?= Date: Wed, 12 Mar 2025 21:01:24 +0100 Subject: [PATCH] FUP Update pci-passthrough and virtual-gpu documentation This patch adds the necessary documentation identified in: - pci-passthrough: Explaining live migration and known issues. - virtual-gpu: Updating the caveats section to clarify what to do when VF devices are available instead of `mdev`. The target goal of these series of patch is to enable VFIO devices migration with kernel variant drivers. Implements: blueprint migrate-vfio-devices-using-kernel-variant-drivers Change-Id: I41271a8af5687fb1d18f9d0852492756e096720d --- doc/source/admin/pci-passthrough.rst | 79 ++++++++++++++++++++++++++-- doc/source/admin/virtual-gpu.rst | 13 +++++ 2 files changed, 89 insertions(+), 3 deletions(-) diff --git a/doc/source/admin/pci-passthrough.rst b/doc/source/admin/pci-passthrough.rst index 0f82a227a8a7..d0c35610961e 100644 --- a/doc/source/admin/pci-passthrough.rst +++ b/doc/source/admin/pci-passthrough.rst @@ -70,9 +70,14 @@ capabilities. based PCI requests. This support is disable by default. .. versionchanged:: 31.0.0 (2025.1 Epoxy): - Add managed tag to define if the PCI device is managed by libvirt. - This is required to support SR-IOV devices using the new kernel variant - driver interface. + + * Add managed tag to define if the PCI device is managed (attached/detached + from the host) by libvirt. This is required to support SR-IOV devices + using the new kernel variant driver interface. + * Add a live_migratable tag to define whether a PCI device supports live + migration. + * Add a live_migratable tag to alias definitions to allow requesting either + a live-migratable or non-live-migratable device. Enabling PCI passthrough ------------------------ @@ -527,6 +532,53 @@ Examples: device_spec = { "vendor_id": "10de", "product_id": "25b6", "address": "0000:25:00.5", "resource_class": "CUSTOM_A16_8A", "managed": "no" } alias = { "device_type": "type-VF", resource_class: "CUSTOM_A16_16A", "name": "A16_16A" } + +Configuring Live Migration for PCI devices +------------------------------------------ + +Live migration of instances with PCI devices requires specific configuration +at both the device and alias levels to ensure that the migration can succeed. +This section explains how to configure PCI passthrough to support live +migration. + +Configuring PCI Device Specification +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Administrators must explicitly define whether a PCI device support live +migration. +This is done by adding the ``live_migratable`` attribute to the device +specification in the :oslo.config:option:`pci.device_spec` configuration. + +.. note:: + + Of course, this requires hardware support, as well as proper system + and hypervisor configuration. + +Example Configuration: + +.. code-block:: ini + + [pci] + dev_spec = {'vendor_id': '8086', 'product_id': '1515', 'live_migratable': 'yes'} + dev_spec = {'vendor_id': '8086', 'product_id': '1516', 'live_migratable': 'no'} + +Configuring PCI Aliases for Users +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +PCI devices can be requested through flavor exta_specs.. To request a live +migratable PCI device, the PCI alias definition in +the :oslo.config:option:`pci.alias` configuration must include +the ``live_migratable`` key. + +Example Configuration: + +.. code-block:: ini + + [pci] + alias = {'name': 'vf_live', 'vendor_id': '8086', 'product_id': '1515', 'device_type': 'type-VF', 'live_migratable': 'yes'} + alias = {'name': 'vf_no_migrate', 'vendor_id': '8086', 'product_id': '1516', 'device_type': 'type-VF', 'live_migratable': 'no'} + + Virtual IOMMU support --------------------- @@ -583,3 +635,24 @@ For the viommu attributes: ``aw_bits`` is driver attribute defined in `Libvirt IOMMU Domain`_. .. _`Libvirt IOMMU Domain`: https://libvirt.org/formatdomain.html#iommu-devices + +Known Issues +------------ + +A known issue exists where the ``live_migratable`` flag is ignored for +devices that include the ``physical_network`` tag. +As a result, instances using such devices do not behave as non-live +migratable, and instead, they continue to migrate using the legacy VIF +unplug/live migrate/VIF plug procedure. + +Example configuration where the live_migratable flag is ignored: + +.. code-block:: ini + + [pci] + device_spec = { "vendor_id":"8086", "product_id":"10ca", "address": "0000:06:", "physical_network": "physnet2", "live_migratable": false} + +A fix for this issue is planned in a follow-up for the **Epoxy** release. +The upstream bug report is `here`__. + +.. __: https://bugs.launchpad.net/nova/+bug/2102161 diff --git a/doc/source/admin/virtual-gpu.rst b/doc/source/admin/virtual-gpu.rst index 2acc2dcc7e6c..0124795ff865 100644 --- a/doc/source/admin/virtual-gpu.rst +++ b/doc/source/admin/virtual-gpu.rst @@ -353,6 +353,18 @@ Caveats This information is correct as of the 17.0.0 Queens release. Where improvements have been made or issues fixed, they are noted per item. +* After installing the NVIDIA driver on compute nodes, if ``mdev`` are not + visible but VF devices are present under a path like + ``/sys/bus/pci/devices/0000:25:00.4/nvidia``, this indicates that the + **kernel variant driver** is in use. + + This most likely occurs on **Ubuntu Noble** or **RHEL 10**. + + .. versionchanged:: 31.0.0 + + Please refer to the `PCI passthrough documentation`_ for proper + configuration. + * When live-migrating an instance using vGPUs, the libvirt guest domain XML isn't updated with the new mediated device UUID to use for the target. @@ -451,3 +463,4 @@ For nested vGPUs: .. _Intel GVT-g: https://01.org/igvt-g .. _NVIDIA GRID vGPU: http://docs.nvidia.com/grid/5.0/pdf/grid-vgpu-user-guide.pdf .. _osc-placement plugin: https://docs.openstack.org/osc-placement/latest/index.html +.. _PCI passthrough documentation: https://docs.openstack.org/nova/latest/admin/pci-passthrough.html