nova/releasenotes/notes/add-support-for-vgpu-libvir...

72 lines
3.7 KiB
YAML

---
features:
- |
The libvirt driver now supports booting instances by asking for virtual
GPUs.
In order to support that, the operators should specify the enabled vGPU
types in the nova-compute configuration file by using the configuration
option ``[devices]/enabled_vgpu_types``. Only the enabled vGPU types can be
used by instances.
For knowing which types the physical GPU driver supports for libvirt, the
operator can look at the sysfs by doing::
ls /sys/class/mdev_bus/<device>/mdev_supported_types
Operators can specify a VGPU resource in a flavor by adding in the flavor's
extra specs::
nova flavor-key <flavor-id> set resources:VGPU=1
That said, Nova currently has some caveats for using vGPUs.
* For the moment, only a single type can be supported across one compute
node, which means that libvirt will create the vGPU by using that
specific type only. It's also possible to have two compute nodes having
different types but there is no possibility yet to specify in the flavor
which specific type we want to use for that instance.
* Suspending a guest having vGPUs doesn't work yet given a libvirt concern
(it can't hot-unplug mediated devices from a guest). Workarounds using
other instance actions (like snapshotting the instance or shelving it)
are recommended until libvirt supports that. If a user asks to suspend
the instance, Nova will get an exception that will set the instance state
back to ``ACTIVE``, and you can see the suspend action in
``os-instance-action`` API will be Error.
* Resizing an instance with a new flavor that has vGPU resources doesn't
allocate those vGPUs to the instance (the instance is created without
vGPU resources). We propose to work around this problem by rebuilding the
instance once it has been resized so then it will have allocated vGPUs.
* Migrating an instance to another host will have the same problem as
resize. In case you want to migrate an instance, make sure to rebuild
it.
* Rescuing an instance having vGPUs will mean that the rescue image won't
use the existing vGPUs. When unrescuing, it will use again the existing
vGPUs that were allocated to the instance. That said, given Nova looks
at all the allocated vGPUs when trying to find unallocated ones, there
could be a race condition if an instance is rescued at the moment a new
instance asking for vGPUs is created, because both instances could use
the same vGPUs. If you want to rescue an instance, make sure to disable
the host until we fix that in Nova.
* Mediated devices that are created by the libvirt driver are not persisted
upon reboot. Consequently, a guest startup would fail since the virtual
device wouldn't exist. In order to prevent that issue, when restarting
the compute service, the libvirt driver now looks at all the guest XMLs
to check if they have mediated devices, and if the mediated device no
longer exists, then Nova recreates it by using the same UUID.
* If you use NVIDIA GRID cards, please know that there is a limitation with
the NVIDIA driver that prevents one guest to have more than one virtual
GPU from the same physical card. One guest can have two or more virtual
GPUs but then it requires each vGPU to be hosted by a separate physical
card. Until that limitation is removed, please avoid creating flavors
asking for more than one vGPU.
We are working actively to remove or workaround those caveats, but please
understand that for the moment this feature is experimental given all the
above.