bd3da5d763
Change-Id: I989fa12f115075c27b29b4863cbb5240abfb5978
72 lines
3.7 KiB
YAML
72 lines
3.7 KiB
YAML
---
|
|
features:
|
|
- |
|
|
The libvirt driver now supports booting instances by asking for virtual
|
|
GPUs.
|
|
In order to support that, the operators should specify the enabled vGPU
|
|
types in the nova-compute configuration file by using the configuration
|
|
option ``[devices]/enabled_vgpu_types``. Only the enabled vGPU types can be
|
|
used by instances.
|
|
|
|
For knowing which types the physical GPU driver supports for libvirt, the
|
|
operator can look at the sysfs by doing::
|
|
|
|
ls /sys/class/mdev_bus/<device>/mdev_supported_types
|
|
|
|
Operators can specify a VGPU resource in a flavor by adding in the flavor's
|
|
extra specs::
|
|
|
|
nova flavor-key <flavor-id> set resources:VGPU=1
|
|
|
|
That said, Nova currently has some caveats for using vGPUs.
|
|
|
|
* For the moment, only a single type can be supported across one compute
|
|
node, which means that libvirt will create the vGPU by using that
|
|
specific type only. It's also possible to have two compute nodes having
|
|
different types but there is no possibility yet to specify in the flavor
|
|
which specific type we want to use for that instance.
|
|
|
|
* Suspending a guest having vGPUs doesn't work yet given a libvirt concern
|
|
(it can't hot-unplug mediated devices from a guest). Workarounds using
|
|
other instance actions (like snapshotting the instance or shelving it)
|
|
are recommended until libvirt supports that. If a user asks to suspend
|
|
the instance, Nova will get an exception that will set the instance state
|
|
back to ``ACTIVE``, and you can see the suspend action in
|
|
``os-instance-action`` API will be Error.
|
|
|
|
* Resizing an instance with a new flavor that has vGPU resources doesn't
|
|
allocate those vGPUs to the instance (the instance is created without
|
|
vGPU resources). We propose to work around this problem by rebuilding the
|
|
instance once it has been resized so then it will have allocated vGPUs.
|
|
|
|
* Migrating an instance to another host will have the same problem as
|
|
resize. In case you want to migrate an instance, make sure to rebuild
|
|
it.
|
|
|
|
* Rescuing an instance having vGPUs will mean that the rescue image won't
|
|
use the existing vGPUs. When unrescuing, it will use again the existing
|
|
vGPUs that were allocated to the instance. That said, given Nova looks
|
|
at all the allocated vGPUs when trying to find unallocated ones, there
|
|
could be a race condition if an instance is rescued at the moment a new
|
|
instance asking for vGPUs is created, because both instances could use
|
|
the same vGPUs. If you want to rescue an instance, make sure to disable
|
|
the host until we fix that in Nova.
|
|
|
|
* Mediated devices that are created by the libvirt driver are not persisted
|
|
upon reboot. Consequently, a guest startup would fail since the virtual
|
|
device wouldn't exist. In order to prevent that issue, when restarting
|
|
the compute service, the libvirt driver now looks at all the guest XMLs
|
|
to check if they have mediated devices, and if the mediated device no
|
|
longer exists, then Nova recreates it by using the same UUID.
|
|
|
|
* If you use NVIDIA GRID cards, please know that there is a limitation with
|
|
the NVIDIA driver that prevents one guest to have more than one virtual
|
|
GPU from the same physical card. One guest can have two or more virtual
|
|
GPUs but then it requires each vGPU to be hosted by a separate physical
|
|
card. Until that limitation is removed, please avoid creating flavors
|
|
asking for more than one vGPU.
|
|
|
|
We are working actively to remove or workaround those caveats, but please
|
|
understand that for the moment this feature is experimental given all the
|
|
above.
|