openstack/charm-nova-compute-nvidia-vgpu

Juju Charm - Nova Compute - plugin for NVidia vGPU

Go to file

Alex Kavanagh 9c4f136b9d Improve platform mocking Patch out charmhelpers.osplatform.get_platform() and charmhelpers.core.host.lsb_release() globally in the unit tests to insulate the unit tests from the platform that the unit tests are being run on. Change-Id: I5336677dd9e7fbfa50cc6507b0d5ac5ef52b0070		2023-10-24 18:10:48 +01:00
src	Filter mediated devices for 3D controllers	2022-05-19 14:29:54 +01:00
templates	Disable nouveau driver	2022-03-15 13:56:13 +01:00
tests	Add 2023.2 Bobcat support	2023-08-02 14:23:00 -04:00
unit_tests	Improve platform mocking	2023-10-24 18:10:48 +01:00
.gitignore	Refactor unit status output	2022-05-17 11:22:38 +01:00
.gitreview	Fix the .gitreview file project key	2023-04-19 19:20:55 +00:00
.stestr.conf	Initial Operator framework boilerplate	2021-11-12 11:05:28 +01:00
.zuul.yaml	Add Antelope support	2023-03-08 17:12:09 +00:00
actions.yaml	Add list-vgpu-types action	2022-03-15 13:56:13 +01:00
build-requirements.txt	Add charmcraft.yaml for publishing to Charmhub	2022-01-21 10:38:04 +01:00
charmcraft.yaml	Add 2023.2 Bobcat support	2023-08-02 14:23:00 -04:00
config.yaml	Install NVIDIA vGPU software	2021-11-25 17:24:58 +01:00
metadata.yaml	Add 2023.2 Bobcat support	2023-08-02 14:23:00 -04:00
osci.yaml	Add 2023.2 Bobcat support	2023-08-02 14:23:00 -04:00
pip.sh	Initial Operator framework boilerplate	2021-11-12 11:05:28 +01:00
README.md	Add documentation	2022-03-15 16:05:51 +01:00
rename.sh	Migrate test bundles to charmhub	2022-02-20 19:40:27 -07:00
requirements.txt	Add Antelope support	2023-03-08 17:12:09 +00:00
test-requirements.txt	Merge "Add Kinetic and Zed support"	2022-08-30 03:38:19 +00:00
tox.ini	Improve platform mocking	2023-10-24 18:10:48 +01:00

README.md

Overview

This subordinate charm provides the Nvidia vGPU support to the OpenStack Nova Compute service.

Usage

Deployment

We are assuming a pre-existing OpenStack deployment (Queens or newer).

Deploy nova-compute-nvidia-vgpu as a subordinate to the nova-compute charm:

juju deploy ch:nova-compute-nvidia-vgpu --channel=yoga/edge
juju add-relation nova-compute-nvidia-vgpu:nova-vgpu nova-compute:nova-vgpu

Pass the proprietary NVIDIA software package (510.47.03 or newer) as a resource to the charm:

juju attach nova-compute-nvidia-vgpu \
    nvidia-vgpu-software=./nvidia-vgpu-ubuntu-510_510.47.03_amd64.deb

Once the model settles, reboot the corresponding compute nodes:

juju run -a nova-compute-nvidia-vgpu -- sudo reboot

vGPU type definition

Each compute node has one or several physical GPUs. Each physical GPU can then be divided into one or several virtual GPUs of a given type. Virtual GPUs will later be claimed by and exposed to guests upon guest creation. Start by listing the available vGPU types for each physical GPU:

juju run-action nova-compute-nvidia-vgpu/0 list-vgpu-types --wait
[...]
    nvidia-256, 0000:41:00.0, GRID RTX6000-1Q, num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
    nvidia-257, 0000:41:00.0, GRID RTX6000-2Q, num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12
    nvidia-258, 0000:41:00.0, GRID RTX6000-3Q, num_heads=4, frl_config=60, framebuffer=3072M, max_resolution=7680x4320, max_instance=8
    nvidia-259, 0000:41:00.0, GRID RTX6000-4Q, num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=7680x4320, max_instance=6
[...]
    nvidia-105, 0000:c1:00.0, GRID V100-1Q, num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=16
    nvidia-106, 0000:c1:00.0, GRID V100-2Q, num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=8
    nvidia-107, 0000:c1:00.0, GRID V100-4Q, num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=7680x4320, max_instance=4
    nvidia-108, 0000:c1:00.0, GRID V100-8Q, num_heads=4, frl_config=60, framebuffer=8192M, max_resolution=7680x4320, max_instance=2
[...]

As we can see, nova-compute-nvidia-vgpu/0 has two physical GPUs: 0000:41:00.0 and 0000:c1:00.0. By selecting the vGPU type nvidia-108 on 0000:c1:00.0, two vGPUs will be available for future guests:

juju config nova-compute-nvidia-vgpu vgpu-device-mappings="{'nvidia-108': ['0000:c1:00.0']}"

Note

: on releases older than Stein, only one vGPU type can be selected accross all available physical GPUs. Starting from Stein each physical GPU can be assigned a different vGPU type.

On OpenStack Stein and newer, once the model has settled, these vGPUs can be listed via the OpenStack CLI:

openstack resource provider list
+--------------------------------------+-----------------------------------+------------+--------------------------------------+--------------------------------------+
| uuid                                 | name                              | generation | root_provider_uuid                   | parent_provider_uuid                 |
+--------------------------------------+-----------------------------------+------------+--------------------------------------+--------------------------------------+
| 0883c2b5-bad2-4abc-a179-e33344361475 | node-sparky.maas                  |          2 | 0883c2b5-bad2-4abc-a179-e33344361475 | None                                 |
| 4b0dbc58-0c85-4a80-8dd6-d43d1bd6ec53 | node-sparky.maas_pci_0000_c1_00_0 |          1 | 0883c2b5-bad2-4abc-a179-e33344361475 | 0883c2b5-bad2-4abc-a179-e33344361475 |
+--------------------------------------+-----------------------------------+------------+--------------------------------------+--------------------------------------+

openstack resource provider inventory list 4b0dbc58-0c85-4a80-8dd6-d43d1bd6ec53
+----------------+------------------+----------+----------+----------+-----------+-------+------+
| resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total | used |
+----------------+------------------+----------+----------+----------+-----------+-------+------+
| VGPU           |              1.0 |        1 |        2 |        0 |         1 |     2 |    0 |
+----------------+------------------+----------+----------+----------+-----------+-------+------+

Nova flavor definition

In order to expose a vGPU of the type defined earlier to any guest created with the m1.small flavor, create a new trait and assign it to the flavor:

openstack --os-placement-api-version 1.6 trait create CUSTOM_NVIDIA_108
openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_NVIDIA_108 4b0dbc58-0c85-4a80-8dd6-d43d1bd6ec53
openstack flavor set m1.small --property resources:VGPU=1 --property trait:CUSTOM_NVIDIA_108=required

Note

: on releases older than Stein, since there is only one vGPU type and it doesn't show up in the resource provider list, no trait can be created. The flavor can simply be modified with openstack flavor set m1.small --property resources:VGPU=1

After creating an instance of this flavor, the resource provider inventory list will show one vGPU being used:

openstack server create --flavor m1.small ...
openstack resource provider inventory list 4b0dbc58-0c85-4a80-8dd6-d43d1bd6ec53
+----------------+------------------+----------+----------+----------+-----------+-------+------+
| resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total | used |
+----------------+------------------+----------+----------+----------+-----------+-------+------+
| VGPU           |              1.0 |        1 |        2 |        0 |         1 |     2 |    1 |
+----------------+------------------+----------+----------+----------+-----------+-------+------+

Bugs

Please report bugs on Launchpad.

For general questions please refer to the OpenStack Charm Guide.