From eb7650805057fc045a3ff8c42c1915f0da5e3db1 Mon Sep 17 00:00:00 2001 From: songwenping Date: Sat, 11 Mar 2023 16:49:03 +0800 Subject: [PATCH] Maintain specs of approved and completed of Antelop version Change-Id: I821474c6e8c19765b68cc35d5a4822c8c89e9919 --- .../2023.1/approved/attribute-api-support.rst | 301 ++++++++++++++++ .../2023.1/approved/disable-enable-device.rst | 221 ++++++++++++ .../approved/pmem-namespace-support.rst | 195 ++++++++++ .../implemented/vgpu-driver-proposal.rst | 337 ++++++++++++++++++ 4 files changed, 1054 insertions(+) create mode 100644 specs/2023.1/approved/attribute-api-support.rst create mode 100644 specs/2023.1/approved/disable-enable-device.rst create mode 100644 specs/2023.1/approved/pmem-namespace-support.rst create mode 100644 specs/2023.1/implemented/vgpu-driver-proposal.rst diff --git a/specs/2023.1/approved/attribute-api-support.rst b/specs/2023.1/approved/attribute-api-support.rst new file mode 100644 index 0000000..001dedf --- /dev/null +++ b/specs/2023.1/approved/attribute-api-support.rst @@ -0,0 +1,301 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +===================== +Support attribute API +===================== + +This spec adds a new group of APIs to manage the lifecycle of accelerator's +attributes. + +Problem description +=================== + +Attribute is designed for describing customized information of an accelerator. +Now they are generated by drivers, users can not add/delete/update them, it's +not applicable to our scenarios now. + +Use Cases +--------- + +An admin or operator needs a group of APIs to manage his accelerator's +attributes. +Here are some useful scenarios: + +* For a NIC accelerator, we need to add a phys_net attribute, it's should be + created by deployer or other components. +* For some Function Volatile Accelerators, we can create the Function name as + an attribute. +* Also for some information, such as Function_UUID is machine readable. + +Proposed change +=============== +None + +Alternatives +------------ + +None + +Data model impact +----------------- + +* Add attribute object to deployable object. + +REST API impact +--------------- + +URL: ``/v2/deployable/{uuid}/attribute`` + +METHOD: ``GET`` + + List all attributes of specified deployable. + +Normal response code (200) and body:: + + { + "attributes":[{ + "key":"key1", + "value":"value1", + "uuid":"uuid1" + } + ] + } + +Error response code and body: + +* 401 (Unauthorized): Unauthorized + +* 403 (Forbidden): RBAC check failed + +* No response body + + +URL: ``/v2/deployable/{uuid}/attribute/{uuid_or_key}`` + +METHOD: ``GET`` + + GET specified attribute of specified deployable. + +Query Parameters: None + +Normal response code (200) and body:: + + { + "attribute": + { + "key":"key1", + "value":"value1", + "uuid":"uuid1", + "created_at":"2020-05-28T03:03:20", + "updated_at":"2020-05-28T03:03:20" + } + } + +Error response code and body: + +* 401 (Unauthorized): Unauthorized + +* 403 (Forbidden): RBAC check failed + +* 404 (NotFound): No deployable of that UUID or no attribute of that UUID + exists + +* No response body + + +URL: ``/v2/deployable/{uuid}/attribute`` + +METHOD: ``POST`` + + Create one or more deployable attribute(s). + +Request body:: + + [ + { + "key": "key1", + "value": "value1" + }, + { + "key": "key2", + "value": "value2" + }, + ... + ] + +Normal response code and body: + +* 204 (No content) + +* No response body + +Error response code: + +* 401 (Unauthorized): Unauthorized + +* 403 (Forbidden): RBAC check failed + +* 409 (Conflict): Bad input or key is not unique + +Error response body:: + + {"error": "error-string"} + + +URL: ``/v2/deployable/{uuid}/attribute/{uuid_or_key}`` + +METHOD: ``DELETE`` + + Delete an exist deployable attribute. + +Query Parameters: None + +Normal response code and body: + +* 204 (No content) + +* No response body + +Error response code: + +* 401 (Unauthorized): Unauthorized + +* 403 (Forbidden): RBAC check failed + +* 404 (NotFound): No deployable of that UUID or no attribute of that UUID + exists + +Error response body:: + + {"error": "error-string"} + + +URL: ``/v2/deployable/{uuid}/attribute`` + +METHOD: ``DELETE`` + + Delete all attributes of a deployable. + +Query Parameters: None + +Normal response code and body: + +* 204 (No content) + +* No response body + +Error response code: + +* 401 (Unauthorized): Unauthorized + +* 403 (Forbidden): RBAC check failed + +Error response body:: + + {"error": "error-string"} + + +URL: ``/v2/deployable/{uuid}/attribute/{uuid_or_key}`` + +METHOD: ``PUT`` + + Update an exist deployable attribute. + +Query Parameters: None + +Request body (Value of deployable attribute):: + + {"value": "value1"} + +Normal response code and body: + +* 204 (No content) + +* No response body + +Error response code and body: + +* 401 (Unauthorized): Unauthorized + +* 403 (Forbidden): RBAC check failed + +* 404 (NotFound): No deployable of that UUID or no attribute of that UUID + exists + +Error response body:: + + {"error": "error-string"} + +Security impact +--------------- +None + +Notifications impact +-------------------- +None + +Other end user impact +--------------------- +* Change Cyborg Attribute table. + + +Performance Impact +------------------ +None + +Other deployer impact +--------------------- +None + +Developer impact +---------------- +* If the user want to use these feature, they should upgrade their Cyborg +* project to latest to support these changes. + +Implementation +============== + +Assignee(s) +----------- +Primary assignee: + hejunli + +Work Items +---------- + +* Change Cyborg REST APIs. +* Change Cyborg Attribute table. +* Change Cyborg deployable object. +* Change cyborgclient to support Attribute management action. +* Add related tests. + +Dependencies +============ +None + +Testing +======= +Appropriate unit and functional tests should be added. + +Documentation Impact +==================== +* Need a documentation to record microversion history. +* Need a documentaiton to explain api usage. + +References +========== +None + +History +======= +.. list-table:: Revisions + :header-rows: 1 + + * - Release Name + - Description + * - Antelope + - Introduced diff --git a/specs/2023.1/approved/disable-enable-device.rst b/specs/2023.1/approved/disable-enable-device.rst new file mode 100644 index 0000000..8027dd7 --- /dev/null +++ b/specs/2023.1/approved/disable-enable-device.rst @@ -0,0 +1,221 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +============================= +Add disable/enable device API +============================= + +https://blueprints.launchpad.net/openstack-cyborg/+spec/disable-enable-device + +Nowadays, Cyborg discovers the device on compute node by each driver. All +devices matching the spec of driver are discovered and reported to the +Placement service as an accelerator resources. +This spec proposes a set of new APIs which allow admin users to +disable/enable a device. + + +Problem description +=================== + +Cyborg maintains a configuration file to configure the enabled drivers. Once +the driver is enabled, the agent will discover all devices whose vendor ID, +device ID match the driver's requirement. If admin user do not want all devices +to be used by virtual machine, there is no way to disable a device currently. + + +Use Cases +--------- +* Alice is an admin user, she wants some FPGAs to be reserved for its own use + and not allow them to be allocated to a VM at the time. For example, she + wants to program the FPGA device and use it as the OVS agent running on + the host. + +Proposed change +=============== +We propose to add new API in order to enable/disable a device. If the device is +disabled, Cyborg will report this device as a reserved resource to Placement, +so that Nova can not schedule to this device. On the contrary, if the device is +enabled, the device should become available and the 'reserved' field in +Placement shoule be set to 0. +* Since the API layer is modified, a new microversion should be introduced. +* It also need a new field "is_maintaining" in Device object and data model to +indicate whether the device is disbaled. If one device is disabled, the +"is_maintaining" field should be set to "True", and if the device is enabled, +the field should be set to "False". The default value should be "False". +* Cyborg need call Placement API to update the "reserved" field for the +device in this API. +* Add "is_maintaining" field's value check during conductor's periodic report. + +Alternatives +------------ +None + +Data model impact +----------------- +A new column `is_maintaining` should be added in Device's data model. + + +REST API impact +--------------- +A microversion need to be introduced since the Device API changed. + +List Device API +^^^^^^^^^^^^^^^ +* Return a device list + URL: ``/devices`` + METHOD: ``GET`` + Return: 200 + +.. code-block:: + + { + "devices": [ + { + "uuid": "d2446439-0142-40b7-9eee-82d855f453d9", + "type": "FPGA", + "vendor": "0xABCD", + "model": "miss model info", + "std_board_info": "{"device_id": "0xabcd", "class": "Fake class"}", + "vendor_board_info": "fake_vendor_info", + "hostname": "devstack01", + "links": [ + { + "href": "http://172.23.97.140/accelerator/v2/devices/d2446439-0142-40b7-9eee-82d855f453d9", + "rel": "self" + } + ], + "created_at": "2021-11-03T08:48:43+00:00", + "updated_at": null + } + ] + } + +Get Device API +^^^^^^^^^^^^^^ +* Get a device by uuid and return the details + URL: ``/devices/{uuid}`` + METHOD: ``GET`` + Return: 200 + +.. code-block:: + + { + "uuid": "d2446439-0142-40b7-9eee-82d855f453d9", + "type": "FPGA", + "vendor": "0xABCD", + "model": "miss model info", + "std_board_info": "{"device_id": "0xabcd", "class": "Fake class"}", + "vendor_board_info": "fake_vendor_info", + "hostname": "devstack01", + "links": [ + { + "href": "http://172.23.97.140/accelerator/v2/devices/d2446439-0142-40b7-9eee-82d855f453d9", + "rel": "self" + } + ], + "created_at": "2021-11-03T08:48:43+00:00", + "updated_at": null + } + + +Disable Device API +^^^^^^^^^^^^^^^^^^ +* Disable a device + URL: ``/devices/disable/{device_uuid}`` + METHOD: ``POST`` + Return: 200 + Error Code: 404(the device is not found),403(the role is not admin) + +Enable Device API +^^^^^^^^^^^^^^^^^ +* Enable a device + URL: ``/devices/enable/{device_uuid}`` + METHOD: ``POST`` + Return: 200 + Error Code: 404(the device is not found),403(the role is not admin) + +Security impact +--------------- +None + +Notifications impact +-------------------- +None + +Other end user impact +--------------------- +None + +Performance Impact +------------------ +None + +Other deployer impact +--------------------- +The deployer need update Cyborg to the microversion which supports +disable/enable API. Otherwise the disable/enable API will be rejected. + +Developer impact +---------------- +None + + +Implementation +============== + +Assignee(s) +----------- +Primary assignee: + Xinran Wang(xin-ran.wang@intel.com) + +Work Items +---------- +* Add new column `is_maintaining` for device table. +* Add disable/enable API in DeviceController. +* Update the RP `reserved` field according to the operation. For `disable` + oparation, the `reserved` field need be set by the same value as the + `total` field, and for `enable` operation, the `reserved` field will be set + to zero. +* Update GET/LIST device API with `is_maintaining` field added in returned + value. +* Add disable/enable operation in cyborgclient. +* Add unit tests. + +Dependencies +============ +None + + +Testing +======= +Need add unit test, and tempest test if needed. + + +Documentation Impact +==================== +Need add related docs. + +References +========== +None + + +History +======= + +Optional section intended to be used each time the spec is updated to describe +new design, API or any database schema updated. Useful to let reader understand +what's happened along the time. + +.. list-table:: Revisions + :header-rows: 1 + + * - Release Name + - Description + * - Xena + - Introduced + * - Yoga + - Reproposed diff --git a/specs/2023.1/approved/pmem-namespace-support.rst b/specs/2023.1/approved/pmem-namespace-support.rst new file mode 100644 index 0000000..6839132 --- /dev/null +++ b/specs/2023.1/approved/pmem-namespace-support.rst @@ -0,0 +1,195 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +================================= +Cyborg Intel PMEM Driver Proposal +================================= + +https://blueprints.launchpad.net/openstack-cyborg/+spec/add-pmem-driver + +This spec proposes to provide the initial design for Cyborg's Intel PMEM +driver. + +Problem description +=================== + +This spec will add Intel PMEM driver for Cyborg to manage specific Intel +PMEM devices. + +PMEM devices can be used as a large pool of low latency high bandwidth memory +where they could store data for computation. This can improve the performance +of the instance. + +PMEM must be partitioned into PMEM namespaces [1]_ for applications to use. +This vPMEM feature only uses PMEM namespaces in devdax mode as QEMU vPMEM +backends [2]_. If you want to dive into related notions, the document NVDIMM +Linux kernel document [3]_ is recommended. + +Starting in the 20.0.0 (Train) release, the virtual persistent memory (vPMEM) +feature in Nova allows a deployment using the libvirt compute driver to provide +vPMEMs for instances using physical persistent memory (PMEM) that can provide +virtual devices [4]_. + +Use Cases +--------- +* As an operator, I would like to use Cyborg agent managing PMEM resource + and checking periodically, the Cyborg Intel PMEM driver should provide + ``discover()`` function to enumerate the list of the Intel PMEM devices, + and report the details of all available Intel PMEM accelerators on the + host, such as PID(Product id), VID(Vendor id), Device ID. + +* As a user, I would like to boot up a VM with Intel PMEM Device attached in + order to accelerate compute ability. Cyborg should be able to manage this + kind of acceleration resources and assign it to the VM(binding). + +Proposed change +=============== +1. In general, the goal is to develop a Intel PMEM Device driver that supports +discover interfaces for Intel PMEM accelerator framework. The driver should +include the ``discover()`` function. This function works excuting "ndctl list" +command that reports devices' raw info sample as following:: + + [ + { + "vendor": "8086", + "product": "ns200_0", + "device": "dax0.0" + } + ] + +2. Generate Cyborg specific driver objects and resource provider modeling +for the PMEM device. Below is the objects to describe a PMEM devices which +complies with the Cyborg database mode and Placement data model. + +:: + + Hardware Driver objects Placement data model + | | | + 1 PMEM 1 device | + | | | + | 1 deployable ---> resource_provider + | | ---> parent resource_provider: compute node + | | | + n Namespace n attach_handle ---> inventories(total:n) + +3. Need add the "enable_driver=intel_pmem_driver" in the Cyborg Agent + configure file. + +4. Need add the "pmem_namespaces=$LABEL:$NSNAME|$NSNAME,$LABEL:$NSNAME|$NSNAME" + in the Cyborg Agent configure file as: + "pmem_namespaces = 6GB:ns0|ns1|ns2,LARGE:ns3" + +5. Resource class follows standard resources classes as: + "CUSTOM_PMEM_NAMESPACE_$LABEL" + +6. Traits follows the placement custom trait format. In the Cyborg driver, it + will report two traits for PMEM accelerator using the format below: + trait1:"CUSTOM_PMEM_NAMESPACE_$LABEL1" + trait2:"CUSTOM_PMEM_NAMESPACE_$LABEL2" + + +7. Before cyborg discover the namespaces, they should be created. How to create + the namespce can reference [5]_ and [6]_. + +Alternatives +------------ + +None + +Data model impact +----------------- + +Need add new type such as PMEM in devices and attach_handle tables. + +REST API impact +--------------- + +None. + +Security impact +--------------- + +None + +Notifications impact +-------------------- + +None + +Other end user impact +--------------------- + +User can manage Intel PMEM Device by Cyborg Intel PMEM driver. Such as list +of the Intel PMEM devices, report the details of all available Intel PMEM +accelerators on the host, binding with Intel PMEM and so on. + +Performance Impact +------------------ + +None + +Other deployer impact +--------------------- + +None. + +Developer impact +---------------- + +None + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + qiujunting(qiujunting@inspur.com) + +Work Items +---------- + +* Implement Intel PMEM driver in Cyborg +* Add related test cases. + + +Dependencies +============ + +None + +Testing +======== + +* Unit tests will be added to test this driver. + +Documentation Impact +==================== + +Document Intel PMEM driver in Cyborg project. +Add test report in cyborg wiki. + +References +========== +.. [1] https://pmem.io/ndctl/ndctl-create-namespace.html +.. [2] https://github.com/qemu/qemu/blob/19b599f7664b2ebfd0f405fb79c14dd241557452/docs/nvdimm.txt#L145 +.. [3] https://www.kernel.org/doc/Documentation/nvdimm/nvdimm.txt +.. [4] https://docs.openstack.org/nova/latest/admin/virtual-persistent-memory.html +.. [5] https://docs.openstack.org/nova/latest/admin/virtual-persistent-memory.html#configure-pmem-namespaces-compute +.. [6] https://pmem.io/ndctl/ndctl-create-namespace.html + +History +======= + +.. list-table:: Revisions + :header-rows: 1 + + * - Release + - Description + * - Yoga + - Introduced + diff --git a/specs/2023.1/implemented/vgpu-driver-proposal.rst b/specs/2023.1/implemented/vgpu-driver-proposal.rst new file mode 100644 index 0000000..87eadb2 --- /dev/null +++ b/specs/2023.1/implemented/vgpu-driver-proposal.rst @@ -0,0 +1,337 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +================================================ +Cyborg NVIDIA GPU Driver support vGPU management +================================================ + +The Cyborg NVIDIA GPU Driver has implemented pGPU management in the Train +release, this spec proposes the specification of supporting vGPU management +in the same driver. + +Problem description +=================== + +GPU devices can provide supercomputing capabilities, and can replace the CPU +to provide users with more efficient computing power at a lower cost. GPU cloud +servers have great value in the following application scenarios, including: +video encoding and decoding, scientific research and artificial intelligence +(deep learning, machine learning). + +In the OpenStack ecosystem, users can now use Nova to pass gpu resources to +guest by two methods: + +* Pass the GPU hardware to the guest (PCI pass-through). + +* Pass the Mediated Device(vGPU) to the guest. + +With the long-term goal that Cyborg will manage heterogeneous accelerators +including GPUs, Cyborg needs to support GPU management and integrate with Nova +to provide users with gpu resources allocation in the aforementioned methods. +The existing Cyborg GPU driver, NVIDIA GPU Driver, has supported the first +method (PCI pass-through), while the second method is not yet supported. +Please see ref [1]_ for Nova-Cyborg vGPU integration spec. + +Use Cases +--------- + +* When the user is using Cyborg to manage GPU devices, he/she wants to boot + up a VM with Nvidia GPU (pGPU or vGPU) attached in order to accelerate the + video coding and decoding, Cyborg should be able to manage this kind of + acceleration resources and to assign it to the VM(binding). + +Proposed changes +================ + +To be clear, in the following, we will describe the whole process of how does +the NVIDIA GPU Driver discover, generate Cyborg specific driver objects of the +vGPU devices(comply with Cyborg Database Model), and report it to cyborg-db +and Placement by cyborg-conductor. Features that are aleady supported in +current branch is marked as DONE, new changes are marked as NEW CHANGES. + +1. Collect raw info of GPU devices from compute node by "lspci" and grep +nvidia related keyword.(DONE) + +2. Parsing details from each record including ``vendor_id``, ``product_id`` +and ``pci_address``.(DONE) + +3. Generate Cyborg specific driver objects and resource provider modeling +for the GPU device as well as its mdiated devices. Below is the objects to +describe a vGPU devices which complies with the Cyborg database mode [4]_ +and placement data model [5]_.(NEW CHANGE) + +:: + + Hardware Driver objects Placement data model + | | | + 1 GPU 1 device | + | | | + | 1 deployable ---> resource_provider + | | ---> parent resource_provider: compute node + | | | + 4 vGPUs 4 attach_handles ---> inventories(total:4) + +4. Supporting set the vGPU type for a specific GPU device in cyborg.conf. The +implementation is similar to that in Nova [9]_.(NEW CHANGE) + +* Firstly, we propose [gpu_devices]/enabled_vgpu_types to define which vgpu + type Cyborg driver can use: + + :: + + [gpu_devices] + enabled_vgpu_types = [str_vgpu_type_1, str_vgpu_type_2, ...] + +* And also, we propose that Cyborg driver will accept configuration sections + that are related to the [gpu_devices]/enabled_vgpu_types and specifies which + exact pGPUs are related to the enabled vGPU types and will have a + device_addresses option defined like this: + + :: + + cfg.ListOpt('device_addresses', + default=[], + help=""" + List of physical PCI addresses to associate with a specific vGPU type. + + The particular physical GPU device address needs to be mapped to the vendor + vGPU type which that physical GPU is configured to accept. In order to + provide this mapping, there will be a CONF section with a name + corresponding to the following template: "vgpu_%(vgpu_type_name)s + + The vGPU type to associate with the PCI devices has to be the section name + prefixed by ``vgpu_``. For example, for 'nvidia-11', you would declare + ``[vgpu_nvidia-11]/device_addresses``. + + Each vGPU type also has to be declared in ``[gpu_devices]/enabled_vgpu_types``. + + Related options: + + * ``[gpu_devices]/enabled_vgpu_types`` + """), + + For example, it would be set in cyborg.conf + + :: + + [gpu_devices] + enabled_vgpu_types = nvidia-223,nvidia-224 + [vgpu_nvidia-223] + device_addresses = 0000:af:00.0,0000:86:00.0 + [vgpu_nvidia-224] + device_addresses = 0000:87:00.0 + +5. Generate resource_class and traits for device, which later will also be +reported to Placement, and used by nova-scheduler to filter appropriate +accelerators.(NEW CHANGE) + +* ``resource class`` follows standard resources classes used by OpenStack [6]_. + Pass-through GPU device will report 'PGPU' as its resource class, + Virtualized GPU device will report 'VGPU' as its resource class. + +* ``traits`` follows the placement custom trait format [7]_. In the Cyborg + driver, it will report two traits for vGPU accelerator using the format + below: + + trait1: **OWNER_CYBORG**. + + trait2: **CUSTOM___**. + + Meaning of each parameter is listed below. + + * OWNER_CYBORG: a new namespace in os-traits to remark that a device is + reported by Cyborg when the inventory is reported to placement. It is used + to distinguish GPU devices reported by Nova. + + * VENDOR_NAME: vendor name of the GPU device. + + * PRODUCT_ID: product ID of the GPU device. + + * Virtual_GPU_Type: this parameter is actually another format of the + enabled_vgpu_types for a specific device set by admin in cyborg.conf. + In order to generate this param, driver will first retrieve + ``enabled_vgpu_type`` and then map it to Virtual_GPU_Type by the way + showed below. The name is exactly the Virtual_GPU_Type that will be + reported in traits. For more details about the valid Virtual GPU Types + for supported GPUs, please refer to [8]_. + + :: + + # find mapping relation between Virtual_GPU_Type and enabled_vgpu_type. + # The value in "name" file contains its corresponding Virtual_GPU_Type. + cat /sys/class/mdev_bus/{device_address}/mdev_supported_types/{enabled_vgpu_type}/name + +* Here is a example to show the traits of a GPU device in the real world. + + * A Nvidia Tesla T4 device has been successfully installed on host, + device address is 0000:af:00.0. In addition, the vendor’s vGPU driver + software must be installed and configured on the host at the same time. + + :: + + [vtu@ubuntudbs ~]# lspci -nnn -D|grep 1eb8 + 0000:af:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1eb8] (rev a1) + + * Enable GPU types (Accelerator) + + 1. Specify which specific GPU type(s) the instances would get from this + specific device. + + Edit devices.enabled_vgpu_types and device_address in cyborg.conf: + + :: + + [gpu] + enabled_vgpu_types=nvidia-223 + [vgpu_nvidia-223] + device_addresses = 0000:af:00.0 + + 2. Restart the cyborg-agent service. + + * Finally, traits reported for this device(RP) will be: + + **OWNER_CYBORG** and **CUSTOM_NVIDIA_1EB8_T4_2B** + +.. NOTE:: + + For the last parameter "T4_2B" (), we can validate the + mapping relation between "nvidia-223" and "T4_2B" by check from the mdev + sys path: + + :: + + [vtu@ubuntudbs mdev_supported_types]$ pwd + /sys/class/mdev_bus/0000:af:00.0/mdev_supported_types + [vtu@ubuntudbs mdev_supported_types]$ ls + nvidia-222 nvidia-225 nvidia-228 nvidia-231 nvidia-234 nvidia-320 + nvidia-223 nvidia-226 nvidia-229 nvidia-232 nvidia-252 nvidia-321 + nvidia-224 nvidia-227 nvidia-230 nvidia-233 nvidia-319 + [vtu@ubuntudbs mdev_supported_types]$ cat nvidia-223/name + GRID T4-2B + +6. Generate ``controlpath_id``, ``deployable``, ``attach_handle``, +``attribute`` for vGPU.(NEW CHANGE) + +7. Create a mdev device in the sys by echo its UUID (actually is the +attach_handle UUID) to the create file when vgpu is bind to a VM.(NEW CHANGE) + +create_file_path= +/sys/class/mdev_bus/{pci_address}/mdev_supported_types/{type-id}/create + +8. Delete a mdev device from sys by echo "1" to the remove file when vgpu is +unbind from a VM.(NEW CHANGE) + +remove_file_path= +/sys/class/mdev_bus/{pci_address}/mdev_supported_types/{type-id}/UUID/remove + +Alternatives +------------ + +Using Nova to manage vGPU device [10]_. + +Data model impact +----------------- + +None + + +REST API impact +--------------- + +None + + +Security impact +--------------- + +None + +Notifications impact +-------------------- + +None + +Other end user impact +--------------------- + +None + +Performance Impact +------------------ + +None + +Other deployer impact +--------------------- + +This feature is highly dependent on the version of libvirt and the physical +devices present on the host. + +For vGPU management, deployers need to make sure that the GPU device has been +successfully virtualized. Otherwise, Cyborg will report it as a pGPU device. + +Please see ref [2]_ and [3]_ for how to install the Virtual GPU Manager package +to virtualize your GPU devices. + +Developer impact +---------------- + +None + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + + +Work Items +---------- + +* Implement NVIDIA GPU Driver enhancement in Cyborg +* Add related test cases. +* Add test report to wiki and update the supported driver doc page + +Dependencies +============ + +None + +Testing +======== + +* Unit tests will be added to test this driver. + +Documentation Impact +==================== + +Document Nvidia GPU driver in Cyborg project. + +References +========== +.. [1] https://review.opendev.org/#/c/750116/ +.. [2] https://docs.nvidia.com/grid/6.0/grid-vgpu-user-guide/index.html +.. [3] https://docs.nvidia.com/grid/6.0/grid-vgpu-user-guide/index.html#install-vgpu-package-generic-linux-kvm +.. [4] https://specs.openstack.org/openstack/cyborg-specs/specs/stein/implemented/cyborg-database-model-proposal.html +.. [5] https://docs.openstack.org/nova/rocky/user/placement.html#references +.. [6] https://github.com/openstack/os-resource-classes/blob/master/os_resource_classes/__init__.py#L41 +.. [7] https://specs.openstack.org/openstack/nova-specs/specs/pike/implemented/resource-provider-traits.html +.. [8] https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#virtual-gpu-types-grid-reference +.. [9] https://specs.openstack.org/openstack/nova-specs/specs/ussuri/implemented/vgpu-multiple-types.html +.. [10] https://docs.openstack.org/nova/latest/admin/virtual-gpu.html + +History +======= + +.. list-table:: Revisions + :header-rows: 1 + + * - Release + - Description + * - Wallaby + - Introduced