Intel QAT and GPU support

Add sections to configure gpu devices, qat devices and intel device plugin operator.
Add section to uninstall device plugins.
Delete old qat device section.

Story: 2010604
Task: 48177

Change-Id: I0f40bb50abc50889adb1c63316e9857ca9a371bc
Signed-off-by: Elisamara Aoki Goncalves <elisamaraaoki.goncalves@windriver.com>
This commit is contained in:
Elisamara Aoki Goncalves 2024-07-18 13:44:02 +00:00
parent bc9ac348d5
commit a3feea0cdb
8 changed files with 409 additions and 201 deletions

View File

@ -0,0 +1,95 @@
.. WARNING: Add no lines of text between the label immediately following
.. and the title.
.. _gpu-device-plugin-configuration-615e2f6edfba:
===============================
GPU Device Plugin Configuration
===============================
Intel |GPU| plugin enables Kubernetes clusters to utilize Intel GPUs for
hardware acceleration of various workloads.
This section describes how to enable and use the Intel |GPU| device plugin
in |prod|.
.. _prerequisites-1:
.. rubric:: |prereq|
- The host should have Intel |GPU| hardware. For supported |GPU| devices,
refer to Intel |GPU| plugin documentation for more details: `Intel GPU
device plugin for Kubernetes
<https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/gpu_plugin/README.md>`__.
- Node Feature Discovery application must be installed using the following
commands:
.. code-block::
~(keystone_admin)]$ system application-upload /usr/local/share/applications/helm/node-feature-discovery*.tgz
~(keystone_admin)]$ system application-apply node-feature-discovery
Enable Intel GPU Device Plugin
------------------------------
#. Locate the application tarball in the ``/usr/local/share/applications/helm``
directory. For example:
``/usr/local/share/applications/helm/intel-device-plugins-operator-<version>.tgz``
#. Upload the application using the following command.
.. code-block::
~(keystone_admin)]$ system application-upload intel-device-plugins-operator-<version>.tgz
Replace ``<version>`` with the latest version number.
#. Verify that the application has been uploaded successfully.
.. code-block::
~(keystone_admin)]$ system application-list
#. Check the helm chart status using the following command:
.. code-block::
~(keystone_admin)]$ system helm-override-list intel-device-plugins-operator-long
#. Enable |GPU| Helm chart using the following command:
.. code-block::
~(keystone_admin)]$ system helm-chart-attribute-modify --enabled true intel-device-plugins-operator intel-device-plugins-gpu intel-device-plugins-operator
#. Apply the application using the following command:
.. code-block::
~(keystone_admin)]$ system application-apply intel-device-plugins-operator
#. Monitor the status of the application using one of the following commands.
.. code-block::
~(keystone_admin)]$ watch -n 5 system application-list
OR
.. code-block::
~(keystone_admin)]$ watch kubectl get pods -n intel-device-plugins-operator
#. Pods can be checked using the following command:
.. code-block::
$ kubectl get pods -n intel-device-plugins-operator
Use Intel GPU Device Plugin
---------------------------
For information related to using |GPU| device plugin, see `Testing and Demos
<https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/gpu_plugin/README.md#testing-and-demos>`__.

View File

@ -48,17 +48,18 @@ Optimize application performance
isolating-cpu-cores-to-enhance-application-performance
kubernetes-topology-manager-policies
.. only:: starlingx
-----------------
QAT Device Plugin
-----------------
---------------------------------
QAT Device and GPU Device Plugins
---------------------------------
.. toctree::
:maxdepth: 1
k8s_qat_device_plugin
.. toctree::
:maxdepth: 1
intel-device-plugins-operator-application-overview-c5de2a6212ae
qat-device-plugin-configuration-616551306371
gpu-device-plugin-configuration-615e2f6edfba
uninstall-intel-device-plugins-operator-application-e712eabc1e49
--------------
Metrics Server

View File

@ -0,0 +1,32 @@
.. WARNING: Add no lines of text between the label immediately following
.. and the title.
.. _intel-device-plugins-operator-application-overview-c5de2a6212ae:
==================================================
Intel Device Plugins Operator Application Overview
==================================================
This application provides a set of plugins developed by Intel to facilitate the
use of Intel hardware features in Kubernetes clusters. These plugins are
designed to enable and optimize the use of Intel-specific hardware capabilities
in a Kubernetes environment.
The following plugins are supported:
* Intel |QAT| device plugin 0.26.0
* Intel |GPU| device plugin 0.26.0
Install Intel Device Plugins Operator Application
-------------------------------------------------
Intel device plugin Operator application is required to be installed for
configuring the Intel |QAT| device plugin and the Intel |GPU| device plugin.
Installation steps are mentioned in the respective device plugin configuration
sections below.
:ref:`qat-device-plugin-configuration-616551306371`
:ref:`gpu-device-plugin-configuration-615e2f6edfba`

View File

@ -1,115 +0,0 @@
.. _k8s_qat_device_plugin:
.. only:: starlingx
==========================================
Kubernetes QAT Device Plugin Configuration
==========================================
Intel® QuickAssist Technology (Intel® QAT) accelerates cryptographic workloads
by offloading the data to hardware capable of optimizing those functions. This
guide describes how to enable and consume the Intel QAT device plugin in
StarlingX.
.. contents::
:local:
:depth: 1
-------------
Prerequisites
-------------
- Install Intel QuickAssist device on host.
- Install StarlingX on bare metal with DPDK enabled. Refer to the |_link-inst-book|
for details.
------------------------------
Enable Intel QAT device plugin
------------------------------
The Intel QAT device plugin daemonset is pre-installed in StarlingX. This
section describes the steps to enable the Intel QAT device plugin for
discovering and advertising QAT VF resources to Kubernetes host.
#. Verify QuickAssist SR-IOV virtual functions are configured on a specified
node after StarlingX is installed. This example uses the worker-0 node.
::
$ ssh worker-0
$ for i in 0442 0443 37c9 19e3; do lspci -d 8086:$i; done
.. note::
The Intel QAT device plugin only supports QAT VF resources in the current
release.
#. Assign the ``intelqat`` label to the node (worker-0 in this example).
::
$ NODE=worker-0
$ system host-lock $NODE
$ system host-label-assign $NODE intelqat=enabled
$ system host-unlock $NODE
#. After the node becomes available, verify the Intel QAT device plugin is
registered.
::
$ kubectl describe node $NODE | grep qat.intel.com/generic
qat.intel.com/generic: 10
.intel.com/generic: 10
-------------------------------
Consume Intel QAT device plugin
-------------------------------
#. Build the DPDK image.
::
$ git clone https://github.com/intel/intel-device-plugins-for-kubernetes.git
$ cd demo
$ ./build-image.sh crypto-perf
This command produces a Docker image named ``crypto-perf``.
#. Deploy a pod to run an example DPDK application named
``dpdk-test-crypto-perf``.
In the pod specification file, add the container resource request and
limit.
For example, use ``qat.intel.com/generic: <number of devices>`` for a
container requesting Intel QAT devices.
For a DPDK-based workload, you may need to add a hugepage request and limit.
::
$ kubectl apply -k deployments/qat_dpdk_app/base/
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
qat-dpdk 1/1 Running 0 27m
intel-qat-plugin-5zgvb 1/1 Running 0 3h
.. Note::
The deployment example above uses kustomize, which is a tool supported by
kubectl since the Kubernetes v1.14 release.
#. Manually execute the ``dpdk-test-crypto-perf`` application to review the
logs.
::
$ kubectl exec -it qat-dpdk bash
$ ./dpdk-test-crypto-perf -l 6-7 -w $QAT1 -- --ptest throughput --\
devtype crypto_qat --optype cipher-only --cipher-algo aes-cbc --cipher-op \
encrypt --cipher-key-sz 16 --total-ops 10000000 --burst-sz 32 --buffer-sz 64

View File

@ -0,0 +1,249 @@
.. WARNING: Add no lines of text between the label immediately following
.. and the title.
.. _qat-device-plugin-configuration-616551306371:
===============================
QAT Device Plugin Configuration
===============================
Intel® QuickAssist Technology (Intel® QAT) accelerates cryptographic workloads
by offloading the data to hardware that is capable of optimizing those
functions.
This section describes how to enable and consume the Intel |QAT| device plugin
in |prod|.
.. rubric:: |prereq|
- The host should have Intel |QAT| hardware. Supported |QAT| devices are 4940
and 4942. After |prod| is installed, do the following verification to ensure
|QAT| devices are configured.
- Verify |QAT| |SRIOV| physical functions are configured.
.. code-block::
$ for i in 4942 4940; do lspci -d** **8086:$i; done
- Verify |QAT| |SRIOV| virtual functions are configured.
.. code-block::
$ for i in 4943 4941; do lspci -d** **8086:$i; done
$ sudo /etc/init.d/qat_service status # Must list all the virtual functions
Checking status of all devices.
There is 34 QAT acceleration device(s) in the system:
qat_dev0 - type: 4xxx, inst_id: 0, node_id: 0, bsf: 0000:f3:00.0,
#accel: 1 #engines: 9 state: up
qat_dev1 - type: 4xxx, inst_id: 1, node_id: 0, bsf: 0000:f7:00.0,
#accel: 1 #engines: 9 state: up
qat_dev2 - type: 4xxxvf, inst_id: 0, node_id: 0, bsf: 0000:f3:00.1,
#accel: 1 #engines: 1 state: up
qat_dev3 - type: 4xxxvf, inst_id: 1, node_id: 0, bsf: 0000:f3:00.2,
#accel: 1 #engines: 1 state: up
- Verify the |QAT| driver ``vfio_pci`` is installed.
.. code-block::
$ lsmod | grep vfio_pci
vfio_pci 69632 0
vfio_virqfd 16384 1 vfio_pci
vfio 45056 4 intel_qat,vfio_mdev,vfio_iommu_type1,\ **vfio_pci**
irqbypass 16384 3 intel_qat,vfio_pci,kvm
- Node Feature Discovery application must be installed, using the following
commands.
.. code-block::
~(keystone_admin)]$ system application-upload /usr/local/share/applications/helm/node-feature-discovery*.tgz
~(keystone_admin)]$ system application-apply node-feature-discovery
Enable Intel QAT Device Plugin
------------------------------
The following steps should be performed to enable the Intel |QAT| device plugin
for discovering and advertising |QAT| VF (Virtual Functions) resources to
Kubernetes host.
#. Locate the application tarball in the ``/usr/local/share/applications/helm``
directory. For example:
``/usr/local/share/applications/helm/intel-device-plugins-operator-<version>.tgz``
#. Upload the application using the following command.
.. code-block::
~(keystone_admin**\ **)]$ system application-upload intel-device-plugins-operator-<version>.tgz
Replace ``<version>`` with the latest version number.
#. Verify that the application has been uploaded successfully.
.. code-block::
~(keystone_admin**\ **)]$ system application-list
#. Check the Hellm chart status.
.. code-block::
~(keystone_admin*)]$ system helm-override-list intel-device-plugins-operator -long**
#. Enable QAT helm chart.
.. code-block::
~(keystone_admin)]$ system helm-chart-attribute-modify --enabled true intel-device-plugins-operator intel-device-plugins-qat intel-device-plugins-operator
#. Apply the application.
.. code-block::
~(keystone_admin)]$ system application-apply intel-device-plugins-operator
#. Monitor the status of the application.
.. code-block::
~(keystone_admin*)]$ watch -n 5 system application-list
OR
.. code-block:: none
~(keystone_admin)]$ watch kubectl get pods -n intel-device-plugins-operator
#. Check the pods.
.. code-block::
$ kubectl get pods -n intel-device-plugins-operator
NAME READY STATUS RESTARTS AGE
intel-qat-plugin-qatdeviceplugin-sample-g8n45 1/1 Running 0 34s
inteldeviceplugins-controller-manager-74f4c 2/2 Running 0 64s
#. Verify |QAT| devices by checking the node's resource allocations. The |QAT|
4940 device and the |QAT| 4942 device each have 16 virtual functions. If
both devices are present, the following command will display a total of 32
virtual functions:
.. code-block::
$ kubectl describe node <node name> \| grep qat.intel.com/asym-dc
Capacity:
---
qat.intel.com/asym-dc: 32
---
Allocatable:
---
qat.intel.com/asym-dc: 32
---
Use Intel QAT Device Plugin
---------------------------
This section describes the steps for using |QAT| device plugin.
#. Deploy a pod using the following sample POD specification file. The pod
specification file can be modified for required resource request and limit.
The ``qat.intel.com/asym-dc: <number of devices>`` field is used to
configure the requested |QAT| virtual functions.
For a |DPDK|-based workload, you may need to add a hugepage request and
limit.
``qat-dpdk.yaml``
.. code-block:: yaml
kind: Pod
apiVersion: v1
metadata:
name: dpdk-test-crypto-perf
spec:
containers:
- name: crypto-perf
image: intel/crypto-perf:devel
imagePullPolicy: IfNotPresent
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 300000; done;" ]
volumeMounts:
- mountPath: /dev/hugepages
name: hugepage
- mountPath: /var/run/dpdk
name: dpdk-runtime
resources:
requests:
cpu: "3"
memory: "128Mi"
qat.intel.com/asym-dc: '4'
hugepages-2Mi: "128Mi"
limits:
cpu: "3"
memory: "128Mi"
qat.intel.com/asym-dc: '4'
hugepages-2Mi: "128Mi"
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
add:
["IPC_LOCK"]
restartPolicy: Never
volumes:
- name: dpdk-runtime
emptyDir:
medium: Memory
- name: hugepage
emptyDir:
medium: HugePages
Apply the pod specification file to create ``dpdk-test-crypto-perf`` pod.
.. code-block::
$ kubectl apply -k qat-dpdk.yaml
#. Verify the pod status and the allocated |QAT| virtual functions.
.. code-block::
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
dpdk-test-crypto-perf 1/1 Running 0 27m
$ kubectl describe pod dpdk-test-crypto-perf**
Requests:
---
qat.intel.com/asym-dc: 4
---
$ kubectl describe node <controller-name>
Allocated resources:
---
qat.intel.com/asym-dc: 4
---
For more information, see: `Demos and Testing
<https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/qat_plugin/README.md#demos-and-testing>`__.

View File

@ -0,0 +1,23 @@
.. WARNING: Add no lines of text between the label immediately following
.. and the title.
.. _uninstall-intel-device-plugins-operator-application-e712eabc1e49:
===================================================
Uninstall Intel Device Plugins Operator Application
===================================================
Use the following steps to uninstall the Intel Device Plugins operator
application:
#. Remove the application using the following command:
.. code-block::
~(keystone_admin)]$ system application-remove intel-device-plugins-operator
#. Delete application using the following command:
.. code-block::
~(keystone_admin)]$ system application-delete intel-device-plugins-operator

View File

@ -34,7 +34,7 @@ Kubernetes Operation
k8s_nodeport_usage
k8s_persistent_vol_claims
k8s_sriov_config
k8s_gpu_device_plugin
-------------------
OpenStack Operation

View File

@ -1,77 +0,0 @@
================================================
Kubernetes Intel GPU Device Plugin Configuration
================================================
This document describes how to enable the Intel GPU device plugin in StarlingX
and schedule pods on nodes with an Intel GPU.
------------------------------
Enable Intel GPU device plugin
------------------------------
You can pre-install the ``intel-gpu-plugin`` daemonset as follows:
#. Launch the ``intel-gpu-plugin`` daemonset.
Add the following lines to the ``localhost.yaml`` file before playing the
Ansible bootstrap playbook to configure the system.
::
k8s_plugins:
intel-gpu-plugin: intelgpu=enabled
#. Assign the ``intelgpu`` label to each node that should have the Intel GPU
plugin enabled. This will make any GPU devices on a given node available for
scheduling to containers. The following example assigns the ``intelgpu``
label to the worker-0 node.
::
$ NODE=worker-0
$ system host-lock $NODE
$ system host-label-assign $NODE intelgpu=enabled
$ system host-unlock $NODE
#. After the node becomes available, verify the GPU device plugin is registered
and that the available GPU devices on the node have been discovered and reported.
::
$ kubectl describe node $NODE | grep gpu.intel.com
gpu.intel.com/i915: 1
gpu.intel.com/i915: 1
-------------------------------------
Schedule pods on nodes with Intel GPU
-------------------------------------
Add a ``resources.limits.gpu.intel.com`` to your container specification in order
to request an available GPU device for your container.
::
...
spec:
containers:
- name: ...
...
resources:
limits:
gpu.intel.com/i915: 1
The pods will be scheduled to the nodes with available Intel GPU devices. A GPU
device will be allocated to the container and the available GPU devices will be
updated.
::
$ kubectl describe node $NODE | grep gpu.intel.com
gpu.intel.com/i915: 1
gpu.intel.com/i915: 0
For more details, refer to the following examples:
* `Kubernetes manifest file example <https://github.com/intel/intel-device-plugins-for-kubernetes/blob/master/demo/intelgpu-job.yaml>`_
* `Scheduling pods on nodes with Intel GPU example <https://github.com/intel/intel-device-plugins-for-kubernetes/blob/master/cmd/gpu_plugin/README.md#test-gpu-device-plugin>`_