Intel QAT and GPU support

Add sections to configure gpu devices, qat devices and intel device plugin operator. Add section to uninstall device plugins. Delete old qat device section. Story: 2010604 Task: 48177 Change-Id: I0f40bb50abc50889adb1c63316e9857ca9a371bc Signed-off-by: Elisamara Aoki Goncalves <elisamaraaoki.goncalves@windriver.com>
2024-07-18 13:44:02 +00:00 · 2024-07-18 13:44:02 +00:00 · a3feea0cdb
commit a3feea0cdb
parent bc9ac348d5
8 changed files with 409 additions and 201 deletions
--- a/doc/source/admintasks/kubernetes/gpu-device-plugin-configuration-615e2f6edfba.rst
+++ b/doc/source/admintasks/kubernetes/gpu-device-plugin-configuration-615e2f6edfba.rst
@ -0,0 +1,95 @@
+.. WARNING: Add no lines of text between the label immediately following
+.. and the title.
+
+.. _gpu-device-plugin-configuration-615e2f6edfba:
+
+===============================
+GPU Device Plugin Configuration
+===============================
+
+Intel |GPU| plugin enables Kubernetes clusters to utilize Intel GPUs for
+hardware acceleration of various workloads.
+
+This section describes how to enable and use the Intel |GPU| device plugin
+in |prod|.
+
+.. _prerequisites-1:
+
+.. rubric:: |prereq|
+
+-  The host should have Intel |GPU| hardware. For supported |GPU| devices,
+   refer to Intel |GPU| plugin documentation for more details: `Intel GPU
+   device plugin for Kubernetes
+   <https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/gpu_plugin/README.md>`__.
+
+-  Node Feature Discovery application must be installed using the following
+   commands:
+
+   .. code-block::
+
+      ~(keystone_admin)]$ system application-upload /usr/local/share/applications/helm/node-feature-discovery*.tgz
+      ~(keystone_admin)]$ system application-apply node-feature-discovery
+
+Enable Intel GPU Device Plugin
+------------------------------
+
+#. Locate the application tarball in the ``/usr/local/share/applications/helm``
+   directory. For example:
+
+   ``/usr/local/share/applications/helm/intel-device-plugins-operator-<version>.tgz``
+
+#. Upload the application using the following command.
+
+   .. code-block::
+
+      ~(keystone_admin)]$ system application-upload intel-device-plugins-operator-<version>.tgz
+
+   Replace ``<version>`` with the latest version number.
+
+#. Verify that the application has been uploaded successfully.
+
+   .. code-block::
+
+      ~(keystone_admin)]$ system application-list
+
+#. Check the helm chart status using the following command:
+
+   .. code-block::
+
+      ~(keystone_admin)]$ system helm-override-list intel-device-plugins-operator-long
+
+#. Enable |GPU| Helm chart using the following command:
+
+   .. code-block::
+
+      ~(keystone_admin)]$ system helm-chart-attribute-modify --enabled true intel-device-plugins-operator intel-device-plugins-gpu intel-device-plugins-operator
+
+#. Apply the application using the following command:
+
+   .. code-block::
+
+      ~(keystone_admin)]$ system application-apply intel-device-plugins-operator
+
+#. Monitor the status of the application using one of the following commands.
+
+   .. code-block::
+
+      ~(keystone_admin)]$ watch -n 5 system application-list
+
+   OR
+
+   .. code-block::
+
+      ~(keystone_admin)]$ watch kubectl get pods -n intel-device-plugins-operator
+
+#. Pods can be checked using the following command:
+
+   .. code-block::
+
+      $ kubectl get pods -n intel-device-plugins-operator
+
+Use Intel GPU Device Plugin
+---------------------------
+
+For information related to using |GPU| device plugin, see `Testing and Demos
+<https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/gpu_plugin/README.md#testing-and-demos>`__.
--- a/doc/source/admintasks/kubernetes/index-admintasks-kub-ebc55fefc368.rst
+++ b/doc/source/admintasks/kubernetes/index-admintasks-kub-ebc55fefc368.rst
@ -48,17 +48,18 @@ Optimize application performance
   isolating-cpu-cores-to-enhance-application-performance
   kubernetes-topology-manager-policies

-.. only:: starlingx

-   -----------------
-   QAT Device Plugin
-   -----------------
+---------------------------------
+QAT Device and GPU Device Plugins
+---------------------------------

-   .. toctree::
-      :maxdepth: 1
-
-      k8s_qat_device_plugin
+.. toctree::
+   :maxdepth: 1

+   intel-device-plugins-operator-application-overview-c5de2a6212ae
+   qat-device-plugin-configuration-616551306371
+   gpu-device-plugin-configuration-615e2f6edfba
+   uninstall-intel-device-plugins-operator-application-e712eabc1e49

 --------------
 Metrics Server
--- a/doc/source/admintasks/kubernetes/intel-device-plugins-operator-application-overview-c5de2a6212ae.rst
+++ b/doc/source/admintasks/kubernetes/intel-device-plugins-operator-application-overview-c5de2a6212ae.rst
@ -0,0 +1,32 @@
+.. WARNING: Add no lines of text between the label immediately following
+.. and the title.
+
+.. _intel-device-plugins-operator-application-overview-c5de2a6212ae:
+
+==================================================
+Intel Device Plugins Operator Application Overview
+==================================================
+
+This application provides a set of plugins developed by Intel to facilitate the
+use of Intel hardware features in Kubernetes clusters. These plugins are
+designed to enable and optimize the use of Intel-specific hardware capabilities
+in a Kubernetes environment.
+
+The following plugins are supported:
+
+* Intel |QAT| device plugin 0.26.0
+
+* Intel |GPU| device plugin 0.26.0
+
+
+Install Intel Device Plugins Operator Application
+-------------------------------------------------
+
+Intel device plugin Operator application is required to be installed for
+configuring the Intel |QAT| device plugin and the Intel |GPU| device plugin.
+Installation steps are mentioned in the respective device plugin configuration
+sections below.
+
+:ref:`qat-device-plugin-configuration-616551306371`
+
+:ref:`gpu-device-plugin-configuration-615e2f6edfba`
--- a/doc/source/admintasks/kubernetes/k8s_qat_device_plugin.rst
+++ b/doc/source/admintasks/kubernetes/k8s_qat_device_plugin.rst
@ -1,115 +0,0 @@
-.. _k8s_qat_device_plugin:
-
-.. only:: starlingx
-
-   ==========================================
-   Kubernetes QAT Device Plugin Configuration
-   ==========================================
-
-   Intel® QuickAssist Technology (Intel® QAT) accelerates cryptographic workloads
-   by offloading the data to hardware capable of optimizing those functions. This
-   guide describes how to enable and consume the Intel QAT device plugin in
-   StarlingX.
-
-   .. contents::
-      :local:
-      :depth: 1
-
-   -------------
-   Prerequisites
-   -------------
-
-   - Install Intel QuickAssist device on host.
-   - Install StarlingX on bare metal with DPDK enabled. Refer to the  |_link-inst-book|
-     for details.
-
-   ------------------------------
-   Enable Intel QAT device plugin
-   ------------------------------
-
-   The Intel QAT device plugin daemonset is pre-installed in StarlingX. This
-   section describes the steps to enable the Intel QAT device plugin for
-   discovering and advertising QAT VF resources to Kubernetes host.
-
-   #. Verify QuickAssist SR-IOV virtual functions are configured on a specified
-      node after StarlingX is installed. This example uses the worker-0 node.
-
-      ::
-
-         $ ssh worker-0
-         $ for i in 0442 0443 37c9 19e3; do lspci -d 8086:$i; done
-
-      .. note::
-
-         The Intel QAT device plugin only supports QAT VF resources in the current
-         release.
-
-   #. Assign the ``intelqat`` label to the node (worker-0 in this example).
-
-      ::
-
-         $ NODE=worker-0
-         $ system host-lock $NODE
-         $ system host-label-assign $NODE intelqat=enabled
-         $ system host-unlock $NODE
-
-   #. After the node becomes available, verify the Intel QAT device plugin is
-      registered.
-
-      ::
-
-         $ kubectl describe node $NODE | grep qat.intel.com/generic
-         qat.intel.com/generic: 10
-         .intel.com/generic: 10
-
-   -------------------------------
-   Consume Intel QAT device plugin
-   -------------------------------
-
-   #. Build the DPDK image.
-
-      ::
-
-         $ git clone https://github.com/intel/intel-device-plugins-for-kubernetes.git
-         $ cd demo
-         $ ./build-image.sh crypto-perf
-
-      This command produces a Docker image named ``crypto-perf``.
-
-   #. Deploy a pod to run an example DPDK application named
-      ``dpdk-test-crypto-perf``.
-
-      In the pod specification file, add the container resource request and
-      limit.
-
-      For example, use ``qat.intel.com/generic: <number of devices>`` for a
-      container requesting Intel QAT devices.
-
-
-      For a DPDK-based workload, you may need to add a hugepage request and limit.
-
-      ::
-
-         $ kubectl apply -k deployments/qat_dpdk_app/base/
-         $ kubectl get pods
-           NAME                     READY     STATUS    RESTARTS   AGE
-           qat-dpdk                 1/1       Running   0          27m
-           intel-qat-plugin-5zgvb   1/1       Running   0          3h
-
-      .. Note::
-
-         The deployment example above uses kustomize, which is a tool supported by
-         kubectl since the Kubernetes v1.14 release.
-
-
-   #. Manually execute the ``dpdk-test-crypto-perf`` application to review the
-      logs.
-
-      ::
-
-         $ kubectl exec -it qat-dpdk bash
-
-         $ ./dpdk-test-crypto-perf -l 6-7 -w $QAT1 -- --ptest throughput --\
-          devtype crypto_qat --optype cipher-only --cipher-algo aes-cbc --cipher-op \
-          encrypt --cipher-key-sz 16 --total-ops 10000000 --burst-sz 32 --buffer-sz 64
-
--- a/doc/source/admintasks/kubernetes/qat-device-plugin-configuration-616551306371.rst
+++ b/doc/source/admintasks/kubernetes/qat-device-plugin-configuration-616551306371.rst
@ -0,0 +1,249 @@
+.. WARNING: Add no lines of text between the label immediately following
+.. and the title.
+
+.. _qat-device-plugin-configuration-616551306371:
+
+===============================
+QAT Device Plugin Configuration
+===============================
+
+Intel® QuickAssist Technology (Intel® QAT) accelerates cryptographic workloads
+by offloading the data to hardware that is capable of optimizing those
+functions.
+
+This section describes how to enable and consume the Intel |QAT| device plugin
+in |prod|.
+
+.. rubric:: |prereq|
+
+-  The host should have Intel |QAT| hardware. Supported |QAT| devices are 4940
+   and 4942. After |prod| is installed, do the following verification to ensure
+   |QAT| devices are configured.
+
+   -  Verify |QAT| |SRIOV| physical functions are configured.
+
+      .. code-block::
+
+         $ for i in 4942 4940; do lspci -d** **8086:$i; done
+
+   -  Verify |QAT| |SRIOV| virtual functions are configured.
+
+      .. code-block::
+
+         $ for i in 4943 4941; do lspci -d** **8086:$i; done
+
+         $ sudo /etc/init.d/qat_service status # Must list all the virtual functions
+
+         Checking status of all devices.
+
+         There is 34 QAT acceleration device(s) in the system:
+
+         qat_dev0 - type: 4xxx, inst_id: 0, node_id: 0, bsf: 0000:f3:00.0,
+         #accel: 1 #engines: 9 state: up
+
+         qat_dev1 - type: 4xxx, inst_id: 1, node_id: 0, bsf: 0000:f7:00.0,
+         #accel: 1 #engines: 9 state: up
+
+         qat_dev2 - type: 4xxxvf, inst_id: 0, node_id: 0, bsf: 0000:f3:00.1,
+         #accel: 1 #engines: 1 state: up
+
+         qat_dev3 - type: 4xxxvf, inst_id: 1, node_id: 0, bsf: 0000:f3:00.2,
+         #accel: 1 #engines: 1 state: up
+
+   -  Verify the |QAT| driver ``vfio_pci`` is installed.
+
+      .. code-block::
+
+         $ lsmod | grep vfio_pci
+
+         vfio_pci 69632 0
+         vfio_virqfd 16384 1 vfio_pci
+         vfio 45056 4 intel_qat,vfio_mdev,vfio_iommu_type1,\ **vfio_pci**
+         irqbypass 16384 3 intel_qat,vfio_pci,kvm
+
+-  Node Feature Discovery application must be installed, using the following
+   commands.
+
+   .. code-block::
+
+      ~(keystone_admin)]$ system application-upload /usr/local/share/applications/helm/node-feature-discovery*.tgz
+      ~(keystone_admin)]$ system application-apply node-feature-discovery
+
+
+Enable Intel QAT Device Plugin
+------------------------------
+
+The following steps should be performed to enable the Intel |QAT| device plugin
+for discovering and advertising |QAT| VF (Virtual Functions) resources to
+Kubernetes host.
+
+#. Locate the application tarball in the ``/usr/local/share/applications/helm``
+   directory. For example:
+
+   ``/usr/local/share/applications/helm/intel-device-plugins-operator-<version>.tgz``
+
+#. Upload the application using the following command.
+
+   .. code-block::
+
+      ~(keystone_admin**\ **)]$ system application-upload intel-device-plugins-operator-<version>.tgz
+
+   Replace ``<version>`` with the latest version number.
+
+#. Verify that the application has been uploaded successfully.
+
+   .. code-block::
+
+      ~(keystone_admin**\ **)]$ system application-list
+
+#. Check the Hellm chart status.
+
+   .. code-block::
+
+      ~(keystone_admin*)]$ system helm-override-list intel-device-plugins-operator -long**
+
+#. Enable QAT helm chart.
+
+   .. code-block::
+
+      ~(keystone_admin)]$ system helm-chart-attribute-modify --enabled true intel-device-plugins-operator intel-device-plugins-qat intel-device-plugins-operator
+
+#. Apply the application.
+
+   .. code-block::
+
+      ~(keystone_admin)]$ system application-apply intel-device-plugins-operator
+
+#. Monitor the status of the application.
+
+   .. code-block::
+
+      ~(keystone_admin*)]$ watch -n 5 system application-list
+
+   OR
+
+   .. code-block:: none
+
+      ~(keystone_admin)]$ watch kubectl get pods -n intel-device-plugins-operator
+
+#. Check the pods.
+
+   .. code-block::
+
+      $ kubectl get pods -n intel-device-plugins-operator
+
+      NAME                                          READY STATUS  RESTARTS AGE
+
+      intel-qat-plugin-qatdeviceplugin-sample-g8n45 1/1   Running 0        34s
+      inteldeviceplugins-controller-manager-74f4c   2/2   Running 0        64s
+
+#. Verify |QAT| devices by checking the node's resource allocations. The |QAT|
+   4940 device and the |QAT| 4942 device each have 16 virtual functions. If
+   both devices are present, the following command will display a total of 32
+   virtual functions:
+
+   .. code-block::
+
+      $ kubectl describe node <node name> \| grep qat.intel.com/asym-dc
+
+      Capacity:
+      ---
+      qat.intel.com/asym-dc: 32
+      ---
+      Allocatable:
+      ---
+      qat.intel.com/asym-dc: 32
+      ---
+
+Use Intel QAT Device Plugin
+---------------------------
+
+This section describes the steps for using |QAT| device plugin.
+
+#. Deploy a pod using the following sample POD specification file. The pod
+   specification file can be modified for required resource request and limit.
+
+   The ``qat.intel.com/asym-dc: <number of devices>`` field is used to
+   configure the requested |QAT| virtual functions.
+
+   For a |DPDK|-based workload, you may need to add a hugepage request and
+   limit.
+
+   ``qat-dpdk.yaml``
+
+   .. code-block:: yaml
+
+      kind: Pod
+      apiVersion: v1
+      metadata:
+        name: dpdk-test-crypto-perf
+      spec:
+        containers:
+        - name: crypto-perf
+          image: intel/crypto-perf:devel
+          imagePullPolicy: IfNotPresent
+          command: [ "/bin/bash", "-c", "--" ]
+          args: [ "while true; do sleep 300000; done;" ]
+          volumeMounts:
+          - mountPath: /dev/hugepages
+            name: hugepage
+          - mountPath: /var/run/dpdk
+            name: dpdk-runtime
+          resources:
+            requests:
+              cpu: "3"
+              memory: "128Mi"
+              qat.intel.com/asym-dc: '4'
+              hugepages-2Mi: "128Mi"
+            limits:
+              cpu: "3"
+              memory: "128Mi"
+              qat.intel.com/asym-dc: '4'
+              hugepages-2Mi: "128Mi"
+          securityContext:
+            readOnlyRootFilesystem: true
+            allowPrivilegeEscalation: false
+            capabilities:
+              add:
+                ["IPC_LOCK"]
+        restartPolicy: Never
+        volumes:
+        - name: dpdk-runtime
+          emptyDir:
+            medium: Memory
+        - name: hugepage
+          emptyDir:
+            medium: HugePages
+
+
+   Apply the pod specification file to create ``dpdk-test-crypto-perf`` pod.
+
+   .. code-block::
+
+      $ kubectl apply -k qat-dpdk.yaml
+
+#. Verify the pod status and the allocated |QAT| virtual functions.
+
+   .. code-block::
+
+      $ kubectl get pods
+
+      NAME                  READY STATUS  RESTARTS AGE
+      dpdk-test-crypto-perf 1/1   Running 0        27m
+
+      $ kubectl describe pod dpdk-test-crypto-perf**
+
+      Requests:
+      ---
+      qat.intel.com/asym-dc: 4
+      ---
+
+      $ kubectl describe node <controller-name>
+
+      Allocated resources:
+      ---
+      qat.intel.com/asym-dc: 4
+      ---
+
+For more information, see: `Demos and Testing
+<https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/qat_plugin/README.md#demos-and-testing>`__.
--- a/doc/source/admintasks/kubernetes/uninstall-intel-device-plugins-operator-application-e712eabc1e49.rst
+++ b/doc/source/admintasks/kubernetes/uninstall-intel-device-plugins-operator-application-e712eabc1e49.rst
@ -0,0 +1,23 @@
+.. WARNING: Add no lines of text between the label immediately following
+.. and the title.
+
+.. _uninstall-intel-device-plugins-operator-application-e712eabc1e49:
+
+===================================================
+Uninstall Intel Device Plugins Operator Application
+===================================================
+
+Use the following steps to uninstall the Intel Device Plugins operator
+application:
+
+#. Remove the application using the following command:
+
+   .. code-block::
+
+      ~(keystone_admin)]$ system application-remove intel-device-plugins-operator
+
+#. Delete application using the following command:
+
+   .. code-block::
+
+      ~(keystone_admin)]$ system application-delete intel-device-plugins-operator
--- a/doc/source/operations/index.rst
+++ b/doc/source/operations/index.rst
@ -34,7 +34,7 @@ Kubernetes Operation
   k8s_nodeport_usage
   k8s_persistent_vol_claims
   k8s_sriov_config
-   k8s_gpu_device_plugin
+

 -------------------
 OpenStack Operation
--- a/doc/source/operations/k8s_gpu_device_plugin.rst
+++ b/doc/source/operations/k8s_gpu_device_plugin.rst
@ -1,77 +0,0 @@
-================================================
-Kubernetes Intel GPU Device Plugin Configuration
-================================================
-
-This document describes how to enable the Intel GPU device plugin in StarlingX
-and schedule pods on nodes with an Intel GPU.
-
------------------------------
-Enable Intel GPU device plugin
------------------------------
-
-You can pre-install the ``intel-gpu-plugin`` daemonset as follows:
-
-#. Launch the ``intel-gpu-plugin`` daemonset.
-
-   Add the following lines to the ``localhost.yaml`` file before playing the
-   Ansible bootstrap playbook to configure the system.
-
-   ::
-
-     k8s_plugins:
-       intel-gpu-plugin: intelgpu=enabled
-
-#. Assign the ``intelgpu`` label to each node that should have the Intel GPU
-   plugin enabled. This will make any GPU devices on a given node available for
-   scheduling to containers. The following example assigns the ``intelgpu``
-   label to the worker-0 node.
-
-   ::
-
-      $ NODE=worker-0
-      $ system host-lock $NODE
-      $ system host-label-assign $NODE intelgpu=enabled
-      $ system host-unlock $NODE
-
-#. After the node becomes available, verify the GPU device plugin is registered
-   and that the available GPU devices on the node have been discovered and reported.
-
-   ::
-
-      $ kubectl describe node $NODE | grep gpu.intel.com
-      gpu.intel.com/i915:  1
-      gpu.intel.com/i915:  1
-
-------------------------------------
-Schedule pods on nodes with Intel GPU
-------------------------------------
-
-Add a ``resources.limits.gpu.intel.com`` to your container specification in order
-to request an available GPU device for your container.
-
-::
-
-  ...
-  spec:
-    containers:
-      - name: ...
-        ...
-        resources:
-          limits:
-            gpu.intel.com/i915: 1
-
-
-The pods will be scheduled to the nodes with available Intel GPU devices. A GPU
-device will be allocated to the container and the available GPU devices will be
-updated.
-
-::
-
-      $ kubectl describe node $NODE | grep gpu.intel.com
-      gpu.intel.com/i915:  1
-      gpu.intel.com/i915:  0
-
-For more details, refer to the following examples:
-
-* `Kubernetes manifest file example <https://github.com/intel/intel-device-plugins-for-kubernetes/blob/master/demo/intelgpu-job.yaml>`_
-* `Scheduling pods on nodes with Intel GPU example <https://github.com/intel/intel-device-plugins-for-kubernetes/blob/master/cmd/gpu_plugin/README.md#test-gpu-device-plugin>`_