
Adding a deployment guide for vDPA Related: https://bugzilla.redhat.com/1956297 Depends-On: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/797529/ Change-Id: Id6ae287fe221089906264475cde02a62b4acd1e4
371 lines
13 KiB
ReStructuredText
371 lines
13 KiB
ReStructuredText
Deploying with vDPA Support
|
|
===============================
|
|
|
|
TripleO can deploy Overcloud nodes with vDPA support. A new role ``ComputeVdpa``
|
|
has been added to create a custom ``roles_data.yaml`` with composable vDPA role.
|
|
|
|
vDPA is very similar to SR-IOV and leverages the same Openstack components. It's
|
|
important to note that vDPA can't function without OVS Hardware Offload.
|
|
|
|
Mellanox is the only NIC vendor currently supported with vDPA.
|
|
|
|
Execute below command to create the ``roles_data.yaml``::
|
|
|
|
openstack overcloud roles generate -o roles_data.yaml Controller ComputeVdpa
|
|
|
|
Once a roles file is created, the following changes are required:
|
|
|
|
- Deploy Command
|
|
- Parameters
|
|
- Network Config
|
|
- Network and Port creation
|
|
|
|
Deploy Command
|
|
----------------
|
|
Deploy command should include the generated roles data file from the above
|
|
command.
|
|
|
|
Deploy command should also include the SR-IOV environment file to include the
|
|
``neutron-sriov-agent`` service. All the required parameters are also specified
|
|
in this environment file. The parameters has to be configured according to the
|
|
baremetal on which vDPA needs to be enabled.
|
|
|
|
Also, vDPA requires mandatory kernel parameters to be set, like
|
|
``intel_iommu=on iommu=pt`` on Intel machines. In order to enable the
|
|
configuration of kernel parametres to the host, The ``KernelArgs`` role
|
|
parameter has to be defined accordingly.
|
|
|
|
Adding the following arguments to the ``openstack overcloud deploy`` command
|
|
will do the trick::
|
|
|
|
openstack overcloud deploy --templates \
|
|
-r roles_data.yaml \
|
|
-e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-sriov.yaml \
|
|
...
|
|
|
|
Parameters
|
|
----------
|
|
|
|
Unlike SR-IOV, vDPA devices shouldn't be added to ``NeutronPhysicalDevMappings`` but to the
|
|
``NovaPCIPassthrough``. The vDPA bridge should also be added to the ``NeutronBridgeMappings``
|
|
and the ``physical_network`` to the ``NeutronNetworkVLANRanges``.
|
|
|
|
The parameter ``KernelArgs`` should be provided in the deployment environment
|
|
file, with the set of kernel boot parameters to be applied on the
|
|
``ComputeVdpa`` role where vDPA is enabled.
|
|
|
|
The ``PciPassthroughFilter`` is required for vDPA. The ``NUMATopologyFilter`` will become
|
|
optional when ``libvirt`` will support the locking of the guest memory. At this time, it
|
|
is mandatory to have it::
|
|
|
|
parameter_defaults:
|
|
NeutronTunnelTypes: ''
|
|
NeutronNetworkType: 'vlan'
|
|
NeutronNetworkVLANRanges:
|
|
- tenant:1300:1399
|
|
NovaSchedulerDefaultFilters:
|
|
- ...
|
|
- PciPassthroughFilter
|
|
- NUMATopologyFilter
|
|
ComputeVdpaParameters:
|
|
NovaPCIPassthrough:
|
|
- vendor_id: "15b3"
|
|
product_id: "101d"
|
|
address: "06:00.0"
|
|
physical_network: "tenant"
|
|
- vendor_id: "15b3"
|
|
product_id: "101d"
|
|
address: "06:00.1"
|
|
physical_network: "tenant"
|
|
KernelArgs: "[...] iommu=pt intel_iommu=on"
|
|
NeutronBridgeMappings:
|
|
- tenant:br-tenant
|
|
|
|
|
|
Network Config
|
|
--------------
|
|
vDPA supported network interfaces should be specified in the network config
|
|
templates as sriov_pf type. It should also be under an OVS bridge with a ``link_mode``
|
|
set to ``switchdev``
|
|
|
|
Example::
|
|
|
|
- type: ovs_bridge
|
|
name: br-tenant
|
|
members:
|
|
- type: sriov_pf
|
|
name: enp6s0f0
|
|
numvfs: 8
|
|
use_dhcp: false
|
|
vdpa: true
|
|
link_mode: switchdev
|
|
- type: sriov_pf
|
|
name: enp6s0f1
|
|
numvfs: 8
|
|
use_dhcp: false
|
|
vdpa: true
|
|
link_mode: switchdev
|
|
|
|
|
|
Network and Port Creation
|
|
-------------------------
|
|
|
|
When creating the network, it has to be mapped to the physical network::
|
|
|
|
$ openstack network create \
|
|
--provider-physical-network tenant \
|
|
--provider-network-type vlan \
|
|
--provider-segment 1337
|
|
vdpa_net1
|
|
|
|
$ openstack subnet create \
|
|
--network vdpa_net1 \
|
|
--subnet-range 192.0.2.0/24 \
|
|
--dhcp
|
|
vdpa_subnet1
|
|
|
|
To allocate a port from a vdpa-enabled NIC, create a neutron port and set the
|
|
``--vnic-type`` to ``vdpa``::
|
|
|
|
$ openstack port create --network vdpa_net1 \
|
|
--vnic-type=vdpa \
|
|
vdpa_direct_port1
|
|
|
|
Scheduling instances
|
|
--------------------
|
|
|
|
Normally, the ``PciPassthroughFilter`` is sufficient to ensure that a vDPA instance will
|
|
land on a vDPA host. If we want to prevent other instances from using a vDPA host, we need
|
|
to setup the `isolate-aggreate feature
|
|
<https://docs.openstack.org/nova/latest/reference/isolate-aggregates.html>`_.
|
|
|
|
Example::
|
|
|
|
$ openstack --os-placement-api-version 1.6 trait create CUSTOM_VDPA
|
|
$ openstack aggregate create \
|
|
--zone vdpa-az1 \
|
|
vdpa_ag1
|
|
$ openstack hypervisor list -c ID -c "Hypervisor Hostname" -f value | grep vdpa | \
|
|
while read l
|
|
do UUID=$(echo $l | cut -f 1 -d " ")
|
|
H_NAME=$(echo $l | cut -f 2 -d " ")
|
|
echo $H_NAME $UUID
|
|
openstack aggregate add host vdpa_ag1 $H_NAME
|
|
traits=$(openstack --os-placement-api-version 1.6 resource provider trait list \
|
|
-f value $UUID | sed 's/^/--trait /')
|
|
openstack --os-placement-api-version 1.6 resource provider trait set \
|
|
$traits --trait CUSTOM_VDPA $UUID
|
|
done
|
|
$ openstack --os-compute-api-version 2.53 aggregate set \
|
|
--property trait:CUSTOM_VDPA=required \
|
|
vdpa_ag1
|
|
|
|
The flavor will map to that new aggregate with the ``traits:CUSTOM_VDPA`` property::
|
|
|
|
$ openstack --os-compute-api-version 2.86 flavor create \
|
|
--ram 4096 \
|
|
--disk 10 \
|
|
--vcpus 2 \
|
|
--property hw:cpu_policy=dedicated \
|
|
--property hw:cpu_realtime=True \
|
|
--property hw:cpu_realtime_mask=^0 \
|
|
--property traits:CUSTOM_VDPA=required \
|
|
vdpa_pinned
|
|
|
|
.. note::
|
|
It's also important to have the ``hw:cpu_realtime*`` properties here since
|
|
``libvirt`` doesn't currently support the locking of guest memory.
|
|
|
|
|
|
This should launch an instance on one of the vDPA hosts::
|
|
|
|
$ openstack server create \
|
|
--image cirros \
|
|
--flavor vdpa_pinned \
|
|
--nic port-id=vdpa_direct_port1 \
|
|
vdpa_test_1
|
|
|
|
Validations
|
|
-----------
|
|
|
|
Confirm that a PCI device is in switchdev mode::
|
|
|
|
[root@computevdpa-0 ~]# devlink dev eswitch show pci/0000:06:00.0
|
|
pci/0000:06:00.0: mode switchdev inline-mode none encap enable
|
|
[root@computevdpa-0 ~]# devlink dev eswitch show pci/0000:06:00.1
|
|
pci/0000:06:00.1: mode switchdev inline-mode none encap enable
|
|
|
|
Verify if offload is enabled in OVS::
|
|
|
|
[root@computevdpa-0 ~]# ovs-vsctl get Open_vSwitch . other_config:hw-offload
|
|
"true"
|
|
|
|
Validate the interfaces are added to the tenant bridge::
|
|
|
|
[root@computevdpa-0 ~]# ovs-vsctl show
|
|
be82eb5b-94c3-449d-98c8-0961b6b6b4c4
|
|
Manager "ptcp:6640:127.0.0.1"
|
|
is_connected: true
|
|
[...]
|
|
Bridge br-tenant
|
|
Controller "tcp:127.0.0.1:6633"
|
|
is_connected: true
|
|
fail_mode: secure
|
|
datapath_type: system
|
|
Port br-tenant
|
|
Interface br-tenant
|
|
type: internal
|
|
Port enp6s0f0
|
|
Interface enp6s0f0
|
|
Port phy-br-tenant
|
|
Interface phy-br-tenant
|
|
type: patch
|
|
options: {peer=int-br-tenant}
|
|
Port enp6s0f1
|
|
Interface enp6s0f1
|
|
[...]
|
|
|
|
|
|
Verify if the NICs have ``hw-tc-offload`` enabled::
|
|
|
|
[root@computevdpa-0 ~]# for i in {0..1};do ethtool -k enp6s0f$i | grep tc-offload;done
|
|
hw-tc-offload: on
|
|
hw-tc-offload: on
|
|
|
|
Verify that the udev rules have been created::
|
|
|
|
[root@computevdpa-0 ~]# cat /etc/udev/rules.d/80-persistent-os-net-config.rules
|
|
# This file is autogenerated by os-net-config
|
|
SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}!="", ATTR{phys_port_name}=="pf*vf*", ENV{NM_UNMANAGED}="1"
|
|
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", KERNELS=="0000:06:00.0", NAME="enp6s0f0"
|
|
SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="80ecee0003723f04", ATTR{phys_port_name}=="pf0vf*", IMPORT{program}="/etc/udev/rep-link-name.sh $attr{phys_port_name}", NAME="enp6s0f0_$env{NUMBER}"
|
|
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", KERNELS=="0000:06:00.1", NAME="enp6s0f1"
|
|
SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="80ecee0003723f04", ATTR{phys_port_name}=="pf1vf*", IMPORT{program}="/etc/udev/rep-link-name.sh $attr{phys_port_name}", NAME="enp6s0f1_$env{NUMBER}"
|
|
|
|
|
|
Validate that the ``numvfs`` are correctly defined::
|
|
|
|
[root@computevdpa-0 ~]# cat /sys/class/net/enp6s0f0/device/sriov_numvfs
|
|
8
|
|
[root@computevdpa-0 ~]# cat /sys/class/net/enp6s0f1/device/sriov_numvfs
|
|
8
|
|
|
|
Validate that the ``pci/passthrough_whitelist`` contains all the PFs::
|
|
|
|
[root@computevdpa-0 ~]# grep ^passthrough_whitelist /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf
|
|
passthrough_whitelist={"address":"06:00.0","physical_network":"tenant","product_id":"101d","vendor_id":"15b3"}
|
|
passthrough_whitelist={"address":"06:00.1","physical_network":"tenant","product_id":"101d","vendor_id":"15b3"}
|
|
|
|
Verify the ``nodedev-list`` from ``libvirt``::
|
|
|
|
[root@computevdpa-0 ~]# podman exec -u0 nova_libvirt virsh nodedev-list | grep -P "pci_0000_06|enp6|vdpa"
|
|
net_enp6s0f0_04_3f_72_ee_ec_80
|
|
net_enp6s0f0_0_5a_86_bd_4b_06_d9
|
|
net_enp6s0f0_1_72_b9_6b_12_33_57
|
|
net_enp6s0f0_2_f6_f2_db_7c_52_90
|
|
net_enp6s0f0_3_66_e5_9e_b8_79_7f
|
|
net_enp6s0f0_4_32_04_6f_ef_ef_c3
|
|
net_enp6s0f0_5_a2_fe_8d_4a_95_64
|
|
net_enp6s0f0_6_8e_23_fa_bb_95_41
|
|
net_enp6s0f0_7_8a_9f_0f_53_f6_19
|
|
net_enp6s0f0v0_ee_a1_e2_4e_80_8d
|
|
net_enp6s0f0v1_ce_b7_e1_33_33_56
|
|
net_enp6s0f0v2_fe_91_a8_ee_2e_79
|
|
net_enp6s0f0v3_2a_34_e0_a0_e6_ff
|
|
net_enp6s0f0v4_26_59_82_da_65_4e
|
|
net_enp6s0f0v5_a6_fd_db_97_c6_8a
|
|
net_enp6s0f0v6_36_5d_5c_ff_e8_00
|
|
net_enp6s0f0v7_4e_23_6c_95_b6_a4
|
|
net_enp6s0f1_04_3f_72_ee_ec_81
|
|
net_enp6s0f1_0_0e_0c_86_b5_43_c1
|
|
net_enp6s0f1_1_be_f5_75_f4_da_b1
|
|
net_enp6s0f1_2_ea_6a_21_37_91_24
|
|
net_enp6s0f1_3_06_95_51_55_de_80
|
|
net_enp6s0f1_4_86_a4_d5_83_bd_56
|
|
net_enp6s0f1_5_86_d1_a9_ba_b7_f0
|
|
net_enp6s0f1_6_82_ae_32_56_07_84
|
|
net_enp6s0f1_7_62_b7_93_7e_5c_30
|
|
net_enp6s0f1v0_b2_b3_0d_bd_6f_5d
|
|
net_enp6s0f1v1_4a_24_a1_24_ae_39
|
|
net_enp6s0f1v2_8e_19_b2_aa_ae_d7
|
|
net_enp6s0f1v3_b6_e2_4b_fa_d8_f0
|
|
net_enp6s0f1v4_5e_31_7f_17_ee_4d
|
|
net_enp6s0f1v5_5e_77_99_09_1a_89
|
|
net_enp6s0f1v6_96_68_4b_70_c5_1b
|
|
net_enp6s0f1v7_c2_bb_14_95_81_29
|
|
pci_0000_06_00_0
|
|
pci_0000_06_00_1
|
|
pci_0000_06_00_2
|
|
pci_0000_06_00_3
|
|
pci_0000_06_00_4
|
|
pci_0000_06_00_5
|
|
pci_0000_06_00_6
|
|
pci_0000_06_00_7
|
|
pci_0000_06_01_0
|
|
pci_0000_06_01_1
|
|
pci_0000_06_01_2
|
|
pci_0000_06_01_3
|
|
pci_0000_06_01_4
|
|
pci_0000_06_01_5
|
|
pci_0000_06_01_6
|
|
pci_0000_06_01_7
|
|
pci_0000_06_02_0
|
|
pci_0000_06_02_1
|
|
vdpa_vdpa0
|
|
vdpa_vdpa1
|
|
vdpa_vdpa10
|
|
vdpa_vdpa11
|
|
vdpa_vdpa12
|
|
vdpa_vdpa13
|
|
vdpa_vdpa14
|
|
vdpa_vdpa15
|
|
vdpa_vdpa2
|
|
vdpa_vdpa3
|
|
vdpa_vdpa4
|
|
vdpa_vdpa5
|
|
vdpa_vdpa6
|
|
vdpa_vdpa7
|
|
vdpa_vdpa8
|
|
vdpa_vdpa9
|
|
|
|
|
|
Validate that the vDPA devices have been created, this should match the vdpa
|
|
devices from ``virsh nodedev-list``::
|
|
|
|
[root@computevdpa-0 ~]# ls -tlra /dev/vhost-vdpa-*
|
|
crw-------. 1 root root 241, 0 Jun 30 12:52 /dev/vhost-vdpa-0
|
|
crw-------. 1 root root 241, 1 Jun 30 12:52 /dev/vhost-vdpa-1
|
|
crw-------. 1 root root 241, 2 Jun 30 12:52 /dev/vhost-vdpa-2
|
|
crw-------. 1 root root 241, 3 Jun 30 12:52 /dev/vhost-vdpa-3
|
|
crw-------. 1 root root 241, 4 Jun 30 12:52 /dev/vhost-vdpa-4
|
|
crw-------. 1 root root 241, 5 Jun 30 12:53 /dev/vhost-vdpa-5
|
|
crw-------. 1 root root 241, 6 Jun 30 12:53 /dev/vhost-vdpa-6
|
|
crw-------. 1 root root 241, 7 Jun 30 12:53 /dev/vhost-vdpa-7
|
|
crw-------. 1 root root 241, 8 Jun 30 12:53 /dev/vhost-vdpa-8
|
|
crw-------. 1 root root 241, 9 Jun 30 12:53 /dev/vhost-vdpa-9
|
|
crw-------. 1 root root 241, 10 Jun 30 12:53 /dev/vhost-vdpa-10
|
|
crw-------. 1 root root 241, 11 Jun 30 12:53 /dev/vhost-vdpa-11
|
|
crw-------. 1 root root 241, 12 Jun 30 12:53 /dev/vhost-vdpa-12
|
|
crw-------. 1 root root 241, 13 Jun 30 12:53 /dev/vhost-vdpa-13
|
|
crw-------. 1 root root 241, 14 Jun 30 12:53 /dev/vhost-vdpa-14
|
|
crw-------. 1 root root 241, 15 Jun 30 12:53 /dev/vhost-vdpa-15
|
|
|
|
Validate the ``pci_devices`` table in the database from one of the controllers::
|
|
|
|
[root@controller-0 ~]# podman exec -u0 $(podman ps -q -f name=galera) mysql -t -D nova -e "select address,product_id,vendor_id,dev_type,dev_id from pci_devices where address like '0000:06:%';"
|
|
+--------------+------------+-----------+----------+------------------+
|
|
| address | product_id | vendor_id | dev_type | dev_id |
|
|
+--------------+------------+-----------+----------+------------------+
|
|
| 0000:06:00.0 | 101d | 15b3 | vdpa | pci_0000_06_00_0 |
|
|
| 0000:06:00.1 | 101d | 15b3 | vdpa | pci_0000_06_00_1 |
|
|
+--------------+------------+-----------+----------+------------------+
|
|
|
|
|
|
Other usefull commands for troubleshooting::
|
|
|
|
[root@computevdpa-0 ~]# ovs-appctl dpctl/dump-flows -m type=offloaded
|
|
[root@computevdpa-0 ~]# ovs-appctl dpctl/dump-flows -m
|
|
[root@computevdpa-0 ~]# tc filter show dev enp6s0f1_1 ingress
|
|
[root@computevdpa-0 ~]# tc -s filter show dev enp6s0f1_1 ingress
|
|
[root@computevdpa-0 ~]# tc monitor
|