Change-Id: Ia68d3469318dd423ceb0843c39d3ac2734a5f264
18 KiB
Deploying with vDPA Support
TripleO can deploy Overcloud nodes with vDPA support. A new role
ComputeVdpa
has been added to create a custom
roles_data.yaml
with composable vDPA role.
vDPA is very similar to SR-IOV and leverages the same Openstack components. It's important to note that vDPA can't function without OVS Hardware Offload.
Mellanox is the only NIC vendor currently supported with vDPA.
CentOS9/RHEL9 with a kernel of 5.14 or higher is required.
Execute below command to create the roles_data.yaml
:
openstack overcloud roles generate -o roles_data.yaml Controller ComputeVdpa
Once a roles file is created, the following changes are required:
- Deploy Command
- Parameters
- Network Config
- Network and Port creation
Deploy Command
Deploy command should include the generated roles data file from the above command.
Deploy command should also include the SR-IOV environment file to
include the neutron-sriov-agent
service. All the required
parameters are also specified in this environment file. The parameters
has to be configured according to the baremetal on which vDPA needs to
be enabled.
Also, vDPA requires mandatory kernel parameters to be set, like
intel_iommu=on iommu=pt
on Intel machines. In order to
enable the configuration of kernel parameters to the host, The
KernelArgs
role parameter has to be defined
accordingly.
Adding the following arguments to the
openstack overcloud deploy
command will do the trick:
openstack overcloud deploy --templates \
-r roles_data.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-sriov.yaml \
...
Parameters
Unlike SR-IOV, vDPA devices shouldn't be added to
NeutronPhysicalDevMappings
but to the
NovaPCIPassthrough
. The vDPA bridge should also be added to
the NeutronBridgeMappings
and the
physical_network
to the
NeutronNetworkVLANRanges
.
The parameter KernelArgs
should be provided in the
deployment environment file, with the set of kernel boot parameters to
be applied on the ComputeVdpa
role where vDPA is
enabled.
The PciPassthroughFilter
is required for vDPA. The
NUMATopologyFilter
will become optional when
libvirt
will support the locking of the guest memory. At
this time, it is mandatory to have it:
parameter_defaults:
NeutronTunnelTypes: ''
NeutronNetworkType: 'vlan'
NeutronNetworkVLANRanges:
- tenant:1300:1399
NovaSchedulerDefaultFilters:
- PciPassthroughFilter
- NUMATopologyFilter
- ...
ComputeVdpaParameters:
NovaPCIPassthrough:
- vendor_id: "15b3"
product_id: "101e"
address: "06:00.0"
physical_network: "tenant"
- vendor_id: "15b3"
product_id: "101e"
address: "06:00.1"
physical_network: "tenant"
KernelArgs: "[...] iommu=pt intel_iommu=on"
NeutronBridgeMappings:
- tenant:br-tenant
Note
It's important to use the product_id
of a VF device and
not a PF
06:00.1 Ethernet controller [0200]: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] [15b3:101d] 06:00.2 Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]
Network Config
vDPA supported network interfaces should be specified in the network
config templates as sriov_pf type. It should also be under an OVS bridge
with a link_mode
set to switchdev
Example:
- type: ovs_bridge
name: br-tenant
members:
- type: sriov_pf
name: enp6s0f0
numvfs: 8
use_dhcp: false
vdpa: true
link_mode: switchdev
- type: sriov_pf
name: enp6s0f1
numvfs: 8
use_dhcp: false
vdpa: true
link_mode: switchdev
Network and Port Creation
When creating the network, it has to be mapped to the physical network:
$ openstack network create \
--provider-physical-network tenant \
--provider-network-type vlan \
--provider-segment 1337 \
vdpa_net1
$ openstack subnet create \
--network vdpa_net1 \
--subnet-range 192.0.2.0/24 \
--dhcp \
vdpa_subnet1
To allocate a port from a vdpa-enabled NIC, create a neutron port and
set the --vnic-type
to vdpa
:
$ openstack port create --network vdpa_net1 \
--vnic-type=vdpa \
vdpa_direct_port1
Scheduling instances
Normally, the PciPassthroughFilter
is sufficient to
ensure that a vDPA instance will land on a vDPA host. If we want to
prevent other instances from using a vDPA host, we need to setup the isolate-aggregate
feature.
Example:
$ openstack --os-placement-api-version 1.6 trait create CUSTOM_VDPA
$ openstack aggregate create \
--zone vdpa-az1 \
vdpa_ag1
$ openstack hypervisor list -c ID -c "Hypervisor Hostname" -f value | grep vdpa | \
while read l
do UUID=$(echo $l | cut -f 1 -d " ")
H_NAME=$(echo $l | cut -f 2 -d " ")
echo $H_NAME $UUID
openstack aggregate add host vdpa_ag1 $H_NAME
traits=$(openstack --os-placement-api-version 1.6 resource provider trait list \
-f value $UUID | sed 's/^/--trait /')
openstack --os-placement-api-version 1.6 resource provider trait set \
$traits --trait CUSTOM_VDPA $UUID
done
$ openstack --os-compute-api-version 2.53 aggregate set \
--property trait:CUSTOM_VDPA=required \
vdpa_ag1
The flavor will map to that new aggregate with the
trait:CUSTOM_VDPA
property:
$ openstack --os-compute-api-version 2.86 flavor create \
--ram 4096 \
--disk 10 \
--vcpus 2 \
--property hw:cpu_policy=dedicated \
--property hw:cpu_realtime=True \
--property hw:cpu_realtime_mask=^0 \
--property trait:CUSTOM_VDPA=required \
vdpa_pinned
Note
It's also important to have the hw:cpu_realtime*
properties here since libvirt
doesn't currently support the
locking of guest memory.
This should launch an instance on one of the vDPA hosts:
$ openstack server create \
--image cirros \
--flavor vdpa_pinned \
--nic port-id=vdpa_direct_port1 \
vdpa_test_1
Validations
Confirm that a PCI device is in switchdev mode:
[root@computevdpa-0 ~]# devlink dev eswitch show pci/0000:06:00.0
pci/0000:06:00.0: mode switchdev inline-mode none encap-mode basic
[root@computevdpa-0 ~]# devlink dev eswitch show pci/0000:06:00.1
pci/0000:06:00.1: mode switchdev inline-mode none encap-mode basic
Verify if offload is enabled in OVS:
[root@computevdpa-0 ~]# ovs-vsctl get Open_vSwitch . other_config:hw-offload
"true"
Validate the interfaces are added to the tenant bridge:
[root@computevdpa-0 ~]# ovs-vsctl show
be82eb5b-94c3-449d-98c8-0961b6b6b4c4
Manager "ptcp:6640:127.0.0.1"
is_connected: true
[...]
Bridge br-tenant
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
datapath_type: system
Port br-tenant
Interface br-tenant
type: internal
Port enp6s0f0
Interface enp6s0f0
Port phy-br-tenant
Interface phy-br-tenant
type: patch
options: {peer=int-br-tenant}
Port enp6s0f1
Interface enp6s0f1
[...]
Verify if the NICs have hw-tc-offload
enabled:
[root@computevdpa-0 ~]# for i in {0..1};do ethtool -k enp6s0f$i | grep tc-offload;done
hw-tc-offload: on
hw-tc-offload: on
Verify that the udev rules have been created:
[root@computevdpa-0 ~]# cat /etc/udev/rules.d/80-persistent-os-net-config.rules
# This file is autogenerated by os-net-config
SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}!="", ATTR{phys_port_name}=="pf*vf*", ENV{NM_UNMANAGED}="1"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", KERNELS=="0000:06:00.0", NAME="enp6s0f0"
SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="80ecee0003723f04", ATTR{phys_port_name}=="pf0vf*", IMPORT{program}="/etc/udev/rep-link-name.sh $attr{phys_port_name}", NAME="enp6s0f0_$env{NUMBER}"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", KERNELS=="0000:06:00.1", NAME="enp6s0f1"
SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="80ecee0003723f04", ATTR{phys_port_name}=="pf1vf*", IMPORT{program}="/etc/udev/rep-link-name.sh $attr{phys_port_name}", NAME="enp6s0f1_$env{NUMBER}"
Validate that the numvfs
are correctly defined:
[root@computevdpa-0 ~]# cat /sys/class/net/enp6s0f0/device/sriov_numvfs
8
[root@computevdpa-0 ~]# cat /sys/class/net/enp6s0f1/device/sriov_numvfs
8
Validate that the pci/passthrough_whitelist
contains all
the PFs:
[root@computevdpa-0 ~]# grep ^passthrough_whitelist /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf
passthrough_whitelist={"address":"06:00.0","physical_network":"tenant","product_id":"101d","vendor_id":"15b3"}
passthrough_whitelist={"address":"06:00.1","physical_network":"tenant","product_id":"101d","vendor_id":"15b3"}
Verify the nodedev-list
from libvirt
:
[root@computevdpa-0 ~]# podman exec -u0 nova_virtqemud virsh -c qemu:///system nodedev-list | grep -P "pci_0000_06|enp6|vdpa"
net_enp6s0f0np0_04_3f_72_ee_ec_84
net_enp6s0f0np0_0_1a_c1_a5_25_94_ef
net_enp6s0f0np0_1_3a_dc_1d_36_85_af
net_enp6s0f0np0_2_6a_95_0c_e9_8f_1a
net_enp6s0f0np0_3_ba_c8_5b_f5_70_cc
net_enp6s0f0np0_4_9e_03_86_23_cd_65
net_enp6s0f0np0_5_0a_5c_8b_c4_00_7a
net_enp6s0f0np0_6_2e_f6_bc_e6_6f_cd
net_enp6s0f0np0_7_ce_1e_b2_20_5e_15
net_enp6s0f1np1_04_3f_72_ee_ec_85
net_enp6s0f1np1_0_a6_04_9e_5a_cd_3b
net_enp6s0f1np1_1_56_5d_59_b0_df_17
net_enp6s0f1np1_2_de_ac_7c_3f_19_b1
net_enp6s0f1np1_3_16_0c_8c_47_40_5c
net_enp6s0f1np1_4_0e_a6_15_f5_68_77
net_enp6s0f1np1_5_e2_73_dc_f9_c2_46
net_enp6s0f1np1_6_e6_13_57_c9_cf_0f
net_enp6s0f1np1_7_62_10_4f_2b_1b_ae
net_vdpa06p00vf2_42_11_c8_97_aa_43
net_vdpa06p00vf3_2a_59_5e_32_3e_b7
net_vdpa06p00vf4_9a_5c_3f_c9_cc_42
net_vdpa06p00vf5_26_73_2a_e3_db_f9
net_vdpa06p00vf6_9a_bf_a9_e9_6b_06
net_vdpa06p00vf7_d2_1f_cc_00_a9_95
net_vdpa06p01vf0_ba_81_cb_7e_01_1d
net_vdpa06p01vf1_56_95_fa_5e_4a_51
net_vdpa06p01vf2_72_53_64_8d_12_98
net_vdpa06p01vf3_9e_ff_1d_6d_c1_4e
net_vdpa06p01vf4_96_20_f3_b1_69_ef
net_vdpa06p01vf5_ea_0c_8b_0b_3f_ff
net_vdpa06p01vf6_0a_53_4e_94_e0_8b
net_vdpa06p01vf7_16_84_48_e6_74_59
net_vdpa06p02vf0_b2_cc_fa_16_f0_52
net_vdpa06p02vf1_0a_12_1b_a2_1a_d3
pci_0000_06_00_0
pci_0000_06_00_1
pci_0000_06_00_2
pci_0000_06_00_3
pci_0000_06_00_4
pci_0000_06_00_5
pci_0000_06_00_6
pci_0000_06_00_7
pci_0000_06_01_0
pci_0000_06_01_1
pci_0000_06_01_2
pci_0000_06_01_3
pci_0000_06_01_4
pci_0000_06_01_5
pci_0000_06_01_6
pci_0000_06_01_7
pci_0000_06_02_0
pci_0000_06_02_1
vdpa_0000_06_00_2
vdpa_0000_06_00_3
vdpa_0000_06_00_4
vdpa_0000_06_00_5
vdpa_0000_06_00_6
vdpa_0000_06_00_7
vdpa_0000_06_01_0
vdpa_0000_06_01_1
vdpa_0000_06_01_2
vdpa_0000_06_01_3
vdpa_0000_06_01_4
vdpa_0000_06_01_5
vdpa_0000_06_01_6
vdpa_0000_06_01_7
vdpa_0000_06_02_0
vdpa_0000_06_02_1
Validate that the vDPA devices have been created, this should match
the vdpa devices from virsh nodedev-list
:
[root@computevdpa-0 ~]# ls -tlra /dev/vhost-vdpa-*
crw-------. 1 root root 241, 0 Jun 30 12:52 /dev/vhost-vdpa-0
crw-------. 1 root root 241, 1 Jun 30 12:52 /dev/vhost-vdpa-1
crw-------. 1 root root 241, 2 Jun 30 12:52 /dev/vhost-vdpa-2
crw-------. 1 root root 241, 3 Jun 30 12:52 /dev/vhost-vdpa-3
crw-------. 1 root root 241, 4 Jun 30 12:52 /dev/vhost-vdpa-4
crw-------. 1 root root 241, 5 Jun 30 12:53 /dev/vhost-vdpa-5
crw-------. 1 root root 241, 6 Jun 30 12:53 /dev/vhost-vdpa-6
crw-------. 1 root root 241, 7 Jun 30 12:53 /dev/vhost-vdpa-7
crw-------. 1 root root 241, 8 Jun 30 12:53 /dev/vhost-vdpa-8
crw-------. 1 root root 241, 9 Jun 30 12:53 /dev/vhost-vdpa-9
crw-------. 1 root root 241, 10 Jun 30 12:53 /dev/vhost-vdpa-10
crw-------. 1 root root 241, 11 Jun 30 12:53 /dev/vhost-vdpa-11
crw-------. 1 root root 241, 12 Jun 30 12:53 /dev/vhost-vdpa-12
crw-------. 1 root root 241, 13 Jun 30 12:53 /dev/vhost-vdpa-13
crw-------. 1 root root 241, 14 Jun 30 12:53 /dev/vhost-vdpa-14
crw-------. 1 root root 241, 15 Jun 30 12:53 /dev/vhost-vdpa-15
Validate the pci_devices
table in the database from one
of the controllers:
[root@controller-2 neutron]# podman exec -u0 $(podman ps -q -f name=galera) mysql -t -D nova -e "select address,product_id,vendor_id,dev_type,dev_id from pci_devices where address like '0000:06:%' and deleted=0;"
+--------------+------------+-----------+----------+------------------+
| address | product_id | vendor_id | dev_type | dev_id |
+--------------+------------+-----------+----------+------------------+
| 0000:06:01.1 | 101e | 15b3 | vdpa | pci_0000_06_01_1 |
| 0000:06:00.2 | 101e | 15b3 | vdpa | pci_0000_06_00_2 |
| 0000:06:00.3 | 101e | 15b3 | vdpa | pci_0000_06_00_3 |
| 0000:06:00.4 | 101e | 15b3 | vdpa | pci_0000_06_00_4 |
| 0000:06:00.5 | 101e | 15b3 | vdpa | pci_0000_06_00_5 |
| 0000:06:00.6 | 101e | 15b3 | vdpa | pci_0000_06_00_6 |
| 0000:06:00.7 | 101e | 15b3 | vdpa | pci_0000_06_00_7 |
| 0000:06:01.0 | 101e | 15b3 | vdpa | pci_0000_06_01_0 |
| 0000:06:01.2 | 101e | 15b3 | vdpa | pci_0000_06_01_2 |
| 0000:06:01.3 | 101e | 15b3 | vdpa | pci_0000_06_01_3 |
| 0000:06:01.4 | 101e | 15b3 | vdpa | pci_0000_06_01_4 |
| 0000:06:01.5 | 101e | 15b3 | vdpa | pci_0000_06_01_5 |
| 0000:06:01.6 | 101e | 15b3 | vdpa | pci_0000_06_01_6 |
| 0000:06:01.7 | 101e | 15b3 | vdpa | pci_0000_06_01_7 |
| 0000:06:02.0 | 101e | 15b3 | vdpa | pci_0000_06_02_0 |
| 0000:06:02.1 | 101e | 15b3 | vdpa | pci_0000_06_02_1 |
| 0000:06:00.2 | 101e | 15b3 | vdpa | pci_0000_06_00_2 |
| 0000:06:00.3 | 101e | 15b3 | vdpa | pci_0000_06_00_3 |
| 0000:06:00.4 | 101e | 15b3 | vdpa | pci_0000_06_00_4 |
| 0000:06:00.5 | 101e | 15b3 | vdpa | pci_0000_06_00_5 |
| 0000:06:00.6 | 101e | 15b3 | vdpa | pci_0000_06_00_6 |
| 0000:06:00.7 | 101e | 15b3 | vdpa | pci_0000_06_00_7 |
| 0000:06:01.0 | 101e | 15b3 | vdpa | pci_0000_06_01_0 |
| 0000:06:01.1 | 101e | 15b3 | vdpa | pci_0000_06_01_1 |
| 0000:06:01.2 | 101e | 15b3 | vdpa | pci_0000_06_01_2 |
| 0000:06:01.3 | 101e | 15b3 | vdpa | pci_0000_06_01_3 |
| 0000:06:01.4 | 101e | 15b3 | vdpa | pci_0000_06_01_4 |
| 0000:06:01.5 | 101e | 15b3 | vdpa | pci_0000_06_01_5 |
| 0000:06:01.6 | 101e | 15b3 | vdpa | pci_0000_06_01_6 |
| 0000:06:01.7 | 101e | 15b3 | vdpa | pci_0000_06_01_7 |
| 0000:06:02.0 | 101e | 15b3 | vdpa | pci_0000_06_02_0 |
| 0000:06:02.1 | 101e | 15b3 | vdpa | pci_0000_06_02_1 |
+--------------+------------+-----------+----------+------------------+
The vdpa
command:
[root@computevdpa-0 ~]# vdpa dev
0000:06:01.0: type network mgmtdev pci/0000:06:01.0 vendor_id 5555 max_vqs 16 max_vq_size 256
0000:06:00.6: type network mgmtdev pci/0000:06:00.6 vendor_id 5555 max_vqs 16 max_vq_size 256
0000:06:00.4: type network mgmtdev pci/0000:06:00.4 vendor_id 5555 max_vqs 16 max_vq_size 256
0000:06:00.2: type network mgmtdev pci/0000:06:00.2 vendor_id 5555 max_vqs 16 max_vq_size 256
0000:06:01.1: type network mgmtdev pci/0000:06:01.1 vendor_id 5555 max_vqs 16 max_vq_size 256
0000:06:00.7: type network mgmtdev pci/0000:06:00.7 vendor_id 5555 max_vqs 16 max_vq_size 256
0000:06:00.5: type network mgmtdev pci/0000:06:00.5 vendor_id 5555 max_vqs 16 max_vq_size 256
0000:06:00.3: type network mgmtdev pci/0000:06:00.3 vendor_id 5555 max_vqs 16 max_vq_size 256
0000:06:02.0: type network mgmtdev pci/0000:06:02.0 vendor_id 5555 max_vqs 16 max_vq_size 256
0000:06:01.6: type network mgmtdev pci/0000:06:01.6 vendor_id 5555 max_vqs 16 max_vq_size 256
0000:06:01.4: type network mgmtdev pci/0000:06:01.4 vendor_id 5555 max_vqs 16 max_vq_size 256
0000:06:01.2: type network mgmtdev pci/0000:06:01.2 vendor_id 5555 max_vqs 16 max_vq_size 256
0000:06:02.1: type network mgmtdev pci/0000:06:02.1 vendor_id 5555 max_vqs 16 max_vq_size 256
0000:06:01.7: type network mgmtdev pci/0000:06:01.7 vendor_id 5555 max_vqs 16 max_vq_size 256
0000:06:01.5: type network mgmtdev pci/0000:06:01.5 vendor_id 5555 max_vqs 16 max_vq_size 256
0000:06:01.3: type network mgmtdev pci/0000:06:01.3 vendor_id 5555 max_vqs 16 max_vq_size 256
Validating the OVN agents:
(overcloud) [stack@undercloud-0 ~]$ openstack network agent list --host computevdpa-0.home.arpa
+--------------------------------------+----------------------+-------------------------+-------------------+-------+-------+----------------------------+
| ID | Agent Type | Host | Availability Zone | Alive | State | Binary |
+--------------------------------------+----------------------+-------------------------+-------------------+-------+-------+----------------------------+
| ef2e6ced-e723-449c-bbf8-7513709f33ea | OVN Controller agent | computevdpa-0.home.arpa | | :-) | UP | ovn-controller |
| 7be39049-db5b-54fc-add1-4a0687160542 | OVN Metadata agent | computevdpa-0.home.arpa | | :-) | UP | neutron-ovn-metadata-agent |
+--------------------------------------+----------------------+-------------------------+-------------------+-------+-------+----------------------------+
Other useful commands for troubleshooting:
[root@computevdpa-0 ~]# ovs-appctl dpctl/dump-flows -m type=offloaded
[root@computevdpa-0 ~]# ovs-appctl dpctl/dump-flows -m
[root@computevdpa-0 ~]# tc filter show dev enp6s0f1_1 ingress
[root@computevdpa-0 ~]# tc -s filter show dev enp6s0f1_1 ingress
[root@computevdpa-0 ~]# tc monitor