Deploying with vDPA Support =============================== TripleO can deploy Overcloud nodes with vDPA support. A new role ``ComputeVdpa`` has been added to create a custom ``roles_data.yaml`` with composable vDPA role. vDPA is very similar to SR-IOV and leverages the same Openstack components. It's important to note that vDPA can't function without OVS Hardware Offload. Mellanox is the only NIC vendor currently supported with vDPA. CentOS9/RHEL9 with a kernel of 5.14 or higher is required. Execute below command to create the ``roles_data.yaml``:: openstack overcloud roles generate -o roles_data.yaml Controller ComputeVdpa Once a roles file is created, the following changes are required: - Deploy Command - Parameters - Network Config - Network and Port creation Deploy Command ---------------- Deploy command should include the generated roles data file from the above command. Deploy command should also include the SR-IOV environment file to include the ``neutron-sriov-agent`` service. All the required parameters are also specified in this environment file. The parameters has to be configured according to the baremetal on which vDPA needs to be enabled. Also, vDPA requires mandatory kernel parameters to be set, like ``intel_iommu=on iommu=pt`` on Intel machines. In order to enable the configuration of kernel parametres to the host, The ``KernelArgs`` role parameter has to be defined accordingly. Adding the following arguments to the ``openstack overcloud deploy`` command will do the trick:: openstack overcloud deploy --templates \ -r roles_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-sriov.yaml \ ... Parameters ---------- Unlike SR-IOV, vDPA devices shouldn't be added to ``NeutronPhysicalDevMappings`` but to the ``NovaPCIPassthrough``. The vDPA bridge should also be added to the ``NeutronBridgeMappings`` and the ``physical_network`` to the ``NeutronNetworkVLANRanges``. The parameter ``KernelArgs`` should be provided in the deployment environment file, with the set of kernel boot parameters to be applied on the ``ComputeVdpa`` role where vDPA is enabled. The ``PciPassthroughFilter`` is required for vDPA. The ``NUMATopologyFilter`` will become optional when ``libvirt`` will support the locking of the guest memory. At this time, it is mandatory to have it:: parameter_defaults: NeutronTunnelTypes: '' NeutronNetworkType: 'vlan' NeutronNetworkVLANRanges: - tenant:1300:1399 NovaSchedulerDefaultFilters: - PciPassthroughFilter - NUMATopologyFilter - ... ComputeVdpaParameters: NovaPCIPassthrough: - vendor_id: "15b3" product_id: "101e" address: "06:00.0" physical_network: "tenant" - vendor_id: "15b3" product_id: "101e" address: "06:00.1" physical_network: "tenant" KernelArgs: "[...] iommu=pt intel_iommu=on" NeutronBridgeMappings: - tenant:br-tenant .. note:: It's important to use the ``product_id`` of a VF device and not a PF 06:00.1 Ethernet controller [0200]: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] [15b3:101d] 06:00.2 Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e] Network Config -------------- vDPA supported network interfaces should be specified in the network config templates as sriov_pf type. It should also be under an OVS bridge with a ``link_mode`` set to ``switchdev`` Example:: - type: ovs_bridge name: br-tenant members: - type: sriov_pf name: enp6s0f0 numvfs: 8 use_dhcp: false vdpa: true link_mode: switchdev - type: sriov_pf name: enp6s0f1 numvfs: 8 use_dhcp: false vdpa: true link_mode: switchdev Network and Port Creation ------------------------- When creating the network, it has to be mapped to the physical network:: $ openstack network create \ --provider-physical-network tenant \ --provider-network-type vlan \ --provider-segment 1337 \ vdpa_net1 $ openstack subnet create \ --network vdpa_net1 \ --subnet-range 192.0.2.0/24 \ --dhcp \ vdpa_subnet1 To allocate a port from a vdpa-enabled NIC, create a neutron port and set the ``--vnic-type`` to ``vdpa``:: $ openstack port create --network vdpa_net1 \ --vnic-type=vdpa \ vdpa_direct_port1 Scheduling instances -------------------- Normally, the ``PciPassthroughFilter`` is sufficient to ensure that a vDPA instance will land on a vDPA host. If we want to prevent other instances from using a vDPA host, we need to setup the `isolate-aggreate feature `_. Example:: $ openstack --os-placement-api-version 1.6 trait create CUSTOM_VDPA $ openstack aggregate create \ --zone vdpa-az1 \ vdpa_ag1 $ openstack hypervisor list -c ID -c "Hypervisor Hostname" -f value | grep vdpa | \ while read l do UUID=$(echo $l | cut -f 1 -d " ") H_NAME=$(echo $l | cut -f 2 -d " ") echo $H_NAME $UUID openstack aggregate add host vdpa_ag1 $H_NAME traits=$(openstack --os-placement-api-version 1.6 resource provider trait list \ -f value $UUID | sed 's/^/--trait /') openstack --os-placement-api-version 1.6 resource provider trait set \ $traits --trait CUSTOM_VDPA $UUID done $ openstack --os-compute-api-version 2.53 aggregate set \ --property trait:CUSTOM_VDPA=required \ vdpa_ag1 The flavor will map to that new aggregate with the ``trait:CUSTOM_VDPA`` property:: $ openstack --os-compute-api-version 2.86 flavor create \ --ram 4096 \ --disk 10 \ --vcpus 2 \ --property hw:cpu_policy=dedicated \ --property hw:cpu_realtime=True \ --property hw:cpu_realtime_mask=^0 \ --property trait:CUSTOM_VDPA=required \ vdpa_pinned .. note:: It's also important to have the ``hw:cpu_realtime*`` properties here since ``libvirt`` doesn't currently support the locking of guest memory. This should launch an instance on one of the vDPA hosts:: $ openstack server create \ --image cirros \ --flavor vdpa_pinned \ --nic port-id=vdpa_direct_port1 \ vdpa_test_1 Validations ----------- Confirm that a PCI device is in switchdev mode:: [root@computevdpa-0 ~]# devlink dev eswitch show pci/0000:06:00.0 pci/0000:06:00.0: mode switchdev inline-mode none encap-mode basic [root@computevdpa-0 ~]# devlink dev eswitch show pci/0000:06:00.1 pci/0000:06:00.1: mode switchdev inline-mode none encap-mode basic Verify if offload is enabled in OVS:: [root@computevdpa-0 ~]# ovs-vsctl get Open_vSwitch . other_config:hw-offload "true" Validate the interfaces are added to the tenant bridge:: [root@computevdpa-0 ~]# ovs-vsctl show be82eb5b-94c3-449d-98c8-0961b6b6b4c4 Manager "ptcp:6640:127.0.0.1" is_connected: true [...] Bridge br-tenant Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: system Port br-tenant Interface br-tenant type: internal Port enp6s0f0 Interface enp6s0f0 Port phy-br-tenant Interface phy-br-tenant type: patch options: {peer=int-br-tenant} Port enp6s0f1 Interface enp6s0f1 [...] Verify if the NICs have ``hw-tc-offload`` enabled:: [root@computevdpa-0 ~]# for i in {0..1};do ethtool -k enp6s0f$i | grep tc-offload;done hw-tc-offload: on hw-tc-offload: on Verify that the udev rules have been created:: [root@computevdpa-0 ~]# cat /etc/udev/rules.d/80-persistent-os-net-config.rules # This file is autogenerated by os-net-config SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}!="", ATTR{phys_port_name}=="pf*vf*", ENV{NM_UNMANAGED}="1" SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", KERNELS=="0000:06:00.0", NAME="enp6s0f0" SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="80ecee0003723f04", ATTR{phys_port_name}=="pf0vf*", IMPORT{program}="/etc/udev/rep-link-name.sh $attr{phys_port_name}", NAME="enp6s0f0_$env{NUMBER}" SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", KERNELS=="0000:06:00.1", NAME="enp6s0f1" SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="80ecee0003723f04", ATTR{phys_port_name}=="pf1vf*", IMPORT{program}="/etc/udev/rep-link-name.sh $attr{phys_port_name}", NAME="enp6s0f1_$env{NUMBER}" Validate that the ``numvfs`` are correctly defined:: [root@computevdpa-0 ~]# cat /sys/class/net/enp6s0f0/device/sriov_numvfs 8 [root@computevdpa-0 ~]# cat /sys/class/net/enp6s0f1/device/sriov_numvfs 8 Validate that the ``pci/passthrough_whitelist`` contains all the PFs:: [root@computevdpa-0 ~]# grep ^passthrough_whitelist /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf passthrough_whitelist={"address":"06:00.0","physical_network":"tenant","product_id":"101d","vendor_id":"15b3"} passthrough_whitelist={"address":"06:00.1","physical_network":"tenant","product_id":"101d","vendor_id":"15b3"} Verify the ``nodedev-list`` from ``libvirt``:: [root@computevdpa-0 ~]# podman exec -u0 nova_virtqemud virsh -c qemu:///system nodedev-list | grep -P "pci_0000_06|enp6|vdpa" net_enp6s0f0np0_04_3f_72_ee_ec_84 net_enp6s0f0np0_0_1a_c1_a5_25_94_ef net_enp6s0f0np0_1_3a_dc_1d_36_85_af net_enp6s0f0np0_2_6a_95_0c_e9_8f_1a net_enp6s0f0np0_3_ba_c8_5b_f5_70_cc net_enp6s0f0np0_4_9e_03_86_23_cd_65 net_enp6s0f0np0_5_0a_5c_8b_c4_00_7a net_enp6s0f0np0_6_2e_f6_bc_e6_6f_cd net_enp6s0f0np0_7_ce_1e_b2_20_5e_15 net_enp6s0f1np1_04_3f_72_ee_ec_85 net_enp6s0f1np1_0_a6_04_9e_5a_cd_3b net_enp6s0f1np1_1_56_5d_59_b0_df_17 net_enp6s0f1np1_2_de_ac_7c_3f_19_b1 net_enp6s0f1np1_3_16_0c_8c_47_40_5c net_enp6s0f1np1_4_0e_a6_15_f5_68_77 net_enp6s0f1np1_5_e2_73_dc_f9_c2_46 net_enp6s0f1np1_6_e6_13_57_c9_cf_0f net_enp6s0f1np1_7_62_10_4f_2b_1b_ae net_vdpa06p00vf2_42_11_c8_97_aa_43 net_vdpa06p00vf3_2a_59_5e_32_3e_b7 net_vdpa06p00vf4_9a_5c_3f_c9_cc_42 net_vdpa06p00vf5_26_73_2a_e3_db_f9 net_vdpa06p00vf6_9a_bf_a9_e9_6b_06 net_vdpa06p00vf7_d2_1f_cc_00_a9_95 net_vdpa06p01vf0_ba_81_cb_7e_01_1d net_vdpa06p01vf1_56_95_fa_5e_4a_51 net_vdpa06p01vf2_72_53_64_8d_12_98 net_vdpa06p01vf3_9e_ff_1d_6d_c1_4e net_vdpa06p01vf4_96_20_f3_b1_69_ef net_vdpa06p01vf5_ea_0c_8b_0b_3f_ff net_vdpa06p01vf6_0a_53_4e_94_e0_8b net_vdpa06p01vf7_16_84_48_e6_74_59 net_vdpa06p02vf0_b2_cc_fa_16_f0_52 net_vdpa06p02vf1_0a_12_1b_a2_1a_d3 pci_0000_06_00_0 pci_0000_06_00_1 pci_0000_06_00_2 pci_0000_06_00_3 pci_0000_06_00_4 pci_0000_06_00_5 pci_0000_06_00_6 pci_0000_06_00_7 pci_0000_06_01_0 pci_0000_06_01_1 pci_0000_06_01_2 pci_0000_06_01_3 pci_0000_06_01_4 pci_0000_06_01_5 pci_0000_06_01_6 pci_0000_06_01_7 pci_0000_06_02_0 pci_0000_06_02_1 vdpa_0000_06_00_2 vdpa_0000_06_00_3 vdpa_0000_06_00_4 vdpa_0000_06_00_5 vdpa_0000_06_00_6 vdpa_0000_06_00_7 vdpa_0000_06_01_0 vdpa_0000_06_01_1 vdpa_0000_06_01_2 vdpa_0000_06_01_3 vdpa_0000_06_01_4 vdpa_0000_06_01_5 vdpa_0000_06_01_6 vdpa_0000_06_01_7 vdpa_0000_06_02_0 vdpa_0000_06_02_1 Validate that the vDPA devices have been created, this should match the vdpa devices from ``virsh nodedev-list``:: [root@computevdpa-0 ~]# ls -tlra /dev/vhost-vdpa-* crw-------. 1 root root 241, 0 Jun 30 12:52 /dev/vhost-vdpa-0 crw-------. 1 root root 241, 1 Jun 30 12:52 /dev/vhost-vdpa-1 crw-------. 1 root root 241, 2 Jun 30 12:52 /dev/vhost-vdpa-2 crw-------. 1 root root 241, 3 Jun 30 12:52 /dev/vhost-vdpa-3 crw-------. 1 root root 241, 4 Jun 30 12:52 /dev/vhost-vdpa-4 crw-------. 1 root root 241, 5 Jun 30 12:53 /dev/vhost-vdpa-5 crw-------. 1 root root 241, 6 Jun 30 12:53 /dev/vhost-vdpa-6 crw-------. 1 root root 241, 7 Jun 30 12:53 /dev/vhost-vdpa-7 crw-------. 1 root root 241, 8 Jun 30 12:53 /dev/vhost-vdpa-8 crw-------. 1 root root 241, 9 Jun 30 12:53 /dev/vhost-vdpa-9 crw-------. 1 root root 241, 10 Jun 30 12:53 /dev/vhost-vdpa-10 crw-------. 1 root root 241, 11 Jun 30 12:53 /dev/vhost-vdpa-11 crw-------. 1 root root 241, 12 Jun 30 12:53 /dev/vhost-vdpa-12 crw-------. 1 root root 241, 13 Jun 30 12:53 /dev/vhost-vdpa-13 crw-------. 1 root root 241, 14 Jun 30 12:53 /dev/vhost-vdpa-14 crw-------. 1 root root 241, 15 Jun 30 12:53 /dev/vhost-vdpa-15 Validate the ``pci_devices`` table in the database from one of the controllers:: [root@controller-2 neutron]# podman exec -u0 $(podman ps -q -f name=galera) mysql -t -D nova -e "select address,product_id,vendor_id,dev_type,dev_id from pci_devices where address like '0000:06:%' and deleted=0;" +--------------+------------+-----------+----------+------------------+ | address | product_id | vendor_id | dev_type | dev_id | +--------------+------------+-----------+----------+------------------+ | 0000:06:01.1 | 101e | 15b3 | vdpa | pci_0000_06_01_1 | | 0000:06:00.2 | 101e | 15b3 | vdpa | pci_0000_06_00_2 | | 0000:06:00.3 | 101e | 15b3 | vdpa | pci_0000_06_00_3 | | 0000:06:00.4 | 101e | 15b3 | vdpa | pci_0000_06_00_4 | | 0000:06:00.5 | 101e | 15b3 | vdpa | pci_0000_06_00_5 | | 0000:06:00.6 | 101e | 15b3 | vdpa | pci_0000_06_00_6 | | 0000:06:00.7 | 101e | 15b3 | vdpa | pci_0000_06_00_7 | | 0000:06:01.0 | 101e | 15b3 | vdpa | pci_0000_06_01_0 | | 0000:06:01.2 | 101e | 15b3 | vdpa | pci_0000_06_01_2 | | 0000:06:01.3 | 101e | 15b3 | vdpa | pci_0000_06_01_3 | | 0000:06:01.4 | 101e | 15b3 | vdpa | pci_0000_06_01_4 | | 0000:06:01.5 | 101e | 15b3 | vdpa | pci_0000_06_01_5 | | 0000:06:01.6 | 101e | 15b3 | vdpa | pci_0000_06_01_6 | | 0000:06:01.7 | 101e | 15b3 | vdpa | pci_0000_06_01_7 | | 0000:06:02.0 | 101e | 15b3 | vdpa | pci_0000_06_02_0 | | 0000:06:02.1 | 101e | 15b3 | vdpa | pci_0000_06_02_1 | | 0000:06:00.2 | 101e | 15b3 | vdpa | pci_0000_06_00_2 | | 0000:06:00.3 | 101e | 15b3 | vdpa | pci_0000_06_00_3 | | 0000:06:00.4 | 101e | 15b3 | vdpa | pci_0000_06_00_4 | | 0000:06:00.5 | 101e | 15b3 | vdpa | pci_0000_06_00_5 | | 0000:06:00.6 | 101e | 15b3 | vdpa | pci_0000_06_00_6 | | 0000:06:00.7 | 101e | 15b3 | vdpa | pci_0000_06_00_7 | | 0000:06:01.0 | 101e | 15b3 | vdpa | pci_0000_06_01_0 | | 0000:06:01.1 | 101e | 15b3 | vdpa | pci_0000_06_01_1 | | 0000:06:01.2 | 101e | 15b3 | vdpa | pci_0000_06_01_2 | | 0000:06:01.3 | 101e | 15b3 | vdpa | pci_0000_06_01_3 | | 0000:06:01.4 | 101e | 15b3 | vdpa | pci_0000_06_01_4 | | 0000:06:01.5 | 101e | 15b3 | vdpa | pci_0000_06_01_5 | | 0000:06:01.6 | 101e | 15b3 | vdpa | pci_0000_06_01_6 | | 0000:06:01.7 | 101e | 15b3 | vdpa | pci_0000_06_01_7 | | 0000:06:02.0 | 101e | 15b3 | vdpa | pci_0000_06_02_0 | | 0000:06:02.1 | 101e | 15b3 | vdpa | pci_0000_06_02_1 | +--------------+------------+-----------+----------+------------------+ The ``vdpa`` command:: [root@computevdpa-0 ~]# vdpa dev 0000:06:01.0: type network mgmtdev pci/0000:06:01.0 vendor_id 5555 max_vqs 16 max_vq_size 256 0000:06:00.6: type network mgmtdev pci/0000:06:00.6 vendor_id 5555 max_vqs 16 max_vq_size 256 0000:06:00.4: type network mgmtdev pci/0000:06:00.4 vendor_id 5555 max_vqs 16 max_vq_size 256 0000:06:00.2: type network mgmtdev pci/0000:06:00.2 vendor_id 5555 max_vqs 16 max_vq_size 256 0000:06:01.1: type network mgmtdev pci/0000:06:01.1 vendor_id 5555 max_vqs 16 max_vq_size 256 0000:06:00.7: type network mgmtdev pci/0000:06:00.7 vendor_id 5555 max_vqs 16 max_vq_size 256 0000:06:00.5: type network mgmtdev pci/0000:06:00.5 vendor_id 5555 max_vqs 16 max_vq_size 256 0000:06:00.3: type network mgmtdev pci/0000:06:00.3 vendor_id 5555 max_vqs 16 max_vq_size 256 0000:06:02.0: type network mgmtdev pci/0000:06:02.0 vendor_id 5555 max_vqs 16 max_vq_size 256 0000:06:01.6: type network mgmtdev pci/0000:06:01.6 vendor_id 5555 max_vqs 16 max_vq_size 256 0000:06:01.4: type network mgmtdev pci/0000:06:01.4 vendor_id 5555 max_vqs 16 max_vq_size 256 0000:06:01.2: type network mgmtdev pci/0000:06:01.2 vendor_id 5555 max_vqs 16 max_vq_size 256 0000:06:02.1: type network mgmtdev pci/0000:06:02.1 vendor_id 5555 max_vqs 16 max_vq_size 256 0000:06:01.7: type network mgmtdev pci/0000:06:01.7 vendor_id 5555 max_vqs 16 max_vq_size 256 0000:06:01.5: type network mgmtdev pci/0000:06:01.5 vendor_id 5555 max_vqs 16 max_vq_size 256 0000:06:01.3: type network mgmtdev pci/0000:06:01.3 vendor_id 5555 max_vqs 16 max_vq_size 256 Validating the OVN agents:: (overcloud) [stack@undercloud-0 ~]$ openstack network agent list --host computevdpa-0.home.arpa +--------------------------------------+----------------------+-------------------------+-------------------+-------+-------+----------------------------+ | ID | Agent Type | Host | Availability Zone | Alive | State | Binary | +--------------------------------------+----------------------+-------------------------+-------------------+-------+-------+----------------------------+ | ef2e6ced-e723-449c-bbf8-7513709f33ea | OVN Controller agent | computevdpa-0.home.arpa | | :-) | UP | ovn-controller | | 7be39049-db5b-54fc-add1-4a0687160542 | OVN Metadata agent | computevdpa-0.home.arpa | | :-) | UP | neutron-ovn-metadata-agent | +--------------------------------------+----------------------+-------------------------+-------------------+-------+-------+----------------------------+ Other usefull commands for troubleshooting:: [root@computevdpa-0 ~]# ovs-appctl dpctl/dump-flows -m type=offloaded [root@computevdpa-0 ~]# ovs-appctl dpctl/dump-flows -m [root@computevdpa-0 ~]# tc filter show dev enp6s0f1_1 ingress [root@computevdpa-0 ~]# tc -s filter show dev enp6s0f1_1 ingress [root@computevdpa-0 ~]# tc monitor