Merge "Adding the vDPA deployment guide"

This commit is contained in:
Zuul 2021-08-02 23:22:10 +00:00 committed by Gerrit Code Review
commit 5c64f155fc
2 changed files with 371 additions and 0 deletions

View File

@ -48,3 +48,4 @@ Documentation on additional features for |project|.
tls-everywhere
tuned
undercloud_minion
vdpa_deployment

View File

@ -0,0 +1,370 @@
Deploying with vDPA Support
===============================
TripleO can deploy Overcloud nodes with vDPA support. A new role ``ComputeVdpa``
has been added to create a custom ``roles_data.yaml`` with composable vDPA role.
vDPA is very similar to SR-IOV and leverages the same Openstack components. It's
important to note that vDPA can't function without OVS Hardware Offload.
Mellanox is the only NIC vendor currently supported with vDPA.
Execute below command to create the ``roles_data.yaml``::
openstack overcloud roles generate -o roles_data.yaml Controller ComputeVdpa
Once a roles file is created, the following changes are required:
- Deploy Command
- Parameters
- Network Config
- Network and Port creation
Deploy Command
----------------
Deploy command should include the generated roles data file from the above
command.
Deploy command should also include the SR-IOV environment file to include the
``neutron-sriov-agent`` service. All the required parameters are also specified
in this environment file. The parameters has to be configured according to the
baremetal on which vDPA needs to be enabled.
Also, vDPA requires mandatory kernel parameters to be set, like
``intel_iommu=on iommu=pt`` on Intel machines. In order to enable the
configuration of kernel parametres to the host, The ``KernelArgs`` role
parameter has to be defined accordingly.
Adding the following arguments to the ``openstack overcloud deploy`` command
will do the trick::
openstack overcloud deploy --templates \
-r roles_data.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-sriov.yaml \
...
Parameters
----------
Unlike SR-IOV, vDPA devices shouldn't be added to ``NeutronPhysicalDevMappings`` but to the
``NovaPCIPassthrough``. The vDPA bridge should also be added to the ``NeutronBridgeMappings``
and the ``physical_network`` to the ``NeutronNetworkVLANRanges``.
The parameter ``KernelArgs`` should be provided in the deployment environment
file, with the set of kernel boot parameters to be applied on the
``ComputeVdpa`` role where vDPA is enabled.
The ``PciPassthroughFilter`` is required for vDPA. The ``NUMATopologyFilter`` will become
optional when ``libvirt`` will support the locking of the guest memory. At this time, it
is mandatory to have it::
parameter_defaults:
NeutronTunnelTypes: ''
NeutronNetworkType: 'vlan'
NeutronNetworkVLANRanges:
- tenant:1300:1399
NovaSchedulerDefaultFilters:
- ...
- PciPassthroughFilter
- NUMATopologyFilter
ComputeVdpaParameters:
NovaPCIPassthrough:
- vendor_id: "15b3"
product_id: "101d"
address: "06:00.0"
physical_network: "tenant"
- vendor_id: "15b3"
product_id: "101d"
address: "06:00.1"
physical_network: "tenant"
KernelArgs: "[...] iommu=pt intel_iommu=on"
NeutronBridgeMappings:
- tenant:br-tenant
Network Config
--------------
vDPA supported network interfaces should be specified in the network config
templates as sriov_pf type. It should also be under an OVS bridge with a ``link_mode``
set to ``switchdev``
Example::
- type: ovs_bridge
name: br-tenant
members:
- type: sriov_pf
name: enp6s0f0
numvfs: 8
use_dhcp: false
vdpa: true
link_mode: switchdev
- type: sriov_pf
name: enp6s0f1
numvfs: 8
use_dhcp: false
vdpa: true
link_mode: switchdev
Network and Port Creation
-------------------------
When creating the network, it has to be mapped to the physical network::
$ openstack network create \
--provider-physical-network tenant \
--provider-network-type vlan \
--provider-segment 1337
vdpa_net1
$ openstack subnet create \
--network vdpa_net1 \
--subnet-range 192.0.2.0/24 \
--dhcp
vdpa_subnet1
To allocate a port from a vdpa-enabled NIC, create a neutron port and set the
``--vnic-type`` to ``vdpa``::
$ openstack port create --network vdpa_net1 \
--vnic-type=vdpa \
vdpa_direct_port1
Scheduling instances
--------------------
Normally, the ``PciPassthroughFilter`` is sufficient to ensure that a vDPA instance will
land on a vDPA host. If we want to prevent other instances from using a vDPA host, we need
to setup the `isolate-aggreate feature
<https://docs.openstack.org/nova/latest/reference/isolate-aggregates.html>`_.
Example::
$ openstack --os-placement-api-version 1.6 trait create CUSTOM_VDPA
$ openstack aggregate create \
--zone vdpa-az1 \
vdpa_ag1
$ openstack hypervisor list -c ID -c "Hypervisor Hostname" -f value | grep vdpa | \
while read l
do UUID=$(echo $l | cut -f 1 -d " ")
H_NAME=$(echo $l | cut -f 2 -d " ")
echo $H_NAME $UUID
openstack aggregate add host vdpa_ag1 $H_NAME
traits=$(openstack --os-placement-api-version 1.6 resource provider trait list \
-f value $UUID | sed 's/^/--trait /')
openstack --os-placement-api-version 1.6 resource provider trait set \
$traits --trait CUSTOM_VDPA $UUID
done
$ openstack --os-compute-api-version 2.53 aggregate set \
--property trait:CUSTOM_VDPA=required \
vdpa_ag1
The flavor will map to that new aggregate with the ``traits:CUSTOM_VDPA`` property::
$ openstack --os-compute-api-version 2.86 flavor create \
--ram 4096 \
--disk 10 \
--vcpus 2 \
--property hw:cpu_policy=dedicated \
--property hw:cpu_realtime=True \
--property hw:cpu_realtime_mask=^0 \
--property traits:CUSTOM_VDPA=required \
vdpa_pinned
.. note::
It's also important to have the ``hw:cpu_realtime*`` properties here since
``libvirt`` doesn't currently support the locking of guest memory.
This should launch an instance on one of the vDPA hosts::
$ openstack server create \
--image cirros \
--flavor vdpa_pinned \
--nic port-id=vdpa_direct_port1 \
vdpa_test_1
Validations
-----------
Confirm that a PCI device is in switchdev mode::
[root@computevdpa-0 ~]# devlink dev eswitch show pci/0000:06:00.0
pci/0000:06:00.0: mode switchdev inline-mode none encap enable
[root@computevdpa-0 ~]# devlink dev eswitch show pci/0000:06:00.1
pci/0000:06:00.1: mode switchdev inline-mode none encap enable
Verify if offload is enabled in OVS::
[root@computevdpa-0 ~]# ovs-vsctl get Open_vSwitch . other_config:hw-offload
"true"
Validate the interfaces are added to the tenant bridge::
[root@computevdpa-0 ~]# ovs-vsctl show
be82eb5b-94c3-449d-98c8-0961b6b6b4c4
Manager "ptcp:6640:127.0.0.1"
is_connected: true
[...]
Bridge br-tenant
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
datapath_type: system
Port br-tenant
Interface br-tenant
type: internal
Port enp6s0f0
Interface enp6s0f0
Port phy-br-tenant
Interface phy-br-tenant
type: patch
options: {peer=int-br-tenant}
Port enp6s0f1
Interface enp6s0f1
[...]
Verify if the NICs have ``hw-tc-offload`` enabled::
[root@computevdpa-0 ~]# for i in {0..1};do ethtool -k enp6s0f$i | grep tc-offload;done
hw-tc-offload: on
hw-tc-offload: on
Verify that the udev rules have been created::
[root@computevdpa-0 ~]# cat /etc/udev/rules.d/80-persistent-os-net-config.rules
# This file is autogenerated by os-net-config
SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}!="", ATTR{phys_port_name}=="pf*vf*", ENV{NM_UNMANAGED}="1"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", KERNELS=="0000:06:00.0", NAME="enp6s0f0"
SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="80ecee0003723f04", ATTR{phys_port_name}=="pf0vf*", IMPORT{program}="/etc/udev/rep-link-name.sh $attr{phys_port_name}", NAME="enp6s0f0_$env{NUMBER}"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", KERNELS=="0000:06:00.1", NAME="enp6s0f1"
SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="80ecee0003723f04", ATTR{phys_port_name}=="pf1vf*", IMPORT{program}="/etc/udev/rep-link-name.sh $attr{phys_port_name}", NAME="enp6s0f1_$env{NUMBER}"
Validate that the ``numvfs`` are correctly defined::
[root@computevdpa-0 ~]# cat /sys/class/net/enp6s0f0/device/sriov_numvfs
8
[root@computevdpa-0 ~]# cat /sys/class/net/enp6s0f1/device/sriov_numvfs
8
Validate that the ``pci/passthrough_whitelist`` contains all the PFs::
[root@computevdpa-0 ~]# grep ^passthrough_whitelist /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf
passthrough_whitelist={"address":"06:00.0","physical_network":"tenant","product_id":"101d","vendor_id":"15b3"}
passthrough_whitelist={"address":"06:00.1","physical_network":"tenant","product_id":"101d","vendor_id":"15b3"}
Verify the ``nodedev-list`` from ``libvirt``::
[root@computevdpa-0 ~]# podman exec -u0 nova_libvirt virsh nodedev-list | grep -P "pci_0000_06|enp6|vdpa"
net_enp6s0f0_04_3f_72_ee_ec_80
net_enp6s0f0_0_5a_86_bd_4b_06_d9
net_enp6s0f0_1_72_b9_6b_12_33_57
net_enp6s0f0_2_f6_f2_db_7c_52_90
net_enp6s0f0_3_66_e5_9e_b8_79_7f
net_enp6s0f0_4_32_04_6f_ef_ef_c3
net_enp6s0f0_5_a2_fe_8d_4a_95_64
net_enp6s0f0_6_8e_23_fa_bb_95_41
net_enp6s0f0_7_8a_9f_0f_53_f6_19
net_enp6s0f0v0_ee_a1_e2_4e_80_8d
net_enp6s0f0v1_ce_b7_e1_33_33_56
net_enp6s0f0v2_fe_91_a8_ee_2e_79
net_enp6s0f0v3_2a_34_e0_a0_e6_ff
net_enp6s0f0v4_26_59_82_da_65_4e
net_enp6s0f0v5_a6_fd_db_97_c6_8a
net_enp6s0f0v6_36_5d_5c_ff_e8_00
net_enp6s0f0v7_4e_23_6c_95_b6_a4
net_enp6s0f1_04_3f_72_ee_ec_81
net_enp6s0f1_0_0e_0c_86_b5_43_c1
net_enp6s0f1_1_be_f5_75_f4_da_b1
net_enp6s0f1_2_ea_6a_21_37_91_24
net_enp6s0f1_3_06_95_51_55_de_80
net_enp6s0f1_4_86_a4_d5_83_bd_56
net_enp6s0f1_5_86_d1_a9_ba_b7_f0
net_enp6s0f1_6_82_ae_32_56_07_84
net_enp6s0f1_7_62_b7_93_7e_5c_30
net_enp6s0f1v0_b2_b3_0d_bd_6f_5d
net_enp6s0f1v1_4a_24_a1_24_ae_39
net_enp6s0f1v2_8e_19_b2_aa_ae_d7
net_enp6s0f1v3_b6_e2_4b_fa_d8_f0
net_enp6s0f1v4_5e_31_7f_17_ee_4d
net_enp6s0f1v5_5e_77_99_09_1a_89
net_enp6s0f1v6_96_68_4b_70_c5_1b
net_enp6s0f1v7_c2_bb_14_95_81_29
pci_0000_06_00_0
pci_0000_06_00_1
pci_0000_06_00_2
pci_0000_06_00_3
pci_0000_06_00_4
pci_0000_06_00_5
pci_0000_06_00_6
pci_0000_06_00_7
pci_0000_06_01_0
pci_0000_06_01_1
pci_0000_06_01_2
pci_0000_06_01_3
pci_0000_06_01_4
pci_0000_06_01_5
pci_0000_06_01_6
pci_0000_06_01_7
pci_0000_06_02_0
pci_0000_06_02_1
vdpa_vdpa0
vdpa_vdpa1
vdpa_vdpa10
vdpa_vdpa11
vdpa_vdpa12
vdpa_vdpa13
vdpa_vdpa14
vdpa_vdpa15
vdpa_vdpa2
vdpa_vdpa3
vdpa_vdpa4
vdpa_vdpa5
vdpa_vdpa6
vdpa_vdpa7
vdpa_vdpa8
vdpa_vdpa9
Validate that the vDPA devices have been created, this should match the vdpa
devices from ``virsh nodedev-list``::
[root@computevdpa-0 ~]# ls -tlra /dev/vhost-vdpa-*
crw-------. 1 root root 241, 0 Jun 30 12:52 /dev/vhost-vdpa-0
crw-------. 1 root root 241, 1 Jun 30 12:52 /dev/vhost-vdpa-1
crw-------. 1 root root 241, 2 Jun 30 12:52 /dev/vhost-vdpa-2
crw-------. 1 root root 241, 3 Jun 30 12:52 /dev/vhost-vdpa-3
crw-------. 1 root root 241, 4 Jun 30 12:52 /dev/vhost-vdpa-4
crw-------. 1 root root 241, 5 Jun 30 12:53 /dev/vhost-vdpa-5
crw-------. 1 root root 241, 6 Jun 30 12:53 /dev/vhost-vdpa-6
crw-------. 1 root root 241, 7 Jun 30 12:53 /dev/vhost-vdpa-7
crw-------. 1 root root 241, 8 Jun 30 12:53 /dev/vhost-vdpa-8
crw-------. 1 root root 241, 9 Jun 30 12:53 /dev/vhost-vdpa-9
crw-------. 1 root root 241, 10 Jun 30 12:53 /dev/vhost-vdpa-10
crw-------. 1 root root 241, 11 Jun 30 12:53 /dev/vhost-vdpa-11
crw-------. 1 root root 241, 12 Jun 30 12:53 /dev/vhost-vdpa-12
crw-------. 1 root root 241, 13 Jun 30 12:53 /dev/vhost-vdpa-13
crw-------. 1 root root 241, 14 Jun 30 12:53 /dev/vhost-vdpa-14
crw-------. 1 root root 241, 15 Jun 30 12:53 /dev/vhost-vdpa-15
Validate the ``pci_devices`` table in the database from one of the controllers::
[root@controller-0 ~]# podman exec -u0 $(podman ps -q -f name=galera) mysql -t -D nova -e "select address,product_id,vendor_id,dev_type,dev_id from pci_devices where address like '0000:06:%';"
+--------------+------------+-----------+----------+------------------+
| address | product_id | vendor_id | dev_type | dev_id |
+--------------+------------+-----------+----------+------------------+
| 0000:06:00.0 | 101d | 15b3 | vdpa | pci_0000_06_00_0 |
| 0000:06:00.1 | 101d | 15b3 | vdpa | pci_0000_06_00_1 |
+--------------+------------+-----------+----------+------------------+
Other usefull commands for troubleshooting::
[root@computevdpa-0 ~]# ovs-appctl dpctl/dump-flows -m type=offloaded
[root@computevdpa-0 ~]# ovs-appctl dpctl/dump-flows -m
[root@computevdpa-0 ~]# tc filter show dev enp6s0f1_1 ingress
[root@computevdpa-0 ~]# tc -s filter show dev enp6s0f1_1 ingress
[root@computevdpa-0 ~]# tc monitor