Nova does not currently support attaching SR-IOV ports to existing instances, you can only create a server with an SR-IOV port at this time. This adds an item about that limitation to the SR-IOV admin docs. Change-Id: I0a954de724384a81cb45446da20fa6b17d4bd63a Related-Bug: #1708433 (cherry picked from commit60a9248b17) (cherry picked from commit7859941647)
17 KiB
SR-IOV
The purpose of this page is to describe how to enable SR-IOV functionality available in OpenStack (using OpenStack Networking). This functionality was first introduced in the OpenStack Juno release. This page intends to serve as a guide for how to configure OpenStack Networking and OpenStack Compute to create SR-IOV ports.
The basics
PCI-SIG Single Root I/O Virtualization and Sharing (SR-IOV) functionality is available in OpenStack since the Juno release. The SR-IOV specification defines a standardized mechanism to virtualize PCIe devices. This mechanism can virtualize a single PCIe Ethernet controller to appear as multiple PCIe devices. Each device can be directly assigned to an instance, bypassing the hypervisor and virtual switch layer. As a result, users are able to achieve low latency and near-line wire speed.
The following terms are used throughout this document:
| Term | Definition |
|---|---|
| PF | Physical Function. The physical Ethernet controller that supports SR-IOV. |
| VF | Virtual Function. The virtual PCIe device created from a physical Ethernet controller. |
SR-IOV agent
The SR-IOV agent allows you to set the admin state of ports, configure port security (enable and disable spoof checking), and configure QoS rate limiting and minimum bandwidth. You must include the SR-IOV agent on each compute node using SR-IOV ports.
Note
The SR-IOV agent was optional before Mitaka, and was not enabled by default before Liberty.
Note
The ability to control port security and QoS rate limit settings was added in Liberty.
Supported Ethernet controllers
The following manufacturers are known to work:
- Intel
- Mellanox
- QLogic
For information on Mellanox SR-IOV Ethernet ConnectX-3/ConnectX-3 Pro cards, see Mellanox: How To Configure SR-IOV VFs.
For information on QLogic SR-IOV Ethernet cards, see User's Guide OpenStack Deployment with SR-IOV Configuration.
Using SR-IOV interfaces
In order to enable SR-IOV, the following steps are required:
- Create Virtual Functions (Compute)
- Whitelist PCI devices in nova-compute (Compute)
- Configure neutron-server (Controller)
- Configure nova-scheduler (Controller)
- Enable neutron sriov-agent (Compute)
We recommend using VLAN provider networks for segregation. This way you can combine instances without SR-IOV ports and instances with SR-IOV ports on a single network.
Note
Throughout this guide, eth3 is used as the PF and
physnet2 is used as the provider network configured as a
VLAN range. These ports may vary in different environments.
Create Virtual Functions (Compute)
Create the VFs for the network interface that will be used for
SR-IOV. We use eth3 as PF, which is also used as the
interface for the VLAN provider network and has access to the private
networks of all machines.
Note
The steps detail how to create VFs using Mellanox ConnectX-4 and newer/Intel SR-IOV Ethernet cards on an Intel system. Steps may differ for different hardware configurations.
Ensure SR-IOV and VT-d are enabled in BIOS.
Enable IOMMU in Linux by adding
intel_iommu=onto the kernel parameters, for example, using GRUB.On each compute node, create the VFs via the PCI SYS interface:
# echo '8' > /sys/class/net/eth3/device/sriov_numvfsNote
On some PCI devices, observe that when changing the amount of VFs you receive the error
Device or resource busy. In this case, you must first setsriov_numvfsto0, then set it to your new value.Note
A network interface could be used both for PCI passthrough, using the PF, and SR-IOV, using the VFs. If the PF is used, the VF number stored in the
sriov_numvfsfile is lost. If the PF is attached again to the operating system, the number of VFs assigned to this interface will be zero. To keep the number of VFs always assigned to this interface, modify the interfaces configuration file adding anifupscript command.In Ubuntu, modifying the
/etc/network/interfacesfile:auto eth3 iface eth3 inet dhcp pre-up echo '4' > /sys/class/net/eth3/device/sriov_numvfsIn Red Hat, modifying the
/sbin/ifup-localfile:#!/bin/sh if [[ "$1" == "eth3" ]] then echo '4' > /sys/class/net/eth3/device/sriov_numvfs fiWarning
Alternatively, you can create VFs by passing the
max_vfsto the kernel module of your network interface. However, themax_vfsparameter has been deprecated, so the PCI SYS interface is the preferred method.You can determine the maximum number of VFs a PF can support:
# cat /sys/class/net/eth3/device/sriov_totalvfs 63Verify that the VFs have been created and are in
upstate:# lspci | grep Ethernet 82:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 82:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 82:10.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 82:10.2 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 82:10.4 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 82:10.6 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 82:11.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 82:11.2 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 82:11.4 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 82:11.6 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)# ip link show eth3 8: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 link/ether a0:36:9f:8f:3f:b8 brd ff:ff:ff:ff:ff:ff vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 7 MAC 00:00:00:00:00:00, spoof checking on, link-state autoIf the interfaces are down, set them to
upbefore launching a guest, otherwise the instance will fail to spawn:# ip link set eth3 upPersist created VFs on reboot:
# echo "echo '7' > /sys/class/net/eth3/device/sriov_numvfs" >> /etc/rc.localNote
The suggested way of making PCI SYS settings persistent is through the
sysfsutilstool. However, this is not available by default on many major distributions.
Whitelist PCI devices nova-compute (Compute)
Configure which PCI devices the
nova-computeservice may use. Edit thenova.conffile:[default] pci_passthrough_whitelist = { "devname": "eth3", "physical_network": "physnet2"}This tells the Compute service that all VFs belonging to
eth3are allowed to be passed through to instances and belong to the provider networkphysnet2.Alternatively the
pci_passthrough_whitelistparameter also supports whitelisting by:PCI address: The address uses the same syntax as in
lspciand an asterisk (*) can be used to match anything.pci_passthrough_whitelist = { "address": "[[[[<domain>]:]<bus>]:][<slot>][.[<function>]]", "physical_network": "physnet2" }For example, to match any domain, bus 0a, slot 00, and all functions:
pci_passthrough_whitelist = { "address": "*:0a:00.*", "physical_network": "physnet2" }PCI
vendor_idandproduct_idas displayed by the Linux utilitylspci.pci_passthrough_whitelist = { "vendor_id": "<id>", "product_id": "<id>", "physical_network": "physnet2" }
If the device defined by the PCI address or
devnamecorresponds to an SR-IOV PF, all VFs under the PF will match the entry. Multiplepci_passthrough_whitelistentries per host are supported.Restart the
nova-computeservice for the changes to go into effect.
Configure neutron-server (Controller)
Add
sriovnicswitchas mechanism driver. Edit theml2_conf.inifile on each controller:mechanism_drivers = openvswitch,sriovnicswitchAdd the
plugin.inifile as a parameter to theneutron-serverservice. Edit the appropriate initialization script to configure theneutron-serverservice to load the plugin configuration file:--config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.iniRestart the
neutron-serverservice.
Configure nova-scheduler (Controller)
On every controller node running the
nova-schedulerservice, addPciPassthroughFiltertoscheduler_default_filtersto enablePciPassthroughFilterby default. Also ensurescheduler_available_filtersparameter under the[DEFAULT]section innova.confis set toall_filtersto enable all filters provided by the Compute service.[DEFAULT] scheduler_default_filters = RetryFilter, AvailabilityZoneFilter, RamFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, PciPassthroughFilter scheduler_available_filters = nova.scheduler.filters.all_filtersRestart the
nova-schedulerservice.
Enable neutron sriov-agent (Compute)
Install the SR-IOV agent.
Edit the
sriov_agent.inifile on each compute node. For example:[securitygroup] firewall_driver = neutron.agent.firewall.NoopFirewallDriver [sriov_nic] physical_device_mappings = physnet2:eth3 exclude_devices =Note
The
physical_device_mappingsparameter is not limited to be a 1-1 mapping between physical networks and NICs. This enables you to map the same physical network to more than one NIC. For example, ifphysnet2is connected toeth3andeth4, thenphysnet2:eth3,physnet2:eth4is a valid option.The
exclude_devicesparameter is empty, therefore, all the VFs associated with eth3 may be configured by the agent. To exclude specific VFs, add them to theexclude_devicesparameter as follows:exclude_devices = eth1:0000:07:00.2;0000:07:00.3,eth2:0000:05:00.1;0000:05:00.2Ensure the neutron sriov-agent runs successfully:
# neutron-sriov-nic-agent \ --config-file /etc/neutron/neutron.conf \ --config-file /etc/neutron/plugins/ml2/sriov_agent.iniEnable the neutron sriov-agent service.
If installing from source, you must configure a daemon file for the init system manually.
(Optional) FDB L2 agent extension
Forwarding DataBase (FDB) population is an L2 agent extension to OVS agent or Linux bridge. Its objective is to update the FDB table for existing instance using normal port. This enables communication between SR-IOV instances and normal instances. The use cases of the FDB population extension are:
- Direct port and normal port instances reside on the same compute node.
- Direct port instance that uses floating IP address and network node are located on the same host.
For additional information describing the problem, refer to: Virtual switching technologies and Linux bridge.
Edit the
ovs_agent.iniorlinuxbridge_agent.inifile on each compute node. For example:[agent] extensions = fdbAdd the FDB section and the
shared_physical_device_mappingsparameter. This parameter maps each physical port to its physical network name. Each physical network can be mapped to several ports:[FDB] shared_physical_device_mappings = physnet1:p1p1, physnet1:p1p2
Launching instances with SR-IOV ports
Once configuration is complete, you can launch instances with SR-IOV ports.
Get the
idof the network where you want the SR-IOV port to be created:$ net_id=`neutron net-show net04 | grep "\ id\ " | awk '{ print $4 }'`Create the SR-IOV port.
vnic_type=directis used here, but other options includenormal,direct-physical, andmacvtap:$ port_id=`neutron port-create $net_id --name sriov_port --binding:vnic_type direct | grep "\ id\ " | awk '{ print $4 }'`Create the instance. Specify the SR-IOV port created in step two for the NIC:
$ openstack server create --flavor m1.large --image ubuntu_14.04 --nic port-id=$port_id test-sriovNote
There are two ways to attach VFs to an instance. You can create an SR-IOV port or use the
pci_aliasin the Compute service. For more information about usingpci_alias, refer to nova-api configuration.
SR-IOV with InfiniBand
The support for SR-IOV with InfiniBand allows a Virtual PCI device (VF) to be directly mapped to the guest, allowing higher performance and advanced features such as RDMA (remote direct memory access). To use this feature, you must:
Use InfiniBand enabled network adapters.
Run InfiniBand subnet managers to enable InfiniBand fabric.
All InfiniBand networks must have a subnet manager running for the network to function. This is true even when doing a simple network of two machines with no switch and the cards are plugged in back-to-back. A subnet manager is required for the link on the cards to come up. It is possible to have more than one subnet manager. In this case, one of them will act as the master, and any other will act as a slave that will take over when the master subnet manager fails.
Install the
ebrctlutility on the compute nodes.Check that
ebrctlis listed somewhere in/etc/nova/rootwrap.d/*:$ grep 'ebrctl' /etc/nova/rootwrap.d/*If
ebrctldoes not appear in any of the rootwrap files, add this to the/etc/nova/rootwrap.d/compute.filtersfile in the[Filters]section.[Filters] ebrctl: CommandFilter, ebrctl, root
Known limitations
When using Quality of Service (QoS),
max_burst_kbps(burst overmax_kbps) is not supported. In addition,max_kbpsis rounded to Mbps.Security groups are not supported when using SR-IOV, thus, the firewall driver must be disabled. This can be done in the
neutron.conffile.[securitygroup] firewall_driver = neutron.agent.firewall.NoopFirewallDriverSR-IOV is not integrated into the OpenStack Dashboard (horizon). Users must use the CLI or API to configure SR-IOV interfaces.
Live migration is not supported for instances with SR-IOV ports.
Note
SR-IOV features may require a specific NIC driver version, depending on the vendor. Intel NICs, for example, require ixgbe version 4.4.6 or greater, and ixgbevf version 3.2.2 or greater.
Attaching SR-IOV ports to existing servers is not currently supported, see bug 1708433 for details.