Files
nova-specs/specs/juno/implemented/pci-passthrough-sriov.rst
Michael Still f0b8204072 Re-organize juno specs
As discussed at our nova meetings, reorganize the juno specs into
three directories:

 - proposed: things proposed which weren't approved
 - approved: things we approved but didn't implement
 - implemented: things approved and implemented

The first I suspect is the most controversial. I've done this
because I worry about the case where a future developer wants to
pick up something dropped by a previous developer, but has trouble
finding previous proposed specifications on the topic. Note that
the actual proposed specs for Juno are adding in a later commit.

Change-Id: Idcf55ca37a83d7098dcb7c2971240c4e8fd23dc8
2014-10-07 07:49:59 +11:00

415 lines
16 KiB
ReStructuredText

..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
PCI SR-IOV passthrough to nova instance
==========================================
https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov
Enable nova instance to be booted up with SR-IOV neutron ports.
Problem description
===================
Right now it is possible to boot VM with general purpose PCI device passthrough
by means of libvirt's managed hostdev device definition in the domain XML. A
guide to use it can be found in [GPP_WIKI]_. However, it's not possible to
request access to virtual network via SR-IOV NICs. Nova enhancments are
required to support SR-IOV networking with Neutron.
Traditionally, a neutron port is a virtual port that is either attached to a
linux bridge or an openvswitch bridge on a compute node. With the introduction
of SR-IOV based NIC (called vNIC), the virtual bridge is no longer required.
Each SR-IOV port is associated with a virtual function (VF) that logically
resides on a vNIC. There exists two variants for SR-IOV networking. SR-IOV
ports may be provided by Hardware-based Virtual Eithernet Bridging (HW VEB); or
they may be extended to an upstream physical switch (IEEE 802.1br). In the
latter case, port's configuration is enforced in the switch. There are also
two variants in connecting a SR-IOV port to its corresponding VF. A SR-IOV port
may be directly connected to its VF. Or it may be connected with a macvtap
device that resides on the host, which is then connected to the corresponding
VF. Using a macvtap device makes live migration with SR-IOV possible.
In the Icehouse release, a couple of blueprints from neutron side were approved
and their associated patches were committed that enable the interactions
between nova and neutron for SR-IOV networking. Refer to [VIF_DETA]_ and
[BIND_PRF]_ for details about them.
Another blueprint [VNIC_TYP]_ added the support in the neutron port API to
allow users to specify vnic-type when creating a neutron port. The currently
supported vnic-types are:
* normal: a traditional virtual port that is either attached to a linux bridge
or an openvswitch bridge on a compute node.
* direct: an SR-IOV port that is directly attached to a VM
* macvtap: an SR-IOV port that is attached to a VM via a macvtap device.
This specification attempts to build up on top of the above-mentioned neutron
changes and address the following functionalities in Nova so that SR-IOV
networking in openstack is fully functional end-to-end:
1. Generating libvirt domain XML and network XML that enables SR-IOV for
networking.
2. Scheduling based on SR-IOV port's network connectivity.
The initial use case that is targeted in this specification and therefore for
Juno is to boot a VM with one or more vNICs that may use different vnic-types.
Particularly a user would do the following to boot a VM with SR-IOV vnics:
* create one or more neutron ports. For example:
::
neutron port-create <net-id> --binding:vnic-type direct
* boot a VM with one or more neutron ports. For example:
::
nova boot --flavor m1.large --image <image>
--nic port-id=<port1> --nic port-id=<port2>
Note that in the nova boot API, users can specify either a port-id or a net-id.
If it's the latter case, it's assumed that the user is requesting a normal
virtual port (which is not a SR-IOV port).
This specification will make use of the existing PCI passthrough
implementation, and make a few enhancements to enable the above use cases.
Therefore, the existing PCI passthrough support as documented by [GPP_WIKI]_
works as it is for general-purpose PCI passthrough.
Proposed change
===============
To schedule an instance with SR-IOV ports based on their network connectivity,
the neutron ports' associated physical networks have to be used in making the
scheduling decision. A VF has to be selected for each of the neutron port.
Therefore, the VF's associated physical network has to be known to the system,
and the selected VF's associated physical network has to match that from the
neutron port. To make the above happen, this specification proposes associating
an extra tag called *physical_network* to each networking VF. In addition, nova
currently has no knowledge of a neutron port's associated physical network.
Therefore, nova needs to make extra calls to neutron in order to retrieve this
information from neutron. In the following, detailed changes in nova will be
described on how to achieve that.
Note that this specification only supports libvirt driver.
PCI Whitelist
-------------
This specification introduces a few enhancements to the existing PCI whitelist:
* allows aggregated declaration of PCI devices by using '*' and '.'
* allows tags to be associated with PCI devices.
Note that it's compatible with the previous PCI whitelist definition. And
therefore, the existing functionalities associated with the PCI whitelist work
as is.
with '[' to indicate 0 or one time occurrence, '{' 0 or multiple occurrences,
'|' mutually exclusive choice, a whitelist entry is defined as:
::
["device_id": "<id>",] ["product_id": "<id>",]
["address": "[[[[<domain>]:]<bus>]:][<slot>][.[<function>]]" |
"devname": "PCI Device Name",]
{"tag":"<tag_value>",}
*<id>* can be a '*' or a valid *device/product id* as displayed by the linux
utility lspci. The *address* uses the same syntax as it's in lspci. Refer to
lspci's manual for its description about the '-s' switch. The *devname* can be
a valid PCI device name. The only device names that are supported in this
specification are those that are displayed by the linux utility *ifconfig -a*
and correspond to either a PF or a VF on a vNIC. There may be 0 or more tags
associated with an entry.
If the device defined by the *address* or *devname* corresponds to a SR-IOV PF,
all the VFs under the PF will match the entry.
For SR-IOV networking, a pre-defined tag "physical_network" is used to define
the physical network that the devices are attached to. A whitelist entry is
defined as:
::
["device_id": "<id>",] ["product_id": "<id>",]
["address": "[[[[<domain>]:]<bus>]:][<slot>][.[<function>]]" |
"devname": "Ethernet Interface Name",]
"physical_network":"name string of the physical network"
Multiple whitelist entries per host are supported as they already are. The
fields *device_id*, *product_id*, and *address* or *devname* will be matched
against PCI devices that are returned as a result of querying libvirt.
Whitelist entries are defined in nova.conf in the format:
::
pci_passthrough_whitelist = {<entry>}
{<entry>} is a json dictionary and is defined as in above.
*pci_passthrough_whitelist* is a plural configuration, and therefore can appear
multiple times in nova.conf.
Some examples are:
::
pci_passthrough_whitelist = {"devname":"eth0",
"physical_network":"physnet"}
pci_passthrough_whitelist = {"address":"*:0a:00.*",
"physical_network":"physnet1"}
pci_passthrough_whitelist = {"address":":0a:00.",
"physical_network":"physnet1"}
pci_passthrough_whitelist = {"vendor_id":"1137","product_id":"0071"}
pci_passthrough_whitelist = {"vendor_id":"1137","product_id":"0071",
"address": "0000:0a:00.1",
"physical_network":"physnet1"}
PCI stats
---------
On the compute node, PCI devices are matched against the PCI whitelist entries
in the order as they are defined in the nova.conf file. Once a match is found,
the device is placed in the corresponding PCI stats entry.
If a device matches a PCI whitelist entry, and if the PCI whitelist entry is
tagged, the tags together with *product_id* and *vendor_id* will be used as
stats keys; otherwise, the existing predefined keys will be used.
A PCI whitelist entry for SR-IOV networking will be tagged with a physical
network name. Therefore, the physical network name is used as the stats key for
SR-IOV networking devices. Conceptually speaking for SR-IOV networking, a PCI
stats entry keeps track of the number of SR-IOV ports that are attached to a
physical network on a compute node. And for scheduling purpose, it can be
considered as a tuple of
::
<host_name> <physical_network_name> <count>
When a port is requested from a physical network, the compute nodes that host
the physical network can be found from the stats entries. The existing PCI
passthrough filter in nova scheduler works without requiring any change in
support of SR-IOV networking.
There is no change in how the stats entries are updated and persisted into the
compute_nodes database table with the use of nova resource tracker. Currently,
a collumn called *pci_stats* in the compute_nodes database table is used to
store the PCI stats as a JSON document. The PCI stats JSON document is
basically a list of stats entries in the format of *<key1> <key2> ....<keyn>* :
*<count>*. This will not be changed for SR-IOV networking. Specifically for
SR-IOV networking, however, PCI stats records are keyed off with the tag
*physical_network_name*, plus *product_id* and *vendor_id*. a stats entry for
SR-IOV networking will look like:
::
<physical_network_name>, <product_id>, <vendor_id> : <count>.
requested_networks (NICs)
-------------------------
Currently, each requested network is a tuple of
::
<neutron-net-id> <v4-fixed-ip> <neutron-port-id>
Either neutron-net-id or neutron-port-id must have a valid value, and
v4-fixed-ip can be None. For each --nic option specified in the *nova boot*
command, a requested_network tuple is created. All the requested_network tuples
are passed to the compute node, and the compute service running on the node
uses the information to request neutron services. This specification proposes
one additional field in the tuple: *pci-request-id*.
Corresponding to each requested_network tuple, there is a neutron port with a
valid vnic-type. If the vnic-type is direct or macvtap, a valid
*pci_request_id* must be populated into the tuple (see below for details). The
*pci-request-id* is later used to locate the PCI device from PCI manager that
is allocated for the requested_network tuple (therefore the NIC).
PCI Requests
------------
Currently, pci_requests as key and a JSON doc string as associated value are
stored in the instance's system metadata. In addition, all the PCI devices
allocated for PCI passthrough are treated the same in terms of generating
libvirt xml. However, for SR-IOV networking, special libvirt xml is required.
Further, we need a way to correlate the allocated device with the requested
network (NIC) later on during the instance boot process. In this specification,
we propose the use of *pci_request_id* for that purpose.
Each PCI request is associated with a *pci_request_id* that is generated while
creating/saving the PCI request to the instance's system metadata. The
*pci_request_id* is used on the compute node to retrieve the allocated PCI
device. Particularly for SR-IOV networking, a PCI request is expressed as
::
"physical_network" : <name>
"count" : 1
"pci_request_id" : <request-uuid>
For each --nic specified in the 'nova boot', nova-api creates a requested
network tuple. For a SR-IOV NIC, it creates a PCI request and as a
result a *pci_request_id* is generated and saved in the PCI request spec. The
same *pci_request_id* is also saved in the requested_network (Refer to the last
section).
nova neutronv2 and VIF
-------------------------------------
Note that Nova network will not be enhanced to support SR-IOV. However, Nova
modules that are responsible for interacting with neutron need to be enhanced.
Refer to [BIND_PRF]_, [VIF_DETA]_, [VNIC_TYP]_ that has added the
functionalities required to support SR-IOV ports in neutron. Accordingly, nova
neutronv2 will be enhanced to work with them in support of SR-IOV ports.
Particularly:
* When nova processes the --nic options, physical network names will be
retrieved from neutron. This needs to be done by using neutron provider
extension with admin access. As a result, additional neutron calls will be
made to retrieve the physical network name.
* When nova updates neutron ports, binding:profile needs to be populated with
pci information that includes pci_vendor_info, pci_slot, physical_network.
* After nova successfully updates the neutron ports, it retrieves the ports'
information from neutron that are used to populate VIF objects. New
properties will be added in the VIF class in support of binding:profile,
binding:vif_details and binding:vnic_type.
nova VIF driver
---------------
Each neutron port is associated with a vif-type. The following VIF types are
related to SR-IOV support:
* VIF_TYPE_802_QBH: corresponds to IEEE 802.1BR (used to be IEEE 802.1Qbh)
* VIF_TYPE_HW_VEB: for vNIC adapters that supports virtual embedded bridging
* VIF_TYPE_802_QBG: corresponds to IEEE 802.1QBG. However, this existing vif
type may not be useful now because the libvirt parameters for 1QBG
(managerid, typeidversion and instanceid) are not supported by known neutron
plugins that support SR-IOV.
The nova generic libvirt VIF driver will be enhanced to support the first two
VIF types. This includes populating the VIF config objects and generating the
interface XMLs.
Alternatives
------------
N/A
Data model impact
-----------------
Currently, a nova object *PciDevice* is created for each PCI passthrough
device. The database table *pci_devices* is used to persist the *PciDevice*
nova objects. A new field *request_id* will be added in the *PciDevice* nova
object. Correspondingly, a new column *request_id* is added in the database
table *pci_devices*. Database migration script will be incorporated
accordingly.
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
The physical network to which a port is connected needs to be retrieved from
neutron, which requires additional calls to neutron. Particularly, nova will
call neutron *show_port* to check the port's *vnic_type*. If the *vnic_type* is
either *direct* or *macvtap*, it will call neutron *show_network* to retrieve
the associated physical network. As a consequence, the number of calls to
neutron will be slightly increased when *port-id* is specified in the --nic
option in nova boot.
Other deployer impact
---------------------
No known deployer impact other than configuring the PCI whitelist for SR-IOV
networking devices.
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
baoli
Other contributors:
TBD
Work Items
----------
* PCI whitelist
* PCI request
* PCI stats
* DB change and the required migration script, PCI device object change
* neutronv2
* VIF
* libvirt generic VIF driver and instance configuration
* nova compute api retrieving physical network, change of requested_networks
Dependencies
============
None
Testing
=======
Both unit and tempest tests need to be created to ensure proper functioning of
SR-IOV networking. For tempest testing, given the nature of SR-IOV depending on
hardware, it may require vendor support and use of proper neutron ML2 mechanism
drivers. Cisco Neutron CI and Mellanox External Testing need to be enhanced in
support of SR-IOV tempest testing.
Documentation Impact
====================
* document new whitelist configuration changes
* a user guide/wiki on how to use SR-IOV networking in openstack
References
==========
.. [GPP_WIKI] `Generic PCI Passthrough WIKI <https://wiki.openstack.org/wiki/Pci_passthrough>`_
.. [VIF_DETA] `Extensible port attribute for plugin to provide details to VIF driver <https://blueprints.launchpad.net/neutron/+spec/vif-details>`_
.. [BIND_PRF] `Implement the binding:profile port attribute in ML2 <https://blueprints.launchpad.net/neutron/+spec/ml2-binding-profile>`_
.. [VNIC_TYP] `Add support for vnic type request to be managed by ML3 mechanism drivers <https://blueprints.launchpad.net/neutron/+spec/ml2-request-vnic-type>`_