nova/nova/scheduler/client
Balazs Gibizer c02e213d50 Ensure that bandwidth and VF are from the same PF
A neutron port can be created with direct vnic type and also it can have
bandwidth resource request at the same time. In this case placement will
offer allocation candidates that could fulfill the bandwidth request, then
the nova scheduler's PciPassthroughFilter checks if a PCI device, a VF,
is available for such request. This check is based on the physnet
of the neutron port and the physical_network tag in the
pci/passthrough_whitelist config. It does not consider the actual PF
providing the bandwidth.

The currently unsupported case is when a single compute node has
whitelisted VFs from more than one PF which are connected to the same
physnet. These PFs can have totally different bandwidth inventories
in placement. For example PF2 can have plenty of bandwidth available
and PF3 has no bandwidth configured at all.

In this case the PciPassthroughFilter might accept the host simply
because PF3 still has available VFs even if the bandwidth from the port
is fulfilled from PF2 which in return might not have available VFs any
more.

Moreover the PCI claim has the same logic as the filter so it will claim
the VF from PF3 while the bandwidth was allocated from PF2 in placement.

This patch does not try to solve the issue in the PciPassthroughFilter but
it does solves the issue in the pci claim. This means that after
successful scheduling the pci claim can still fail if bandwidth is
allocated from one PF but a VF is not available from that specific PF
any more. This will lead to re-schedule.

Making the PciPassthroughFilter smart enough is complicated because:
* The filters are not knowing about placement allocation candidates at
  all
* The filters are working per compute host not per allocation
  candidates. If there are two allocation candidates for the same host
  then nova will only try to filter for the first one. [1][2]

This patch applies the following logic:

The compute manager checks the InstancePCIRequest ovos in a given
boot request and maps each of them to the neutron port that requested
the PCI device. Then it maps the neutron port to the physical device
RP in the placement allocation made for this server. Then the spec in
the InstancePCIRequest is extended with the interface name of the PF
from where the bandwidth was allocated from based on the name of the
device RP. Then the PCI claim will enforce that the PF interface
name in the request matches the interface name of the PF from where
the VF is selected from. The PCI claim code knows about the PF
interface name of each available VF from the virt driver reporting the
'parent_ifname' key as part of the return value of the
get_available_resource() driver call.

The PCI claim process is not changed as it already enforces that
every fields from the request matches with the fields of the selected
device pool.

The current patch extends the libvirt driver to provider PF interface
name information. Besides the libvirt driver the xenapi driver also
support SRIOV VF handling but this patch does not extend the xenapi
driver. So for the xenapi the above described configuration currently
kept unsupported.

I know that this feels complicated but it is necessary becase VFs has
not been counted as resources in placement yet.

[1] f6996903d2/nova/scheduler/filter_scheduler.py (L239)
[2] f6996903d2/nova/scheduler/filter_scheduler.py (L426)

blueprint: bandwidth-resource-provider

Change-Id: I038867c4094d79ae4a20615ab9c9f9e38fcc2e0a
2019-03-05 17:48:29 +01:00
..
__init__.py Rip out the SchedulerClient 2019-01-16 18:35:26 +00:00
query.py Modify select_destinations() to return objects and alts 2017-12-07 15:01:13 +00:00
report.py Ensure that bandwidth and VF are from the same PF 2019-03-05 17:48:29 +01:00