QoS minimum bandwidth allocation in Placement API
This spec describes how to model, from Neutron, a new resource provider in the Placement API to describe the bandwidth allocation. Based on a Rocky PTG discussion this is a re-work of the spec. Co-Authored-By: Rodolfo Alonso Hernandez <rodolfo.alonso.hernandez@intel.com> Co-Authored-By: Bence Romsics <bence.romsics@ericsson.com> Co-Authored-By: Balazs Gibizer <balazs.gibizer@ericsson.com> Related-Bug: #1578989 Change-Id: Ib995837f6161bcceb09735a5601d8b79a25a7354 See-Also: Ie7be551f4f03957ade9beb64457736f400560486
This commit is contained in:
parent
ac7d3cffcc
commit
dca619fd9e
|
@ -0,0 +1,721 @@
|
||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
=================================================
|
||||||
|
QoS minimum bandwidth allocation in Placement API
|
||||||
|
=================================================
|
||||||
|
|
||||||
|
https://bugs.launchpad.net/neutron/+bug/1578989
|
||||||
|
|
||||||
|
This spec describes how to model, from Neutron, new resource providers
|
||||||
|
in Placement API to describe bandwidth allocation.
|
||||||
|
|
||||||
|
Problem Description
|
||||||
|
===================
|
||||||
|
|
||||||
|
Currently there are several parameters, quantitative and qualitative,
|
||||||
|
that define a Nova server and are used to select the correct host
|
||||||
|
and network backend devices to run it. Network bandwidth is not yet
|
||||||
|
among these parameters. This allows situations where a physical network
|
||||||
|
device could be oversubscribed.
|
||||||
|
|
||||||
|
This spec addresses managing the bandwidth on the first physical device
|
||||||
|
ie. the pyhsical interface closest to the nova server. Managing bandwidth
|
||||||
|
further away, for example on the backplane of a Top-Of-Rack switch or
|
||||||
|
end-to-end, is out of scope here.
|
||||||
|
|
||||||
|
Guaranteeing bandwidth generally involves enforcement of constraints on
|
||||||
|
two levels.
|
||||||
|
|
||||||
|
* placement: Avoiding oversubscription when placing (scheduling) nova servers
|
||||||
|
and their ports.
|
||||||
|
|
||||||
|
* data plane: Enforcing the guarantee on the physical network devices.
|
||||||
|
|
||||||
|
This spec addresses placement enforcement only. (Data plane enforcement
|
||||||
|
is covered by [4]_.) However the design must respect that users are
|
||||||
|
interested in the joint use of these enforcements.
|
||||||
|
|
||||||
|
Since the placement enforcement itself is a Nova-Neutron cross-project
|
||||||
|
feature this spec is meant to be read, commented and maintained together
|
||||||
|
with its Nova counterpart: `Network bandwidth resource provider` [2]_.
|
||||||
|
|
||||||
|
This spec is based on the approved Neutron spec `Add a spec for strict
|
||||||
|
minimum bandwidth support` [3]_. The aim of the current spec is not
|
||||||
|
to redefine what is already approved in [3]_, but to specify how it is
|
||||||
|
going to be implemented in Neutron.
|
||||||
|
|
||||||
|
Use Cases
|
||||||
|
---------
|
||||||
|
|
||||||
|
The most straightforward use case is when a user, who has paid for a
|
||||||
|
premium service that guarantees a minimum network bandwidth, wants to
|
||||||
|
spawn a Nova server. The scheduler needs to know how much bandwidth is
|
||||||
|
already in use in each physical network device in each compute host and
|
||||||
|
how much bandwidth the user is requesting.
|
||||||
|
|
||||||
|
Data plane only enforcement was merged in Newton for SR-IOV egress
|
||||||
|
(see `Newton Release Notes` [6]_).
|
||||||
|
|
||||||
|
Placement only enforcement may be a viable feature for users able
|
||||||
|
to control all traffic (e.g. in a single tenant private cloud). Such
|
||||||
|
placement only enforcement can be also used together with the bandwidth
|
||||||
|
limit rule. The admin can set two rules in a QoS policy, both with
|
||||||
|
the same bandwidth values and then each server on such chosen compute
|
||||||
|
host will be able to use at most as much bandwidth as it has guaranteed.
|
||||||
|
|
||||||
|
Proposed Change
|
||||||
|
===============
|
||||||
|
|
||||||
|
1. The user must be able to express the resource needs of a port.
|
||||||
|
|
||||||
|
1. Extend ``qos_minimum_bandwidth_rule`` with ingress direction.
|
||||||
|
|
||||||
|
Unlike enforcement in the data plane, Placement can handle both
|
||||||
|
directions by the same effort.
|
||||||
|
|
||||||
|
2. Mark ``qos_minimum_bandwidth_rule`` as supported QoS
|
||||||
|
policy rule for each existing QoS driver.
|
||||||
|
|
||||||
|
Placement enforcement is orthogonal to backend mechanisms. A user
|
||||||
|
can have placement enforcement for drivers not having data plane
|
||||||
|
enforcement (yet).
|
||||||
|
|
||||||
|
Due to the fact that we exposed (and likely want to expose further)
|
||||||
|
partial results of this development effort to end users, the meaning
|
||||||
|
of a ``qos_minimum_bandwidth_rule`` depends on OpenStack version,
|
||||||
|
Neutron backend driver and the rule's direction. A rule may be enforced
|
||||||
|
by placement and/or on the data plane. Therefore we must document, next
|
||||||
|
to the already existing support matrix in the `QoS devref` [10]_, which
|
||||||
|
combinations of versions, drivers, rule directions and (placement and/or
|
||||||
|
data plane) enforcements are supported.
|
||||||
|
|
||||||
|
Since Neutron's choice of backend is hidden from the cloud user, the
|
||||||
|
deployer must also clearly document which subset of the above support
|
||||||
|
matrix is applicable for a cloud user in a particular deployment.
|
||||||
|
|
||||||
|
2. Neutron must convey the resource needs of a port to Nova.
|
||||||
|
|
||||||
|
Extend port with attribute ``resource_request`` according to section
|
||||||
|
'How required bandwidth for a Neutron port is modeled' below. This
|
||||||
|
attribute is computed, read-only and admin-only.
|
||||||
|
|
||||||
|
Information available at port create time (ie. before the port
|
||||||
|
is bound) must be sufficient to generate the ``resource_request``
|
||||||
|
attribute.
|
||||||
|
|
||||||
|
The port extension must be decoupled from ML2 and kept
|
||||||
|
in the QoS service plugin. One way to do that is to use
|
||||||
|
``neutron.db._resource_extend`` like ``trunk_details`` uses it.
|
||||||
|
|
||||||
|
3. Neutron must populate the Placement DB with the available resources.
|
||||||
|
|
||||||
|
Report information on available resources to the Placement service
|
||||||
|
using the `Placement API` [1]_. That is information about the physical
|
||||||
|
network devices, their physnets, available bandwidth and supported
|
||||||
|
VNIC types.
|
||||||
|
|
||||||
|
The cloud admin must be able to control (by configuration) what is
|
||||||
|
reported to Placement. To ease the configuration work autodiscovery
|
||||||
|
of networking devices may be employed, but the admin must be able to
|
||||||
|
override its results.
|
||||||
|
|
||||||
|
Which devices and parameters will be tracked
|
||||||
|
--------------------------------------------
|
||||||
|
|
||||||
|
Even inside a compute host many networking topologies are possible.
|
||||||
|
For example:
|
||||||
|
|
||||||
|
1. OVS agent: physical network - OVS bridge - single physical NIC (or a bond):
|
||||||
|
1-to-1 mapping between physical network and physical interface
|
||||||
|
|
||||||
|
2. SR-IOV agent: physical network - one or more PFs:
|
||||||
|
1-to-n mapping between physical network and physical interface(s)
|
||||||
|
(See `Networking Guide: SR-IOV` [7]_.)
|
||||||
|
|
||||||
|
Each Neutron agent (Open vSwitch, Linux Bridge, SR-IOV) has a
|
||||||
|
configuration parameter to map a physical network with one or more
|
||||||
|
provider interfaces (SR-IOV) or a bridge connected to a provider interface
|
||||||
|
(Open vSwitch or Linux Bridge).
|
||||||
|
|
||||||
|
OVS agent configuration::
|
||||||
|
|
||||||
|
[ovs]
|
||||||
|
# bridge_mappings as it exists already.
|
||||||
|
bridge_mappings = physnet0:br0,physnet1:br1
|
||||||
|
|
||||||
|
# Each right hand side value in bridge_mappings:
|
||||||
|
# * will have a corresponding resource provider created in Placement
|
||||||
|
# * must be listed as a key in resource_provider_bandwidths
|
||||||
|
|
||||||
|
resource_provider_bandwidths = br0:EGRESS:INGRESS,br1:EGRESS:INGRESS
|
||||||
|
|
||||||
|
# Examples:
|
||||||
|
|
||||||
|
# Resource provider created, no inventory reported.
|
||||||
|
resource_provider_bandwidths = br0
|
||||||
|
resource_provider_bandwidths = br0::
|
||||||
|
|
||||||
|
# Report only egress inventory in kbps (same unit as in the QoS rule API).
|
||||||
|
resource_provider_bandwidths = br0:1000000:
|
||||||
|
|
||||||
|
# Report egress and ingress inventories in kbps.
|
||||||
|
resource_provider_bandwidths = br0:1000000:1000000
|
||||||
|
|
||||||
|
# Later we may introduce auto-discovery (for example via ethtool).
|
||||||
|
# We reserve the option to make auto-discovery the default behavior
|
||||||
|
# when it is implemented.
|
||||||
|
resource_provider_bandwidths = br0:auto:auto
|
||||||
|
|
||||||
|
SR-IOV agent configuration::
|
||||||
|
|
||||||
|
[sriov_nic]
|
||||||
|
physical_device_mappings = physnet0:eth0,physnet0:eth1,physnet1:eth2
|
||||||
|
|
||||||
|
resource_provider_bandwidths = eth0:EGRESS:INGRESS,eth1:EGRESS:INGRESS
|
||||||
|
|
||||||
|
How required bandwidth for a Neutron port is modeled
|
||||||
|
----------------------------------------------------
|
||||||
|
|
||||||
|
The required minimum network bandwidth needed for a port is modeled
|
||||||
|
defining a QoS policy along with one or more QoS minimum bandwidth rules
|
||||||
|
[4]_. However neither Nova nor Placement know about any QoS policy
|
||||||
|
rule directly. Neutron translates the resource needs of a port into a
|
||||||
|
standard port attribute describing the needed resource classes, amounts
|
||||||
|
and traits.
|
||||||
|
|
||||||
|
In this spec we assume that a single port requests resources from a
|
||||||
|
single RP. Later we may allow a port to request resources from multiple RPs.
|
||||||
|
|
||||||
|
The resources needed by a port are expressed via the new attribute
|
||||||
|
``resource_request`` extending the port as follows.
|
||||||
|
|
||||||
|
Figure: resource_request in the port
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
{"port": {
|
||||||
|
"status": "ACTIVE",
|
||||||
|
"name": "port0",
|
||||||
|
...
|
||||||
|
"device_id": "5e3898d7-11be-483e-9732-b2f5eccd2b2e",
|
||||||
|
"resource_request": {
|
||||||
|
"resources": {
|
||||||
|
"NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND": 1000,
|
||||||
|
"NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND": 1000 },
|
||||||
|
"required": ["CUSTOM_PHYSNET_NET0", "CUSTOM_VNIC_TYPE_NORMAL"]}
|
||||||
|
}}
|
||||||
|
|
||||||
|
The ``resource_request`` port attribute will be implemented by a new
|
||||||
|
API extension named ``port-resource-request``.
|
||||||
|
|
||||||
|
If a nova server boot request has a port defined and this port has a
|
||||||
|
``resource_request`` attribute, that means the Placement Service must
|
||||||
|
enforce the minimum bandwidth requirements.
|
||||||
|
|
||||||
|
A host will satisfy the requirements if it has a physical network
|
||||||
|
interface RP with the following properties. First, inventory of the
|
||||||
|
new ``NET_BANDWIDTH_*`` resource classes and there is enough bandwidth
|
||||||
|
available as shown in the 'Networking RP model' section. If a host doesn't
|
||||||
|
have an inventory of the requested network bandwidth resource class(es),
|
||||||
|
it won't be a candidate for the scheduler. Second, the physical network
|
||||||
|
interface RP must have all the traits associated with it as listed in the
|
||||||
|
``required`` field of the ``resource_request`` attribute.
|
||||||
|
|
||||||
|
We propose two kinds of custom traits. First to express and request support
|
||||||
|
for certain ``vnic_types``. This trait uses prefix ``CUSTOM_VNIC_TYPE_``.
|
||||||
|
The ``vnic_type`` is then appended in all upper case.
|
||||||
|
For example:
|
||||||
|
|
||||||
|
* ``CUSTOM_VNIC_TYPE_NORMAL``
|
||||||
|
* ``CUSTOM_VNIC_TYPE_DIRECT``
|
||||||
|
|
||||||
|
Second we'll use traits to decide if a segment of a network (identified
|
||||||
|
by its physnet name) is connected on the compute host considered in
|
||||||
|
scheduling. This trait uses prefix ``CUSTOM_PHYSNET_``. The physnet name
|
||||||
|
is then appended in all upper case, any characters prohibited in traits must
|
||||||
|
be replaced with underscores.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
* ``CUSTOM_PHYSNET_PUBLIC``
|
||||||
|
* ``CUSTOM_PHYSNET_NET1``
|
||||||
|
|
||||||
|
If a nova server boot request has a network defined and this network has
|
||||||
|
a ``qos_minimum_bandwidth_rule``, that boot request is going to fail as
|
||||||
|
documented in the 'Scoping' section of [2]_ until Nova is refactored to
|
||||||
|
create the port earlier (that is before scheduling). See also `SPEC:
|
||||||
|
Prep work for Network aware scheduling (Pike)` [11]_.
|
||||||
|
|
||||||
|
For multi-segment Neutron networks each static segment's physnet trait
|
||||||
|
must be included in the ``resource_request`` attribute in a format that
|
||||||
|
we can only specify after Placement supports request matching logic
|
||||||
|
of ``any(traits)``. See `any-traits-in-allocation_candidates-query` [9]_.
|
||||||
|
|
||||||
|
Reporting Available Resources
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
Some details of reporting are described in the following sections of [2]_:
|
||||||
|
|
||||||
|
* Neutron agent first start
|
||||||
|
|
||||||
|
* Neutron agent restart
|
||||||
|
|
||||||
|
* Finding the compute RP
|
||||||
|
|
||||||
|
Details internal to Neutron are the following:
|
||||||
|
|
||||||
|
Networking RP model
|
||||||
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
We made the following assumptions:
|
||||||
|
|
||||||
|
* Neutron supports the ``multi-provider`` extension therefore a single
|
||||||
|
logical network might map to more than one physnet. Physnets of
|
||||||
|
non-dynamic segments are known before port binding. For the sake of
|
||||||
|
simplicity in this spec we assume each segment directly connected to a
|
||||||
|
physical interface with a mimimum bandwidth guarantee is a non-dynamic
|
||||||
|
segment. Therefore those physnets can be included in the port's
|
||||||
|
``resource_request`` as traits.
|
||||||
|
|
||||||
|
* Multiple SRIOV physical functions (PFs) can give access to the same
|
||||||
|
physnet on a given compute but those PFs always implement the same
|
||||||
|
``vnic_type``. This means that using only physnet traits in Placement
|
||||||
|
and in the port's resource request does not select one PF unambiguously
|
||||||
|
but it is not a problem as both PFs are equivalent from resource
|
||||||
|
allocation perspective.
|
||||||
|
|
||||||
|
* Two different backends (e.g. SRIOV and OVS) can give access to the same
|
||||||
|
physnet on the same compute host. In this case Neutron selects the
|
||||||
|
backend based on ``vnic_type`` of the Neutron port specified by the
|
||||||
|
end user during port create. Therefore physical device selection during
|
||||||
|
scheduling should consider the ``vnic_type`` of the port as well. This
|
||||||
|
can be done via the ``vnic_type`` based traits previously described.
|
||||||
|
|
||||||
|
* Two different backends (e.g. OVS and LinuxBridge) can give access to
|
||||||
|
the same physnet on the same compute host while they are also
|
||||||
|
implementing the same ``vnic_type`` (e.g. ``normal``). In this
|
||||||
|
case the backend selection in Neutron is done according to
|
||||||
|
the order of ``mechanism_drivers`` configured by the admin in
|
||||||
|
``neutron.conf``. Therefore physical device selection during scheduling
|
||||||
|
should consider the same preference order. As the backend order is
|
||||||
|
just a preference but not a hard rule supporting this behavior is *out
|
||||||
|
of scope* in this spec but in theory it can be done by a new weigher
|
||||||
|
in nova-scheduler.
|
||||||
|
|
||||||
|
Based on these assumptions, Neutron will construct in Placement a RP tree
|
||||||
|
as follows:
|
||||||
|
|
||||||
|
Figure: networking RP model
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
Compute RP (name=hostname)
|
||||||
|
+
|
||||||
|
|
|
||||||
|
+-------+Network agent RP (for OVS agent), uuid = agent_uuid
|
||||||
|
| inventory: # later, model number of OVS ports here
|
||||||
|
| +
|
||||||
|
| |
|
||||||
|
| +------+Physical network interface RP,
|
||||||
|
| | uuid = uuid5(hostname:br0)
|
||||||
|
| | traits: CUSTOM_PHYSNET_1, CUSTOM_VNIC_TYPE_NORMAL
|
||||||
|
| | inventory:
|
||||||
|
| | {NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND: 10000,
|
||||||
|
| | NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND: 10000}
|
||||||
|
| |
|
||||||
|
| +------+Physical network interface RP,
|
||||||
|
| uuid = uuid5(hostname:br1)
|
||||||
|
| traits: CUSTOM_PHYSNET_2, CUSTOM_VNIC_TYPE_NORMAL
|
||||||
|
| inventory:
|
||||||
|
| {NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND: 10000,
|
||||||
|
| NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND: 10000}
|
||||||
|
|
|
||||||
|
+-------+Network agent RP (for LinuxBridge agent), uuid = agent_uuid
|
||||||
|
| +
|
||||||
|
| |
|
||||||
|
| +------+Physical network interface RP,
|
||||||
|
| uuid = uuid5(hostname:virbr0)
|
||||||
|
| traits: CUSTOM_PHYSNET_1, CUSTOM_VNIC_TYPE_NORMAL
|
||||||
|
| inventory:
|
||||||
|
| {NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND: 10000,
|
||||||
|
| NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND: 10000}
|
||||||
|
|
|
||||||
|
+-------+Network agent RP (for SRIOV agent), uuid = agent_uuid
|
||||||
|
+
|
||||||
|
|
|
||||||
|
+------+Physical network interface RP,
|
||||||
|
| uuid = uuid5(hostname:eth0)
|
||||||
|
| traits: CUSTOM_PHYSNET_2, CUSTOM_VNIC_TYPE_DIRECT
|
||||||
|
| inventory:
|
||||||
|
| {VF: 8, # VF resource is out of scope
|
||||||
|
| NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND: 10000,
|
||||||
|
| NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND: 10000}
|
||||||
|
|
|
||||||
|
+------+Physical network interface RP,
|
||||||
|
| uuid = uuid5(hostname:eth1)
|
||||||
|
| traits: CUSTOM_PHYSNET_2, CUSTOM_VNIC_TYPE_DIRECT
|
||||||
|
| inventory:
|
||||||
|
| {VF: 8, # VF resource is out of scope
|
||||||
|
| NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND: 10000,
|
||||||
|
| NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND: 10000}
|
||||||
|
|
|
||||||
|
+------+Physical network interface RP,
|
||||||
|
uuid = uuid5(hostname:eth2)
|
||||||
|
traits: CUSTOM_PHYSNET_3, CUSTOM_VNIC_TYPE_DIRECT
|
||||||
|
inventory:
|
||||||
|
{VF: 8, # VF resource is out of scope
|
||||||
|
NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND: 10000,
|
||||||
|
NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND: 10000}
|
||||||
|
|
||||||
|
Custom traits will be used to indicate which physical network a given
|
||||||
|
Physical network interface RP is connected to, as previously described.
|
||||||
|
|
||||||
|
Custom traits will be used to indicate which ``vnic_type`` a backend
|
||||||
|
supports so different backend technologies can be distinguished, as previously
|
||||||
|
described.
|
||||||
|
|
||||||
|
The current purpose of agent RPs is to allow us detecting the deletion of an
|
||||||
|
RP. Later we may also start to model agent-level resources and capabilities.
|
||||||
|
|
||||||
|
Report directly or indirectly
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Considering only agent-based MechanismDrivers we have two options:
|
||||||
|
|
||||||
|
* direct: The agent reports resource providers, traits and inventories
|
||||||
|
directly to the Placement API.
|
||||||
|
|
||||||
|
* indirect: The agent reports resource providers, traits and
|
||||||
|
inventories to Neutron-server which in turn reports the information
|
||||||
|
to the Placement API.
|
||||||
|
|
||||||
|
Both have pros and cons. Direct reporting involves fewer components
|
||||||
|
therefore it's more efficient and more reliable. On the other hand
|
||||||
|
freshness of the resource information may be important information in
|
||||||
|
itself. Nova has the compute heartbeat mechanism to ensure scheduler
|
||||||
|
considers the live Placement records only. In case freshness of Neutron
|
||||||
|
resource information is needed the only practical way is to build
|
||||||
|
on the Neutron-agent heartbeat mechanism. Otherwise the reporting
|
||||||
|
and heartbeat mechanism would take different paths. If resource
|
||||||
|
information is reported through the agent heartbeat mechanism then
|
||||||
|
freshness of resource information is known by Neutron-server and other
|
||||||
|
components (for example a nova scheduler filter) could query it from
|
||||||
|
Neutron-server.
|
||||||
|
|
||||||
|
When Placement and nova-scheduler choose to allocate the requested
|
||||||
|
bandwidth on a particular network resource provider (that represents
|
||||||
|
a physical network interface) that choice has its implications on:
|
||||||
|
|
||||||
|
* Neutron-server's choice of a Neutron backend for a port.
|
||||||
|
(vif_type, vif_details)
|
||||||
|
* Neutron-agent's choice of a physical network interface.
|
||||||
|
(Only in some cases like when multiple SR-IOV PFs back one physnet.)
|
||||||
|
|
||||||
|
The later choices (of neutron-server and neutron-agent) must respect the
|
||||||
|
first (in the allocation), otherwise resources could be used somewhere
|
||||||
|
else than allocated.
|
||||||
|
|
||||||
|
The choice in the allocation can be easily communicated to Neutron
|
||||||
|
using the chosen network resource provider UUID if this UUID is known
|
||||||
|
to both Neutron-server and Neutron-agent. If available resources are
|
||||||
|
reported directly from Neutron-agent to Placement then Neutron-server
|
||||||
|
may not know about resource provider UUIDs. Therefore indirect reporting
|
||||||
|
is recommended.
|
||||||
|
|
||||||
|
Even when reporting indirectly we must keep the (Neutron reported part
|
||||||
|
of the) content of the Placement DB under the as direct as possible control
|
||||||
|
of Neutron agents. It is best to keep Neutron-server in a basically
|
||||||
|
proxy-like role.
|
||||||
|
|
||||||
|
Content and format of resource information reported (from agent to server)
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
We propose to extend the ``configurations`` field of the agent heartbeat
|
||||||
|
RPC message.
|
||||||
|
|
||||||
|
Beyond the agent's hardcoded set of supported ``vnic_types`` the following
|
||||||
|
agent configuration options are the input to extend the heartbeat message:
|
||||||
|
|
||||||
|
* ``bridge_mappings`` or ``physical_device_mappings``
|
||||||
|
* ``resource_provider_bandwidths``
|
||||||
|
* If needed further options controlling inventory attributes like:
|
||||||
|
``allocation_ratio``, ``min_unit``, ``max_unit``,
|
||||||
|
``step_size``, ``reserved``
|
||||||
|
|
||||||
|
Based on the input above the ``configurations`` dictionary of the
|
||||||
|
heartbeat message shall be extended with the following keys:
|
||||||
|
|
||||||
|
* (custom) ``traits``
|
||||||
|
* ``resource_providers``
|
||||||
|
* ``resource_provider_inventories``
|
||||||
|
* ``resource_provider_traits``
|
||||||
|
|
||||||
|
The values must be (re-)evaluated after the agent configuration is
|
||||||
|
(re-)read. Each heartbeat message shall contain all items known by
|
||||||
|
the agent at that time. The extension of the ``configurations`` fields
|
||||||
|
intentionally mirrors the structure of the placement API (and does not
|
||||||
|
directly mirror the agent configuration format, though can be derived
|
||||||
|
from it). The values of these fields shall be formatted so they can be
|
||||||
|
readily pasted into requests sent to the Placement API.
|
||||||
|
|
||||||
|
Agent resource providers shall be identified by their already existing
|
||||||
|
Neutron agent UUIDs as shown in the 'Networking RP model' section above.
|
||||||
|
|
||||||
|
Neutron-agents shall generate UUIDs for physical network interface
|
||||||
|
resource providers. Version 5 (name-based) UUIDs should be used by
|
||||||
|
hashing names like ``HOSTNAME:OVS-BRIDGE-NAME`` for ovs-agent and
|
||||||
|
``HOSTNAME:PF-NAME`` for sriov-agent since this way the UUIDs will
|
||||||
|
be stable through an agent restart.
|
||||||
|
|
||||||
|
Please note that the agent heartbeat message contains traits and their
|
||||||
|
associations with resource providers, but there's no traits directly
|
||||||
|
listed in the agent configurations. This is possible because both physnet
|
||||||
|
and ``vnic_type`` traits we'll use can be inferred from already known
|
||||||
|
pieces of information.
|
||||||
|
|
||||||
|
Synchronization of resource information reported
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Ideally Neutron-agent, Neutron-server and Placement must have the same
|
||||||
|
view of resources. We propose the following synchronization mechanism
|
||||||
|
between Neutron-server and Placement:
|
||||||
|
|
||||||
|
Each time Neutron-server learns of a new agent it diffs the heartbeat
|
||||||
|
message (for traits, providers, inventories and trait associations)
|
||||||
|
with all objects found in Placement under the agent RP. It creates
|
||||||
|
the objects missing from Placement. It deletes those missing from the
|
||||||
|
heartbeat. It updates the objects whose attributes are different in
|
||||||
|
Placement and the heartbeat.
|
||||||
|
|
||||||
|
At subsequent heartbeats received Neutron-server diffs the new and
|
||||||
|
the previous heartbeats. If nothing changed no Placement request
|
||||||
|
is sent. If a change in heartbeats is detected Neutron sends the
|
||||||
|
appropriate Placement request based on the diff of heartbeats using
|
||||||
|
the last seen Placement generation number. If the Placement request is
|
||||||
|
successful Neutron stores the new generation number. If the request
|
||||||
|
fails with generation conflict Neutron falls back to diffing between
|
||||||
|
Placement and the heartbeat.
|
||||||
|
|
||||||
|
Progress or block until the Compute host RP is created
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Neutron-server cannot progress to report resource information until
|
||||||
|
the relevant Nova-compute host RP is created. (The reason being the
|
||||||
|
Nova-compute host RP UUID is unpredictable to Neutron.) We believe
|
||||||
|
that while waiting for the Nova-compute host RP a Neutron-server can
|
||||||
|
progress with its other functions.
|
||||||
|
|
||||||
|
Port binding changes
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
The order of relevant operations are the following:
|
||||||
|
|
||||||
|
1. Placement DB is populated with both compute and network resource
|
||||||
|
information.
|
||||||
|
|
||||||
|
2. Triggered by a nova server boot Placement selects a list of candidates.
|
||||||
|
|
||||||
|
3. Scheduler chooses exactly one candidate and allocates it in a single
|
||||||
|
transaction. (In some complex nova server move cases the conductor
|
||||||
|
may allocate but that's unimportant here.)
|
||||||
|
|
||||||
|
4. Neutron binds the port.
|
||||||
|
|
||||||
|
In steps (2) and (3) the selection includes the choice of RPs representing
|
||||||
|
network backend objects (beyond the obvious choice of compute host). This
|
||||||
|
naturally conflicts with Neutron's current port binding mechanism.
|
||||||
|
|
||||||
|
To solve the conflict we must make sure:
|
||||||
|
|
||||||
|
* Placement must produce candidates whose ports later can be bound by
|
||||||
|
Neutron. (At least with roughly the same probability as scheduler is
|
||||||
|
able today.)
|
||||||
|
|
||||||
|
* The choices made by Placement and made by Neutron port binding must
|
||||||
|
be the same. Therefore the selection must be coordinated.
|
||||||
|
|
||||||
|
If more than one Neutron backend can satisfy the resource requirements
|
||||||
|
of a port on the same host then it cannot happen that Placement chooses
|
||||||
|
one, but Neutron binds another.
|
||||||
|
|
||||||
|
One way to do that is for Neutron-server to read (from Placement)
|
||||||
|
the allocation of the port currently being bound and let it influence
|
||||||
|
the binding. However this introduces a slow remote call in the middle of
|
||||||
|
port binding therefore it is not recommended.
|
||||||
|
|
||||||
|
Another way is to pass down part of the allocation record in the
|
||||||
|
call/message chain leading to the port binding PUT request. In the
|
||||||
|
port binding PUT request use the binding_profile attribute. That way
|
||||||
|
we would not need a new remote call, just to add an argument/payload
|
||||||
|
to already existing calls/messages.
|
||||||
|
|
||||||
|
The Nova spec ([2]_) proposes that the resources requested for a port
|
||||||
|
are included in a numbered request group (see `Granular Resource Request
|
||||||
|
Syntax` [8]_). A numbered request group is always matched by a single
|
||||||
|
resource. In general Neutron needs to know which resource provider
|
||||||
|
matched the numbered request group of the port.
|
||||||
|
|
||||||
|
To express the choice made by placement and nova-scheduler we propose
|
||||||
|
to add an ``allocation`` entry to ``binding_profile``::
|
||||||
|
|
||||||
|
{
|
||||||
|
"name": "port with minimum bw being bound",
|
||||||
|
"id": ...,
|
||||||
|
"network_id": ...,
|
||||||
|
"binding_profile": { "allocation": RP_UUID }
|
||||||
|
}
|
||||||
|
|
||||||
|
If a port has the ``resource_request`` attribute then it must
|
||||||
|
be bound with ``binding_profile.allocation`` supplied. Otherwise
|
||||||
|
``binding_profile.allocation`` must not be present.
|
||||||
|
|
||||||
|
Usually ML2 port binding tries the mechanism drivers in their
|
||||||
|
configuration order until one succeeds to set the binding. However
|
||||||
|
if a port being bound has ``binding_profile.allocation`` then only a
|
||||||
|
single mechanism driver can be tried - the one implicitly identified by
|
||||||
|
``RP_UUID``.
|
||||||
|
|
||||||
|
In case of hierarchical port binding ``binding_profile.allocation``
|
||||||
|
is meant to drive the binding only on the binding level that represents
|
||||||
|
the closest physical interface to the nova server.
|
||||||
|
|
||||||
|
Out of Scope
|
||||||
|
------------
|
||||||
|
|
||||||
|
Minimum bandwidth rule update:
|
||||||
|
When a minimum bandwidth rule is updated, the ML2 plugin will list the bound
|
||||||
|
ports with this QoS policy and rule attached and will update the Allocation
|
||||||
|
value. The `consumer_id` of each Allocation is the `device_id` of the port.
|
||||||
|
This is out of scope in this spec and should be done during the work related
|
||||||
|
to `os-vif migration tasks` [5]_.
|
||||||
|
|
||||||
|
Trunk port:
|
||||||
|
Subports of a trunk port are unknown to Nova. Allocating resources for
|
||||||
|
subports is a task for Neutron only. This is out of scope too.
|
||||||
|
|
||||||
|
Testing
|
||||||
|
-------
|
||||||
|
|
||||||
|
* Unit tests.
|
||||||
|
|
||||||
|
* Functional tests.
|
||||||
|
|
||||||
|
* Agent-server interactions.
|
||||||
|
|
||||||
|
* Fullstack.
|
||||||
|
|
||||||
|
* Handling agent failure cases.
|
||||||
|
|
||||||
|
* Tempest API tests.
|
||||||
|
|
||||||
|
* Port API extended with ``resource_request``.
|
||||||
|
* Extensions of ``binding_profile``.
|
||||||
|
|
||||||
|
* Tempest scenario tests.
|
||||||
|
|
||||||
|
* End-to-end feature test.
|
||||||
|
|
||||||
|
In test frameworks where we cannot depend on Nova we can mock it away by:
|
||||||
|
|
||||||
|
* Creating and binding the port as Nova would have done it, including.
|
||||||
|
|
||||||
|
* Setting its ``binding_profile``.
|
||||||
|
* Setting its ``binding_host_id`` as if Placement and Scheduler would
|
||||||
|
have chosen the host.
|
||||||
|
|
||||||
|
Upgrade
|
||||||
|
-------
|
||||||
|
|
||||||
|
* When upgrading a system with ``minimum_bandwidth`` rules to support
|
||||||
|
both data plane and placement enforcement we see two options:
|
||||||
|
|
||||||
|
1. It is the responsibility of the admin to create the
|
||||||
|
allocations in Placement for all ports using ``minimum_bandwidth``
|
||||||
|
rules. Please note: this assumes that bandwidth is not overallocated
|
||||||
|
at the time of upgrade.
|
||||||
|
|
||||||
|
2. Add tooling for 1. as described in the 'Upgrade impact' section of [2]_.
|
||||||
|
|
||||||
|
* The desired upgrade order of components is the following:
|
||||||
|
Placement, Nova, Neutron
|
||||||
|
|
||||||
|
If for some reason the reverse Neutron-Nova order is desired, then the
|
||||||
|
Neutron port API extension of ``resource_request`` must not be turned on
|
||||||
|
until both components are upgraded.
|
||||||
|
|
||||||
|
* Neutron-server must be able to handle agent heartbeats both with
|
||||||
|
and without resource information in the ``configurations``.
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
These work items are designed so Neutron end-to-end behavior can be
|
||||||
|
prototyped and tested independently of the progress of related work in
|
||||||
|
Nova. But part of it depends on already available Placement features.
|
||||||
|
|
||||||
|
* Extend agent heartbeat configuration with resource provider information.
|
||||||
|
* (We already have it): Persist extended agent configuration reported via
|
||||||
|
heartbeat.
|
||||||
|
* (We already have it): Placement client in neutron-lib for the use of
|
||||||
|
neutron-server.
|
||||||
|
* Neutron-server initially diffs resource info reported by agent against
|
||||||
|
Placement.
|
||||||
|
* Neutron-server diffs consequent agent heartbeat configurations.
|
||||||
|
* Neutron-server turns the diffs into Placement requests (with generation
|
||||||
|
handling).
|
||||||
|
* Extend rule ``qos_minimum_bandwidth_rule`` with direction ``ingress``.
|
||||||
|
* Extend port with ``resource_request`` based on QoS rule
|
||||||
|
mimimum-bandwidth-placement.
|
||||||
|
* Make the reported agent configuration queriable so neutron-server can
|
||||||
|
infer which backend is implied in the RP allocated (as in
|
||||||
|
``binding_profile.allocation``).
|
||||||
|
* In binding a port with ``binding_profile.allocation`` replace the list of
|
||||||
|
tried mechanism drivers with the one-element list of the inferred backend.
|
||||||
|
* (We already have it): Send ``binding_profile`` to all agents.
|
||||||
|
* In sriov-agent force the choice of PF as implied by the RP allocated.
|
||||||
|
|
||||||
|
For each of the above:
|
||||||
|
|
||||||
|
* Tests.
|
||||||
|
* Documentation: api-ref, devref, networking guide.
|
||||||
|
* Release notes.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
.. [1] `Placement API`:
|
||||||
|
https://docs.openstack.org/nova/latest/user/placement.html
|
||||||
|
|
||||||
|
.. [2] `SPEC: Network bandwidth resource provider`:
|
||||||
|
https://review.openstack.org/502306
|
||||||
|
|
||||||
|
.. [3] `Add a spec for strict minimum bandwidth support`:
|
||||||
|
https://review.openstack.org/396297
|
||||||
|
|
||||||
|
.. [4] `[RFE] Minimum bandwidth support (egress)`:
|
||||||
|
https://bugs.launchpad.net/neutron/+bug/1560963
|
||||||
|
|
||||||
|
.. [5] `BP: os-vif migration tasks`:
|
||||||
|
https://blueprints.launchpad.net/neutron/+spec/os-vif-migration
|
||||||
|
|
||||||
|
.. [6] `Newton Release Notes`:
|
||||||
|
https://docs.openstack.org/releasenotes/neutron/newton.html
|
||||||
|
|
||||||
|
.. [7] `Networking Guide: SR-IOV`:
|
||||||
|
https://docs.openstack.org/neutron/queens/admin/config-sriov.html#enable-neutron-sriov-agent-compute
|
||||||
|
|
||||||
|
.. [8] `Granular Resource Request Syntax`:
|
||||||
|
https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/granular-resource-requests.html
|
||||||
|
|
||||||
|
.. [9] `any-traits-in-allocation_candidates-query`:
|
||||||
|
https://blueprints.launchpad.net/nova/+spec/any-traits-in-allocation-candidates-query
|
||||||
|
|
||||||
|
.. [10] `QoS devref`:
|
||||||
|
https://docs.openstack.org/neutron/latest/contributor/internals/quality_of_service.html#agent-backends
|
||||||
|
|
||||||
|
.. [11] `SPEC: Prep work for Network aware scheduling (Pike)`:
|
||||||
|
https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/prep-for-network-aware-scheduling-pike.html
|
Loading…
Reference in New Issue