Drive the choice of mechanism driver during binding as inferred from
the resource provider allocated by nova and as told to neutron via the
port's binding:profile.
As discussed on a neutron qos irc meeting some time ago
this patch introduces a new assumption on bind_port() implementations.
That is an implementation of bind_port() in any mech driver supporting
Guaranteed Minimum Bandwidth bind_port() must not have a non-idempotent
side effect. Because the last binding level will be redone for a 2nd
time with a narrowed down list of mechanism drivers. And if the 2nd call
does not give the same result as the first all kind of weird things can
happen.
Change-Id: I2b7573ec6795170ce45a13d5d0ad7844fb85182d
Depends-On: https://review.openstack.org/574781
Depends-On: https://review.openstack.org/635160
Partial-Bug: #1578989
See-Also: https://review.openstack.org/502306 (nova spec)
See-Also: https://review.openstack.org/508149 (neutron spec)
Sometimes, when the OVSDB is too loaded (that could happen during the
functional tests), there is a delay between the OVSDB post transaction
end and when the register (new or updated) can be read. Although this is
something that should not happen (considering the OVSDB is transactional),
tests should deal with this inconvenience and provide a robust method to
retrieve a value and at the same time check the value. This new method
should provide a retrieving mechanism to read again the value in case of
discordance.
In order to solve the gate problem ASAP, another bug is fixed in this
patch: to skip the QoS removal when OVS agent is initialized during
funtional tests
When executing functional tests, several OVS QoS policies specific for
minimum bandwidth rules [1]. Because during the functional tests
execution several threads can create more than one minimum bandwidth
QoS policy (something in a production environment cannot happen), the
OVS QoS driver must skip the execution of [2] to avoid removing other
QoS created in parellel in other tests.
This patch is marking as unstable "test_min_bw_qos_policy_rule_lifecycle"
and "test_bw_limit_qos_port_removed". Those tests will be investigated
once the CI gates are stable.
[1] Those QoS policies are created only to hold minimum bandwidth rules.
Those policies are marked with:
external_ids: {'_type'='minimum_bandwidth'}
[2] d6fba30781/neutron/plugins/ml2/drivers/openvswitch/agent/extension_drivers/qos_driver.py (L43)
Closes-Bug: #1818613
Closes-Bug: #1818859
Related-Bug: #1819125
Change-Id: Ia725cc1b36bc3630d2891f86f76b13c16f6cc37c
In functional test environment, seems L3 agent can not handle
such 30+ routers in the test test_router_processing_pool_size.
It still meets timeout for some processing procedure.
For now, router initialize/process/delete is not our test purpose
for this case, so we just mock them.
Closes-Bug: #1816239
Change-Id: I85dc6fd9d98a6a13bbf35ee2e67ce6f69be48dde
Current version used is old and does not work on Bionic nodes. But as
Xenial kernels do not include the fix for local VXLAN tunnels
(bug/1684897), we still have to use a locally compiled version.
On Xenial nodes, the Queens UCA repository has openvswitch 2.9.0
On Bionic nodes, we have 2.9.2
So use the latest 2.9 release for fullstack testing
Change-Id: Ifb61daa1f14969a1d09379599081e96053488f9f
Closes-Bug: #1818632
This patch adds the support for network segment range CRUD. Subsequent
patches will be added to use this network segment range on segment
allocation if this extension is loaded.
Changes include:
- an API extension which exposes the segment range to be administered;
- standard attributes with tagging support for the new resource;
- a new service plugin "network_segment_range" for the feature
enabling/disabling;
- a new network segment range DB table model along with operation
logic;
- Oslo Versioned Objects for network segment range data model;
- policy-in-code support for network segment range.
Co-authored-by: Allain Legacy <Allain.legacy@windriver.com>
Partially-implements: blueprint network-segment-range-management
Change-Id: I75814e50b2c9402fe6776229d469745d7a72290b
It may helpdebug some issues related to keepalived and/or
dnsmasq which are logging to journal only.
Change-Id: I42c311f9111e0a0d1a6ea3a7aeab0fef8d77c549
While the initial version of this patch removed neutron.db.api, a
different duplicate patch [1] landed first.
This patch cleans up the remining references to neutron.db.api
including those in the docs and comments.
[1] https://review.openstack.org/#/c/635978/
Change-Id: I5f911f4c6a1fc582a9c1006ec5e2880853ff2909
In [1], a new init parameter was introduced in the class
OVSAgentExtensionAPI. This change in the extension API can break
backwards compatibility with other projects (networking_sfc and
bagpipe are affected).
Because this parameter is needed only in qos_driver extension when
calling OVSAgentExtensionAPI.request_phy_brs() (to retrieve the
physical bridges list), we can make this new parameter optional not
to break other stadium projects. When the OVS it's initialized
(in-tree agent), the extension is called with the three needed
parameters.
[1] https://review.openstack.org/#/c/406841/22/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_agent_extension_api.py@43
Change-Id: I31d1a31a935fdcdd12e13e1bc58f7c5f640ca092
Closes-Bug: #1818693
In case when L3 agent is running in dvr_snat mode on compute node,
it is like that e.g. in some of the gate jobs, it may happen that
same router is scheduled to be in standby mode on compute node and
on same compute node there is instance connected to it.
So in such case metadata proxy needs to be spawned in router namespace
even if it is in standby mode.
Change-Id: Id646ab2c184c7a1d5ac38286a0162dd37d72df6e
Closes-Bug: #1817956
Closes-Bug: #1606741
Unfortunately it still sometimes fails because restart was still happened in
very short pause between agents.
I will need to figure out some other possible solution for that issue.
This reverts commit bdd35405548c1d60072cd71ef648a724bf1d31d2.
Change-Id: Iaf9d1be3255e941c5fe227943535ab7c6905253c
Today, if live migration has failed after an inactive
binding was created on the destination node but before
the activation of the created binding, the port's binding level
for the destination host is not cleared during nova's API call
to neutron to delete the port binding.
This causes future attempts to perform live migration
of the instance to the same host to fail.
This change removes port binding level object during port binding
deletion.
Closes-Bug: #1815345
Change-Id: Idd55f7d24a2062c08ac8a0dc2243625632d962a5
In netlink_lib functional tests module there are listed conntrack
entries and those entries are assert to some expected list.
It may happen that sometimes some additional entries from other
tests will also be in the list and that cause failures of
netlink_lib tests.
So this patch changes way how those assertions are done. For now
it will check if each of expected entries is in entries list and
in case of delete entries tests, it will also check if any of
deleted entries isn't actually in list.
Change-Id: I30c18f141a8356b060902e6493ba0657b21619ad
Closes-Bug: #1817295
We spawn a lot of neutron-servers, on all but the smallest systems.
It's often hard to tell which are busy/overloaded or spinning.
Add an option to set the process names to their role.
This has a small chance of breaking existing scripting, depending
how they're parsing ps output.
Sample output:
$ ps xw | grep neutron-server
1126 pts/2 S+ 0:00 grep --color=auto neutron-server
25355 ? Ss 0:26 /usr/bin/python /usr/local/bin/neutron-server \
--config-file /etc/neutron/neutron.conf \
--config-file /etc/neutron/plugins/ml2/ml2_conf.ini
25368 ? S 0:00 neutron-server: api worker
25369 ? S 0:00 neutron-server: api worker
25370 ? S 0:00 neutron-server: api worker
25371 ? S 0:00 neutron-server: api worker
25372 ? S 0:02 neutron-server: rpc worker
25373 ? S 0:02 neutron-server: rpc worker
25374 ? S 0:02 neutron-server: services worker
The "normal" looking ps output is the main parent.
Partial-Bug: #1816485
Depends-On: https://review.openstack.org/637119
Change-Id: I0e664a5f8e792d85b8f5483fb8c6f1cd59a677cd
In some cases on dvr ha router it may happend that
RouterInfo.radvd.disable() will be called even if
radvd DaemonMonitor wasn't initialized earlier and it is
None.
To prevent exception in such case, this patch adds check
if DaemonMonitor is not None to call disable() method on
it.
Change-Id: Ib9b5f4eeae6e4cebcb958928e6521cf1d69b049c
Closes-Bug: #1817435
We recently exposed the privsep opts for config generator use, so
projects that depend on oslo.privsep should include them in their
sample configs.
Change-Id: Ibaef2e2848855cd8ef987ec58457220911ad7c69
In functional tests of HA router, in
L3AgentTestFramework._router_lifecycle method there was assertion
that HA router at the beginning don't have IPs configured in
router's namespace.
That could lead to test failure because sometimes keepalived process
switched router from standby to master before this assertion was
done and IPs were already configured.
There is alsmost no value in doing this assertion as it's just after
router was created so it is "normal" that there is no IP addresses
configured yet.
Because of that this patch removes this assertion.
Change-Id: Ib509a7226eb94483a0aaf2d930f329e419b8e135
Closes-Bug: #1816489
This service plugin synchronizes ML2 mechanism driver agents' resource
information to Placement. To use this service an agent must add
'resource_provider_bandwidths' to the 'configurations' field of its
RPC heartbeat. It also may add 'resource_provider_inventory_defaults'
to fine tune Placement inventory parameters. Again to use this service a
mechanism driver must implement get_standrd_device_mappings() and allocate
a UUID as mechanism driver property 'resource_provider_uuid5_namespace'.
The synchronization is triggered by:
* any new agent object in the DB
* restart of an agent (via 'start_flag' in the RPC heartbeat)
* if an agent's 'resources_synced' attribute is not True (None/False)
The latter should autoheal transient errors of the synchronization
process. That is if a sync attemp fails then we store
resources_synced=False which triggers a sync retry at each new heartbeat
message until a sync attempt finally succeeds and we can set
resources_synced=True.
Since this code functionally depends on ML2 we can also consider making
it part of ML2, but at the moment it is a service plugin for better
decoupling. Even if you load the service plugin the logic gracefully
degrades for heartbeat messages not containing resource provider info.
If needed the sync can be forced in multiple ways. First, if you restart
an agent then the RPs belonging to that agent will be re-synced. You may
also delete the agent by 'openstack network agent delete' and let the
next heartbeat message re-create the agent object. On re-creation the
RPs belonging to that agent will be re-synced. On the other hand a
neutron-server restart does not trigger a re-sync in any way. Depending
on the trade-off between the admin's needs to force re-syncs and the
performance of (not absolutely necessary) Placement updates re-sync
conditions may be further fine tuned.
Example config for neutron-server:
neutron.conf:
[DEFAULT]
service_plugins = placement
Change-Id: Ia1ff6f7559ab77913ddb9c3b134420a401b8eb43
Co-Authored-By: Lajos Katona <lajos.katona@ericsson.com>
Depends-On: https://review.openstack.org/586567
Partial-Bug: #1578989
See-Also: https://review.openstack.org/502306 (nova spec)
See-Also: https://review.openstack.org/508149 (neutron spec)
In fullstack test
test_l3_agent.test_ha_router_restart_agents_no_packet_lost
restarts of L3 agents where done in 2 steps:
1. restart of all standby agents,
2. restart of all active agents.
It was done like that because of bug [1] and [2].
Now when those bugs are fixed, lets change this test to
some "more probable" scenario. So agents will be restarted
without checking which one is master and which is standby.
However agents will be restarted one by one instead of doing
restarts in (almost) exactly same time.
Restarting all agents in same time caused still some issue
on my local testing environment but I suspect that it might be
some problem related to the nature of fullstack tests and to the
fact that 2 different "nodes" are in fact simulated by namespaces only.
[1] https://bugs.launchpad.net/neutron/+bug/1776459
[2] https://bugs.launchpad.net/neutron/+bug/1798475
Change-Id: I731211b56a57d44636e741009721522f67c12368
I noticed in the functional logs that the l3-agent is constantly
logging this message, even when just adding or removing a single
router:
Resizing router processing queue green pool size to: 8
It's misleading as the pool is not being resized, it's still 8,
so let's only log when we're actually changing the pool size.
Change-Id: I5dc42fa4b4c1964b7d027681b61550cd82e83234