During the CI meeting we agreed that non-voting jobs in the branches
which are in Extened Maintenance (EM) phase should be moved to from
the check to the experimental queue.
This patch is doing exactly that.
Change-Id: Ie8a63eacf479ac6871af448a1741598584de8de8
We observe an excessive amount of routers created on
compute node on which some virtual machines got a fixed
ip on floating network.
Rpc servers should filter out those unnecessary routers
during syncing.
Change-Id: I299031a505f05cd0469e2476b867b9dbca59c5bf
Partial-Bug: #1840579
(cherry picked from commit 480b04ce04)
In case when neutron-ovs-agent will notice that any of physical
bridges was "re-created", we should also ensure that stale Open
Flow rules (with old cookie id) are cleaned.
This patch is doing exactly that.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I7c7c8a4c371d6f4afdaab51ed50950e2b20db30f
Related-Bug: #1864822
(cherry picked from commit 63c45b3766)
Block traffic between br-int and br-physical is over kill
and will at least
1. interrupt vlan flow during startup, and is particularly
so if dvr enabled
2. if let's rabbitmq is not stable, it is possible data plane
will be affected and vlan will never work.
Using openstack on k8s particularly amplifies the problem
because pod could be killed pretty easily by liveness
probes.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I51050c600ba7090fea71213687d94340bac0674a
Closes-Bug: #1869808
(cherry picked from commit 90212b12cd)
In some cases it may be useful to log new vlan tag which is found
on the port when it losts old vlan tag which should is expected to
be there.
So this patch adds such value to the log message.
TrivialFix
Depends-On: https://review.opendev.org/735615
Change-Id: I231e624f460510decc6d2237040c8bef207e2e8e
(cherry picked from commit 3ac63422ea)
In case when physical bridge is removed and created again it
is initialized by neutron-ovs-agent.
But if agent has enabled distributed routing, dvr related
flows wasn't configured again and that lead to connectivity issues
in case of DVR routers.
This patch fixes it by adding configuration of dvr related flows
if distributed routing is enabled in agent's configuration.
It also adds reset list of phys_brs in dvr_agent. Without that there
were different objects used in ovs agent and dvr_agent classes thus
e.g. 2 various cookie ids were set on flows in physical bridge.
This was also the same issue in case when openvswitch was restarted and
all bridges were reconfigured.
Now in such case there is correctly new cookie_id configured for all
flows.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I710f00f0f542bcf7fa2fc60800797b90f9f77e14
Closes-Bug: #1864822
(cherry picked from commit 91f0bf3c85)
1. Make grenade jobs experimental for EM branches
As discussed in ML thread[1], we are going to
make grenade jobs as non voting for all EM stable and
oldest stable. grenade jobs are failing not and it might take
time to fix those if we are able to fix. Once it jobs are
working depends on project team, they can bring them back to
voting or keep non-voting.
If those jobs are failing consistently and no one is fixing them
then removing those n-v jobs in future also fine.
Additionally, it was proposed in neutron CI meeting [2] that non-voting
jobs would be moved to experimental, so move grenade jobs there instead
of keeping them non-voting
[1] http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015499.html
[2] http://eavesdrop.openstack.org/meetings/neutron_ci/2020/neutron_ci.2020-07-01-15.00.log.html#l-101
StableOnly
Conflicts:
.zuul.yaml
(cherry picked from commit 9313dce459)
2. Install pip2 for functional/fullstack/neutron-tempest-iptables_hybrid
Else both jobs fail with "sudo: pip: command not found"
3. Add ensure-tox for functional/fullstack/neutron-tempest-iptables_hybrid
Similar error message for tox
4. Disable OVS compilation for fullstack and move job to experimental
Compilation fails similarly to recent master failures:
/opt/stack/new/ovs/datapath/linux/geneve.c:943:15: error: ‘const struct ipv6_stub’ has no member named ‘ipv6_dst_lookup’
But branch 2.9 is not updated anymore. Use official package
This triggers a few tests failures, so move it to experimental (instead
of marking non-voting), same as grenade jobs
Change-Id: Ie846a8cb481da65999b12f5547b407cc7bdc3138
When a Port is deleted, the QoS extension will reset any rule (QoS
and Queue registers) applied on this port or will reset the
related Interface policing parameters.
If the Port and the related Interface are deleted during the QoS
extension operation, those commands will fail. This patch makes those
operations more resiliant by not checking the errors when writing on
the Port or the Interface register.
NOTE: this patch is squashed with [1]. That will fix the problem
with empty "vsctl" transactions when using this OVS DB implementation.
[1]https://review.opendev.org/#/c/738574/
Change-Id: I2cc4cdf5be25fab6adbc64acabb3fffebb693fa6
Closes-Bug: #1884512
(cherry picked from commit e2d1c2869a)
(cherry picked from commit 84ac8cf9ff)
(cherry picked from commit 3785868bfb)
(cherry picked from commit 7edfb0ef4a)
Neutron-ovs-agent can now enable IGMP snooping in integration bridge
if config option "igmp_snooping_enable" in OVS section in config will
be set to True.
It will also set mcast-snooping-disable-flood-unregistered=true
so flooding of multicast packets to all unregistered ports will be
disabled also.
Both changes are applied on integration bridge.
Change-Id: I12f4030a35d10d1715d3b4bfb3ed5efb9aa28f2b
Closes-Bug: #1840136
(cherry picked from commit 5b341150e2)
There is a race condition between nova-compute boots instance and
l3-agent processes DVR (local) router in compute node. This issue
can be seen when a large number of instances were booted to one
same host, and instances are under different DVR router. So the
l3-agent will concurrently process all these dvr routers in this
host at the same time.
For now we have a green pool for the router ResourceProcessingQueue
with 8 greenlet, but some of these routers can still be waiting, event
worse thing is that there are time-consuming actions during the router
processing procedure. For instance, installing arp entries, iptables
rules, route rules etc.
So when the VM is up, it will try to get meta via the local proxy
hosting by the dvr router. But the router is not ready yet in that
host. And finally those instances will not be able to setup some
config in the guest OS.
This patch adds a new measurement based on the router quantity to
indicate the L3 router process queue green pool size. The pool size
will be limit from 8 (original value) to 32, because we do not want
the L3 agent cost too much host resource on processing router in the
compute node.
Conflicts:
neutron/tests/functional/agent/l3/test_legacy_router.py
Related-Bug: #1813787
Change-Id: I62393864a103d666d5d9d379073f5fc23ac7d114
(cherry picked from commit 837c9283ab)
In the patch [1] we changed definition of the abstract method
"plug" in the LinuxInterfaceDriver class.
That broke e.g. 3rd-party drivers which still don't accept this
new parameter called "link_up" in the plug_new method.
So this patch fixes this to make such legacy drivers to be still working
with the new base interface driver class.
This commit also marks such definition of the plug_new method as
deprecated. Possibility of using it without accepting link_up parameter
will be removed in the "W" release of the OpenStack.
[1] https://review.opendev.org/#/c/707406/
Change-Id: Icd555987a1a57ca0b31fa7e4e830583d6c69c861
Closes-Bug: #1879307
(cherry picked from commit 30d573d5ab)
(cherry picked from commit 9c242a0329)
(cherry picked from commit bc8c38bda8)
Although notify_nova_on_port_status_changes defaults to true, it
could be to false, making the nova_notifier attribute unsafe to
use without checking.
This patch checks both the config option and that the attribute
exists, since the config could be changed after the plugin is
already initialized without the nova_notifier attribute being set.
Change-Id: Ide0f93275e60dffda10b7da59f6d81c5582c3849
Closes-bug: #1843269
(cherry picked from commit ab4320edb4)
The 2.6.0 version introduces some checks that cause failures
with the current code. To avoid that, cap pycodestyle to a
version that had been tested without errors.
In Rocky we already had tests with pycodestyle, but proper listing in
test-requirements.txt was only added in
I33be4f5d4ae48c6bd48d80e3f1185ef8307a2a0c
Conflicts:
test-requirements.txt
Change-Id: I00a35884b14af3e2cf751c04312c847ecfe658c7
(cherry picked from commit 719cae183a)
[0] introduced the concept of connected routers: routers that are
connected to the same subnets. When a L3 agent is synching a router
with connected routers, the data of the entire set should be returned
to the agent by the Neutron server.
However, if an agent tries to synch a router with
no connected routers when the same agent has other routers that are
connected among them, the Neutron server returns the former and the
latter. For details of how this bug can manifest itself, please see [1].
This change prevents this situation: only the synched router is
returned.
[0] https://review.opendev.org/#/c/597567
[1] https://bugs.launchpad.net/neutron/+bug/1838449/comments/15
Change-Id: Ibbf35d0f4a0bf9281f0bc8c411e8527eed75361d
Closes-Bug: #1838449
(cherry picked from commit 48ea7da6c5)
In order to reduce the number of elements retrieved from the DB, this
patch, before processing the VLAN allocations per physical network,
deleted those registers belonging to any unconfigured physical network.
The VLAN registers per physical network are deleted using a bulk delete
operation, to speed up the process.
Those missing VLAN registers per network are now created using a bulk
insert operation, available in the ORM. This bulk operation speeds up
the sync process.
Conflicts:
neutron/plugins/ml2/drivers/type_vlan.py
Change-Id: I8568e2277e157754aaff87a059a40e34e6a43e2b
Partial-Bug: #1862178
(cherry picked from commit 016e7826f1)
(cherry picked from commit 651eb12bec)
(cherry picked from commit 4fff732b76)
Patch [1] introduced new mechanism which only brings UP interfaces
on master node of HA router. It works fine with keepalived 1.x
but it is broken when keepalived 2.x was used (e.g. on Centos 8) as
in this new version of keepalived by default all interfaces of VIPs
and routes are tracked, and if one of them is DOWN, keepalived is
going to FAULT state. Because of that router will never be
transitioned to MASTER on any node.
This patch fixes it by adding "no_track" option to all VIPs
and routes in keepalived's config file.
This "no_track" option isn't added to ha interface so this one
is still tracked by keepalived.
[1] https://review.opendev.org/#/c/707406/
Closes-bug: #1874211
Change-Id: Ic16cf83fe1d1576d91047adb2d4f9e07d57185b6
(cherry picked from commit dc9084a8ec)
Operators may want to see how long it takes in the port
processing procedure since DEBUG log does not enable
basically in the production envrionment.
Related-Bug: #1813703
Related-Bug: #1813707
Related-Bug: #1813706
Related-Bug: #1813709
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I43733546abf5421d0e3f4cd5a959d279e1b89d1e
(cherry picked from commit 8e73de8bc4)
Retrieving the SG rules now is used the admin context. This allows to
get all possible rules, independently of the user calling. The filters
passed and the RBAC policies filter those results, returning only:
- The SG rules belonging to the user.
- The SG rules belonging to a SG owned by the user.
However, if the SG list is too long, the query can take a lot of time.
Instead of this, the filtering is done in the DB query. If no filters
are passed to "get_security_group_rules" and the context is not the
admin context, only the rules specified in the first paragraph will
be retrieved.
Because overwriting the method "get_objects" is too complex, an
intermediate query is done to retrieve the SG rule IDs. Those IDs
will be used as a filter in the "get_objects" call.
Conflicts:
neutron/objects/securitygroup.py
neutron/tests/unit/db/test_securitygroups_db.py
neutron/tests/unit/objects/test_securitygroup.py
Closes-Bug: #1863201
Change-Id: I25d3da929f8d0b6ee15d7b90ec59b9d58a4ae6a5
(cherry picked from commit d874c46bff)
(cherry picked from commit d3905264b7)
(cherry picked from commit 61dc621c1b)
In version 4.15 of iproute2 there was added support
for chain index in tc_filter [1].
Such version is available e.g. in Ubuntu 18.04 and it
has to be supported in l3_tc_lib regex to match
properly output of "tc filter" command.
[1] https://lwn.net/Articles/745643/
Closes-bug: #1809497
Change-Id: Id4066b5cff933ccd0dd3c751bf67b5d58af662d1
(cherry picked from commit e788d29458)
This patch is the first one of a series of patches improving how the L3
agents update the router HA state to the Neutron server.
This patch partially reverts the previous patch [1]. When the batch
notifier sends events, it calls the callback method passed during the
initialization, in this case AgentMixin.notify_server. The batch
notifier spawns a new thread in charge of sending the notifications and
then wait the specified "batch_interval" time. If the callback method is
not synchronous with the notify thread execution (what [1] implemented),
the thread can finish while the RPC client is still sending the
HA router states. If another HA state update is received, then both
updates can be executed at the same time. It is possible then that a new
router state can be overwritten with an old one still not sent or
processed.
The batch notifier is refactored, to improve what initally was
implemented [2] and then updated [3]. Currently, each new event thread
can update the "pending_events" list. Then, a new thread is spawned to
process this event list. This thread decouples the current execution
from the calling thread, making the event processing a non-blocking
process.
But with the current implementation, each new process will spawn a new
thread, synchronized with the previous and new ones (using a
synchronized decorator). That means, during the batch interval time, the
system can have as many threads waiting as new events received. Those
threads will end secuentially when the previous threads end the batch
interval sleep time.
Instead of this, this patch receives and enqueue each new event and
allows only one thread to be alive while processing the event list. If
at the end of the processing loop new events are stored, the thread will
process then.
[1] I3f555a0c78fbc02d8214f12b62c37d140bc71da1
[2] I2f8cf261f48bdb632ac0bd643a337290b5297fce
[3] I82f403441564955345f47877151e0c457712dd2f
Partial-Bug: #1837635
Change-Id: I20cfa1cf5281198079f5e0dbf195755abc919581
(cherry picked from commit 8b7d2c8a93)
L3 router will set its devices link up by default.
For HA routers, the gateway device will be pluged
in all scheduled hosts. When the gateway deivce is
up in backup node, it will send out IPv6 related
packets (MLDv2) according to some kernal config.
This will cause the physical fabric think that the
gateway MAC is now working in the backup node. And
finally the master node L3 traffic will be broken.
This patch sets the backup gateway device link down
by default. When the VRRP sets the master state in
one host, the L3 agent state change procedure will
do link up action for the gateway device.
Conflicts:
neutron/agent/l3/router_info.py
Closes-Bug: #1859832
Change-Id: I8dca2c1a2f8cb467cfb44420f0eea54ca0932b05
(cherry picked from commit c52029c39a)
(cherry picked from commit b9a2968100)