When the L3 agent get a router update notification, it will try to
retrieve the router info from neutron server. But at this time, if
the message queue is down/unreachable. It will get exceptions related
message queue. The resync actions will be run then. Sometimes, rabbitMQ
cluster is not so much easy to recover. Then Long time MQ recover time
will cause the router info sync RPC never get successful until it meets
the max retry time. Then the bad thing happens, L3 agent is trying to
remove the router now. It basically shutdown all the existing L3 traffic
of this router.
This patch directly removes the final router removal action, let the
router run as it is.
Closes-Bug: #1871850
Change-Id: I9062638366b45a7a930f31185cd6e23901a43957
Those 2 jobs are pretty stable since long time so we can make them
to be voting now.
Job 'neutron-functional-with-uwsgi' also follows 'neutron-functional'
job's failure rate but both jobs still needs some more stabilization
before we will make it voting.
This patch also moves some jobs to group similar jobs together.
Change-Id: Icb32776198f9b7fc6adfa287081e3feb4297116d
On heavy loaded environments, like Neutron gates, we can
observe sporadic failures of functional tests, that are
timeouts.
Lets increase the timeout value to 15 seconds for functional
tests because looks like 5 seconds is not enought.
Change-Id: I327de751e3ba26c5be03b2571b105492661999cb
Closes-Bug: 1868110
Only reschedule gateways/update segments when things have changed
that would require those actions.
Co-Authored-By: Terry Wilson <twilson@redhat.com>
Change-Id: I62f53dbd862c0f38af4a1434d453e97c18777eb4
Closes-bug: #1861510
Closes-bug: #1861509
"keepalived_state_change" monitor does not use eventlet but normal
Python threads. When "send_ip_addr_adv_notif" is called from inside
the monitor, the arping command is never sent because the eventlet
thread does not start. In order to be able to be called from this
process, this method should also have an alternative implementation
using "threading".
"TestMonitorDaemon.test_new_fip_sends_garp" is also modified to
actually test the GARP sent. The test was originally implemented with
only one interface in the monitored namespace.
"keepalived_state_change" sends a GARP when a new IP address is added
in a interface other than the monitored one. That's why this patch
creates a new interface and sets it as the monitor interface. When
a new IP address is added to the other interface, the monitor populates
it by sending a GARP through the modified interface [1].
[1] 8ee34655b8/neutron/agent/l3/keepalived_state_change.py (L90)
Change-Id: Ib69e21b4645cef71db07595019fac9af77fefaa1
Closes-Bug: #1870313
Some linux.ip_lib functions make use of "ctype.CDLL" methods
(create_netns, remove_netns). Those methods are called inside a
"privsep" context; that means the function reference and the
arguments are passed to a privileged context that will execute
the method.
"privsep" library makes use of eventlet to implement multitasking.
If the method executed returns the GIL, nothing guarantees that
the "eventlet" executor will return it again to this task. This
could lead to timeouts during the execution of those methods.
From https://docs.python.org/3.6/library/ctypes.html#ctypes.PyDLL:
"Instances of this class behave like CDLL instances, except that
the Python GIL is not released during the function call, and
after the function execution the Python error flag is checked."
Change-Id: I36ef9bf59e9c93f50464457a5d9a968738844079
Closes-Bug: #1870352
Just to make it clear in the message, add the
tunnel_ip_version config option in the error
message, else the user has to consult the startup
message to know what the value is set to.
Trivialfix
Change-Id: Ic8d8b0d454e8202d1bfae52b77137eb9071508da
This reverts commit 8ebc635a18.
The reverted commit does mass update on all subports of a trunk.
This is not in line with the original design since it causes huge
api-side performance effects.
I think that's the reason why we started seeing gate failures of
rally_openstack.task.scenarios.neutron.trunk.CreateAndListTrunks
in neutron-rally-task.
Change-Id: I6f0fd91c62985207af8dbf29aae463b2b478d5d2
Closes-Bug: #1870110
Related-Bug: #1700428
When "trunk:subport" wasn't added to the list of device owners which
are supported by dvr, there was no proper config in br-int's openflow
rules for such port, e.g. there was no dvr_to_src_mac rule in table 1
added and traffic from such port was never going through br-int.
Trunk ports should be added to this dvr serviced device owners list and
that patch is adding it there.
Change-Id: Ic21089adfa32dbf5d0e29a89713e6e2bf28f0f05
Closes-Bug: #1870114
When user is using keepalived on their instances, he often creates
additional port in Neutron to allocate some IP address which will
be then used as VIP in keepalived and will be configured in
allowed_address_pair of other ports plugged to instances with
keepalived.
This is e.g. Octavia's use case.
This together with DVR caused problems with connectivity to such VIP
as it was populated in router's arp cache with MAC address from
Neutron db.
As this port isn't bound, it is only Neutron db entry so there is no
need to set it in arp cache of the router.
This patch is doing exactly that to filter such "unbound" and
"binding_failed" ports from the list.
Change-Id: Ia885ce00dbb5f2968859e8d0850bc511016f0846
Closes-Bug: #1869887
This patch is migrating the OVN migration scripts. At the moment, only
migration from ML2/OVS to ML2/OVN in a TripleO environment is supported.
Co-Authored-By: Miguel Angel Ajo <majopela@redhat.com>
Co-Authored-By: Jakub Libosvar <libosvar@redhat.com>
Co-Authored-By: Daniel Alvarez <dalvarez@redhat.com>
Co-Authored-By: Maciej Józefczyk <mjozefcz@redhat.com>
Co-Authored-By: Numan Siddique <nusiddiq@redhat.com>
Co-Authored-By: Roman Safronov <rsafrono@redhat.com>
Co-Authored-By: Terry Wilson <twilson@redhat.com>
Related-Blueprint: neutron-ovn-merge
Change-Id: I925f4b650209b8807290d6a69440c31fd72a1762
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
We can now revert this patch, because main cause has been already
fixed in Core OVN [1]. With this fix the ARP responder flows are not
installed on LS pipeline, when LSP has port security disabled, and
an 'unknown' address is set in addresses column.
This makes MAC spoofing possible.
[1] https://patchwork.ozlabs.org/patch/1258152/
This reverts commit 03b87ad963.
Change-Id: Ie4c87d325b671348e133d62818d99af147d50ca2
Closes-Bug: #1864027
Correct name of the extension is "subnet_dns_publish_fixed_ip" but
in the Neutron docs it was "subnet_dns_publish_fixed_ips".
Change-Id: I52e313766d08879b8163b36f41515ce4afd5c470
Closes-Bug: #1869057
The "old" parameter passed to the handle_ha_chassis_group_changes()
method is a delta object and sometimes it does not contain the
"external_ids" column (because it hasn't changed).
The absence of that column was misleading that method into believe that
the "old" object was no longer a gateway chassis and that triggered some
changes in the HA group. Changing the HA group resulted in the SRIOV
(external in OVN) ports to start flapping between the gateway chassis.
This patch is adding a check to verify that the "external_ids" column
has changed before acting on it, otherwise just ignore the update and
return.
Closes-Bug: #1869389
Change-Id: I3f7de633e5546dc78c3546b9c34ea81d0a0524d3
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
The patch fixes issue when the same port is requested for multiple
instances and second one can't get metadata due to cached instance_id.
Closes-Bug: 1868867
Change-Id: If6a5866e4406c9c6c30e989c79ffb4ee1a88cecf
mech_driver.OVNMechanismDriver "_ovn_client" is not a class member but
a read-only property and can't be assigned.
Change-Id: I6fdd9d929e75a6092e0a874b8ffcf283c64b076a
Closes-Bug: #1869342
DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered
off the port is deleted, and when an instance is powered on a port is
created. This means a reboot is functionally a super fast
delete-then-create. Neutron trunking mode in combination with DPDK/vhu
implements a trunk bridge for each tenant, and the ports for the
instances are created as subports of that bridge. The standard way a
trunk bridge works is that when all the subports are deleted, a thread
is spawned to delete the trunk bridge, because that is an expensive and
time-consuming operation. That means that if the port in question is
the only port on the trunk on that compute node, this happens:
1. The port is deleted
2. A thread is spawned to delete the trunk
3. The port is recreated
If the trunk is deleted after #3 happens then the instance has no
networking and is inaccessible; this is the scenario that was dealt with
in a previous change [1]. But there continue to be issues with errors
"RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X". What is
happening in this case is that the trunk is being deleted in the middle
of the execution of #3, so that it stops existing in the middle of the
port creation logic but before the port is actually recreated.
Since this is a timing issue between two different threads it's
difficult to stamp out entirely, but I think the best way to do it is to
add a slight delay in the trunk deletion thread, just a second or two.
That will give the port time to come back online and avoid the trunk
deletion entirely.
[1] https://review.opendev.org/623275
Related-Bug: #1869244
Change-Id: I36a98fe5da85da1f3a0315dd1a470f062de6f38b