When HA router is created in "stanby" mode, ipv6 forwarding is
disabled by default in its namespace.
But when router is transitioned to be "master" on node, ipv6
forwarding should be enabled. This was fine for routers with
configured gateway but we somehow missed the case when router don't
have gateway configured.
Because of that missing ipv6 forwarding setting in such case, IPv6
W-E traffic between 2 subnets was not working fine in L3 HA case.
This patch fixes it by adding configuring ipv6_forwarding on
"all" interface in router's namespace always, even if it don't have
gateway configured.
Change-Id: I8b1b2b426f7a26a4b2407a83f9bf29dd6e9ba7b0
CLoses-Bug: #1818224
Since iptables-restore doesn't support --dport with protocol vrrp,
it errors out setting the security groups on the hypervisor.
Marking this a partial fix, since we need a change to prevent
adding those incompatible rules in the first place, but this
patch will stop the bleeding.
Change-Id: If5e557a8e61c3aa364ba1e2c60be4cbe74c1ec8f
Partial-Bug: #1818385
Removing an active or a standby HA router from an agent that has a
valid DVR serviceable port (such as DHCP), does not remove the
HA interface associated with the Router in the SNAT namespace.
When we try to add the HA router back to the agent, then it
adds more than one HA interface to the SNAT Namespace causing
more problems and we sometimes also see multiple active routers.
This bug might have been introduced by this patch [1].
Fix the problem by just adding the router namespaces without HA
interfaces when there is no HA and re-insert the HA interfaces
when HA router is bound to the agent into the namespace.
[1] https://review.openstack.org/#/c/522362/
Closes-Bug: #1816698
Change-Id: Ie625abcb73f8185bb2bee06dcd26a01d8af0b0d1
In case when L3 agent is running in dvr_snat mode on compute node,
it is like that e.g. in some of the gate jobs, it may happen that
same router is scheduled to be in standby mode on compute node and
on same compute node there is instance connected to it.
So in such case metadata proxy needs to be spawned in router namespace
even if it is in standby mode.
Change-Id: Id646ab2c184c7a1d5ac38286a0162dd37d72df6e
Closes-Bug: #1817956
Closes-Bug: #1606741
In some cases on dvr ha router it may happend that
RouterInfo.radvd.disable() will be called even if
radvd DaemonMonitor wasn't initialized earlier and it is
None.
To prevent exception in such case, this patch adds check
if DaemonMonitor is not None to call disable() method on
it.
Change-Id: Ib9b5f4eeae6e4cebcb958928e6521cf1d69b049c
Closes-Bug: #1817435
I noticed in the functional logs that the l3-agent is constantly
logging this message, even when just adding or removing a single
router:
Resizing router processing queue green pool size to: 8
It's misleading as the pool is not being resized, it's still 8,
so let's only log when we're actually changing the pool size.
Change-Id: I5dc42fa4b4c1964b7d027681b61550cd82e83234
Add minimum egress bandwidth support for Open vSwitch.
The scope of this implementation is reduced to N/S traffic.
There is no QoS applied on traffic between VMs.
The QoS rules are aplied to exit ports in bridges other than
br-int; that means all physical bridges. No tunneled traffic
will be shaped. This feature will be implemented in a following
patch.
Partial-Bug: #1560963
Change-Id: I0a2ef52b13151a39e678e9a3e6f75babb47298d0
Need to pass centralized floating IPs as preserve_ips to
_external_gateway_added during DVR router update.
Otherwise IP addresses will be deleted from gw device in certain case.
The case is when a router with active centralized floating IPs is
being scheduled to a new dvr_snat L3 agent (rescheduled from a down one).
Please see corresponding traces in the bug description.
Change-Id: Iaeb9fbed73144df6fcd9092c665ed19986e85f4d
Closes-bug: #1817306
The listed revision no longer supports python2, but afaik, we are
always running under python3 for those tests anyway.
Change-Id: Iba94d73eeb65fb21f5d098afe0fbe4348dbea850
Now ip_lib.get_devices_info function is implemented using pyroute2,
"vlan_in_use" and "vxlan_in_use" can make use of it.
Change-Id: I82a2c3ea76195b10880cf37bf2229341b995b0ae
Closes-Bug: #1815498
Consume the rehomed constant. Remove the constant from neutron.
Change-Id: Ia5d6ec8b66344c0c0c2d1588f8c1215c6c2b1cbe
Depends-On: https://review.openstack.org/631795
Related-Bug: #1811639
In ip_lib.get_devices_info(), privileged.get_link_devices() can return
devices with links not present in this namespace or not listed. In this
situation, get_devices_info() will always try to find the device to set
the parameter "parent_name", what will trigger an exception.
This patch solves this issue avoiding the population of "parent_name"
if the link device is not present in the devices list.
Change-Id: Ic5c7d9008a11da5c406dc383cfdae2892a3118d8
Closes-Bug: #1815758
In the OVS agent, when setting up the ancillary bridges, the parameter
external_id:bridge-id is retrieved. If this parameter is not defined
(e.g.: manually created bridges), ovsdbapp writes an error in the logs.
This information is irrelevant and can cause confusion during debugging time.
Change-Id: Ic85db65f651eb67fcb56b937ebe5850ec1e8f29f
Closes-Bug: #1815912
If l3-agent was restarted by a regular action, such as config change,
package upgrade, manually service restart etc. We should not set the
HA port down during such scenarios. Unless the physical host was
rebooted, aka the VRRP processes were all terminated.
This patch adds a new RPC call during l3 agent init, it will try to
retrieve the HA router count first. And then compare the VRRP process
(keepalived) count and 'neutron-keepalived-state-change' count
with the hosting router count. If the count matches, then that
set HA port to 'DOWN' state action will not be triggered anymore.
Closes-Bug: #1798475
Change-Id: I5e2bb64df0aaab11a640a798963372c8d91a06a8
There is a race condition between nova-compute boots instance and
l3-agent processes DVR (local) router in compute node. This issue
can be seen when a large number of instances were booted to one
same host, and instances are under different DVR router. So the
l3-agent will concurrently process all these dvr routers in this
host at the same time.
For now we have a green pool for the router ResourceProcessingQueue
with 8 greenlet, but some of these routers can still be waiting, event
worse thing is that there are time-consuming actions during the router
processing procedure. For instance, installing arp entries, iptables
rules, route rules etc.
So when the VM is up, it will try to get meta via the local proxy
hosting by the dvr router. But the router is not ready yet in that
host. And finally those instances will not be able to setup some
config in the guest OS.
This patch adds a new measurement based on the router quantity to
indicate the L3 router process queue green pool size. The pool size
will be limit from 8 (original value) to 32, because we do not want
the L3 agent cost too much host resource on processing router in the
compute node.
Related-Bug: #1813787
Change-Id: I62393864a103d666d5d9d379073f5fc23ac7d114
To prevent data from being out of sync in the following situations:
1. Create a policy with two rules bound to the virtual machine
2. Stop l2-agent
3. Delete/change/clear policy rule
4. Start l2-agent (the rule is still there, out-of-sync)
Change-Id: I194c918d859172c31ae5ce1af925fdbb388f9cfb
Closes-Bug: #1812576
RouterInfo class has got internal_ports cache which is updated
in _process_internal_ports() method.
There was an issue in this updates logic because it was
iterating through enumerate local variable "internal_ports"
which represents current router ports and if such current port
was found in updated_ports list it was storred in
RouterInfo().internal_ports variable under same index as was
found in "internal_ports" local variable.
This sometimes leads to an issue because same port can be
stored under different index in internal_ports and
RouterInfo().internal_ports lists thus wrong port in
RouterInfo().internal_ports was overwritten.
Such issue leads to problem with generating radvd config file
because in ports cache list there was duplicate info about same port
so radvd config file contained duplicate interface definitions too.
This should be properly fixed by changing RouterInfo.internal_ports
to be a dict instead of list of ports but such patch would be much
bigger and (possibly) harded to backport to stable branches.
Change-Id: I2e38457942518c8a3e07e606091bb6720317b77e
Closes-Bug: #1813279
TcCommand.set_tbf_bw_limit() is used now to set and replace a TC TBF
filter.
Related-Bug: #1560963
Change-Id: I162dea499d16db76692dd3d6d99b6be45f44ae59
The neutron.common.rpc module has been in neutron-lib for awhile now and
neutron is shimmed to use neutron-lib already.
This patch removes neutron.common.rpc and switches the code over to use
neutron-lib's implementation where needed.
NeutronLibImpact
Change-Id: I733f07a8c4a2af071b3467bd710290eee11a4f4c