When HA router is created in "stanby" mode, ipv6 forwarding is
disabled by default in its namespace.
But when router is transitioned to be "master" on node, ipv6
forwarding should be enabled. This was fine for routers with
configured gateway but we somehow missed the case when router don't
have gateway configured.
Because of that missing ipv6 forwarding setting in such case, IPv6
W-E traffic between 2 subnets was not working fine in L3 HA case.
This patch fixes it by adding configuring ipv6_forwarding on
"all" interface in router's namespace always, even if it don't have
Since iptables-restore doesn't support --dport with protocol vrrp,
it errors out setting the security groups on the hypervisor.
Marking this a partial fix, since we need a change to prevent
adding those incompatible rules in the first place, but this
patch will stop the bleeding.
Removing an active or a standby HA router from an agent that has a
valid DVR serviceable port (such as DHCP), does not remove the
HA interface associated with the Router in the SNAT namespace.
When we try to add the HA router back to the agent, then it
adds more than one HA interface to the SNAT Namespace causing
more problems and we sometimes also see multiple active routers.
This bug might have been introduced by this patch .
Fix the problem by just adding the router namespaces without HA
interfaces when there is no HA and re-insert the HA interfaces
when HA router is bound to the agent into the namespace.
In case when L3 agent is running in dvr_snat mode on compute node,
it is like that e.g. in some of the gate jobs, it may happen that
same router is scheduled to be in standby mode on compute node and
on same compute node there is instance connected to it.
So in such case metadata proxy needs to be spawned in router namespace
even if it is in standby mode.
In some cases on dvr ha router it may happend that
RouterInfo.radvd.disable() will be called even if
radvd DaemonMonitor wasn't initialized earlier and it is
To prevent exception in such case, this patch adds check
if DaemonMonitor is not None to call disable() method on
I noticed in the functional logs that the l3-agent is constantly
logging this message, even when just adding or removing a single
Resizing router processing queue green pool size to: 8
It's misleading as the pool is not being resized, it's still 8,
so let's only log when we're actually changing the pool size.
Add minimum egress bandwidth support for Open vSwitch.
The scope of this implementation is reduced to N/S traffic.
There is no QoS applied on traffic between VMs.
The QoS rules are aplied to exit ports in bridges other than
br-int; that means all physical bridges. No tunneled traffic
will be shaped. This feature will be implemented in a following
Need to pass centralized floating IPs as preserve_ips to
_external_gateway_added during DVR router update.
Otherwise IP addresses will be deleted from gw device in certain case.
The case is when a router with active centralized floating IPs is
being scheduled to a new dvr_snat L3 agent (rescheduled from a down one).
Please see corresponding traces in the bug description.
In ip_lib.get_devices_info(), privileged.get_link_devices() can return
devices with links not present in this namespace or not listed. In this
situation, get_devices_info() will always try to find the device to set
the parameter "parent_name", what will trigger an exception.
This patch solves this issue avoiding the population of "parent_name"
if the link device is not present in the devices list.
In the OVS agent, when setting up the ancillary bridges, the parameter
external_id:bridge-id is retrieved. If this parameter is not defined
(e.g.: manually created bridges), ovsdbapp writes an error in the logs.
This information is irrelevant and can cause confusion during debugging time.
If l3-agent was restarted by a regular action, such as config change,
package upgrade, manually service restart etc. We should not set the
HA port down during such scenarios. Unless the physical host was
rebooted, aka the VRRP processes were all terminated.
This patch adds a new RPC call during l3 agent init, it will try to
retrieve the HA router count first. And then compare the VRRP process
(keepalived) count and 'neutron-keepalived-state-change' count
with the hosting router count. If the count matches, then that
set HA port to 'DOWN' state action will not be triggered anymore.
There is a race condition between nova-compute boots instance and
l3-agent processes DVR (local) router in compute node. This issue
can be seen when a large number of instances were booted to one
same host, and instances are under different DVR router. So the
l3-agent will concurrently process all these dvr routers in this
host at the same time.
For now we have a green pool for the router ResourceProcessingQueue
with 8 greenlet, but some of these routers can still be waiting, event
worse thing is that there are time-consuming actions during the router
processing procedure. For instance, installing arp entries, iptables
rules, route rules etc.
So when the VM is up, it will try to get meta via the local proxy
hosting by the dvr router. But the router is not ready yet in that
host. And finally those instances will not be able to setup some
config in the guest OS.
This patch adds a new measurement based on the router quantity to
indicate the L3 router process queue green pool size. The pool size
will be limit from 8 (original value) to 32, because we do not want
the L3 agent cost too much host resource on processing router in the
To prevent data from being out of sync in the following situations:
1. Create a policy with two rules bound to the virtual machine
2. Stop l2-agent
3. Delete/change/clear policy rule
4. Start l2-agent (the rule is still there, out-of-sync)
RouterInfo class has got internal_ports cache which is updated
in _process_internal_ports() method.
There was an issue in this updates logic because it was
iterating through enumerate local variable "internal_ports"
which represents current router ports and if such current port
was found in updated_ports list it was storred in
RouterInfo().internal_ports variable under same index as was
found in "internal_ports" local variable.
This sometimes leads to an issue because same port can be
stored under different index in internal_ports and
RouterInfo().internal_ports lists thus wrong port in
RouterInfo().internal_ports was overwritten.
Such issue leads to problem with generating radvd config file
because in ports cache list there was duplicate info about same port
so radvd config file contained duplicate interface definitions too.
This should be properly fixed by changing RouterInfo.internal_ports
to be a dict instead of list of ports but such patch would be much
bigger and (possibly) harded to backport to stable branches.
The neutron.common.rpc module has been in neutron-lib for awhile now and
neutron is shimmed to use neutron-lib already.
This patch removes neutron.common.rpc and switches the code over to use
neutron-lib's implementation where needed.