It may happen that subnet is connected to dvr router using IP address
different than subnet's gateway_ip.
So in br-tun arp to dvr router's port should be dropped instead of
dropping arp to subnet's gateway_ip (or mac in case of IPv6).
(cherry picked from commit ae3aa28f5a)
Ovs-agent can process the ports in large sets, then all
of these ports will have to update DB status or attributes.
But neutron server is centralized. It may have to do
something else, or the database processing can be also
time-consuming. Because of these, it sometimes returns
the RPC timeout exception to ovs-agent. And a fullsync
will be triggered in next rpc loop. The restart time is
becoming longer and longer.
Adds a default step to update the port to reduce
the probability of RPC timeout.
(cherry picked from commit 8408af4f17)
(cherry picked from commit d7d30ea950)
(cherry picked from commit 5d705468de)
In order to avoid inaccurate agent_boot_time setting,
this patch suggests to consider agent as "started" only
after completion of initial sync with server.
(cherry picked from commit 8f20963c5b)
In the OVS agent, when setting up the ancillary bridges, the parameter
external_id:bridge-id is retrieved. If this parameter is not defined
(e.g.: manually created bridges), ovsdbapp writes an error in the logs.
This information is irrelevant and can cause confusion during debugging time.
(cherry picked from commit 769e971293)
The native OVS/ofctl controllers talk to the bridges using a
datapath-id, instead of the bridge name. The datapath ID is
auto-generated based on the MAC address of the bridge's NIC.
In the case where bridges are on VLAN interfaces, they would
have the same MACs, therefore the same datapath-id, causing
flows for one physical bridge to be programmed on each other.
The datapath-id is a 64-bit field, with lower 48 bits being
the MAC. We set the upper 12 unused bits to identify each
unique physical bridge
This could also be fixed manually using ovs-vsctl set, but
it might be beneficial to automate this in the code.
ovs-vsctl set bridge <mybr> other-config:datapath-id=<datapathid>
You can change this yourself using above command.
You can view/verify current datapath-id via
ovs-vsctl get Bridge br-vlan datapath-id
(please note that other-config is needed in the set, but not get)
Co-Authored-By: Rodolfo Alonso Hernandez <firstname.lastname@example.org>
(cherry picked from commit 379a9faf62)
(cherry picked from commit c02b1148db)
(cherry picked from commit c7031e2cd3)
Ovs-agent can be very time-consuming in handling a large number
of ports. At this point, the ovs-agent status report may have
exceeded the set timeout value. Some flows updating operations
will not be triggerred. This results in flows loss during agent
restart, especially for hosts to hosts of vxlan tunnel flow.
This fix will let the ovs-agent explicitly, in the first rpc loop,
indicate that the status is restarted. Then l2pop will be required
to update fdb entries.
(cherry picked from commit a5244d6d44)
(cherry picked from commit cc49ab5501)
(cherry picked from commit 5ffca49668)
The dump-flows action will get a very large sets of flow information
if there are enormous ports or openflow security group rules. For now
we can meet some known exception during such action, for instance,
memory issue, timeout issue.
So after this patch, the cleanup action of the bridge stale flows
will be done one table by one table. But note, this only supports
for 'native' OpenFlow interface driver.
(cherry picked from commit f898ffd71f)
Instead of allowing an error to bubble up and exit from rpc_loop, catch
it and assume the switch is dead which will make the agent to wait until
the switch is back without failing the service.
(cherry picked from commit 544597c6ef)
If the switch misbehaves, we may receive None from db_get_val. In this
case, int() on the return value will raise TypeError which is not
expected by callers and may result in ovs agent crash.
Instead of bubbling up the TypeError exception, we raise RuntimeError if
datapath id is None.
(cherry picked from commit 38d0b2b52d)
This fixes race condition leading to lack of fdb entries
on agent after OVS restart, if agent managed to handle all ports
before sending state report with start_flag set to True.
(cherry picked from commit 3995abefb1)
When ovs-vswitchd process is restarted neutron-ovs-agent will
handle it and reconfigure all ports and openflows in bridges.
Unfortunatelly when tunnel networks are used together with
L2pop mechanism driver, this driver will not notice that agent
lost all openflow config and will not send all fdb entries which
should be added on host.
In such case L2pop mechanism driver should behave in same way like
when neutron-ovs-agent is restarted and send all fdb_entries to
This patch adds "simulate" of agent start flag when ovs_restart is
handled thus neutron-server will send all fdb_entries to agent and
tunnels openflow rules can be reconfigured properly.
(cherry picked from commit ae031d1886)
Sometimes due to NIC driver incorrect behavior, VFs might be
missing in 'ip link show' output. This may lead to a VM boot
failure as agent will just skip such missing devices.
Make the agent do a resync in case a newly added device
'disappears' during processing, which should cause a MAC to
Co-authored-by: Oleg Bondarev <email@example.com>
(cherry picked from commit eea5aaac4f)
neutron-openvswitch-agent will refresh flows when it's restarted.
But the port's binding status is not changed, update_port_postcommit
will be skipped at function '_update_individual_port_db_status' in
'neutron/plugins/ml2/plugin.py', l2pop don't handle DVR ports, the
fdb entries about DVR port will not be added.
So, we can't skip DVR port at notify_l2pop_port_wiring when agent
(cherry picked from commit d0fa2c9ac5)
With high concurrency more than 1 port may be activated on an
OVS agent at the same time (like VM port + a DVR port),
so the patch mitigates the condition by checking for 1 or 2
first active ports.
Given that the condition also contains "or self.agent_restarted(context)"
which makes it True first 180 sec (by default) after agent restart,
I believe the downside of changing 1 to 2 should be negligible.
Please see bug for more details on the issue.
(cherry picked from commit b32db30874)
The Neutron OVS agent logs can get flooded with KeyErrors as the
'_get_port_info' method skips the added/removed dict items if no
ports have been added/removed, which are expected to be present,
even if those are just empty sets.
This change ensures that those port info dict fields are always set.
(cherry picked from commit da5b13df2b)
When HA router's interface on host is going DOWN but router
is still available on this host, L2 population
mechanism driver will now send to other hosts info to remove
fdb unicast entries to this port on host.
It will not send FLOODING_ENTRY because this port is still on
host but in standby mode and might be transformed to master
This solves issue with migration router from Legacy to HA.
In such case, port which was originally attached to legacy
router is transformed to be HA backup port before changing
its status to DOWN.
Now in such case unicast entries to this port and backup
node will be removed properly so packets to HA router will
be really send to host which is master node for router.
(cherry picked from commit 6c300b1a9b)
On hosts with dvr_snat agent mode, after restarting OVS agent,
sometimes the SNAT port is processed first instead of the distributed port.
The subnet_info is cached locally via get_subnet_for_dvr when either of these ports
are processed. However, it returns the MAC address of the port used to query
as the gateway for the subnet. Using the SNAT port, this puts the wrong
MAC as the gateway, causing some flows such as the DVR flows on br-int
for local src VMs to have the wrong MAC.
This patch fixes the get_subnet_for_dvr with fixed_ips as None for the csnat port,
as that causes the server side handler to fill in the subnet's actual gateway
rather than using the port's MAC.
(cherry picked from commit c6de172e58)
There was a race condition during port update to disable it.
In case when neutron-server receives port update call to set
admin_state_down on port, it sends PORT_UPDATE notification
Then both L2 and DHCP agents start doing their job with port.
L2 agent asks neutron-server about device details and during
processing this call server sets port's status to DOWN if
its admin_state_up = False.
Problem is that sometimes just after that, DHCP agent will send
to neutron server notification that provisioning for this port
As there is no any other provisioning block in db (because it
is just port update) neutron-server is setting port's status
This patch fixes this issue by allowing to transition to
ACTIVE only ports which are administratively enabled.
(cherry picked from commit 2a1319ab7a)
When Openvswitch agent will get "port_update" event
(e.g. to set port as unbound) and port is already removed
from br-int when agent tries to get vif port in
treat_devices_added_updated() method (port is removed
because e.g. nova-compute removes it) then resources set
for port by L2 agent extension drivers (like qos) are not
In such case port is added to skipped_ports and is set
as DOWN in neutron-db but ext_manager is not called then
for such port so it will not clear stuff like bandwidth
limit's QoS and queue records and also DSCP marking
open flow rules for this port.
This patch fixes this issue by adding call of
ext_manager.delete_port() method for all skipped ports.
(cherry picked from commit a8271e978a)
In case when external bridge configured in OVS agent's bridge_mappings
will be destroyed and created again (for example by running ifup-ovs
script on Centos) bridge wasn't configured by OVS agent.
That might cause broken connectivity for OpenStack's dataplane if
dataplane network also uses same bridge.
This patch adds additional ovsdb-monitor to monitor if any
of physical bridges configured in bridge_mappings was created.
If so, agent will reconfigure it to restore proper openflow rules
(cherry picked from commit 85b46cd51e)
Inter Tenant Traffic between two different networks that belong
to two different Tenants is not possible when connected through
a shared network that are internally connected through DVR
This issue can be seen in multinode environment where there
is network isolation.
The issue is, we have two different IP for the ports that are
connecting the two routers and DVR does not expose the router
interfaces outside a compute and is blocked by ovs tunnel bridge
This patch fixes the issue by not applying the DVR specific
rules in the tunnel-bridge to the shared network ports that
are connecting the routers.
(cherry picked from commit d019790fe4)
This commit adds common_agent_extension class which is agent API
for L2 extension drivers used e.g. by Linuxbridge agent.
This is necessary to be able to use instance of iptables_manager
used in firewall driver also in L2 extension drivers (like qos).
This patch refactors little bit iptables_manager code to make possible
to initialize e.g. mangle or nat table on demand, even if iptables
is created as "state_less"
(cherry picked from commit cbee0f9f88)
When we delete vm port with attached QoS policy,
it is just doing nothing if vif_port does not exist.
This is fine for egress bandwidth limit as it is configured
directly on vif_port in OVS.
For ingress bw limit however it uses additional records in
Openvswitch database: qos and queue. Those records are not
cleaned up in such case.
This patch also records port in self.ports in the case of
bandwidth limit rules, just as in the case of dscp rules.
Never execute port clear if vif_port not exists. Finally, ovs
driver can clean such qos and queue records
(cherry picked from commit ee423e1fa0)
In case of a live migration and with linux bridge the events are not
sent to Nova, because the port UUID returned by _device_to_port_id may
be a truncated UUID and the current plugin._get_port() can't find it.
Signed-off-by: Sahid Orentino Ferdjaoui <firstname.lastname@example.org>
(cherry picked from commit 38b3d4e16a)
When the OVS agent skips processing a port because it was
not found on the integration bridge, it doesn't send back
any status to the server to notify it. This can cause the
port to get stuck in the BUILD state indefinitely, since
that is the default state it gets before the server tells
the agent to update it.
The OVS agent will now notify the server that any skipped
device should be considered DOWN if it did not exist.
(cherry picked from commit a789d23b02)
Notify agents about port update when "port_update" is called to update
port's status change. L3 agent will spawn keepalived only when HA
network port status is ACTIVE. When L3 agent is restarted, it sets HA
network port status to DOWN (through "port_update"), with the assumption
that L2 agent will again rewire the port (set status to ACTIVE) allowing
L3 agent to spawn keepalived. As server is not notifying L2 agent, port
is remaining in DOWN status and L3 agent was never spawning keepalived
on L3 agent restart(if keepalived is killed).
(cherry picked from commit 4f9a6a8b76)