Multiple ports are located in ports_re_added. Assume we have port_one
and port_two. It will loop through the ports. Port_one is iterated
first, events ['re_added'] is assigned port_one, events ['removed']
is assigned port_two. In the second loop, events ['re_added'] is set
to port_two instead of adding port_two to list. So after the loop,
only port_two is left in events ['re_added'].
Conflicts:
neutron/tests/unit/plugins/ml2/drivers/openvswitch/agent/test_ovs_neutron_agent.py
Change-Id: If8edd29dd741f1688ffcac341fd58173539ba000
Closes-Bug: #1864630
(cherry picked from commit 5600163e9b)
(cherry picked from commit 22df469504)
Commit 90212b12 changed the OVS agent so adding vital drop flows on
br-int (table 0 priority 2) for packets from physical bridges was
deferred until DVR initialization later on. But if br-int has no flows
from a previous run (eg after host reboot), then these packets will hit
the NORMAL flow in table 60. And if there is more than one physical
bridge, then the physical interfaces from the different bridges are now
essentially connected at layer 2 and a network loop is possible in the
time before the flows are added by DVR. Also the DVR code won't add them
until after RPC calls to the server, so a loop is more likely if the
server is not available.
This patch restores adding these flows to when the physical bridges are
first configured. Also updated a comment that was no longer correct and
updated the unit test.
Change-Id: I42c33fefaae6a7bee134779c840f35632823472e
Closes-Bug: #1887148
Related-Bug: #1869808
(cherry picked from commit c1a77ef8b7)
(cherry picked from commit 143fe8ff89)
(cherry picked from commit 6a861b8c8c28e5675ec2208057298b811ba2b649)
(cherry picked from commit 8181c5dbfe)
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
In case when physical bridge is removed and created again it
is initialized by neutron-ovs-agent.
But if agent has enabled distributed routing, dvr related
flows wasn't configured again and that lead to connectivity issues
in case of DVR routers.
This patch fixes it by adding configuration of dvr related flows
if distributed routing is enabled in agent's configuration.
It also adds reset list of phys_brs in dvr_agent. Without that there
were different objects used in ovs agent and dvr_agent classes thus
e.g. 2 various cookie ids were set on flows in physical bridge.
This was also the same issue in case when openvswitch was restarted and
all bridges were reconfigured.
Now in such case there is correctly new cookie_id configured for all
flows.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I710f00f0f542bcf7fa2fc60800797b90f9f77e14
Closes-Bug: #1864822
(cherry picked from commit 91f0bf3c85)
Do not flood the packets to bridge, since we have the
bridge port list, we can add a simple direct flow to
the right port only.
Conflicts:
neutron/agent/linux/openvswitch_firewall/firewall.py
neutron/conf/plugins/ml2/drivers/ovs_conf.py
Closes-Bug: #1732067
Related-Bug: #1841622
Change-Id: I14fefe289a19b718b247bf0740ca9bc47f8903f4
(cherry picked from commit efa8dd0895)
The OVS agent processes the port events in a polling loop. It could
happen (and more frequently in a loaded OVS agent) that the "removed"
and "added" events can happen in the same polling iteration. Because
of this, the same port is detected as "removed" and "added".
When the virtual machine is restarted, the port event sequence is
"removed" and then "added". When both events are captured in the same
iteration, the port is already present in the bridge and the port is
discharted from the "removed" list.
Because the port was removed first and the added, the QoS policies do
not apply anymore (QoS and Queue registers, OF rules). If the QoS
policy does not change, the QoS agent driver will detect it and won't
call the QoS driver methods (based on the OVS agent QoS cache, storing
port and QoS rules). This will lead to an unconfigured port.
This patch solves this issue by detecting this double event and
registering it as "removed_and_added". When the "added" port is
handled, the QoS deletion method is called first (if needed) to remove
the unneded artifacts (OVS registers, OF rules) and remove the QoS
cache (port/QoS policy). Then the QoS policy is applied again on the
port.
NOTE: this is going to be quite difficult to be tested in a fullstack
test.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I51eef168fa8c18a3e4cee57c9ff86046ea9203fd
Closes-Bug: #1845161
(cherry picked from commit 50ffa5173d)
(cherry picked from commit 3eceb6d2ae)
(cherry picked from commit 6376391b45)
Type of lvm.vlan is int and other_config.get('tag') is a string,
they can never be equal. We should do type conversion before
comparing to avoid unnecessary operation of ovsdb and flows.
Change-Id: Ib84da6296ddf3c95be9e9f370eb574bf92ceec15
Closes-Bug: #1843425
(cherry picked from commit 0550c0e1f6)
Neutron-ovs-agent configures physical bridges that they works
in fail_mode=secure. This means that only packets which match some
OpenFlow rule in the bridge can be processed.
This may cause problem on hosts with only one physical NIC
where same bridge is used to provide control plane connectivity
like connection to rabbitmq and data plane connectivity for VM.
After e.g. host reboot bridge will still be in fail_mode=secure
but there will be no any OpenFlow rule on it thus there will be
no communication to rabbitmq.
With current order of actions in __init__ method of OVSNeutronAgent
class it first tries to establish connection to rabbitmq and later
configure physical bridges with some initial OpenFlow rules.
And in case described above it will fail as there is no connectivity
to rabbitmq through physical bridge.
So this patch changes order of actions in __init__ method that it first
setup physical bridges and than configure rpc connection.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I41c02b0164537c5b1c766feab8117cc88487bc77
Closes-Bug: #1840443
(cherry picked from commit d41bd58f31)
(cherry picked from commit 3a2842bdd8)
In case when physical bridge was recreated on host, ovs agent
is trying to reconfigure it.
If there will be e.g. timeout while getting bridge's datapath_id,
RuntimeError will be raised and that caused crash of whole agent.
This patch changes that to not crash agent in such case but try to
reconfigure everything in next rpc_loop iteration once again.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
neutron/tests/unit/plugins/ml2/drivers/openvswitch/agent/test_ovs_neutron_agent.py
Change-Id: Ic9b17a420068c0c76748e2c24d97be1ed7c460c7
Closes-Bug: #1837380
(cherry picked from commit b63809715a)
It may happen that subnet is connected to dvr router using IP address
different than subnet's gateway_ip.
So in br-tun arp to dvr router's port should be dropped instead of
dropping arp to subnet's gateway_ip (or mac in case of IPv6).
Conflicts:
neutron/tests/unit/plugins/ml2/drivers/openvswitch/agent/test_ovs_neutron_agent.py
Change-Id: Ida6b7ae53f3fc76f54e389c5f7131b5a66f533ce
Closes-bug: #1831575
(cherry picked from commit ae3aa28f5a)
In the OVS agent, when setting up the ancillary bridges, the parameter
external_id:bridge-id is retrieved. If this parameter is not defined
(e.g.: manually created bridges), ovsdbapp writes an error in the logs.
This information is irrelevant and can cause confusion during debugging time.
Change-Id: Ic85db65f651eb67fcb56b937ebe5850ec1e8f29f
Closes-Bug: #1815912
(cherry picked from commit 769e971293)
In order to avoid inaccurate agent_boot_time setting,
this patch suggests to consider agent as "started" only
after completion of initial sync with server.
Change-Id: Icba05288889219e8a606c3809efd88b2c234bef3
Closes-Bug: #1799178
(cherry picked from commit 8f20963c5b)
Ovs-agent can be very time-consuming in handling a large number
of ports. At this point, the ovs-agent status report may have
exceeded the set timeout value. Some flows updating operations
will not be triggerred. This results in flows loss during agent
restart, especially for hosts to hosts of vxlan tunnel flow.
This fix will let the ovs-agent explicitly, in the first rpc loop,
indicate that the status is restarted. Then l2pop will be required
to update fdb entries.
Conflicts:
neutron/plugins/ml2/rpc.py
Closes-Bug: #1813703
Closes-Bug: #1813714
Closes-Bug: #1813715
Closes-Bug: #1794991
Closes-Bug: #1799178
Change-Id: I8edc2deb509216add1fb21e1893f1c17dda80961
(cherry picked from commit a5244d6d44)
The dump-flows action will get a very large sets of flow information
if there are enormous ports or openflow security group rules. For now
we can meet some known exception during such action, for instance,
memory issue, timeout issue.
So after this patch, the cleanup action of the bridge stale flows
will be done one table by one table. But note, this only supports
for 'native' OpenFlow interface driver.
Related-Bug: #1813703
Related-Bug: #1813712
Related-Bug: #1813709
Related-Bug: #1813708
Change-Id: Ie06d1bebe83ffeaf7130dcbb8ca21e5e59a220fb
(cherry picked from commit f898ffd71f)
The native OVS/ofctl controllers talk to the bridges using a
datapath-id, instead of the bridge name. The datapath ID is
auto-generated based on the MAC address of the bridge's NIC.
In the case where bridges are on VLAN interfaces, they would
have the same MACs, therefore the same datapath-id, causing
flows for one physical bridge to be programmed on each other.
The datapath-id is a 64-bit field, with lower 48 bits being
the MAC. We set the upper 12 unused bits to identify each
unique physical bridge
This could also be fixed manually using ovs-vsctl set, but
it might be beneficial to automate this in the code.
ovs-vsctl set bridge <mybr> other-config:datapath-id=<datapathid>
You can change this yourself using above command.
You can view/verify current datapath-id via
ovs-vsctl get Bridge br-vlan datapath-id
"00006ea5a4b38a4a"
(please note that other-config is needed in the set, but not get)
Closes-Bug: #1697243
Co-Authored-By: Rodolfo Alonso Hernandez <ralonsoh@redhat.com>
Change-Id: I575ddf0a66e2cfe745af3874728809cf54e37745
(cherry picked from commit 379a9faf62)
This fixes race condition leading to lack of fdb entries
on agent after OVS restart, if agent managed to handle all ports
before sending state report with start_flag set to True.
Change-Id: I943f8d805630cdfbefff9cff1fb4bce89210618b
Closes-Bug: #1808136
(cherry picked from commit 3995abefb1)
When ovs-vswitchd process is restarted neutron-ovs-agent will
handle it and reconfigure all ports and openflows in bridges.
Unfortunatelly when tunnel networks are used together with
L2pop mechanism driver, this driver will not notice that agent
lost all openflow config and will not send all fdb entries which
should be added on host.
In such case L2pop mechanism driver should behave in same way like
when neutron-ovs-agent is restarted and send all fdb_entries to
agent.
This patch adds "simulate" of agent start flag when ovs_restart is
handled thus neutron-server will send all fdb_entries to agent and
tunnels openflow rules can be reconfigured properly.
Change-Id: I5f1471e20bbad90c4cdcbc6c06d3a4412db55b2a
Closes-bug: #1804842
(cherry picked from commit ae031d1886)
On hosts with dvr_snat agent mode, after restarting OVS agent,
sometimes the SNAT port is processed first instead of the distributed port.
The subnet_info is cached locally via get_subnet_for_dvr when either of these ports
are processed. However, it returns the MAC address of the port used to query
as the gateway for the subnet. Using the SNAT port, this puts the wrong
MAC as the gateway, causing some flows such as the DVR flows on br-int
for local src VMs to have the wrong MAC.
This patch fixes the get_subnet_for_dvr with fixed_ips as None for the csnat port,
as that causes the server side handler to fill in the subnet's actual gateway
rather than using the port's MAC.
Change-Id: If045851819fd53c3b9a1506cc52bc1757e6d6851
Closes-Bug: #1783470
(cherry picked from commit c6de172e58)
The Neutron OVS agent logs can get flooded with KeyErrors as the
'_get_port_info' method skips the added/removed dict items if no
ports have been added/removed, which are expected to be present,
even if those are just empty sets.
This change ensures that those port info dict fields are always set.
Closes-Bug: #1783556
Change-Id: I9e5325aa2d8525231353ba451e8ea895be51b1ca
(cherry picked from commit da5b13df2b)
As part of the implementation of multiple port bindings [1], add binding
activation support to the OVS agent. This will enable the execution in
OVS agents of the complete sequence of steps outlined in [1] during an
instance migration:
1) Create inactive port bindings for destination host
2) Migrate the instance to the destination host and plug its VIFs
3) Activate the port bindings in the destination host
4) Delete the port bindings for the source host
[1] https://review.openstack.org/#/c/309416/
Change-Id: Iabca39364ec95633b2a8891fc295b3ada5f4f5e0
Partial-Bug: #1580880
To support multiple port bindings, the L2 agents have to have the
capability to handle binding deactivation notifications from the
Neutron server. This patch adds the necessary code to the OVS agent.
After receiving the notification, the agent un-plugs the corresponding
VIF from the integration bridge.
Change-Id: I78178de2039ccabc649558de4f6549a38de90418
Partial-Bug: #1580880
In case when external bridge configured in OVS agent's bridge_mappings
will be destroyed and created again (for example by running ifup-ovs
script on Centos) bridge wasn't configured by OVS agent.
That might cause broken connectivity for OpenStack's dataplane if
dataplane network also uses same bridge.
This patch adds additional ovsdb-monitor to monitor if any
of physical bridges configured in bridge_mappings was created.
If so, agent will reconfigure it to restore proper openflow rules
on it.
Change-Id: I9c0dc587e70327e03be5a64522d0c679665f79bd
Closes-Bug: #1768990
Instead of allowing an error to bubble up and exit from rpc_loop, catch
it and assume the switch is dead which will make the agent to wait until
the switch is back without failing the service.
Change-Id: Ic3095dd42b386f56b1f75ebb6a125606f295551b
Closes-Bug: #1731494
Add error handling for get_network_info_for_id rpc call in the
ovs_dvr_neutron_agent.
Closes-Bug: #1758093
Change-Id: I44a5911554c712c89cdc8901cbc7b844c4b0a363
Inter Tenant Traffic between two different networks that belong
to two different Tenants is not possible when connected through
a shared network that are internally connected through DVR
routers.
This issue can be seen in multinode environment where there
is network isolation.
The issue is, we have two different IP for the ports that are
connecting the two routers and DVR does not expose the router
interfaces outside a compute and is blocked by ovs tunnel bridge
rules.
This patch fixes the issue by not applying the DVR specific
rules in the tunnel-bridge to the shared network ports that
are connecting the routers.
Closes-Bug: #1751396
Change-Id: I0717f29209f1354605d2f4128949ddbaefd99629
Adding ability to set DSCP field in OVS tunnels outer header, or
inherit it from the inner header's DSCP value for OVS and linuxbridge.
Change-Id: Ia59753ded73cd23019605668e60cfbc8841e803d
Closes-Bug: #1692951
When Openvswitch agent will get "port_update" event
(e.g. to set port as unbound) and port is already removed
from br-int when agent tries to get vif port in
treat_devices_added_updated() method (port is removed
because e.g. nova-compute removes it) then resources set
for port by L2 agent extension drivers (like qos) are not
cleaned properly.
In such case port is added to skipped_ports and is set
as DOWN in neutron-db but ext_manager is not called then
for such port so it will not clear stuff like bandwidth
limit's QoS and queue records and also DSCP marking
open flow rules for this port.
This patch fixes this issue by adding call of
ext_manager.delete_port() method for all skipped ports.
Change-Id: I3cf5c57c7f232deaa190ab6b0129e398fdabe592
Closes-Bug: #1737892
neutron-lib contains a number of the plugin related constants from
neutron.plugins.common.constants. This patch consumes those constants
from neutron-lib and removes them from neutron. In addition the notion
of the dummy plugin service type is moved strictly into the test
package of neutron since it's not a real service plugin.
NeutronLibImpact
Change-Id: I767c626f3fe6159ab3abd6a7ae3cb9893b79bf66
The neutron-lib commit I360545b6ee4291547e0c5c8e668ad03d3efa4725 moved
the externally consumed globals from neutron.common.constants into lib.
With the exception of PROVISIONAL_IPV6_PD_PREFIX all other constants
in neutron.common.constants should only be used in neutron, and will
hopefully remain that way. External consumers needing access to other
common constants should move them into lib first.
NeutronLibImpact
Change-Id: Ie4bcffccf626a6e1de84af01f3487feb825f8b65
When the OVS agent skips processing a port because it was
not found on the integration bridge, it doesn't send back
any status to the server to notify it. This can cause the
port to get stuck in the BUILD state indefinitely, since
that is the default state it gets before the server tells
the agent to update it.
The OVS agent will now notify the server that any skipped
device should be considered DOWN if it did not exist.
Change-Id: I15dc55951cdb75c6d87d7c645f8e2cbf82b2f3e4
Closes-bug: #1719011
Otherwise we don't see some of them for the agent, for example,
AGENT.root_helper is missing.
To make sure the logging is as early as possible, and to make sure that
options that may be registered by extensions are also logged, some
refactoring was applied to the code to move the extension manager
loading as early as possible, even before agent's __init__ is called.
Related-Bug: #1718767
Change-Id: I823150cf6406f709d1e4ffa74897d598e80f5329
This was deprecated over a year ago in [1] so let's
get rid of it to clean up some code.
1. Ib63ba8ae7050465a0786ea3d50c65f413f4ebe38
Change-Id: I6039fb7e743c5d9a1a313e3c174ada36c9874c70
DVR flows are not compatible with OVS firewall flows as firewall flows
have higher priority. As a consequence, rules for DVR were never match
as firewall uses output directly.
This patch replaces flows using normal or output actions and resends
packets to TRANSIENT table instead. This transient table then uses
either those normal or output action rules. With this split, we will be
able to match egress/ingress flows in TRANSIENT table instead of
LOCAL_SWITCHING putting DVR pipeline in front of OVS firewall pipeline.
Change-Id: I9f738047f131b42d11a90f539435006d16ea7883
Closes-bug: #1696983
Now with the merge of push notifications, processing a port update
no longer automatically implies a transition from ACTIVE to BUILD
to ACTIVE again.
This resulted in a bug where Nova would unplug and replug an interface
quickly during rebuild and it would never get a vif-plugged event.
Nothing in the data model was actually being updated that resulted in
the status being set to DOWN or BUILD and the port would return before
the agent would process it as a removed port to mark it as DOWN.
This fixes the bug by making the agent force the port to DOWN whenever
it loses its VLAN. Watching for the VLAN loss was already introduced
to detect these fast unplug/plug events before so this just adds the
status update.
Closes-Bug: #1694371
Change-Id: Ice24eea2534fd6f3b103ec014218a65a45492b1f
Replace the calls to the OVSPluginAPI info retrieval functions
with reads directly from the push notification cache.
Since we now depend on the cache for the source of truth, the
'port_update'/'port_delete'/'network_update' handlers are configured
to be called whenever the cache receives a corresponding resource update.
The OVS agent will no longer subscribe to topic notifications for ports
or networks from the legacy notification API.
Partially-Implements: blueprint push-notifications
Change-Id: Ib2234ec1f5d328649c6bb1c3fe07799d3e351f48
idl_factory was removed in favor of just passing in an Idl instance
as an Idl doesn't start a connection until its .run() is called.
The try/excepts will be removed when the ovsdbapp 0.4.0 constraint
changes are merged.
Change-Id: Id22faa1f6179c2fdf8a136972d65f10749c9fc2e
With backoff client, setting .timeout property on it doesn't take any
effect. It means that starting from Mitaka, we broke
quitting_rpc_timeout option.
Now, when the TERM signal is received, we reset the dict capturing
per-method timeouts; and we cap waiting times by the value of the
option. This significantly reduces time needed for the agent to
gracefully shut down.
Change-Id: I2d86ed7a6f337395bfcfdb0698ec685cf384f172
Related-Bug: #1663458