When the vlan and vxlan both exist in env, and l2population
and arp_responder are enabled, if we update a port's ip address
from vlan network, there will be arp responder related flows
added into br-tun, this will cause too many arp reply for
one arp request, and vm connections will be unnormal.
Closes-Bug: #1824504
Change-Id: I1b6154b9433a9442d3e0118dedfa01c4a9b4740b
(cherry picked from commit 5301ecf41b)
In case when neutron-ovs-agent will notice that any of physical
bridges was "re-created", we should also ensure that stale Open
Flow rules (with old cookie id) are cleaned.
This patch is doing exactly that.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I7c7c8a4c371d6f4afdaab51ed50950e2b20db30f
Related-Bug: #1864822
(cherry picked from commit 63c45b3766)
Block traffic between br-int and br-physical is over kill
and will at least
1. interrupt vlan flow during startup, and is particularly
so if dvr enabled
2. if let's rabbitmq is not stable, it is possible data plane
will be affected and vlan will never work.
Using openstack on k8s particularly amplifies the problem
because pod could be killed pretty easily by liveness
probes.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I51050c600ba7090fea71213687d94340bac0674a
Closes-Bug: #1869808
(cherry picked from commit 90212b12cd)
In some cases it may be useful to log new vlan tag which is found
on the port when it losts old vlan tag which should is expected to
be there.
So this patch adds such value to the log message.
TrivialFix
Depends-On: https://review.opendev.org/735615
Change-Id: I231e624f460510decc6d2237040c8bef207e2e8e
(cherry picked from commit 3ac63422ea)
In case when physical bridge is removed and created again it
is initialized by neutron-ovs-agent.
But if agent has enabled distributed routing, dvr related
flows wasn't configured again and that lead to connectivity issues
in case of DVR routers.
This patch fixes it by adding configuration of dvr related flows
if distributed routing is enabled in agent's configuration.
It also adds reset list of phys_brs in dvr_agent. Without that there
were different objects used in ovs agent and dvr_agent classes thus
e.g. 2 various cookie ids were set on flows in physical bridge.
This was also the same issue in case when openvswitch was restarted and
all bridges were reconfigured.
Now in such case there is correctly new cookie_id configured for all
flows.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I710f00f0f542bcf7fa2fc60800797b90f9f77e14
Closes-Bug: #1864822
(cherry picked from commit 91f0bf3c85)
Neutron-ovs-agent can now enable IGMP snooping in integration bridge
if config option "igmp_snooping_enable" in OVS section in config will
be set to True.
It will also set mcast-snooping-disable-flood-unregistered=true
so flooding of multicast packets to all unregistered ports will be
disabled also.
Both changes are applied on integration bridge.
Change-Id: I12f4030a35d10d1715d3b4bfb3ed5efb9aa28f2b
Closes-Bug: #1840136
(cherry picked from commit 5b341150e2)
Although notify_nova_on_port_status_changes defaults to true, it
could be to false, making the nova_notifier attribute unsafe to
use without checking.
This patch checks both the config option and that the attribute
exists, since the config could be changed after the plugin is
already initialized without the nova_notifier attribute being set.
Change-Id: Ide0f93275e60dffda10b7da59f6d81c5582c3849
Closes-bug: #1843269
(cherry picked from commit ab4320edb4)
In order to reduce the number of elements retrieved from the DB, this
patch, before processing the VLAN allocations per physical network,
deleted those registers belonging to any unconfigured physical network.
The VLAN registers per physical network are deleted using a bulk delete
operation, to speed up the process.
Those missing VLAN registers per network are now created using a bulk
insert operation, available in the ORM. This bulk operation speeds up
the sync process.
Conflicts:
neutron/plugins/ml2/drivers/type_vlan.py
Change-Id: I8568e2277e157754aaff87a059a40e34e6a43e2b
Partial-Bug: #1862178
(cherry picked from commit 016e7826f1)
(cherry picked from commit 651eb12bec)
(cherry picked from commit 4fff732b76)
Operators may want to see how long it takes in the port
processing procedure since DEBUG log does not enable
basically in the production envrionment.
Related-Bug: #1813703
Related-Bug: #1813707
Related-Bug: #1813706
Related-Bug: #1813709
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I43733546abf5421d0e3f4cd5a959d279e1b89d1e
(cherry picked from commit 8e73de8bc4)
Do not flood the packets to bridge, since we have the
bridge port list, we can add a simple direct flow to
the right port only.
Conflicts:
neutron/agent/linux/openvswitch_firewall/firewall.py
neutron/conf/plugins/ml2/drivers/ovs_conf.py
Closes-Bug: #1732067
Related-Bug: #1841622
Change-Id: I14fefe289a19b718b247bf0740ca9bc47f8903f4
(cherry picked from commit efa8dd0895)
Patch https://review.opendev.org/#/c/697655/ cannot be backported
because it includes an RPC version change. This patch is for the
stable branches.
Currently the ovs agent calls update_device_list with the
agent_restarted flag set only on the first loop iteration. Then the
server knows to send the l2pop flooding entries for the network to
the agent. But when a compute node with many instances on many
networks reboots, it takes time to readd all the active devices and
some may be readded after the first loop iteration. Then the server
can fail to send the flooding entries which means there will be no
flood_to_tuns flow and broadcasts like dhcp will fail.
This patch fixes that by also setting the agent_restarted flag if
the agent has not received the flooding entries for a network.
Change-Id: Iccc4fe4a785ee042fd76a663d0e76a27facd1809
Closes-Bug: #1853613
(cherry picked from commit bc0ab0fcd7)
(cherry picked from commit aee87e72b1)
The OVS agent processes the port events in a polling loop. It could
happen (and more frequently in a loaded OVS agent) that the "removed"
and "added" events can happen in the same polling iteration. Because
of this, the same port is detected as "removed" and "added".
When the virtual machine is restarted, the port event sequence is
"removed" and then "added". When both events are captured in the same
iteration, the port is already present in the bridge and the port is
discharted from the "removed" list.
Because the port was removed first and the added, the QoS policies do
not apply anymore (QoS and Queue registers, OF rules). If the QoS
policy does not change, the QoS agent driver will detect it and won't
call the QoS driver methods (based on the OVS agent QoS cache, storing
port and QoS rules). This will lead to an unconfigured port.
This patch solves this issue by detecting this double event and
registering it as "removed_and_added". When the "added" port is
handled, the QoS deletion method is called first (if needed) to remove
the unneded artifacts (OVS registers, OF rules) and remove the QoS
cache (port/QoS policy). Then the QoS policy is applied again on the
port.
NOTE: this is going to be quite difficult to be tested in a fullstack
test.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I51eef168fa8c18a3e4cee57c9ff86046ea9203fd
Closes-Bug: #1845161
(cherry picked from commit 50ffa5173d)
(cherry picked from commit 3eceb6d2ae)
(cherry picked from commit 6376391b45)
OVS agent is a single thread module executed on a os-ken AppManager
context. os-ken uses, by default (and no other implementation is
available today [1]), "eventlet" threads. Those threads are scheduled
manually by the code itself; the context switch is done through
yielding. The easiest way to do this is by executing:
eventlet.sleep()
If the assigned thread is not ready to take the GIL and do not yield
back the executor, other threads will starve and eventually will
timeout.
This patch removes the "sleep" command during the DP retrieval. This
will keep the executor on the current thread and will prevent the
execution timeouts, as seen in the bug related.
[1]1f751b2d7d/os_ken/lib/hub.py
Closes-Bug: #1861269
Change-Id: I19e1af1bda788ed970d30ab251e895f7daa11e39
(cherry picked from commit 740741864a)
- This change updates _set_bridge_name to set
the bridge name field in the vif binding details.
- This change adds the integration_bridge name
to the agent configuration report.
Closes-Bug: #1788009
Closes-Bug: #1856152
(cherry picked from commit 995744c576)
Change-Id: I454efcb226745c585935d5bd1b3d378f69a55ca2
- This change adds a max priority flow to drop
all traffic that is associated with the
DEAD VLAN 4095.
- This change is part of a partial mitigation of
bug 1734320. Without this change vlan 4095 traffic
will be dropped via a low priority flow after being
processed by part/all of the openflow pipeline.
By raising the priorty and droping in table 0
we drop invalid packets as soon as they enter
the pipeline.
Change-Id: I3482c7c4f00942828cc9396cd2f3d646c9e8c9d1
Partial-Bug: #1734320
(cherry picked from commit e3dc447b90)
When a DVR router is migrated from distributed to
centralized, we are unbinding the router from the
agents, but the ml2 distributed portbindings for
the router port still remains intact.
This patch will fix the issue by deleting the
binding entry for multiple hosts.
Closes-Bug: #1718345
Change-Id: If139790eb336ff13b07b094151946af30322ad3e
(cherry picked from commit 32bfc3edec)
Type of lvm.vlan is int and other_config.get('tag') is a string,
they can never be equal. We should do type conversion before
comparing to avoid unnecessary operation of ovsdb and flows.
Change-Id: Ib84da6296ddf3c95be9e9f370eb574bf92ceec15
Closes-Bug: #1843425
(cherry picked from commit 0550c0e1f6)
In case when vlan network was created with segmentation_id=0 and without
physical_network given, it was passing validation of provider segment
and first available segmentation_id was choosen for network.
Problem was that in such case all available segmentation ids where
allocated and no other vlan network could be created later.
This patch fixes validation of segmentation_id when it is set to value 0.
Change-Id: Ic768deb84d544db832367f9a4b84a92729eee620
Closes-bug: #1840895
(cherry picked from commit f01f3ae5dd)
When a network is deleted, precommit handlers are notified prior to the
deletion of the network from the database. One handler exists in the ML2
plugin - _network_delete_precommit_handler. This handler queries the
database for the current state of the network and uses it to create a
NetworkContext which it saves under context._mech_context. When the
postcommit handler _network_delete_after_delete_handler is triggered
later, it passess the saved context._mech_context to mechanism drivers.
A problem can occur with provider networks since the segments service
also registers a precommit handler - _delete_segments_for_network. Both
precommit handlers use the default priority, so the order in which they
are called is random, and determined by dict ordering. If the segment
precommit handler executes first, it will delete the segments associated
with the network. When the ML2 plugin precommit handler runs it then
sees no segments for the network and sets the provider attributes of the
network in the NetworkContext to None.
A mechanism driver that is passed a NetworkContext without provider
attributes in its delete_network_postcommit method will not have the
information to perform the necessary actions. In the case of the
networking-generic-switch mechanism driver where this was observed, this
resulted in the driver ignoring the event, because the network did not
look like a VLAN.
This change uses a priority of zero for ML2 network delete precommit
handler, to ensure they query the network and store the NetworkContext
before the segments service has a chance to delete segments.
A similar change has been made for subnets, both to keep the pattern
consistent and avoid any similar issues.
Change-Id: I6482223ed2a479de4f5ef4cef056c311c0281408
Closes-Bug: #1841967
Depends-On: https://review.opendev.org/680001
(cherry picked from commit fea2d9091f)
as with https://review.opendev.org/#/c/656066/ if limit is applied in
any place other than at the end of the filters, sql alchemy will return
an error, and possibily we could return less result than intended.
Change-Id: I9a54ae99d2d5dfda63cb0061bcf3d727ed7cc992
Closes-Bug: #1827363
(cherry picked from commit 94bc403078)
Neutron-ovs-agent configures physical bridges that they works
in fail_mode=secure. This means that only packets which match some
OpenFlow rule in the bridge can be processed.
This may cause problem on hosts with only one physical NIC
where same bridge is used to provide control plane connectivity
like connection to rabbitmq and data plane connectivity for VM.
After e.g. host reboot bridge will still be in fail_mode=secure
but there will be no any OpenFlow rule on it thus there will be
no communication to rabbitmq.
With current order of actions in __init__ method of OVSNeutronAgent
class it first tries to establish connection to rabbitmq and later
configure physical bridges with some initial OpenFlow rules.
And in case described above it will fail as there is no connectivity
to rabbitmq through physical bridge.
So this patch changes order of actions in __init__ method that it first
setup physical bridges and than configure rpc connection.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I41c02b0164537c5b1c766feab8117cc88487bc77
Closes-Bug: #1840443
(cherry picked from commit d41bd58f31)
(cherry picked from commit 3a2842bdd8)
Concurrent calls to _bind_port_if_needed may lead to a missing RPC
notification which can cause a port stuck in a DOWN state. If the only
caller that succeeds in the concurrency does not specify that an RPC
notification is allowed then no RPC would be sent to the agent. The
other caller which needs to send an RPC notification will fail since the
resulting PortContext instance will not have any binding levels set.
The failure has negative effects on consumers of the L2Population
functionality because the L2Population mechanism driver will not be
triggered to publish that a port is UP on a given compute node. Manual
intervention is required in this case.
This patch proposes to handle this by populating the PortContext with
the current binding levels so that the caller can continue on and have
an RPC notification sent out.
Closes-Bug: #1755810
Story: 2003922
Change-Id: Ie2b813b2bdf181fb3c24743dbd13487ace6ee76a
(cherry picked from commit 0dc730c7c0)
This is a stable-only fix since code around the change was removed
in master: https://review.opendev.org/#/c/641866
Commit a5244d6d44 changed the check
order so regular non-dvr ports are checked for agent restarted.
However regular ports may be unbound already, which leads to the
error in the bug description: agent_restarted check is done against
a 'None' agent.
This patch fixed logic back - only check agent_restarted for dvr ports.
This also adds some logging to have a clue why update port up/down fails.
Change-Id: I3ad59864eeb42916d2cf15a5292d5aa9484f6e91
Closes-Bug: #1835731
(cherry picked from commit c3a3031f78)
when update port set body={"port": {}}, neutron server will return 500.
In function _process_port_binding_attributes(plugins/ml2/plugin.py),
when update port body={"port": {}}
attrs={}
vnic_type = attrs and attrs.get(portbindings.VNIC_TYPE)
vnic_type = {}
because attrs as False, will not execute attrs.get(portbindings.VNIC_TYPE)
vnic_type will be replicated as attrs.
Change-Id: I40d388543387ebdd72f26d761339c1829bef9413
Partial-bug: #1838396
(cherry picked from commit dd080c70b4)
In case when physical bridge was recreated on host, ovs agent
is trying to reconfigure it.
If there will be e.g. timeout while getting bridge's datapath_id,
RuntimeError will be raised and that caused crash of whole agent.
This patch changes that to not crash agent in such case but try to
reconfigure everything in next rpc_loop iteration once again.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
neutron/tests/unit/plugins/ml2/drivers/openvswitch/agent/test_ovs_neutron_agent.py
Change-Id: Ic9b17a420068c0c76748e2c24d97be1ed7c460c7
Closes-Bug: #1837380
(cherry picked from commit b63809715a)
Ovs-agent will scan and process the ports during the
first rpc_loop, and a local port update notification
will be sent out. This will cause these ports to
be processed again in the ovs-agent next (second)
rpc_loop.
This patch passes the restart flag (iteration num 0)
to the local port_update call trace. After this patch,
the local port_update notification will be ignored in
the first RPC loop.
Related-Bug: #1813703
Change-Id: Ic5bf718cfd056f805741892a91a8d45f7a6e0db3
(cherry picked from commit eaf3ff5786)
direct-physical ports inherit MAC address of physical device
when binding happens (VM created). When VM is deleted this
MAC has to be cleared so other ports may be bound to same device
without conflicts.
Conflicts:
neutron/plugins/ml2/plugin.py
Change-Id: I9dc8562546e9a7365d18492678f7fc1cb3553622
Closes-Bug: #1830383
(cherry picked from commit e603d19939)
Check for configured and actual number of VFs to prevent
device registaration with 0 VFs.
Closes-Bug: #1831622
Change-Id: Ie699d245f8ae2fc1d16b96432d2962788d9dba57
(cherry picked from commit c148c6df46)
This parameter applies to the OVSDB Controller table when the
native openflow driver is used. There are reports that increasing
it can reduce errors on busy systems. This patch also sets the
default value to 10s which is more than the OVS default of 5s.
See the ovs-vswitchd.conf.db man page for full description.
Conflicts:
neutron/tests/functional/agent/common/test_ovs_lib.py
Change-Id: If0d42919412dac75deb4d7f484c42cea630fbc59
Partial-Bug: #1817022
(cherry picked from commit 540d00f68e)
(cherry picked from commit 6e661ecd2d)
It may happen that subnet is connected to dvr router using IP address
different than subnet's gateway_ip.
So in br-tun arp to dvr router's port should be dropped instead of
dropping arp to subnet's gateway_ip (or mac in case of IPv6).
Conflicts:
neutron/tests/unit/plugins/ml2/drivers/openvswitch/agent/test_ovs_neutron_agent.py
Change-Id: Ida6b7ae53f3fc76f54e389c5f7131b5a66f533ce
Closes-bug: #1831575
(cherry picked from commit ae3aa28f5a)
For auto-address IPv6 subnets postcommit has update port action
if the net already has ports. This results in
"cannot be called within a transaction" error for bulk IPv6 subnet
create.
Closes-Bug: #1822582
Change-Id: Ia32ec4c11c0793e7df07dcce19c122b3c7f865e1
(cherry picked from commit 14c76d3181)
In the OVS agent, when setting up the ancillary bridges, the parameter
external_id:bridge-id is retrieved. If this parameter is not defined
(e.g.: manually created bridges), ovsdbapp writes an error in the logs.
This information is irrelevant and can cause confusion during debugging time.
Change-Id: Ic85db65f651eb67fcb56b937ebe5850ec1e8f29f
Closes-Bug: #1815912
(cherry picked from commit 769e971293)
In order to avoid inaccurate agent_boot_time setting,
this patch suggests to consider agent as "started" only
after completion of initial sync with server.
Change-Id: Icba05288889219e8a606c3809efd88b2c234bef3
Closes-Bug: #1799178
(cherry picked from commit 8f20963c5b)
In one specific compute node, the security group rules
can be enormous quantity. This patch adds a step-by-step
processing method to deal with the large number of the
security group rules. And also changes or adds some LOG.
Related-Bug: #1813703
Related-Bug: #1813704
Related-Bug: #1813707
Conflicts:
neutron/common/constants.py
Change-Id: I57bf27ec75cf848271c5a28b22beee12b8bd5faa
(cherry picked from commit 6ac420df7e)