When the vlan and vxlan both exist in env, and l2population
and arp_responder are enabled, if we update a port's ip address
from vlan network, there will be arp responder related flows
added into br-tun, this will cause too many arp reply for
one arp request, and vm connections will be unnormal.
Closes-Bug: #1824504
Change-Id: I1b6154b9433a9442d3e0118dedfa01c4a9b4740b
(cherry picked from commit 5301ecf41b)
Ovs-agent can be very time-consuming in handling a large number
of ports. At this point, the ovs-agent status report may have
exceeded the set timeout value. Some flows updating operations
will not be triggerred. This results in flows loss during agent
restart, especially for hosts to hosts of vxlan tunnel flow.
This fix will let the ovs-agent explicitly, in the first rpc loop,
indicate that the status is restarted. Then l2pop will be required
to update fdb entries.
Conflicts:
neutron/plugins/ml2/rpc.py
Closes-Bug: #1813703
Closes-Bug: #1813714
Closes-Bug: #1813715
Closes-Bug: #1794991
Closes-Bug: #1799178
Change-Id: I8edc2deb509216add1fb21e1893f1c17dda80961
(cherry picked from commit a5244d6d44)
With high concurrency more than 1 port may be activated on an
OVS agent at the same time (like VM port + a DVR port),
so the patch mitigates the condition by checking for 1 or 2
first active ports.
Given that the condition also contains "or self.agent_restarted(context)"
which makes it True first 180 sec (by default) after agent restart,
I believe the downside of changing 1 to 2 should be negligible.
Please see bug for more details on the issue.
Closes-Bug: #1789846
Change-Id: Ieab0186cbe05185d47bbf5a31141563cf923f66f
(cherry picked from commit b32db30874)
When HA router's interface on host is going DOWN but router
is still available on this host, L2 population
mechanism driver will now send to other hosts info to remove
fdb unicast entries to this port on host.
It will not send FLOODING_ENTRY because this port is still on
host but in standby mode and might be transformed to master
in future.
This solves issue with migration router from Legacy to HA.
In such case, port which was originally attached to legacy
router is transformed to be HA backup port before changing
its status to DOWN.
Now in such case unicast entries to this port and backup
node will be removed properly so packets to HA router will
be really send to host which is master node for router.
Closes-Bug: #1785582
Change-Id: Icc14e5f5d40fc6fbb49e0f7b18cc3b15ebec8508
(cherry picked from commit 6c300b1a9b)
neutron-openvswitch-agent will refresh flows when it's restarted.
But the port's binding status is not changed, update_port_postcommit
will be skipped at function '_update_individual_port_db_status' in
'neutron/plugins/ml2/plugin.py', l2pop don't handle DVR ports, the
fdb entries about DVR port will not be added.
So, we can't skip DVR port at notify_l2pop_port_wiring when agent
is restared.
Closes-Bug: #1773286
Change-Id: I54e3db4822830a0c83daf7b5150575f8d6e2497b
The neutron.common.topics module was rehomed into neutron-lib with
commit Ie88b84949cbd55a4e7ad06341aab77b286cdc485
This patch consumes it by removing the rehomed module from neutron
and using the module from neutron-lib instead.
NeutronLibImpact
Change-Id: Ia4a4604c259ce862597de80c6deeb3d408bf0e95
Agent object has been merged [1].
This patch uses Agent object in agents_db and test_agents_db.
We also introduce a new function (get_agents_object) and keep
the old function (get_agents_db) for backward compatibility.
[1] https://review.openstack.org/#/c/297887/
Co-Authored-By: Nguyen Phuong An <AnNP@vn.fujitsu.com>
Change-Id: I4c4283cb1aa05d52dca00cc249e094ea7d55b1d3
Partially-Implements: blueprint adopt-oslo-versioned-objects-for-db
Recently we have been seeing an error in neutron associated
with DVR routers, that says 'Binding info for DVR port not
found'.
This error is thrown when the 'get_bound_port_context' is
called when trying to notify_l2pop_port_wiring.
notify_l2pop_port_wiring is intended for router 'HA' ports, so
the get_bound_port_context should be called here only for 'HA'
ports.
This was introduced by a recent refactor in neutron
Icd4cd4e3f735e88299e86468380c5f786e7628fe
Change-Id: I1c636344068518aa26be6c96c598c61b7f0f3563
Closes-Bug: #1702769
The error sneaked in with Ib6e59ab3405857d3ed4d82df1a80800089c3f06e
where is_ha_router_port expects a NeutronContext object but we still
pass PortContext instead.
Change-Id: I593af5d050de00ddea7d758007d9856c4b97695f
Closes-Bug: #1703938
neutron-lib now contains the API definitions for neutron's core
resources. This patch removes the constant core resource and collection
variables and uses them from lib. Subsequent patches will consume the
actual core resource attribute definitions.
NeutronLibImpact
Change-Id: Ia9afdf620cd538b2aa420593277d6403a45c996b
The well known service type constants are in
neutron_lib.plugins.constants, but for legacy reasons a few still exist
and are referenced from neutron_lib.constants that we'd like to remove.
This patch switches references over to neutron_lib's plugin constants.
Change-Id: I1861448cec303725b30cef8f42029f467f9e03a3
The workaround of using deepcopy calls on the PortBinding
and PortBindingLevel objects prevents the port relationship
from being loaded to bump its revision because it then fails
to merge.
So in order to allow port bindings to bump the revision we
need to stop using sqlalchemy objects in the PortContext. This
patch adds a new snapshot object that just copies the column
values and provides a method to reconcile them back into the
session.
This workaround can go away after we switch to using OVOs, but
this needs to be backportable so we can't just wait for OVO
adoption.
Partial-Bug: #1699034
Change-Id: Ib85ec8182117fa3c4844dabfffe881e38e68b556
This patch integrates Router Extra Attributes OVO and uses proper
context in calling methods for object operations.
The other integration parts of this OVO in l3_agentschedulers_db.py
and l3_attrs_db.py are being done in patch [1] and [2] respectively.
[1] - I0af665a97087ad72431d58f04089a804088ef005
[2] - Id5ed0a541d09cd1105f9cb067401e2afa8cd9b83
Change-Id: Ib6e59ab3405857d3ed4d82df1a80800089c3f06e
Partially-Implements: blueprint adopt-oslo-versioned-objects-for-db
Neutron-lib 1.1.0 is now out and contains the portbindings
API definition (as per commit [1]). This patch moves neutron
references over to the neutron-lib version.
NeutronLibImpact
- Consumers using the public constants within neutron's
portbindings API extension must now use the values
from neutron-lib.
[1] 87e42f993c07ae320159d5123662ee9f3bd4d903
Change-Id: I669af9b4c712877772d91a03857ab108714001d4
This reverts commit 4ba0e75254.
This broke the fullstack job, which doesn't setup DHCP agents
so we get stuck waiting for ports to become active.
Change-Id: I28ac1362c3be7a459cfff5dad2b5251e4edd13fa
Provisioning blocks merged in Newton so for Pike we can
safely assume we are not running with Liberty agents that
don't notify the server when the port is ready.
This allows us to skip a query to the agents and configuration
parsing on every port provisioning block setup.
Related-Bug: #1453350
Change-Id: I8111469ad4b0d88580bff7a77492ad95af8e9377
Neutron-lib 1.1.0 is now out and contains the provider
network API definition (as per commit [1]). This patch
moves neutron references over to the neutron-lib
version.
NeutronLibImpact
- Consumers using the public constants within neutron's
providernet API extension must now use the values
from neutron-lib.
[1] cba0f9f0dd920b1f828c4bba3bd388d5b4eb9abf
Change-Id: I46390a159e93642901de87ea6604f2e7ffa03bad
get_agent_by_host can return None in the l2pop
driver so we need to check for that case before
we blindly try to decode configuration values on
the result.
There are a couple of cases that can lead to this.
* The deployment can be misconfigured and is missing
either a tunneling_ip option for the agent on a
host or is missing an L2 agent with that host_id
entirely.
* Multiple mech drivers are in use and a port is being
deleted from an agentless host.
Related-Bug: #1533013
Closes-Bug: #1672564
Change-Id: I1e79f600172edad1e31e8231a0a6a2c55f46804c
In delete_port_postcommit, a DVR port (port['device_owner'] =
DEVICE_OWNER_ROUTER_SNAT) can match on l2pop_db.HA_ROUTER_PORTS[1],
but can not get any fdb entries by _get_ha_port_agents_fdb. Then,
the fdb_entries[network_id]['ports'] is been overwritten by {}. So
the associated flow entries will not be deleted.
Closes-Bug: #1668277
Change-Id: I7b621157fe85945acd99e4f08b6370d2f9c3d44d
Change I3447ea5bcb7c57365c6f50efe12a1671e86588b3 introduced a new
running-index for RouterL3AgentBinding, binding_index, which helps to
keep count of how many bindings a router has for each agent (and how
many bindings in total). Since we were able use this DB column to make
sure concurrency doesn't break on creating a new HA router, we also
postponed the creation of L3HARouterAgentPortBinding to after the first
binding was successfully created.
This patch proposes a change to the way routers are scheduled to an
agent: when creating a new HA router, no L3HARouterAgentPortBinding
entities will be created until after the corresponding
RouterL3AgentBinding was successfully created.
In other words, instead of pre-creating the L3HARouterAgentPortBinding
objects without assigning it to an agent, we'll create them only after
the RouterL3AgentBinding were successfully created.
Related-Bug: #1609738
Change-Id: Ie98d5e3760cdb17450aea546f4b61f5ba14baf1c
Neutron Manager is loaded at the very startup of the neutron
server process and with it plugins are loaded and stored for
lookup purposes as their references are widely used across the
entire neutron codebase.
Rather than holding these references directly in NeutronManager
this patch refactors the code so that these references are held
by a plugin directory.
This allows subprojects and other parts of the Neutron codebase
to use the directory in lieu of the manager. The result is a
leaner, cleaner, and more decoupled code.
Usage pattern [1,2] can be translated to [3,4] respectively.
[1] manager.NeutronManager.get_service_plugins()[FOO]
[2] manager.NeutronManager.get_plugin()
[3] directory.get_plugin(FOO)
[4] directory.get_plugin()
The more entangled part is in the neutron unit tests, where the
use of the manager can be simplified as mocking is typically
replaced by a call to the directory add_plugin() method. This is
safe as each test case gets its own copy of the plugin directory.
That said, unit tests that look more like API tests and that rely on
the entire plugin machinery, need some tweaking to avoid stumbling
into plugin loading failures.
Due to the massive use of the manager, deprecation warnings are
considered impractical as they cause logs to bloat out of proportion.
Follow-up patches that show how to adopt the directory in neutron
subprojects are tagged with topic:plugin-directory.
NeutronLibImpact
Partially-implements: blueprint neutron-lib
Change-Id: I7331e914234c5f0b7abe836604fdd7e4067551cf
_get_agent_fdb may return None so we need to check for
that before we try to iterate over a key inside of it
in delete_port_postcommit.
Closes-Bug: #1622996
Change-Id: I2256df0e08380e550f32248fb9589ee43b0923ff
This patch makes L3 HA failover not depended on neutron components
(during failover).
All HA agents(active and backup) call update_device_up/down after wiring
the ports. But l2pop driver is called for only active agent as port
binding in DB reflects active agent. Then l2pop creates unicast and
multicast flows for active agent.
On failover, flows to new active agent is created. For this to happen -
all of database, messaging server, neutron-server and destination L3
agent should be active during failover. This creates two issues -
1) When any of the above resources(i.e neutron-server, .. ) are dead,
flows between new master and other agents won't be created and
L3 Ha failover is not working. In same scenario, L3 Ha failover will
work if l2pop is disabled.
2) Packet loss during failover is higher as above neutron resources
interact multiple times, so will take time to create l2 flows.
In this change, we allow plugin to notify l2pop when update_device_up/down
is called by backup agents also. Then l2pop will create flood flows to
all HA agents(both active and slave). L2pop won't create unicast flow for
this port, instead unicast flow is created by learning action of table 10
when keepalived sends GARP after assigning ip address to master router's
qr-xx port. As flood flows are already created and unicast flow is
dynamically added, L3 HA failover is not depended on l2pop.
This solves two isses
1) with L3 HA + l2pop, failover will work even if any of above agents
or processes dead.
2) Reduce failover time as we are not depending on neutron to create
flows during failover.
We use L3HARouterAgentPortBinding table for getting all HA agents of a
router port. HA router port on slave agent is also considered for l2pop
distributed_active_network_ports and agent_network_active_port_count
Closes-bug: #1522980
Closes-bug: #1602614
Change-Id: Ie1f5289390b3ff3f7f3ed7ffc8f6a8258ee8662e
Remove deprecation warnings for various constants
and exceptions that have moved to neutron_lib.
Fix miscellaneous other deprecations.
Uses constants instead of l3_constants when importing
neutron-lib constants.
Co-Authored By: Henry Gessau <gessau@gmail.com>
Co-Authored By: Gary Kotton <gkotton@vmware.com>
Change-Id: Ib0e8ff5c3e23677c1009241a1818cbc8a3430c38
Now the ML2 core plugin maps driver errors to MechanismDriverError
and hides the error details from the caller.
This patch change MechanismDriverError from an instance of
NeutronException to an instance of MultipleExceptions. Add add
exceptions from mechanism driver as inner_exceptions of
MultipleExceptions. As a result, the api layer will unwrap the
MechanismDriverError and return the real error to client.
Change-Id: I3a46932848d59f7f027640bfb598650f064b0a12
Closes-bug: #1273730
The issue might happen when VMs are intensively created/deleted.
With the patch deleted ports will be just skipped.
Closes-Bug: #1610303
Change-Id: I32b0de9c452cf973d687c72e8381584012c9f3b4
As part of making DVR portbinding implementation generic, we rename
dvr portbinding functions as distributed portbinding functions.
In next patch we make dvr logic for port binding generic,
to be useful for all distributed router ports(for example, HA).
Partial-Bug: #1595043
Partial-Bug: #1522980
Change-Id: I402df76c64299156d4ed48ac92ede1e8e9f28f23
This is a clean up for patch [1], the functions map should be removed
to make code easy to read. Start a deprecation cycle for these functions
in case external projects will use them.
[1] https://review.openstack.org/#/c/242393
Change-Id: I77c83bd7ee0c8ef92d8aaaa8e968479b848532fe
Partially-Implements: blueprint routed-networks
These unit tests initially asserted sequential allocation of IP
addresses, even though they have no need to specifically assert
that a specific IP was allocated. This made it difficult to
change out the IP allocation algorithm in the future and made
these tests fragile and poorly isolated.
This change breaks the dependency these unit tests have on a
specific IP allocation strategy and isolates them from any
changes that may be made to the order in which IP addresses
are allocated on a subnet.
Change-Id: Idc879b7f1e6496aa96b4f7ae6c3eaca6079bdcac
Partial-Bug: #1543094
This function isn't necessary. The json encoding of a
named tuple will already turn into a normal list.
ports = [l2pop_rpc.PortInfo('abcdef', '1.1.1.1')]
json.dumps(ports) == json.dumps([(mac, ip) for (mac, ip) in ports])
An argument could be made that the PortInfo object could have
something added to it later that we wouldn't want to serialize
in order to remain backward compatible. However, doing so would
break all of the constructions of PortInfo objects on the agents
once they got the updated code for PortInfo that requires the
new parameter.
So there is no way currently to add a new field to PortInfo without
breaking existing legacy clients or breaking new clients.
Given that, let's stop doing the json encoder's job.
This patch also adds a sanity unit test to make sure the json
serialization method used in oslo does not break on the named tuples.
Change-Id: I45ae69ef8c9c15ad21a28dc42f2d78b234ccfb0c
This fixes the problem that when two or more ports in a network
are migrated to a host that did not previously have any ports in
the same network, the new host is sometimes not told about the
IP/MAC addresses of all the other ports in the network. In other
words, initial L2population does not work, for the new host.
This is because the l2pop mechanism driver only sends catch-up
information to the host when it thinks it is dealing with the first
active port on that host; and currently, when multiple ports are
migrated to a new host, there is always more than one active port so
the condition above is never triggered.
The fix is for the ML2 plugin to set a port's status to DOWN when
its binding info changes.
This patch also fixes the bug when nova thinks it should not wait
for any events from neutron because all ports are already active.
Closes-bug: #1483601
Closes-bug: #1443421
Closes-Bug: #1522824
Related-Bug: #1450604
Change-Id: I342ad910360b21085316c25df2154854fd1001b2
The unmarshalling function was not aware of the data
structure used by update_fdb_entries, so it would not
setup PortInfo named tuples in the 'before' and 'after'
fields. This would break the fdb_chg_ip_tun function
which expected to be able to use named attributes.
This patch adjusts the unmarshalling function to be aware
of this datastrucure.
This has likely been broken since the change that added
named tuples here: I7f8c93b0e12ee0179bb23dfbb3a3d814615b1c2e
It probably went undetected for so long because the exception
will only be observed when the updated entry does not have
an agent IP that matches the local agent's (i.e. not single-node).
Even in a multi-node environment, this would only trigger an
error when the fixed_ips of a port changed so it wouldn't show
up in a normal port wiring life-cycle.
Closes-Bug: #1538387
Change-Id: I0aacb3af9ebd160ebfb801f77b186075303c3df5
This adds test coverage for the scenario where a port associated with a
network that has an IPv4 subnet and IPv6 subnet, the l2pop mechanism
driver will issue an RPC call to the agent with two port info objects,
one for the IPv4 address and one for the IPv6 address.
Change-Id: I4f50894722fad220132fa16b5e62e996a92293f0
None of the L2populationDbMixin methods actually use 'self' for
anything. As the class is basically just used as a namespace and
modules already provide that, this patch gets rid of the mixin. This
makes the code simpler and easier to debug as inheritance doesn't buy
us anything in this case.
Change-Id: Ibf4dfe49a2ebc32d3909d3d7b579d2bb2ea3f61d