Removed E125 (continuation line does not distinguish itself
from next logical line) from the ignore list and fixed all
the indentation issues. Didn't think it was going to be
close to 100 files when I started.
There are some extreme conditions which will result the unbound
router gateway port. Then all the centralized floating IPs will
not be reachable since the gateway port was set to 4095 tag.
This patch adds the HA status to the router related port
processing code path. If it is HA router, the gateway port
will go to the right HA router processing code branch.
All of the externally consumed variables from neutron.common.constants
now live in neutron-lib. This patch removes neutron.common.constants
and switches all uses over to lib.
Reduces E128 warnings by ~260 to just ~900,
no way we're getting rid of all of them at once (or ever).
Files under neutron/tests still have a ton of E128 warnings.
Co-Authored-By: Akihiro Motoki <email@example.com>
If l3-agent was restarted by a regular action, such as config change,
package upgrade, manually service restart etc. We should not set the
HA port down during such scenarios. Unless the physical host was
rebooted, aka the VRRP processes were all terminated.
This patch adds a new RPC call during l3 agent init, it will try to
retrieve the HA router count first. And then compare the VRRP process
(keepalived) count and 'neutron-keepalived-state-change' count
with the hosting router count. If the count matches, then that
set HA port to 'DOWN' state action will not be triggered anymore.
This patch implements the plugin.
This patch introduces an new service plugin for port forwarding resources,
named 'pf_plugin', and supports create/update/delete port forwarding
operation towards a free Floating IP.
This patch including some works below:
* Introduces portforwarding extension and the base class of plugin
* Introduces portforwarding plugin, support CRUD port forwarding
* Add the policy of portforwarding
The race issue fix in:
Fip extend port forwarding field addition in:
Partially-Implements: blueprint port-forwarding
The externally consumed APIs from neutron.db.api were rehomed into
neutron-lib with https://review.openstack.org/#/c/557040/
This patch consumes the retry_db_errors function from lib by:
- Removing retry_db_errors from neutron.db.api
- Updating the imports for retry_db_errors to use it from lib
- Using the DB API retry fixture from lib in the UTs where applicable
- Removing the UTs for neutron.db.api as they are now covered in lib
Post-binding information about router ports is missing in results of RPC
calls made by l3 agents. sync_routers code ensures that bindings are
present, however, it does not refresh router objects before returning
them - for RPC clients ports remain unbound before the next sync and
there is no necessary address scope information present to create routes
from fip namespaces to qrouter namespaces.
The is_extension_supported function now lives in neutron-lib. This patch
removes the function from neutron and uses lib's version instead.
Commit I81748aa0e48b1275df3e1ea41b1d36a117d0097d added the l3 extension
API definition to neutron-lib and commit
I2324a3a02789c798248cab41c278a2d9981d24be rehomed the l3 exceptions,
while Ifd79eb1a92853e49bd4ef028e7a7bd89811c6957 shims the l3
This patch consumes the l3 api def by:
- Removing the code from neutron that's now in lib.
- Using lib's version of the code where applicable.
- Tidying up the related unit tests as now that the l3 api def from lib
is used the necessary fixture is already setup in the parent chain when
setting up the unit test class.
In commit 500b255278 we are using
"get_router_ids" RPC to update HA network port status. But that
was needed to backport that commit to other branches.
As "get_router_ids" RPC is expected to fetch only router ids and
not to have any other processing, we are adding new RPC
"update_ha_network_port_status". L3 agent will call this new RPC
to set HA network port status to DOWN.
The well known service type constants are in
neutron_lib.plugins.constants, but for legacy reasons a few still exist
and are referenced from neutron_lib.constants that we'd like to remove.
This patch switches references over to neutron_lib's plugin constants.
When l3 agent node is rebooted, if HA network port status is already
ACTIVE in DB, agent will get this status from server and then spawn
the keepalived (though l2 agent might not have wired the port),
resulting in multiple HA masters active at the same time.
To fix this, when the L3 agent starts up we can have it explicitly
set the port status to DOWN for all of the HA ports on that node.
Then we are guaranteed that when they go to ACTIVE it will be because
the L2 agent has wired the ports.
Neutron-lib 1.1.0 is now out and contains the portbindings
API definition (as per commit ). This patch moves neutron
references over to the neutron-lib version.
- Consumers using the public constants within neutron's
portbindings API extension must now use the values
The handler was making the incorrect assumption that once a
host_id was set, a port which failed to bind could only be in
the 'binding_failed' state. So it would not try to rebind ports
that encountered an exeption during port binding commit that left
them in the unbound state.
Neutron Manager is loaded at the very startup of the neutron
server process and with it plugins are loaded and stored for
lookup purposes as their references are widely used across the
entire neutron codebase.
Rather than holding these references directly in NeutronManager
this patch refactors the code so that these references are held
by a plugin directory.
This allows subprojects and other parts of the Neutron codebase
to use the directory in lieu of the manager. The result is a
leaner, cleaner, and more decoupled code.
Usage pattern [1,2] can be translated to [3,4] respectively.
The more entangled part is in the neutron unit tests, where the
use of the manager can be simplified as mocking is typically
replaced by a call to the directory add_plugin() method. This is
safe as each test case gets its own copy of the plugin directory.
That said, unit tests that look more like API tests and that rely on
the entire plugin machinery, need some tweaking to avoid stumbling
into plugin loading failures.
Due to the massive use of the manager, deprecation warnings are
considered impractical as they cause logs to bloat out of proportion.
Follow-up patches that show how to adopt the directory in neutron
subprojects are tagged with topic:plugin-directory.
Partially-implements: blueprint neutron-lib
When everything works as expected, no-one hardly pays any attention
to this log trace, which accounts for an incredible amount of log data.
This change proposes to emit the router payload only during failures
(when debugging info is needed the most), and furthermore it relocates
it to the L3 agent log files, where it is more pertinent.
As part of making DVR portbinding implementation generic, we rename
dvr portbinding functions as distributed portbinding functions.
In next patch we make dvr logic for port binding generic,
to be useful for all distributed router ports(for example, HA).
In this patch, auto schedule router will be removed from sync_routers,
so that the reported bug can be fixed. And potential race can be avoid
accoridng to 
The result of patch will make the l3 agent can't get the router info
when the router is not bound to the l3 agent. And router in agent will
be removed during the agent processing. This makes sense, since, in
neutron server, the router is not tied to the agent. For DVR, if there
are service port in the agent host, the router info will still be
returned to l3 agent.
Co-Authored-By: John Schwarz <firstname.lastname@example.org>
Currently, router_centralized_snat port can be bound to a host were
l3-agent is in standby state (L3 HA + DVR case). As a result VM without
floating ip is unable to reach external network. This change passes
ha_router_port flag to _ensure_host_set_on_port when called for
Note: this issue is intermittent, without changes in l3_rpc.py
unit test does not fail every time.
Co-Authored-By: Oleg Bondarev <email@example.com>
Routers auto scheduling works when an l3 agent starts and performs
a full sync with neutron server. Neutron server looks for all
unscheduled routers and schedules them to that agent if applicable.
This was broken by commit 0e97feb0f3
which changed full sync logic a bit: now l3 agent requests all ids
of routers scheduled to it first. get_router_ids() didn't call
routers auto scheduling which caused the regression.
This patch adds routers auto scheduling to get_router_ids().
In case there are thousands of routers attached to thousands of
networks, sync_routers request might take a long time and lead to timeout
on agent side, so agent initiate another resync. This may lead to an endless
loop causing server overload and agent not being able to sync state.
This patch makes l3 agent first check how many routers are assigned to
it and then start to fetch routers by chunks.
Initial chunk size is set to 256 but may be decreased dynamically in case
timeouts happen while waiting response from server.
This approach allows to reduce the load on server side and to speed up
resync on agent side by starting processing right after receiving
the first chunk.
The L3 agent needs to know the address scope of the fixed ip of each
floating ip because floating ips are a way to cross scope boundaries.
Without the scope information, there could be ambiguity and no way to
know which scope to send it to.
Partially-Implements: blueprint address-scopes
Without "L3_ROUTER_NAT" in neutron's service_plugins, l3_rpc will
fail when getting l3plugin. So, remove the useless "if" block
A recent change used a keyword argument when it didn't need to,
correct it to fix the multinode DVR job.
End of typical traceback:
in delete_agent_gateway_port(admin_ctx, network_id, host_id=host)
TypeError: delete_floatingip_agent_gateway_port() got multiple
values for keyword argument 'host_id'
Introduced in commit 639f1893dd
Today FloatingIP Agent gateway port is deleted and
re-created for DVR based routers based on floatingip
association and disassociation with VMs on compute
nodes by the plugin.
This introduces lot more strain on the plugin to
create and delete these ports when VMs come up and
get deleted that are associated with FloatingIps.
This patch will introduce an RPC call for the agent
to initiate a agent gateway port delete.
Also the agent will look for the last floatingip that
it manages, and if condition satisfies, the agent will
request the server to remove the FloatingIP Agent
This patch is to address the failure of manual move of
dvr_snat routers from one service node to another.
The entry in the csnat_l3_agent_bindings table is now removed
during the router to agent unbind operation.
Appropriate notification is now sent to the agent to remove
There were other places in the code
that needed to examine the snat binding table to
check if updates were required -
Additionally, schedule_routers() was made optional
within the rpc _notification path since it can
override the manual move being attempted.
Co-Authored-By: Ila Palanisamy <firstname.lastname@example.org>
Co-Authored-By: Oleg Bondarev <email@example.com>
A misnamed function call and execution order issue was causing
update_subnet to fail when a PD enabled subnet received a new CIDR.
This patch fixes the issues, and introduces an rpc api test to
ensure the function works. This includes altering the process_prefix_update
RPC handler to expose the issue to the test.
This patch includes the DB, IPAM & RPC changes needed for the IPv6 Prefix
To enable this feature, the subnetpool_id attribute of subnets has been
modified to allow for a special subnetpool identifier - "prefix_delegation".
1. Admin sets default_ipv6_subnet_pool in neutron.conf to "prefix_delegation"
2. User creates a new IPv6 subnet without a CIDR or subnetpool ID
3. User creates an interface between this subnet and a router with an existing
The agent-side changes will follow in separate patches.
A documentation patch is up for review here:
Video guides for configuring and using this feature are available on
Co-Authored-By: Baodong (Robert) Li <firstname.lastname@example.org>
Partially-Implements: blueprint ipv6-prefix-delegation
The decorator was previously added at the API layer
However some RPC handlers are also dealing with port
create/update/delete operations, like dhcp ports for example.
We need to cover these cases too.
Also remove db retry from ml2 plugin delete_port()
as it's not needed once we retry at the API and RPC layers.
(there is already a unit test on this)
The patch also adds a unit test for checking deadlock
handling during port creation at API layer.
Though it's not directly related to the current fix,
I decided to leave it for regression preventing purposes.
An HA port needs to point to the correct host (where the master router
is running) in order for L2Population to work.
Hence, this patch introduces two fixes:
* When a port owned by an HA router is up we make sure it points to the
right node where the master is running, or a random node if there is
no master yet (This corner case is fixed by the 2nd bullet point).
* When a L3 agent reports it's hosting a master, we need to update the
port binding to the host the master is now running on. This fixes
both routers with no elected master (Yet) and failovers.
This patch also changes the L3 HA failover test to use l2pop.
Note that the test does not pass when using l2pop without this patch.
Co-Authored-By: Assaf Muller <email@example.com>
In the L3 RPC code if the host for a port is not
present, it ends up calling update_port with the
host_id set to None. This does not update the host
id at all because it's treated as an unset attribute
which leads to the same thing happening on the next
iteration. These pointless update calls are expensive
because they involve a semaphore and calls to mechanism
This patch adjusts the logic to only send a port
update if it actually has a host to ensure is on
The get_routers method in the l3 RPC code has a log.debug
statement that formats all of the router data as indented
JSON. This method can be expensive if there are hundreds
of routers being synced and it happens even if debugging
is disabled since the function call result is the parameter
to the debug statement.
This patch adds and leverages a small helper class that takes a
callable and its args and defers calling it until the __str__ method
is called on it when it's actually trying to be rendered to a string.
The L3 agent gets keepalived state change notifications via
a unix domain socket. These events are now batched and
send out as a single RPC to the server. In case the same
router got updated multiple times during the batch period,
only the latest state is sent.
Partially-Implements: blueprint report-ha-router-master
It's mostly a matter of changing imports to a new location.
Non-obvious changes needed:
* pass overwrite= argument to oslo_context since oslo.log reads context
from its thread local store and not local.store from incubator
* don't store context at local.store now that there is no code that
would consume it
* LOG.deprecated() -> versionutils.report_deprecated_feature()
* dropped LOG.audit check from hacking rule since now the method does
* WritableLogger is now located in oslo_log.loggers
Dropped log module from the tree. Also dropped local module that is now
of no use (and obsolete, as per oslo team).
Added versionutils back to openstack-common.conf since now we use the
module directly from neutron code and not just as a dependency of some
other oslo-incubator module.
Note: tempest tests are expected to be broken now, so instead of fixing
all the oslo.log related issues for the subtree in this patch, I only
added TODOs with directions for later fix.