Currently, the dhcp Provisioning of ports is the crucial bottleneck
of that concurrently boot multiple VM.
The root cause is that these ports will be processed one by one by dhcp
agent when they belong to the same network, And the 'Provisioning complete'
port is still blocked other port's processing in other dhcp agents. The
patch aim to optimize the dispatch strategy of the port cast to agent to
improve the Provisioning process.
In server side, I classify messages to multi levels. Especially, I classify
the port_update_end or port_create_end message to two levels, the high-level
message only cast to one agent, the low-level message cast to all agent. In
agent side I put these messages to `resource_processing_queue`, with the queue,
We can delete `_net_lock` and process these messages in order of priority.
Additonally, I modified the `resource_processing_queue` for my demand. I update
`_queue` from LIST to PriorityQueue in `ExclusiveResourceProcessor`, by this
way, we can sort all message which cached in `ExclusiveResourceProcessor` by
The common rpc and exceptions were rehomed into
neutron-lib with . This patch shims those rehomed
modules in neutron to switch over to neutron-lib's
versions under the covers.
To do so:
- The rpc and common exceptions are changed to
reference their counterpart in neutron-lib effectively
swapping the impl over to neutron-lib.
- The fake_notifier is removed from neutron and lib's
version is used instead.
- The rpc tests are removed; they live in lib now.
- A few unit test related changes are required
including changing mock.patch to mock.patch.object,
changing the mock checks for a few UTs as they don't
quite work the same with the shim in place.
- Using the RPC fixture from neutron-lib rather than
that setup in neutron's base test class.
With this shim in place, consumers are effectively using
neutron-lib's RPC plumbing and thus we can move consumers
over to neutron-lib's version at will. Once all
consumers are moved over we can come back and remove
the RPC logic from neutron and follow-up with a consumption
The is_extension_supported function now lives in neutron-lib. This patch
removes the function from neutron and uses lib's version instead.
In reviews we usually check import grouping but it is boring.
By using flake8-import-order plugin, we can avoid this.
It enforces loose checking so it sounds good to use it.
This flake8 plugin is already used in tempest.
Note that flake8-import-order version is pinned to avoid unexpected
breakage of pep8 job.
Setup for unit tests of hacking rules is tweaked to disable
flake8-import-order checks. This extension assumes an actual file exists
and causes hacking rule unit tests.
The callback modules have been available in neutron-lib since commit 
and are ready for consumption.
As the callback registry is implemented with a singleton manager
instance, sync complications can arise ensuring all consumers switch to
lib's implementation at the same time. Therefore this consumption has
been broken down:
1) Shim neutron's callbacks using lib's callback system and remove
existing neutron internals related to callbacks (devref, UTs, etc.).
2) Switch all neutron's callback imports over to neutron-lib's.
3) Have all sub-projects using callbacks move their imports over to use
neutron-lib's callbacks implementation.
4) Remove the callback shims in neutron-lib once sub-projects are moved
over to lib's callbacks.
5) Follow-on patches moving our existing uses of callbacks to the new
event payload model provided by neutron-lib.callback.events
This patch implements #2 from above, moving all neutron's callback
imports to use neutron-lib's callbacks.
There are also a few places in the UT code that still patch callbacks,
we can address those in step #4 which may need .
On profiling the get_devices_details communications between
the agent and the server, a significant amount of time
(60% in my dev env) is being spent in the AFTER_UPDATE events
for the port updates resulting from the port status changes.
One of the major offenders is the native DHCP agent notifier.
On each port update it ends up retrieving the network for the
port, the DHCP agents for the network, and the segments.
This patch addresses this particular issue by adding logic to
skip a DHCP notification if the only thing that changed on the
port was the status. The DHCP agent doesn't do anything based on
the status field so there is no need to update it when this is
the only change.
Neutron Manager is loaded at the very startup of the neutron
server process and with it plugins are loaded and stored for
lookup purposes as their references are widely used across the
entire neutron codebase.
Rather than holding these references directly in NeutronManager
this patch refactors the code so that these references are held
by a plugin directory.
This allows subprojects and other parts of the Neutron codebase
to use the directory in lieu of the manager. The result is a
leaner, cleaner, and more decoupled code.
Usage pattern [1,2] can be translated to [3,4] respectively.
The more entangled part is in the neutron unit tests, where the
use of the manager can be simplified as mocking is typically
replaced by a call to the directory add_plugin() method. This is
safe as each test case gets its own copy of the plugin directory.
That said, unit tests that look more like API tests and that rely on
the entire plugin machinery, need some tweaking to avoid stumbling
into plugin loading failures.
Due to the massive use of the manager, deprecation warnings are
considered impractical as they cause logs to bloat out of proportion.
Follow-up patches that show how to adopt the directory in neutron
subprojects are tagged with topic:plugin-directory.
Partially-implements: blueprint neutron-lib
This patch set is for breaking the circular dependency between
See:https://review.openstack.org/#/c/297887/ for details.
Co-Authored-By: Victor Morales <firstname.lastname@example.org>
Co-Authored-By: Sindhu Devale <email@example.com>
This makes the notifier subscribe to core resource events
and leverage them if they are available. This solves the
issue where internal core plugin calls from service plugins
were not generating DHCP agent notifications.
Sending arp update to each l3 dvr agent one by one on every port
creation is not scalable and causes serious performance degradation
if router is hosted on lots of l3 dvr agents on compute nodes (see
bug report). This increases port creation time and eventually leads
to timeouts in Nova and VMs going to ERROR state.
This patch changes notification to be fanout.
The downside is that with fanout the arp notification will be sent to
each l3 agent, even those not hosting the router. However such agents
will just skip the notification if not hosting the router - this should
be quite cheap.
Since a network can now be broken up by segments, each segment will
need to have its own DHCP Agent. This maintains backwards
compatibility when a network does not have a segment. However,
once a segment is created on a network, a dhcp agent should be
scheduled per segment with a dhcp enabled subnet.
The scheduling happens by filtering the candidate dhcp agents by
the hosts that are bound to that segment.
Partially-Implements: blueprint routed-networks
Once the spinout is undergoing we should perform the eviction.
Partially-implements: blueprint bgp-spinout
In Dnsmasq, the function get_isolated_subnets() returns a list of
subnets in a network if the subnet is not connected to a router.
The implementation of this function checks all the router interface
ports in a cached network object passed from DHCP agent. But the
cached network object is not updated when a subnet is attached to
or detached from a router.
This patch fixes that by adding callback functions in DHCP RPC client
to notify DHCP agent when changes happen on router interfaces.
This patch implements a new agent named "BgpDrAgent". The new agent
will host different BGP speaking drivers and makes the required BGP
peering session/s for neutron. The agent takes the needed "peer/s and
route/s" information from the BGP speaker entity and synchronize the
same to the registerd driver.
For realizing HA, two BgpDrAgents should host the same BGP speaker.
Partially-Implements: blueprint bgp-dynamic-routing
Co-Authored-By: Ryan Tidwell <firstname.lastname@example.org>
Co-Authored-By: Jaume Devesa <email@example.com>
Co-Authored-By: Numan Siddique <firstname.lastname@example.org>
Previously when admin_state_up of an agent is turned to False,
all services on it will be disabled.
This fix makes existing services on agents with admin_state_up
False keep available.
To keep current behavior available the following configuration
If the parameter is True, existing services on agents with admin_state_up
False keep available. No more service will be scheduled to the agent
automatically. But adding a service to the agent manually is available.
i.e. admin_state_up: False means to stop automatic scheduling under the
parameter is True.
The default of the parameter is False (current behavior).
Change the DHCP notifier behavior to schedule a network
to a DHCP agent when a subnet is created rather than
waiting for the first port to be created.
This will reduce the possibility to get a VM port created
and have it send a DHCP request before the DHCP agent is
ready. Before, the network would be scheduled to an agent
as a result of the API call to create the VM port, so the
DHCP port wouldn't be created until after the VM port.
After this patch, the network will have been scheduled to
a DHCP agent before the first VM port is created.
There is still a possibility that the DHCP agent could be
responding so slowly that it doesn't create its port and
activate the dnsmasq instance before the VM sends traffic.
A proper fix will ensure that the dnsmasq instance is
truly ready to serve requests for a new port will require
significantly more code for barriers (either on the subnet
creation, port creation, or the nova boot process) are too
complex to add this late in the cycle.
This patch also eliminates the logic in the n1kv plugin that
was already doing the same thing.
This is achieved by adjusting the log traces and ensuring
that the right log errors are emitted based on the status
of the network.
Also, the patch drastically simplifies the structure of the
notification method and unit tests to increase coverage.
The Neutron service, when under load, may not be able to process
agent heartbeats in a timely fashion. This can result in
agents being erroneously considered inactive. Previously, DHCP
notifications for which active agents could not be found were
silently dropped. This change ensures that notifications for
a given network are sent to agents even if those agents do not
appear to be active.
Additionally, if no enabled dhcp agents can be found for a given
network, an error will be logged. Raising an exception might be
preferable, but has such a large testing impact that it will be
submitted as a separate patch if deemed necessary.