Currently, the dhcp Provisioning of ports is the crucial bottleneck
of that concurrently boot multiple VM.
The root cause is that these ports will be processed one by one by dhcp
agent when they belong to the same network, And the 'Provisioning complete'
port is still blocked other port's processing in other dhcp agents. The
patch aim to optimize the dispatch strategy of the port cast to agent to
improve the Provisioning process.
In server side, I classify messages to multi levels. Especially, I classify
the port_update_end or port_create_end message to two levels, the high-level
message only cast to one agent, the low-level message cast to all agent. In
agent side I put these messages to `resource_processing_queue`, with the queue,
We can delete `_net_lock` and process these messages in order of priority.
Additonally, I modified the `resource_processing_queue` for my demand. I update
`_queue` from LIST to PriorityQueue in `ExclusiveResourceProcessor`, by this
way, we can sort all message which cached in `ExclusiveResourceProcessor` by
priority.
Related-Bug: #1760047
Change-Id: I255caa0571c42fb012fe882259ef181070beccef
The port delete events are not synchronized with network rpc events. This
creates a condition which makes it possible for a port delete event to be
processed just before a previously started network query completes.
The problematic order of operations is as follows:
1) a network is scheduled to an agent; a network rpc is sent to the
agent
2) the agent queries the network data from the server
3) while that query is in progress a port on that network is deleted; a
port rpc is sent to the agent
4) that port delete rpc is received before the network query rpc
completes
5) the port delete results in no action because the port was not present
on the agent
6) the network query finishes and adds the port to the cache (even
though the port has already been deleted)
7) some time passes and a new port is configured with the same IP
address as the port that was deleted in (3)
8) the dhcp host file is corrupted with 2 entries for the same IP
address.
9) dhcp queries for the newest port is rejected because of the duplicate
entry in the dhcp host file.
The solution is to add the network_id to the port_delete_end rpc event
so that the _net_lock(network_id) synchronization point can be acquired
so that it is processed serially with other network related events.
To ensure backwards compatibility with newer agents running against older
servers the determination of which network_id value to use in the lock is
handled using a utility that will fallback to the previous mode of operation
whenever the network_id attribute is not present in the *_delete_end RPC
events. That utility can be removed in the future when it is guaranteed
that the network_id attribute will be present in RPC messages from the
server.
Closes-Bug: #1732456
Change-Id: I735f8b1c9248b12e5feb6cbe970cf67f321e6ebc
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
The neutron.common.topics module was rehomed into neutron-lib with
commit Ie88b84949cbd55a4e7ad06341aab77b286cdc485
This patch consumes it by removing the rehomed module from neutron
and using the module from neutron-lib instead.
NeutronLibImpact
Change-Id: Ia4a4604c259ce862597de80c6deeb3d408bf0e95
The is_extension_supported function now lives in neutron-lib. This patch
removes the function from neutron and uses lib's version instead.
NeutronLibImpact
Change-Id: Iccb72e00f85043b3dff0299df7eb1279655e313e
This patch switches callbacks over to the payload object style events
[1] for BEFORE_RESPONSE and AFTER_REQUEST based notifications. To do
so an APIEventPayload object is used with the publish() method to
pass along the API related data. In addition a few UTs are updated to
work with the changes.
NeutronLibImpact
[1] https://docs.openstack.org/neutron-lib/latest/contributor/callbacks.html#event-payloads
Change-Id: Ibd8559e0db9dcc995abf8937a0cb764b21a18531
The neutron-lib commit I360545b6ee4291547e0c5c8e668ad03d3efa4725 moved
the externally consumed globals from neutron.common.constants into lib.
With the exception of PROVISIONAL_IPV6_PD_PREFIX all other constants
in neutron.common.constants should only be used in neutron, and will
hopefully remain that way. External consumers needing access to other
common constants should move them into lib first.
NeutronLibImpact
Change-Id: Ie4bcffccf626a6e1de84af01f3487feb825f8b65
Changing rpc_api.rst file path from doc/source/devref/rpc_api.rst
to /doc/source/contributor/internals/rpc_api.rst. Because rpc_api.rst
file is located at this path
doc/source/contributor/internals/rpc_api.rst.
Closes-Bug #1722072
Change-Id: Ic243aab9e3428bfec69db61a94b4129cd768e233
Since Pike log messages should not be translated.
This patch removes calls to i18n _LC, _LI, _LE, _LW from
logging logic throughout the code. Translators definition
from neutron._i18n is removed as well.
This patch also removes log translation verification from
ignore directive in tox.ini.
Change-Id: If9aa76fcf121c0e61a7c08088006c5873faee56e
DVR router updates are notified based on where they are hosted.
But in the case of metering rpc notifier the notification to a
specific host was not implemented and also when a call is being
made to routers_updated_on_host, it is being ignored since it only
implements the routers_updated function call.
Change-Id: I39b0d43b14294a8eecf6cba230d948c2c45a0b7a
Closes-Bug: #1682345
The well known service type constants are in
neutron_lib.plugins.constants, but for legacy reasons a few still exist
and are referenced from neutron_lib.constants that we'd like to remove.
This patch switches references over to neutron_lib's plugin constants.
Change-Id: I1861448cec303725b30cef8f42029f467f9e03a3
The callback modules have been available in neutron-lib since commit [1]
and are ready for consumption.
As the callback registry is implemented with a singleton manager
instance, sync complications can arise ensuring all consumers switch to
lib's implementation at the same time. Therefore this consumption has
been broken down:
1) Shim neutron's callbacks using lib's callback system and remove
existing neutron internals related to callbacks (devref, UTs, etc.).
2) Switch all neutron's callback imports over to neutron-lib's.
3) Have all sub-projects using callbacks move their imports over to use
neutron-lib's callbacks implementation.
4) Remove the callback shims in neutron-lib once sub-projects are moved
over to lib's callbacks.
5) Follow-on patches moving our existing uses of callbacks to the new
event payload model provided by neutron-lib.callback.events
This patch implements #2 from above, moving all neutron's callback
imports to use neutron-lib's callbacks.
There are also a few places in the UT code that still patch callbacks,
we can address those in step #4 which may need [2].
NeutronLibImpact
[1] fea8bb64ba7ff52632c2bd3e3298eaedf623ee4f
[2] I9966c90e3f90552b41ed84a68b19f3e540426432
Change-Id: I8dae56f0f5c009bdf3e8ebfa1b360756216ab886
according to https://wiki.openstack.org/wiki/Python3, now we should avoid
using six.iteritems and replace it with dict.items.
Change-Id: I58a399baa2275f280acc0e6d649f81838648ce5c
Closes-Bug: #1680761
On profiling the get_devices_details communications between
the agent and the server, a significant amount of time
(60% in my dev env) is being spent in the AFTER_UPDATE events
for the port updates resulting from the port status changes.
One of the major offenders is the native DHCP agent notifier.
On each port update it ends up retrieving the network for the
port, the DHCP agents for the network, and the segments.
This patch addresses this particular issue by adding logic to
skip a DHCP notification if the only thing that changed on the
port was the status. The DHCP agent doesn't do anything based on
the status field so there is no need to update it when this is
the only change.
Change-Id: I948132924ec5021a9db78cf17efbba96b2500e8e
Partial-Bug: #1665215
Neutron Manager is loaded at the very startup of the neutron
server process and with it plugins are loaded and stored for
lookup purposes as their references are widely used across the
entire neutron codebase.
Rather than holding these references directly in NeutronManager
this patch refactors the code so that these references are held
by a plugin directory.
This allows subprojects and other parts of the Neutron codebase
to use the directory in lieu of the manager. The result is a
leaner, cleaner, and more decoupled code.
Usage pattern [1,2] can be translated to [3,4] respectively.
[1] manager.NeutronManager.get_service_plugins()[FOO]
[2] manager.NeutronManager.get_plugin()
[3] directory.get_plugin(FOO)
[4] directory.get_plugin()
The more entangled part is in the neutron unit tests, where the
use of the manager can be simplified as mocking is typically
replaced by a call to the directory add_plugin() method. This is
safe as each test case gets its own copy of the plugin directory.
That said, unit tests that look more like API tests and that rely on
the entire plugin machinery, need some tweaking to avoid stumbling
into plugin loading failures.
Due to the massive use of the manager, deprecation warnings are
considered impractical as they cause logs to bloat out of proportion.
Follow-up patches that show how to adopt the directory in neutron
subprojects are tagged with topic:plugin-directory.
NeutronLibImpact
Partially-implements: blueprint neutron-lib
Change-Id: I7331e914234c5f0b7abe836604fdd7e4067551cf
This makes the notifier subscribe to core resource events
and leverage them if they are available. This solves the
issue where internal core plugin calls from service plugins
were not generating DHCP agent notifications.
Closes-Bug: #1621345
Change-Id: I607635601caff0322fd0c80c9023f5c4f663ca25
Sending arp update to each l3 dvr agent one by one on every port
creation is not scalable and causes serious performance degradation
if router is hosted on lots of l3 dvr agents on compute nodes (see
bug report). This increases port creation time and eventually leads
to timeouts in Nova and VMs going to ERROR state.
This patch changes notification to be fanout.
The downside is that with fanout the arp notification will be sent to
each l3 agent, even those not hosting the router. However such agents
will just skip the notification if not hosting the router - this should
be quite cheap.
Closes-Bug: #1614452
Change-Id: I1fb533d7804b131f709b790fc730ed7b97cb5499
Bug 1591766 unveiled an issue where calling the plugin API does not trigger
DHCP notifications. This is required by the auto-allocated-topology service
plugin that calls core_plugin.update_network(), and expect notifications
to be sent out on state changes. To accomplish this, the logic has been
encapsulated in the DHCP module, and leveraged via callback mechanisms.
For this reason, new events have been introduced, AFTER_REQUEST, and
BEFORE_RESPONSE. The latter in particular is the one needed to hook up
dhcp notifications in order to preserve backward compatibility.
More precisely, core plugins that use DHCP as is or implement their own,
(with or without an agent) should already instantiate their own notifier,
and if they do not, this should be rectified.
A search on codesearch.openstack.org reveals that out-of-tree plugins
already specify their own notifiers, and the default initialization is
clearly redundant now.
Related-bug: #1591766
Change-Id: I7440becb6d30af7159ecaeba09d7a28eceb71bea
Since a network can now be broken up by segments, each segment will
need to have its own DHCP Agent. This maintains backwards
compatibility when a network does not have a segment. However,
once a segment is created on a network, a dhcp agent should be
scheduled per segment with a dhcp enabled subnet.
The scheduling happens by filtering the candidate dhcp agents by
the hosts that are bound to that segment.
Partially-Implements: blueprint routed-networks
Change-Id: If73211978e14b7533a1213cfb8c2c155a408f19e
This patch introduces retry(func, max_attempts) method
which wraps the original func in such a way,
that if execution results in MessagingException,
the given function will be retried for max_attempts
till success or MessagingException is raised.
The function is in the utils module and can be reused
by different agentnotifiers if necessary.
Change-Id: I0d0c17e500e44c1a17438c29a0e76a9ef00872e8
Once the spinout is undergoing we should perform the eviction.
Partially-implements: blueprint bgp-spinout
Depends-on: I8be510153edbc496575cde34943ca4c56645e0fb
Change-Id: I20b6ddd37d10eae70e8294d578e53137c0f866fe
In Dnsmasq, the function get_isolated_subnets() returns a list of
subnets in a network if the subnet is not connected to a router.
The implementation of this function checks all the router interface
ports in a cached network object passed from DHCP agent. But the
cached network object is not updated when a subnet is attached to
or detached from a router.
This patch fixes that by adding callback functions in DHCP RPC client
to notify DHCP agent when changes happen on router interfaces.
Closes-Bug: #1554825
Change-Id: Ifaab163f49e0d1c5cb3eba6efa96214104647e4e
Python 3 deprecated the logger.warn method, see:
https://docs.python.org/3/library/logging.html#logging.warning
so we prefer to use warning to avoid DeprecationWarning.
Closes-Bugs: #1529913
Change-Id: Icc01ce5fbd10880440cf75a2e0833394783464a0
Co-Authored-By: Gary Kotton <gkotton@vmware.com>
This patch implements a new agent named "BgpDrAgent". The new agent
will host different BGP speaking drivers and makes the required BGP
peering session/s for neutron. The agent takes the needed "peer/s and
route/s" information from the BGP speaker entity and synchronize the
same to the registerd driver.
For realizing HA, two BgpDrAgents should host the same BGP speaker.
Partially-Implements: blueprint bgp-dynamic-routing
Co-Authored-By: Ryan Tidwell <ryan.tidwell@hpe.com>
Co-Authored-By: Jaume Devesa <devvesa@gmail.com>
Co-Authored-By: Numan Siddique <nusiddiq@redhat.com>
Change-Id: I3217795bdd0fa2d9d4b39274f4f95fc013c8d29d
When we remove explicit binding of dvr routers to compute nodes
we'll need a way to know all hosts where a dvr router should be
hosted in order to send notifications.
This patch adds such a query and updates l3 rpc notifier to use it.
Partially implements blueprint improve-dvr-l3-agent-binding
Change-Id: Ic6680bb42455189f14c4c913b1d4adeebda83180
No reason to try get enabled_agents and send notification
if there are no any agents associated with network.
Closes-bug: #1522471
Change-Id: I111967415ce600253fc679837d03c9cd75f19656
- This does NOT break other projects that rely on neutron.i18n,
as this change includes a debtcollector shim to maintain those
older entry points, until they can migrate.
- Also updates _i18n.py to the latest pattern defined by oslo_i18n
- Guidance and template are from the reference:
http://docs.openstack.org/developer/oslo.i18n/usage.html
Partially-Closes-Bug: #1519493
Change-Id: I1aa3a5fd837d9156da4643a367013c869ed8bf9d
Currently router_added (and other) notifications are sent
to agents with an RPC cast() method which does not ensure that
the message is actually delivered to the recipient.
If the message is lost (due to instability of messaging system
in some failover scenarios for example) neither server nor agent
will be aware of that and router will be "lost" till next agent
resync. Resync will only happen in case of errors on agent side
or restart.
The fix makes server use call() to notify agents about added routers
thus ensuring no routers will be lost.
This also unifies reschedule_router() method to avoid code duplication
between legacy and dvr agent schedulers.
Closes-Bug: #1482630
Related-Bug #1404743
Change-Id: Id08764ba837d8f47a28649d081a5876797fe369e
The fipnamespace is associated with an external network
on a given node. In the case of DVR there is just one
single FIP namespace for a given node.
We have seen some race conditions in the agent for creation
and deletion of the fip namespace. See the bug report for
details on the failure.
So in order to address this race condition and make the
code more stable, we will be cleaning up the fip namespace
only when an external network is removed.
The server will be sending a rpc notification message to
the agent to cleanup the fip namespace when the external
net is removed.
This patch address the above mentioned issue by not constantly
deleting and creating the fip namespace.
Closes-Bug: #1501873
Change-Id: I86869f66d4afffad7db09942578b1a456a9bd418
Currently when floating ip is created, a lot of useless action
is happening: floating ip router is scheduled, all l3 agents where
router is scheduled are notified about router update, all agents
request full router info from server. All this becomes a big
performance problem at scale with lots of compute nodes.
In fact on (associated) Floating IP creation we really need
to notify specific l3 agent on compute node where associated
VM port is located and do not need to schedule router and
bother other agents where rourter is scheduled. This should
significally decrease unneeded load on neutron server at scale.
Partial-Bug: #1486828
Change-Id: I0cbe8c51c3714e6cbdc48ca37135b783f8014905
This patch is to address the failure of manual move of
dvr_snat routers from one service node to another.
The entry in the csnat_l3_agent_bindings table is now removed
during the router to agent unbind operation.
Appropriate notification is now sent to the agent to remove
snat/qrouter namespace.
There were other places in the code
that needed to examine the snat binding table to
check if updates were required -
validate_agent_router_combination() and
check_agent_router_scheduling_needed().
Additionally, schedule_routers() was made optional
within the rpc _notification path since it can
override the manual move being attempted.
Change-Id: Iac9598eb79f455c4ef3d3243a96bed524e3d2f7c
Closes-Bug: #1369721
Co-Authored-By: Ila Palanisamy <ilavajuthy.palanisamy@hp.com>
Co-Authored-By: Oleg Bondarev <obondarev@mirantis.com>
This cannot be done in Python 3, where dict.keys() returns an iterator. We need
to cast the result of dict.keys() to a list first.
Change-Id: I28986aefb720b4513e3eee9ba0909f79d1dc9695
Blueprint: neutron-python3
This also adds a check to neutron/hacking/checks.py that should catch this
error in the future.
Blueprint: neutron-python3
Change-Id: Ie7b833ffa173772d39b85ee3ecaddace18e1274f
Previously when admin_state_up of an agent is turned to False,
all services on it will be disabled.
This fix makes existing services on agents with admin_state_up
False keep available.
To keep current behavior available the following configuration
parameter added.
* enable_services_on_agents_with_admin_state_down
If the parameter is True, existing services on agents with admin_state_up
False keep available. No more service will be scheduled to the agent
automatically. But adding a service to the agent manually is available.
i.e. admin_state_up: False means to stop automatic scheduling under the
parameter is True.
The default of the parameter is False (current behavior).
Change-Id: Ifba606a5c1f3f07d717c7695a7a64e16238c2057
Closes-Bug: #1408488
Now we send all labels and rules per rule create/delete
and rebuild whole iptables chains.
In this patch we send only affected rule and create/
delete only this rule from iptables.
Change-Id: I58ebd8d810c62980c09a340ee1680be17c12b74a
Closes-Bug: #1400280
Change the DHCP notifier behavior to schedule a network
to a DHCP agent when a subnet is created rather than
waiting for the first port to be created.
This will reduce the possibility to get a VM port created
and have it send a DHCP request before the DHCP agent is
ready. Before, the network would be scheduled to an agent
as a result of the API call to create the VM port, so the
DHCP port wouldn't be created until after the VM port.
After this patch, the network will have been scheduled to
a DHCP agent before the first VM port is created.
There is still a possibility that the DHCP agent could be
responding so slowly that it doesn't create its port and
activate the dnsmasq instance before the VM sends traffic.
A proper fix will ensure that the dnsmasq instance is
truly ready to serve requests for a new port will require
significantly more code for barriers (either on the subnet
creation, port creation, or the nova boot process) are too
complex to add this late in the cycle.
This patch also eliminates the logic in the n1kv plugin that
was already doing the same thing.
Closes-Bug: #1431105
Change-Id: I1c1caed0fdda6b801375a07f9252a9127058a07e
It's mostly a matter of changing imports to a new location.
Non-obvious changes needed:
* pass overwrite= argument to oslo_context since oslo.log reads context
from its thread local store and not local.store from incubator
* don't store context at local.store now that there is no code that
would consume it
* LOG.deprecated() -> versionutils.report_deprecated_feature()
* dropped LOG.audit check from hacking rule since now the method does
not exist
* WritableLogger is now located in oslo_log.loggers
Dropped log module from the tree. Also dropped local module that is now
of no use (and obsolete, as per oslo team).
Added versionutils back to openstack-common.conf since now we use the
module directly from neutron code and not just as a dependency of some
other oslo-incubator module.
Note: tempest tests are expected to be broken now, so instead of fixing
all the oslo.log related issues for the subtree in this patch, I only
added TODOs with directions for later fix.
Closes-Bug: #1425013
Change-Id: I310e059a815377579de6bb2aa204de168e72571e
Oslo project decided to move away from using oslo.* namespace for all their
libraries [1], so we should migrate to new import path.
This patch applies new paths for:
- oslo.config
- oslo.db
- oslo.i18n
- oslo.messaging
- oslo.middleware
- oslo.rootwrap
- oslo.serialization
- oslo.utils
Added hacking check to enforce new import paths for all oslo libraries.
Updated setup.cfg entry points.
We'll cleanup old imports from oslo-incubator modules on demand or
if/when oslo officially deprecates old namespace in one of the next
cycles.
[1]: https://blueprints.launchpad.net/oslo-incubator/+spec/drop-namespace-packages
Depends-On: https://review.openstack.org/#/c/147248/
Depends-On: https://review.openstack.org/#/c/152292/
Depends-On: https://review.openstack.org/#/c/147240/
Closes-Bug: #1409733
Change-Id: If0dce29a0980206ace9866112be529436194d47e
There is an rpc interface defined for the Neutron plugin to be able to
execute methods in the DHCP agent. Provide docstring pointers in the
client and server side that tells you where to find the other side of
the interface.
No namespace usage is needed here. This API is the only one exposed
via the DHCP agent, so the default namespace used now is fine.
The DhcpAgent class was updated to explicitly define the
messaging.Target(). Previously it was using the equivalent one
defined in the Manager base class. Having it specified here makes it
more obvious that this is an rpc endpoint, and also provides the
obvious place that must have the version updated if the interface is
changed.
Part of blueprint rpc-docs-and-namespaces.
Change-Id: I4a6eb8dacb9ba01f329a5d5961dc0e0ee6f780ba
Remove usage of the RpcProxy compatibility class from the
DhcpAgentNotifyAPI class. The equivalent oslo.messaging APIs are now
used instead.
Part of blueprint drop-rpc-compat.
Change-Id: Ib658a0d67da1af3b009bc6df9a7c8ec08c04897b