Currently, the dhcp Provisioning of ports is the crucial bottleneck
of that concurrently boot multiple VM.
The root cause is that these ports will be processed one by one by dhcp
agent when they belong to the same network, And the 'Provisioning complete'
port is still blocked other port's processing in other dhcp agents. The
patch aim to optimize the dispatch strategy of the port cast to agent to
improve the Provisioning process.
In server side, I classify messages to multi levels. Especially, I classify
the port_update_end or port_create_end message to two levels, the high-level
message only cast to one agent, the low-level message cast to all agent. In
agent side I put these messages to `resource_processing_queue`, with the queue,
We can delete `_net_lock` and process these messages in order of priority.
Additonally, I modified the `resource_processing_queue` for my demand. I update
`_queue` from LIST to PriorityQueue in `ExclusiveResourceProcessor`, by this
way, we can sort all message which cached in `ExclusiveResourceProcessor` by
priority.
Related-Bug: #1760047
Change-Id: I255caa0571c42fb012fe882259ef181070beccef
The common rpc and exceptions were rehomed into
neutron-lib with [1]. This patch shims those rehomed
modules in neutron to switch over to neutron-lib's
versions under the covers.
To do so:
- The rpc and common exceptions are changed to
reference their counterpart in neutron-lib effectively
swapping the impl over to neutron-lib.
- The fake_notifier is removed from neutron and lib's
version is used instead.
- The rpc tests are removed; they live in lib now.
- A few unit test related changes are required
including changing mock.patch to mock.patch.object,
changing the mock checks for a few UTs as they don't
quite work the same with the shim in place.
- Using the RPC fixture from neutron-lib rather than
that setup in neutron's base test class.
With this shim in place, consumers are effectively using
neutron-lib's RPC plumbing and thus we can move consumers
over to neutron-lib's version at will. Once all
consumers are moved over we can come back and remove
the RPC logic from neutron and follow-up with a consumption
patch.
NeutronLibImpact
[1] https://review.openstack.org/#/c/319328/
Change-Id: I87685be8764a152ac24366f13e190de9d4f6f8d8
The current "agent remove-add" method to reschedule a router is not
friendly for DVR. In DVR cases, the old agent may still need to keep the
router in a non-master role to service any VM ports that exit on that
host while the new agent handles the master role of the router.
This patch proposes to send "update" instead of "remove" notification
for DVR rescheduling, which aligns with the retain_router check in
remove_router_from_l3_agent as well.
Closes-Bug: #1781179
Change-Id: I23431f9f46f72e6bce91f3d1fb0ed328d55930fb
When a network is removed from a dhcp agent, in some scenarios if the
agent releases its port concurrently, there is chance that the
unscheduling will fail due to that the target port is not found.
Catch the PortNotFound exception as an expected error under this type of
concurrent circumstance and logs it to move forward.
Closes-Bug: #1775496
Change-Id: Ib51b364f6ced0de7685c8ee07c1d292308d919f5
Signed-off-by: Kailun Qin <kailun.qin@intel.com>
The dhcpagentscheduler extension's API definition was rehomed into
neutron-lib with https://review.openstack.org/#/c/520751/
This patch consumes it by using the API definition and it's constants
where applicable.
NeutronLibImpact
Change-Id: Ib0c97268f01885f6daacb3d1cdbbd94bb6020d60
During listing router_ids on host it is possible that on some hosts
there are no L3 agents.
In such case AgentNotFoundByTypeHost exception is raised in
neutron.db.agents_db module in _get_agent_by_type_and_host() method.
Now this exception is properly handled during listing routers on host.
Change-Id: Ia5ff1b57ef63c98b4ada4f2d46c45336e413be3d
Closes-Bug: #1737917
Agent object has been merged [1].
This patch uses Agent object in agents_db and test_agents_db.
We also introduce a new function (get_agents_object) and keep
the old function (get_agents_db) for backward compatibility.
[1] https://review.openstack.org/#/c/297887/
Co-Authored-By: Nguyen Phuong An <AnNP@vn.fujitsu.com>
Change-Id: I4c4283cb1aa05d52dca00cc249e094ea7d55b1d3
Partially-Implements: blueprint adopt-oslo-versioned-objects-for-db
Today our unit test code uses various ways to "patch" the global
RESOURCE_ATTRIBUTE_MAP as well as extension specific maps in some cases.
This patch consolidates such patching whereby tests should use neutron's
AttributeMapMemento in their setup() chain (only once) if they update
the global map and they should individually handle backup/restore of per
extension map updates. This change will simplify the code and make it
easier to phase-in API definition usage with neutron-lib where we have
some as API definitions and others not. Longer term the
AttributeMapMemento will be replace with neutron-lib's fixture as we
move all extension maps to API definitions in neutron-lib.
Change-Id: I2586f0b11b107d7f57214a0d65bcf7c38a5f0ebb
The neutron-lib commit I360545b6ee4291547e0c5c8e668ad03d3efa4725 moved
the externally consumed globals from neutron.common.constants into lib.
With the exception of PROVISIONAL_IPV6_PD_PREFIX all other constants
in neutron.common.constants should only be used in neutron, and will
hopefully remain that way. External consumers needing access to other
common constants should move them into lib first.
NeutronLibImpact
Change-Id: Ie4bcffccf626a6e1de84af01f3487feb825f8b65
The well known service type constants are in
neutron_lib.plugins.constants, but for legacy reasons a few still exist
and are referenced from neutron_lib.constants that we'd like to remove.
This patch switches references over to neutron_lib's plugin constants.
Change-Id: I1861448cec303725b30cef8f42029f467f9e03a3
This patch integrates the OVO created for RouterL3AgentBinding into
the code base.
Change-Id: I0af665a97087ad72431d58f04089a804088ef005
Partially-Implements: blueprint adopt-oslo-versioned-objects-for-db
This prevents all of the warnings in unit tests about
no DHCP agents being available for scheduling during
network and port creation.
TrivialFix
Change-Id: I06cb626496866b90f60b406d1141ecad6e1a47e1
Neutron Manager is loaded at the very startup of the neutron
server process and with it plugins are loaded and stored for
lookup purposes as their references are widely used across the
entire neutron codebase.
Rather than holding these references directly in NeutronManager
this patch refactors the code so that these references are held
by a plugin directory.
This allows subprojects and other parts of the Neutron codebase
to use the directory in lieu of the manager. The result is a
leaner, cleaner, and more decoupled code.
Usage pattern [1,2] can be translated to [3,4] respectively.
[1] manager.NeutronManager.get_service_plugins()[FOO]
[2] manager.NeutronManager.get_plugin()
[3] directory.get_plugin(FOO)
[4] directory.get_plugin()
The more entangled part is in the neutron unit tests, where the
use of the manager can be simplified as mocking is typically
replaced by a call to the directory add_plugin() method. This is
safe as each test case gets its own copy of the plugin directory.
That said, unit tests that look more like API tests and that rely on
the entire plugin machinery, need some tweaking to avoid stumbling
into plugin loading failures.
Due to the massive use of the manager, deprecation warnings are
considered impractical as they cause logs to bloat out of proportion.
Follow-up patches that show how to adopt the directory in neutron
subprojects are tagged with topic:plugin-directory.
NeutronLibImpact
Partially-implements: blueprint neutron-lib
Change-Id: I7331e914234c5f0b7abe836604fdd7e4067551cf
Renames the AgentStatusCheckWorker class to PeriodicWorker
and moves it into the worker module since there isn't anything
agent-specific about it and it can be used for other periodic
jobs server side.
TrivialFix
Change-Id: Ic7a55ef534f64e6bfc60ae38bb0e139a0078510b
This worker would fail to start again if stop() or reset()
was called on it because of some bad conditional logic. This
doesn't appear to impact the current in-tree use case but
it should behave correctly.
Closes-Bug: #1641788
Change-Id: Id6334c1ef6c99bd112ada31e8fe3746d7e035356
This patch set is for breaking the circular dependency between
Agent/AgentVersionedObject.
See:https://review.openstack.org/#/c/297887/ for details.
Change-Id: I7be4ce2513e49e6da46a7bdffb8538613f0be7c7
Partial-Bug: #1597913
Co-Authored-By: Victor Morales <victor.morales@intel.com>
Co-Authored-By: Sindhu Devale <sindhu.devale@intel.com>
As there would be issue of cyclic imports while
implementation of objects for l3agentbinding which has
db models definition and mixins in same file, this patch will
relocate l3agentbinding models.
Change-Id: Idef2fe3e16b245da849e2d29c5578e5f5d081dc4
Partial-Bug: #1597913
This makes the notifier subscribe to core resource events
and leverage them if they are available. This solves the
issue where internal core plugin calls from service plugins
were not generating DHCP agent notifications.
Closes-Bug: #1621345
Change-Id: I607635601caff0322fd0c80c9023f5c4f663ca25
The callback manager was indexing callbacks based on
a callback ID generated from oslo utils reflection.
This presented two problems.
The first was that in py34 get_callable_name would use
__qualname__ which returns the class name the function
is defined on rather than the class of the object itself.
So two classes defined in the same module inheriting from
the same parent class could not both subscribe a method
defined on the parent.
The second more general problem is that two objects which
are instances of the same class cannot subscribe the same
method because they have the same ID.
This adds the hash of the method to the ID to prevent these
issues. The hash by itself could have been used but it's not
very user-friendly so the name is left on for nice log
messages.
Change-Id: Iff1ca8c4ddb58ca5907d21fa0de7f0f292b6fc0e
Stevedore documentation suggest that full import paths are not supposed
to be user visible. Since unit tests emulate users when configuring
oslo.config, we better off relying on well known plugin aliases than
internal details.
For in-tree that may be not a big deal, but with it we set a bad example
for third parties that may later find their tests broken eg. when we
decide to move code around.
TrivialFix
Change-Id: I7bd036ac3df7e7f4c678356d0a793e7d38599dda
This is the initial support for flavors and multiple service
providers with the built-in L3 service plugin.
This patch handles a few key components:
* Adds an optional flavor_id to the router data model
* Adds a new driver controller that performs the following tasks:
* Loads up the configured drivers and 4 default drivers representing
the current matrix of ha/dvr options (single node, ha, dvr, and ha+dvr)
* Associates every router with a driver based on ha/dvr attributes
or the flavor_id if specified
Note that the current drivers are very limited because they don't do anything.
All of the complex logic for the in-tree drivers is still tied up in the giant
mixin the service plugin inherits. Breaking that apart will be in follow-up
patches.
Partially-Implements: blueprint multi-l3-backends
Change-Id: Idce75bf0fc1375dcbbff9b9803fd2fe97d158cff
In this patch, auto schedule router will be removed from sync_routers,
so that the reported bug can be fixed. And potential race can be avoid
accoridng to [1]
The result of patch will make the l3 agent can't get the router info
when the router is not bound to the l3 agent. And router in agent will
be removed during the agent processing. This makes sense, since, in
neutron server, the router is not tied to the agent. For DVR, if there
are service port in the agent host, the router info will still be
returned to l3 agent.
[1] https://review.openstack.org/#/c/317949/
Change-Id: Id0a8cf7537fefd626df06064f915d2de7c1680c6
Co-Authored-By: John Schwarz <jschwarz@redhat.com>
Closes-Bug: #1593653
Bug 1591766 unveiled an issue where calling the plugin API does not trigger
DHCP notifications. This is required by the auto-allocated-topology service
plugin that calls core_plugin.update_network(), and expect notifications
to be sent out on state changes. To accomplish this, the logic has been
encapsulated in the DHCP module, and leveraged via callback mechanisms.
For this reason, new events have been introduced, AFTER_REQUEST, and
BEFORE_RESPONSE. The latter in particular is the one needed to hook up
dhcp notifications in order to preserve backward compatibility.
More precisely, core plugins that use DHCP as is or implement their own,
(with or without an agent) should already instantiate their own notifier,
and if they do not, this should be rectified.
A search on codesearch.openstack.org reveals that out-of-tree plugins
already specify their own notifiers, and the default initialization is
clearly redundant now.
Related-bug: #1591766
Change-Id: I7440becb6d30af7159ecaeba09d7a28eceb71bea
It was deprecated at [1].
Remove the deprecated config 'router_id' and its related tests.
[1] https://review.openstack.org/#/c/248498
DocImpact: All references of 'router_id' configuration option
and its description should be removed from the docs.
UpgradeImpact: Remove 'router_id' configuration option from the
l3_agent.ini.
Change-Id: Ic9420191e8c1a333e4dcc0b70411591b8573ec7c
Closes-Bug: #1594711
This reverts commit b1cdba1696
Original patch was reverted because it broke neutron plugin's
backward compatibility and needed more work.
This patch fixes that problems:
1) original behaviour of add_agent_status_check,
start_periodic_l3_agent_status_check and
start_periodic_dhcp_agent_status_check methods is deprecated but kept
for using in third part plugins for backward compatibility
2) new add_agent_status_check_worker, add_periodic_l3_agent_status_check
and add_periodic_dhcp_agent_status_check method are implemented
instead and are used for implementing plugins in neutron codebase
Closes-Bug: #1569404
Change-Id: I3a32a95489831f0d862930384309eefdc881d8f6
get_active_networks() has no longer been used since havana. This patch
migrates the existing tests calling this method to use the current method
that is being used: get_active_networks_info().
Change-Id: Ibd6871b4884a2694983c4a572e00647674df67ae
This patch introduces retry(func, max_attempts) method
which wraps the original func in such a way,
that if execution results in MessagingException,
the given function will be retried for max_attempts
till success or MessagingException is raised.
The function is in the utils module and can be reused
by different agentnotifiers if necessary.
Change-Id: I0d0c17e500e44c1a17438c29a0e76a9ef00872e8
In case of intermittent DB failures router and network auto-rescheduling
tasks may fail due to error on fetching down bindings from db.
Need to put this queries under try/except to prevent unexpected exit.
Closes-Bug: #1546110
Change-Id: Id48e899a5b3d906c6d1da4d03923bdda2681cd92
Currently neutron DCHP scheduler assumes that that every server running
a dhcp-agent can reach every network. Typically the scheduler can
wrongly schedule a vlan network on a dhcp-agent that has no reachability
to the network it's supposed to serve (ex: network's physical_network
not supported).
Typically such usecase can append if:
* physical_networks are dedicated to a specific service and we don't
want to mix dnsmasqs related to different services (for
isolation/configuration purpose),
* physical_networks are dedicated to a specific rack (see example
diagram http://i.imgur.com/NTBxRxk.png), the rack interconnection can
be handled outside of neutron or inside when routed-networks will be
supported.
This change makes the DHCP scheduler network reachability aware by
querying plugin's filter_hosts_with_network_access method.
This change provides an implementation for ML2 plugin delegating host
filtering to its mechanism drivers: it aggregates the filtering done by
each mechanism or disables filtering if any mechanism doesn't overload
default mechanism implementation[1] (for backward compatibility with
out-of-tree mechanisms). Every in-tree mechanism overloads the default
implementation: OVS/LB/SRIOV mechanisms use their agent mapping to filter
hosts, l2pop/test/logger ones return empty set (they provide to "L2
capability").
This change provides a default implementation[2] for other plugins
filtering nothing (for backward compatibility), they can overload it to
provide their own implementation.
Such host filtering has some limitations if a dhcp-agent is on a host
handled by multiple l2 mechanisms with one mechanism claiming network
reachability but not the one handling dhcp-agent ports. Indeed the
host is able to reach the network but not dhcp-agent ports! Such
limitation will be handled in a follow-up change using host+vif_type
filtering.
[1] neutron.plugin.ml2.driver_api.MechanismDriver.\
filter_hosts_with_network_access
[2] neutron.db.agents_db.AgentDbMixin.filter_hosts_with_network_access
Closes-Bug: #1478100
Co-Authored-By: Cedric Brandily <zzelle@gmail.com>
Change-Id: I0501d47404c8adbec4bccb84ac5980e045da68b3
After dvr scheduling refactoring this method is only used in
l3_dvrscheduler_db. This patch also makes it private method.
Change-Id: Iac19d1244c63ec1b71360f9dd3b09c3b131e0ec8
Routers auto scheduling works when an l3 agent starts and performs
a full sync with neutron server. Neutron server looks for all
unscheduled routers and schedules them to that agent if applicable.
This was broken by commit 0e97feb0f3
which changed full sync logic a bit: now l3 agent requests all ids
of routers scheduled to it first. get_router_ids() didn't call
routers auto scheduling which caused the regression.
This patch adds routers auto scheduling to get_router_ids().
Closes-Bug: #1541348
Change-Id: If6d4e7b3a4839c93296985e169631e5583d9fa12
Fix params order to correspond to real signature:
assertEqual(expected, actual)
Change-Id: I722b998f6eae47076f3d10213073296a0a9a2081
Closes-Bug: #1277104
As described in the spec there is no need to explicitly bind
DVR router to each l3 agent running on compute nodes where
there are dvr serviceable ports - this brings complexity to the code,
makes it less readable and very hard to maintain (one could see how many
bugs were filed and fixed regarding dvr scheduling stuff already).
Also this brings scalability problems as time needed for router scheduling
grows linearly with the number of compute nodes.
The idea is to align dvr scheduling with legacy router scheduling:
only schedule SNAT portion of the router and use DB queries whenever
we need to know which compute nodes should host the router.
Implements blueprint improve-dvr-l3-agent-binding
Change-Id: I82c8d256c56bb16cdc1b1232ebb660d09909f9c6
If the required extensions are missing, we currently log an error
that is going to be practically ignored. That said, the unfulfilled
requirement will most definitely going to lead to other failures,
so we might as well fail fast.
This patch also cleans up some <barf>dns-integration nonsense</barf>
within the ML2 framework: the extension must not be declared statically
as it's being loaded by the extension manager, and this fixes the lousy
unit tests we have to live with. As for the db base plugin, some cleanup
is still overdue, but it will have to be taken care of in a follow-up
patch.
Closes-bug: #1538623
Change-Id: Id50eeb52c5d209170042b48821a29af3421c2f5c
This removes check_ports_on_host_and_subnet which mostly duplicates what
check_ports_exist_on_l3agent is doing.
Also rename check_ports_exist_on_l3agent to check_dvr_serviceable_ports_on_host
for more clarity.
Closes-Bug: #1524291
Change-Id: Ie02c68279c2bbafffc7be4d9a81fe25a0e983d58
The check of the tenant done in the method _get_tenant_id_for_create()
is already did by the Neutron Controller in prepare_request_body(),
with a call to attributes.populate_tenant_id().
Moreover, when the Controller processes a "create" requests, it
will add the 'tenant_id' to the resource dict.
Thus, _get_tenant_id_for_create() can be deleted.
Calls to this method are replaced by the res['tenant_id'].
Changes have to be done in UT to explicitly add the tenant_id while
creating resources, since the UT framework is bypassing the controller code
that automatically adds the tenant_id to the resource.
Co-Authored-By: Hong Hui Xiao <xiaohhui@cn.ibm.com>
Closes-Bug: #1513825
Change-Id: Icea06dc81344e1120bdf986a97a6b1094bbb765e
Depends-On: I31022e9230fc5404c6a94edabbb08d2b079c3a09
Depends-On: Iea3f014ef17a1e1b755cd2efe99afd1a36ebbc6a
Depends-On: I604602d023e0cbf7f6591149f914d73217d7a574
In check_ports_exist_on_l3agent we have an optimization fix
that checks for the subnets associated with the router and if
the subnets have dhcp enabled we go ahead and create the
router if it is a dvr_snat agent.
This was introduced in liberty since we saw some race condition
in the gate with single node failures.
It may not be completely right, since the dhcp agents can
run on non dvr_snat nodes as well.
Based on recommendation from the reviews, and a recent upstream
patch that sends notification on port create, we would want to
remove this and monitor the situation.
This would reduce the load on check_ports_exist_on_l3agent for
non dvr_snat nodes.
Depends-On: I40b8684f6ec9ddd31753f7bbbdb364d1c0ec838a
Related-Bug: #1513678
Change-Id: I0f50dc1101b2013caf03a64a4f48e2d03ea87b26
If there are a lot of routers scheduled to l3 agent,
rescheduling all of them one by one might take quite a long
period of time - during that time some agents might get back
online. In this case we should skip rescheduling.
Closes-Bug: #1522436
Change-Id: If6df1f2878ea3379e8d2dba431de3e358e40189d