The RPC events received by the client are stored in a queue and
processed depending on a priority. Before this patch, a pool of
threads was spawned to process the received events. However, this
model does not improve the processing speed and could lead to potential
thread concurrencies not considered. Note that the event processing
methods are thread safe against the sync method but not among them.
This patch reduce the number of concurrent threads processing the
received events to one only, that is safe against the sync process.
The network sync process could happen when:
* ``_dhcp_ready_ports_loop`` is called.
* ``sync_state`` is called.
Closes-Bug: #2070376
Change-Id: I21d237de97571aaaae3912d060a3e03a37dd20de
The method ``DhcpAgent._dhcp_ready_ports_loop`` is updating the
instance variables ``dhcp_prio_ready_ports`` and ``dhcp_ready_ports``.
These variables are also updated by the ``sync_state`` method (that
performs an update of the current status of a network). The related
method should be executed inside a context lock to avoid interferences
from other threads.
Related-Bug: #2070376
Change-Id: I49adc465a915883478c88f6d830f28dbc5d3b304
https://review.opendev.org/c/openstack/neutron/+/867359 inadvertently
dropped a return when binding:profile was missing, making it possible
to hit a KeyError when trying to access port["binding:profile"]. This
was seen in the Maintenance thread after adding a port.
Fixes: b6750fb2b8
Closes-Bug: #2071822
Change-Id: I232daa2905904d464ddf84e66e857f8b1f08e941
This patch enables the use of the WSGI module with the ML2/OVN
mechanism driver. The ML2/OVN requires two events that are called
during the Neutron eventlet server initialization:
* BEFORE_SPAWN: called once before the API workers have been created
and after the ML2 plugin code has been initalizated.
* AFTER_INIT: called when the API worker is started; at this point
the different worker processes have been spawned.
The WSGI module didn't make these event calls. Now these events are
called during the API server initialization, after the ML2 plugin
has been initalizated but before the server is running and attending
any request.
This approach differs from the Neutron eventlet server event calls
because the BEFORE_SPAWN event is called for all API workers; that
means the method ``OVNMechanismDriver.pre_fork_initialize`` is called
as many times as workers are configured.
Closes-Bug: #1912359
Change-Id: I684c6cea620308a6617b665400ce608650a2adfd
The subnet policy rule ``ADMIN_OR_NET_OWNER_MEMBER`` requires to
retrieve the network object from the database to read the project ID.
When retrieving a list of subnets, this operation can slow down the
API call. This patch is reordering the subnet RBAC policy checks to
make this check at the end.
As reported in the related LP bug, it is usual to have a "creator"
project where different resources are created and then shared to others;
in this case networks and subnets. All these subnets will belong to the
same project. If a non-admin user from this project list all the
subnets, with the code before to this patch it would be needed to
retrieve all the networks to read the project ID. With the current code
it is needed only to check that the user is a project reader.
The following benchmark has been done in a VM running a standalone
OpenStack deployment. One project has created 400 networks and 400
subnets (one per network). Each network has been shared with another
project. API time to process "GET /networking/v2.0/subnets":
* Without this patch: 5.5 seconds (average)
* With this patch: 0.25 seconds (average)
Related-Bug: #2071374
Related-Bug: #2037107
Change-Id: Ibca174213bba3c56fc18ec2732d80054ac95e859
In [1], a method to process the DHCP events in the correct order was
implemented. That method checks the port events in order to match
the "fixed_ips" field. That implies the Neutron server provides this
information in the port event, sent via RPC.
However in [2], the "fixed_ips" information was removed from the
``DhcpAgentNotifyAPI._after_router_interface_deleted``, causing a
periodic error in the ``DHCPResourceUpdate.__lt__`` method, as reported
in the LP bug. This patch is restoring this field in the RPC message.
[1]https://review.opendev.org/c/openstack/neutron/+/773160
[2]https://review.opendev.org/c/openstack/neutron/+/639814
Closes-Bug: #2071426
Change-Id: If1362b9b91794e74e8cf6bb233e661fba9fb3b26
To solve a performance issue when using network rbacs with thousands
of entries in the subnets, networks, and networks rbacs tables, it's
necessary to change the eager loader strategy to not create and process
a "cartesian" product of thousands of unnecessary combinatios for the
purpose of the relationship included between rbac rules and subnetpool
database model.
We don't need a many-to-many relationship here. So, we can use the
selectin eager loading to make this relationship one-to-many and create
the model with only the necessary steps, without exploding into a
thousands of rows caused by the "left outer join" cascade.
The "total" queries from this process would be divided into a series of
smaller queries with much better performance, and the resulting huge
select query will be resolved much faster without joined cascade,
representing significant performance gains.
Closes-bug: #2071374
Change-Id: I2e4fa0ffd2ad091ab6928bdf0d440b082c37def2
The test_l2_agent_restart test was failing due to the agents not
restarting within the timeout of 30s. This is fixed by:
* Use `systemctl restart` to restart the service instead of killing
and creating a new transient service.
* Don't block on `systemctl` calls to allow parallel service
operations. Previously this was serialized in the rootwrap daemon
which lead to delays.
* Use `KillMode=mixed` to first only kill the main process and give it
25s to cleanly shutdown all other processes. After this timeout all
processes are killed. Previously systemd sent a SIGTERM to all
processes which caused unclean shutdowns of some neutron agents which
expected to shutdown their child processes themselves.
Change-Id: Ic752e36e6fe6ba9b1fc9e7296204c086c465d76f
Closes-Bug: #2070390
When using the Neutron WSGI module, the ML2/OVN maintenance worker needs
to be spawned in a separate service. This patch adds the service
``neutron-ovn-maintenance-worker``, that is a single process service tha
runs the ``MaintenanceWorker`` instance. This process is in charge of
performing periodic routines related to the ML2/OVN driver.
This new service should be included in any deployment project that
allows to spawn Neutron ML2/OVN with WSGI. Along with this patch, a new
one for devstack will be proposed.
Related-Bug: #1912359
Change-Id: Iea2995adb3343aae74a1b617fbccfce5c62c6b87
When using the Neutron WSGI module, the plugin services (periodic
workers created on demand in the ML2 plugin initialization) were not
spawned.
This patch adds a new service that should be spawned within the Neutron
API processes, similar to the RPC server.
Closes-Bug: #2069581
Change-Id: Ia5376a68bfbcff4156800f550f28b59944b863c3
It's seen that these tests interfered(removed router namespace) with
test_metadata_proxy_rate_limiting_ipv6 but can interfere with others
too, let's run these serially to avoid random failures.
Following tests will run serially now:-
- test_periodic_sync_routers_task
- test_periodic_sync_routers_task_routers_deleted_while_agent_down
- test_periodic_sync_routers_task_routers_deleted_while_agent_sync
Closes-Bug: #2069744
Change-Id: I34598cb9ad39c96f5e46d98af1185992c5eb3446
This patch ensures that the "classless-static-route" is wrapped in {} as
expected by OVN and also merges the default routes with the user
inputted ones so everything works as expected.
Closes-Bug: #2069625
Change-Id: I302a872161c55df447a05b31d99c702537502a2f
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
The patch [1] defines the RpcWorker and RpcReportsWorker processes
names. However, the parent class ``neutron_lib.worker.BaseWorker``
init method is not reading the class variable defined in [1]. This
patch is explictly passing the ``desc`` class name in the init method.
[1]https://review.opendev.org/c/openstack/neutron/+/907712
Closes-Bug: #2069581
Change-Id: I50c2b0567ea10316ad06e6e6c1d01db8b9520e3e
The current comparison strategy is very time-consuming, and if
there are hundreds of thousands of security group rules, the
comparison time can still vary from several hours. The main
time-consuming operations are [1].
This patch is sorted first by security group rule ID and then
compared. The execution of sorting actions is relatively fast.
After actual measurement, the total time consumption is in the
minute level.
Partial-Bug: #2023130
[1] b86ca713f7/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_db_sync.py (L285-L291)
Change-Id: If4c886d928258450aac31e12a4e26e0cbe2ace62
This allows deployment tooling to easily switch from passing a binary
path to passing a Python module path. We'll use it shortly.
Change-Id: I5350dff6be0daf1d4e5e5dfa4aab745b765436f7
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
We want this module for use elsewhere. This has a natural home in the
neutron.api module so move it there. This is similar to what has
previously been done for nova [1].
Tests for the module are also moved and other tests slightly decoupled.
[1] https://review.opendev.org/c/openstack/nova/+/902686
Change-Id: I835e7ad95b6d7d83d06f4303b476519c16b9a2c8
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
As was agreed on the CI meeting [1], this patch moves
neutron-ovn-grenade-multinode job from the experimental queue to the
check and gate queues.
Now in check/gate there are 2 grenade jobs: one ovs-multinode job and
one ovn-multinode job.
To not increase number of jobs in the check/gate, this patch also moves
neutron-ovs-grenade-dvr-multinode job to the periodic (and experimental)
queue.
[1] https://meetings.opendev.org/meetings/neutron_ci/2024/neutron_ci.2024-06-11-15.02.log.html#l-18
Change-Id: I22d0f9a59bca6f412dcf30005678229a859d5e4c
This reverts commit 85d3fff97e55ba85f72cda4365ad0441c10bd9f6.
Reason for revert:
The original change was made as a “cheap win” to optimize the number
of queries the neutron server makes during testing. This did
improve the number of queries made but introduced regression in
real world deployments where some customers(through automation)
would define hundreds of tags per port across a large deployment.
I am proposing to revert this change in favor of the old “subquery”
relation in order to fix this regression. In addition, I filed the
Related-Bug #2069061 to investigate using `selectin` as the more
appropriate long term solution.
Change-Id: I83ec349e49e1f343da8996cab149d76443120873
Closes-Bug: #2068761
Related-Bug: #2069061