28220 Commits

Author SHA1 Message Date
Rodolfo Alonso Hernandez
6d480cbaf5 [DHCP] Reduce to one single thread the event processing
The RPC events received by the client are stored in a queue and
processed depending on a priority. Before this patch, a pool of
threads was spawned to process the received events. However, this
model does not improve the processing speed and could lead to potential
thread concurrencies not considered. Note that the event processing
methods are thread safe against the sync method but not among them.

This patch reduce the number of concurrent threads processing the
received events to one only, that is safe against the sync process.
The network sync process could happen when:
* ``_dhcp_ready_ports_loop`` is called.
* ``sync_state`` is called.

Closes-Bug: #2070376
Change-Id: I21d237de97571aaaae3912d060a3e03a37dd20de
2024-07-08 11:03:21 +00:00
Rodolfo Alonso Hernandez
928f41f1fe [DHCP] Lock the execution of `_dhcp_ready_ports_loop`
The method ``DhcpAgent._dhcp_ready_ports_loop`` is updating the
instance variables ``dhcp_prio_ready_ports`` and ``dhcp_ready_ports``.
These variables are also updated by the ``sync_state`` method (that
performs an update of the current status of a network). The related
method should be executed inside a context lock to avoid interferences
from other threads.

Related-Bug: #2070376
Change-Id: I49adc465a915883478c88f6d830f28dbc5d3b304
2024-07-08 10:07:11 +00:00
Zuul
5602d1db4b Merge "[OVN] Release note for `neutron-ovn-maintenance-worker` process" 2024-07-08 06:26:40 +00:00
Zuul
568c78dcf7 Merge "Change to use selectin for RBACs in SubnetPool DB load strategy" 2024-07-05 20:27:13 +00:00
Zuul
964997b960 Merge "Return empty BpInfo if missing binding:profile" 2024-07-05 04:44:19 +00:00
Zuul
82ebd4df9f Merge "Reorder subnet RBAC policy check strings" 2024-07-04 21:35:25 +00:00
Rodolfo Alonso Hernandez
308db2e048 [OVN] Release note for `neutron-ovn-maintenance-worker` process
Related-Bug: #1912359
Change-Id: Ic7f7edb22c2824a841a7b978b5b3c8eff68fb281
2024-07-04 06:45:10 +00:00
Terry Wilson
e5a8829c56 Return empty BpInfo if missing binding:profile
https://review.opendev.org/c/openstack/neutron/+/867359 inadvertently
dropped a return when binding:profile was missing, making it possible
to hit a KeyError when trying to access port["binding:profile"]. This
was seen in the Maintenance thread after adding a port.

Fixes: b6750fb2b8
Closes-Bug: #2071822
Change-Id: I232daa2905904d464ddf84e66e857f8b1f08e941
2024-07-03 11:03:37 -05:00
Rodolfo Alonso Hernandez
cfab008eef [OVN] Enable the WSGI module for the OVN mechanism driver
This patch enables the use of the WSGI module with the ML2/OVN
mechanism driver. The ML2/OVN requires two events that are called
during the Neutron eventlet server initialization:
* BEFORE_SPAWN: called once before the API workers have been created
  and after the ML2 plugin code has been initalizated.
* AFTER_INIT: called when the API worker is started; at this point
  the different worker processes have been spawned.

The WSGI module didn't make these event calls. Now these events are
called during the API server initialization, after the ML2 plugin
has been initalizated but before the server is running and attending
any request.

This approach differs from the Neutron eventlet server event calls
because the BEFORE_SPAWN event is called for all API workers; that
means the method ``OVNMechanismDriver.pre_fork_initialize`` is called
as many times as workers are configured.

Closes-Bug: #1912359
Change-Id: I684c6cea620308a6617b665400ce608650a2adfd
2024-07-03 07:33:31 +00:00
Zuul
73a9d509ef Merge "[OVN] Add a new process to spawn ML2/OVN maintenance worker" 2024-07-03 07:31:44 +00:00
Rodolfo Alonso Hernandez
729920da5e Reorder subnet RBAC policy check strings
The subnet policy rule ``ADMIN_OR_NET_OWNER_MEMBER`` requires to
retrieve the network object from the database to read the project ID.
When retrieving a list of subnets, this operation can slow down the
API call. This patch is reordering the subnet RBAC policy checks to
make this check at the end.

As reported in the related LP bug, it is usual to have a "creator"
project where different resources are created and then shared to others;
in this case networks and subnets. All these subnets will belong to the
same project. If a non-admin user from this project list all the
subnets, with the code before to this patch it would be needed to
retrieve all the networks to read the project ID. With the current code
it is needed only to check that the user is a project reader.

The following benchmark has been done in a VM running a standalone
OpenStack deployment. One project has created 400 networks and 400
subnets (one per network). Each network has been shared with another
project. API time to process "GET /networking/v2.0/subnets":
* Without this patch: 5.5 seconds (average)
* With this patch: 0.25 seconds (average)

Related-Bug: #2071374
Related-Bug: #2037107
Change-Id: Ibca174213bba3c56fc18ec2732d80054ac95e859
2024-07-02 11:13:32 +00:00
Zuul
9649251f4d Merge "Add the port "fixed_ips" information in the DHCP RPC" 2024-07-01 17:14:28 +00:00
Zuul
08889da9b8 Merge "Fix ML2/OVN OVSDB handling of gateway ports" 2024-07-01 17:14:23 +00:00
Rodolfo Alonso Hernandez
b0081ac6c0 Add the port "fixed_ips" information in the DHCP RPC
In [1], a method to process the DHCP events in the correct order was
implemented. That method checks the port events in order to match
the "fixed_ips" field. That implies the Neutron server provides this
information in the port event, sent via RPC.

However in [2], the "fixed_ips" information was removed from the
``DhcpAgentNotifyAPI._after_router_interface_deleted``, causing a
periodic error in the ``DHCPResourceUpdate.__lt__`` method, as reported
in the LP bug. This patch is restoring this field in the RPC message.

[1]https://review.opendev.org/c/openstack/neutron/+/773160
[2]https://review.opendev.org/c/openstack/neutron/+/639814

Closes-Bug: #2071426
Change-Id: If1362b9b91794e74e8cf6bb233e661fba9fb3b26
2024-06-28 14:57:01 +00:00
Zuul
2ce9ca6cdf Merge "Improve Process fixture service restart handling" 2024-06-27 20:18:54 +00:00
Zuul
ac32dbc192 Merge "Add a new process to spawn the plugin services in the Neutron server" 2024-06-27 20:18:36 +00:00
Roberto Bartzen Acosta
46edf255bd Change to use selectin for RBACs in SubnetPool DB load strategy
To solve a performance issue when using network rbacs with thousands
of entries in the subnets, networks, and networks rbacs tables, it's
necessary to change the eager loader strategy to not create and process
a "cartesian" product of thousands of unnecessary combinatios for the
purpose of the relationship included between rbac rules and subnetpool
database model.

We don't need a many-to-many relationship here. So, we can use the
selectin eager loading to make this relationship one-to-many and create
the model with only the necessary steps, without exploding into a
thousands of rows caused by the "left outer join" cascade.

The "total" queries from this process would be divided into a series of
smaller queries with much better performance, and the resulting huge
select query will be resolved much faster without joined cascade,
representing significant performance gains.

Closes-bug: #2071374
Change-Id: I2e4fa0ffd2ad091ab6928bdf0d440b082c37def2
2024-06-27 18:48:25 +00:00
Zuul
778aebf953 Merge "dhcp: fix dhcp cleaning stale devices process when enable action" 2024-06-27 01:00:53 +00:00
Zuul
8792f32b26 Merge "Remove maintenance task "update_port_virtual_type"" 2024-06-27 00:27:35 +00:00
Gaudenz Steinlin
1c888c94a3 Improve Process fixture service restart handling
The test_l2_agent_restart test was failing due to the agents not
restarting within the timeout of 30s. This is fixed by:

* Use `systemctl restart` to restart the service instead of killing
  and creating a  new transient service.
* Don't block on `systemctl` calls to allow parallel service
  operations. Previously this was serialized in the rootwrap daemon
  which lead to delays.
* Use `KillMode=mixed` to first only kill the main process and give it
  25s to cleanly shutdown all other processes. After this timeout all
  processes are killed. Previously systemd sent a SIGTERM to all
  processes which caused unclean shutdowns of some neutron agents which
  expected to shutdown their child processes themselves.

Change-Id: Ic752e36e6fe6ba9b1fc9e7296204c086c465d76f
Closes-Bug: #2070390
2024-06-26 07:59:21 +00:00
Rodolfo Alonso Hernandez
b39b5fc215 Remove maintenance task "update_port_virtual_type"
That method was planned to be removed in Z+4=D cycle.

Related-Bug: #1973276
Change-Id: Id78889caa530064ddc4a5efb6f60c7e3cfa216da
2024-06-25 22:53:16 +00:00
Rodolfo Alonso Hernandez
5d316e8a87 Remove the Windows libraries
Windows OS was deprecated in 2023.2 and removed in this release
(2024.2), as documented in [1].

[1]https://review.opendev.org/c/openstack/neutron/+/880980

Related-Bug: #2015844
Change-Id: I3e512311a2e7fe46f70b86c0b0cf992f0cf6bd68
2024-06-25 22:43:14 +00:00
Rodolfo Alonso Hernandez
980f9bdab2 [OVN] Add a new process to spawn ML2/OVN maintenance worker
When using the Neutron WSGI module, the ML2/OVN maintenance worker needs
to be spawned in a separate service. This patch adds the service
``neutron-ovn-maintenance-worker``, that is a single process service tha
runs the ``MaintenanceWorker`` instance. This process is in charge of
performing periodic routines related to the ML2/OVN driver.

This new service should be included in any deployment project that
allows to spawn Neutron ML2/OVN with WSGI. Along with this patch, a new
one for devstack will be proposed.

Related-Bug: #1912359
Change-Id: Iea2995adb3343aae74a1b617fbccfce5c62c6b87
2024-06-24 12:40:26 +00:00
Rodolfo Alonso Hernandez
811f74d943 Add a new process to spawn the plugin services in the Neutron server
When using the Neutron WSGI module, the plugin services (periodic
workers created on demand in the ML2 plugin initialization) were not
spawned.

This patch adds a new service that should be spawned within the Neutron
API processes, similar to the RPC server.

Closes-Bug: #2069581
Change-Id: Ia5376a68bfbcff4156800f550f28b59944b863c3
2024-06-23 08:12:00 +00:00
Zuul
305e1451bb Merge "[OVN] Sanitize the classless-static-route DHCP option" 25.0.0.0b1 2024-06-19 13:34:11 +00:00
Zuul
466f5e6f1a Merge "[FT] Run test_periodic_sync_routers_task tests serially" 2024-06-18 22:47:48 +00:00
Zuul
a0fb188598 Merge "Set the Neutron server workers name" 2024-06-18 21:17:31 +00:00
Zuul
5198ad8e4f Merge "Add new neutron.wsgi module" 2024-06-18 21:15:28 +00:00
bf82263df0 [FT] Run test_periodic_sync_routers_task tests serially
It's seen that these tests interfered(removed router namespace) with
test_metadata_proxy_rate_limiting_ipv6 but can interfere with others
too, let's run these serially to avoid random failures.

Following tests will run serially now:-
- test_periodic_sync_routers_task
- test_periodic_sync_routers_task_routers_deleted_while_agent_down
- test_periodic_sync_routers_task_routers_deleted_while_agent_sync

Closes-Bug: #2069744
Change-Id: I34598cb9ad39c96f5e46d98af1185992c5eb3446
2024-06-18 19:19:07 +05:30
Zuul
1431a08440 Merge "Use transient systemd units in Process fixture" 2024-06-17 20:52:03 +00:00
Zuul
8ee0ddadf6 Merge "Add wsgi tempest job for OVS and OVN" 2024-06-17 20:30:30 +00:00
Lucas Alvares Gomes
ceee380a18 [OVN] Sanitize the classless-static-route DHCP option
This patch ensures that the "classless-static-route" is wrapped in {} as
expected by OVN and also merges the default routes with the user
inputted ones so everything works as expected.

Closes-Bug: #2069625
Change-Id: I302a872161c55df447a05b31d99c702537502a2f
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
2024-06-17 15:42:15 +01:00
Rodolfo Alonso Hernandez
cabe18dd79 Set the Neutron server workers name
The patch [1] defines the RpcWorker and RpcReportsWorker processes
names. However, the parent class ``neutron_lib.worker.BaseWorker``
init method is not reading the class variable defined in [1]. This
patch is explictly passing the ``desc`` class name in the init method.

[1]https://review.opendev.org/c/openstack/neutron/+/907712

Closes-Bug: #2069581
Change-Id: I50c2b0567ea10316ad06e6e6c1d01db8b9520e3e
2024-06-17 10:24:44 +00:00
Zuul
0c4793ed2a Merge "Remove neutron.wsgi module" 2024-06-16 12:21:16 +00:00
elajkat
b352917461 Add wsgi tempest job for OVS and OVN
Add wsgi tempest job for OVN and OVS in the
experimental and periodic queue.

Depends-On: https://review.opendev.org/c/919725
Depends-On: https://review.opendev.org/c/openstack/devstack/+/922012

Change-Id: Ibe2e5960f7c5daebda2a8ca3b1f619b0d93b7bc9
2024-06-14 12:09:49 +00:00
Ihar Hrachyshka
79b2d709c8 tests: fix IP address not accepted by latest netaddr
This raised the following error:

netaddr.core.AddrFormatError: invalid IPNetwork 10.10.20/24

Change-Id: I33d86b40bd8ef8dfb68892ad333a31ae24924a6a
2024-06-13 21:53:23 -04:00
Zuul
2a2d626ead Merge "Change to use selectin for DB load strategy" 2024-06-13 21:17:56 +00:00
Zuul
2b20737eb3 Merge "Improve ACL comparison efficiency" 2024-06-13 17:15:51 +00:00
Zuul
a247c9be2b Merge "[CI] Enable OVN grenade job in the check and gate queue" 2024-06-13 09:37:30 +00:00
zhouhenglc
dbca7e1f8c Improve ACL comparison efficiency
The current comparison strategy is very time-consuming, and if
there are hundreds of thousands of security group rules, the
comparison time can still vary from several hours. The main
time-consuming operations are [1].

This patch is sorted first by security group rule ID and then
compared. The execution of sorting actions is relatively fast.
After actual measurement, the total time consumption is in the
minute level.

Partial-Bug: #2023130
[1] b86ca713f7/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_db_sync.py (L285-L291)

Change-Id: If4c886d928258450aac31e12a4e26e0cbe2ace62
2024-06-12 21:45:08 +00:00
Stephen Finucane
adb39e2d1c Add new neutron.wsgi module
This allows deployment tooling to easily switch from passing a binary
path to passing a Python module path. We'll use it shortly.

Change-Id: I5350dff6be0daf1d4e5e5dfa4aab745b765436f7
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2024-06-12 19:46:25 +00:00
Stephen Finucane
34fafa2d8c Remove neutron.wsgi module
We want this module for use elsewhere. This has a natural home in the
neutron.api module so move it there. This is similar to what has
previously been done for nova [1].

Tests for the module are also moved and other tests slightly decoupled.

[1] https://review.opendev.org/c/openstack/nova/+/902686

Change-Id: I835e7ad95b6d7d83d06f4303b476519c16b9a2c8
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2024-06-12 19:46:11 +00:00
Brian Haley
05fcfef6ce Change to use selectin for DB load strategy
During a mailing list discussion on some OOM issues neutron
has been seeing [0], Mike Bayer recommended we should change
from using subquery to selectin DB load strategy.

A full description of this strategy can be found here [1],
but in short:

- “subquery” loading incurs additional performance / complexity
  issues when used on a many-levels-deep eager load, as
  subqueries will be nested repeatedly.

- "The subqueryload() eager loader is mostly legacy at this
  point, superseded by selectinload()

- "The only scenario in which selectin eager loading is not
  feasible is when the model is using composite primary keys,
  and the backend database does not support tuples with IN,
  which currently includes SQL Server." So that does not
  apply to us.

The plan agreed to at the neutron drivers meeting [2] was to
make this change early in the cycle so we would be able to
see if there were any issues through the D cycle.

Added hacking checks so new code using subquery loads is
not added back.

[0] https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/EHLQQXNG3NLLZYPDGG2ES3DINIJ7YT3N/
[1] https://docs.sqlalchemy.org/en/20/orm/queryguide/relationships.html#selectin-eager-loading
[2] https://meetings.opendev.org/meetings/neutron_drivers/2024/neutron_drivers.2024-05-31-14.00.log.html#l-67

Closes-bug: #2067770
Depends-on: https://review.opendev.org/c/openstack/neutron-lib/+/920936
Change-Id: I6e40a15284da392a3d48d45205a7a5770c14c297
2024-06-12 11:31:22 -04:00
Zuul
abe8110f53 Merge "Revert "Use HasStandardAttributes as parent class for Tags DB model"" 2024-06-12 10:26:05 +00:00
Slawek Kaplonski
e1cf0f2d59 [CI] Enable OVN grenade job in the check and gate queue
As was agreed on the CI meeting [1], this patch moves
neutron-ovn-grenade-multinode job from the experimental queue to the
check and gate queues.
Now in check/gate there are 2 grenade jobs: one ovs-multinode job and
one ovn-multinode job.

To not increase number of jobs in the check/gate, this patch also moves
neutron-ovs-grenade-dvr-multinode job to the periodic (and experimental)
queue.

[1] https://meetings.opendev.org/meetings/neutron_ci/2024/neutron_ci.2024-06-11-15.02.log.html#l-18

Change-Id: I22d0f9a59bca6f412dcf30005678229a859d5e4c
2024-06-12 09:42:32 +02:00
Zuul
939f86f027 Merge "Remove old excludes" 2024-06-12 00:09:39 +00:00
Zuul
efe7930dd0 Merge "[OVN] Bump revision number after update_virtual_port_host" 2024-06-11 22:09:58 +00:00
Zuul
25608f2165 Merge "Bump neutron-lib to 3.13.0" 2024-06-11 18:04:21 +00:00
Zuul
319489cc8f Merge "[OVN] Fix virtual parent match for PortBindingUpdateVirtualPortsEvent" 2024-06-11 18:04:11 +00:00
Miro Tomaska
bf123dfb38 Revert "Use HasStandardAttributes as parent class for Tags DB model"
This reverts commit 85d3fff97e55ba85f72cda4365ad0441c10bd9f6.

Reason for revert:
The original change was made as a “cheap win” to optimize the number
of queries the neutron server makes during testing. This did
improve the number of queries made but introduced regression in
real world deployments where some customers(through automation)
would define hundreds of tags per port across a large deployment.
I am proposing to revert this change in favor of the old “subquery”
relation in order to fix this regression. In addition, I filed the
Related-Bug #2069061 to investigate using `selectin` as the more
appropriate long term solution.

Change-Id: I83ec349e49e1f343da8996cab149d76443120873
Closes-Bug: #2068761
Related-Bug: #2069061
2024-06-11 11:10:18 -04:00