The method ``_extend_tags_dict`` can be called from a "list" operation.
If one resource and its "standardattr" register is deleted concurrently,
the "standard_attr" field retrieval will fail.
The "list" operation is protected with a READER transaction context;
however this is failing with the DB PostgreSQL backend.
Closes-Bug: #2078787
Change-Id: I55142ce21cec8bd8e2d6b7b8b20c0147873699da
(cherry picked from commit c7d07b7421034c2722fb0d0cfd2371e052928b97)
If a ML2/SR-IOV port is disabled (status=DOWN), it will have precedence
on the VF link state value over the "auto" value. That will stop any
transmission from the VF.
Closes-Bug: #2078789
Change-Id: I11d973d245dd391623e501aa14b470daa780b4db
(cherry picked from commit 8211c29158d6fc8a1af938c326dfbaa685428a4a)
This patch fixes 2 issues related to that port_hardware_offload_type
extension:
1. API extension is now not supported by the ML2 plugin directly so if
ml2 extension is not loaded Neutron will not report that API
extension is available,
2. Fix error 500 when creating port with hardware_offload_type
attribute set but when binding:profile is not set (is of type
Sentinel).
Conflicts:
neutron/plugins/ml2/plugin.py
Closes-bug: #2078432
Closes-bug: #2078434
Change-Id: Ib0038dd39d8d210104ee8a70e4519124f09292da
(cherry picked from commit fbb7c9ae3d672796b72b796c53f89865ea6b3763)
When an IPv6 only network is used as the sole network for a VM and
there are no other bound ports on the same network in the same chassis,
the OVN metadata agent concludes that the associated namespace is not
needed and deletes it. As a consequence, the VM cannot access the
metadata service. With this change, the namespace is preserved if there
is at least one bound port on the chassis with either IPv4 or IPv6
addresses.
Closes-Bug: #2069482
Change-Id: Ie15c3344161ad521bf10b98303c7bb730351e2d8
(cherry picked from commit f7000f3d57bc59732522c4943d6ff2e9dfcf7d31)
Currently, is_valid_ipv6 accepts ipv6 addresses with scope. However
netaddr library won't accept an address with scope. Now,
get_noscope_ipv6() can be used to avoid this situation. In a future we
will be able to use the same function which is also being defined on
oslo.utils. https://review.opendev.org/c/openstack/oslo.utils/+/925469
Closes-Bug: #2073894
Signed-off-by: Elvira García <egarciar@redhat.com>
Change-Id: I27f25f90c54d7aaa3c4a7b5317b4b8a4122e4068
(cherry picked from commit 1ed8609a6818d99133bf56483adb9bce8c886fd6)
Required since the Depends-On patch included, without
it postgres job fails with:-
AttributeError: 'NoneType' object has no attribute 'id'
Depends-On: https://review.opendev.org/c/openstack/neutron-lib/+/923926
Related-Bug: #2072567
Change-Id: I8f2229eb0a9d8dce927ded004037eda93ce3650d
(cherry picked from commit f17cc24e8adb2bf18af32a45a44e68790c50dc6b)
Some of the OVN maintenance tasks are expected to be run just once and
then they raise periodic.NeverAgain() to not be run anymore. Those tasks
also require to have acquried ovn db lock so that only one of the
maintenance workers really runs them.
All those tasks had set 600 seconds as a spacing time so they were run
every 600 seconds. This works fine usually but that may cause small
issue in the environments were Neutron is run in POD as k8s/openshift
application. In such case, when e.g. configuration of neutron is
updated, it may happen that first new POD with Neutron is spawned and
only once it is already running, k8s will stop old POD. Because of that
maintenance worker running in the new neutron-server POD will not
acquire lock on the OVN DB (old POD still holds the lock) and will not
run all those maintenance tasks immediately. After old POD will be
terminated, one of the new PODs will at some point acquire that lock and
then will run all those maintenance tasks but this would cause 600
seconds delay in running them.
To avoid such long waiting time to run those maintenance tasks, this
patch lowers its spacing time from 600 to just 5 seconds.
Additionally maintenance tasks which are supposed to be run only once and
only by the maintenance worker which has acquired ovn db lock will now be
stopped (periodic.NeverAgain will be raised) after 100 attempts of
run.
This will avoid running them every 5 seconds forever on the workers
which don't acquire lock on the OVN DB at all.
Conflicts:
neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/maintenance.py
Closes-bug: #2074209
Change-Id: Iabb4bb427588c1a5da27a5d313f75b5bd23805b2
(cherry picked from commit 04c217bcd0eda07d52a60121b6f86236ba6e26ee)
Setting of the 'reside-on-chassis-redirect' was skipped for LRP ports of
the provider tenant networks in patch [1] but later patch [2] removed
this limitation from the ovn_client but not from the maintenance task.
Due to that this option wasn't updated after e.g. change of the
'enable_distributed_floating_ip' config option and connectivity to the
existing Floating IPs associated to the ports in vlan tenant networks
was broken.
This patch removes that limitation and this option is now updated for
all of the Logical_Router_Ports for vlan networks, not only for external
gateways.
[1] https://review.opendev.org/c/openstack/neutron/+/871252
[2] https://review.opendev.org/c/openstack/neutron/+/878450
Conflicts:
neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py
Closes-bug: #2073987
Change-Id: I56e791847c8f4f3a07f543689bf22fde8160c9b7
(cherry picked from commit 4b1bfb93e380b8dce78935395b2cda57076e5476)
Commit 260c968118934 broke the gate by causing jobs
to not get run when it added RE2 compatibility for
irrelevant-files. Digging found that RE2 doesn't
support negative lookahead (and won't ever), so it's
impossible to replace the previous pcre filter with a
similar RE2 filter.
Instead of reverting to the original filter, which
is considered obsolete by zuul, this patch fixes the
issue by explicitly listing all files under zuul.d/
except the one that we actually want to trigger the
jobs: zuul.d/project.yaml.
Listing all the files in the directory for every job
is not ideal, and we may revisit it later, or perhaps
even reconsider the extensive use of irrelevant-files
in the neutron tree. This will have to wait for when
the gate is in better shape though.
[0] https://github.com/google/re2/issues/156
Conflicts:
zuul.d/base.yaml
zuul.d/grenade.yaml
zuul.d/job-templates.yaml
zuul.d/project.yaml
zuul.d/rally.yaml
zuul.d/tempest-multinode.yaml
zuul.d/tempest-singlenode.yaml
Related-bug: #2065821
Change-Id: I3bba89ac14414c6b7d375072ae92d2e0b5497736
(cherry picked from commit 11027e3e1ef9a58d5b2faa575a3764bd33cd2a08)
This is follow up patch for the [1] which introduced this new decorator.
[1] https://review.opendev.org/c/openstack/neutron/+/896544
Change-Id: I2de3b5d7ba5783dd82acacda89ab4b64c2d29149
(cherry picked from commit 2a6bc5db237d28ddfdda16aea7c1b3416f3e14a4)
Neither fdb_removal_limit nor mac_binding_removal_limit config
options currently get set in the OVN DB. This patch corrects that
and adds missing testing for the MAC_Binding aging maintenance
task.
Fixes: 0a554b4f29 ("Add support for OVN MAC_Binding aging")
Fixes: 1e9f50c736 ("Add support for FDB aging")
Closes-Bug: #2073309
Change-Id: I80d79faeb9f1057d398ee750ae6e246598fd13d2
(cherry picked from commit b4c8cc600a21469e247a0012585969a7897a0929)
When Neutron is killed with SIGTERM (like via systemctl), when using
ML2/OVN neutron workers do not exit and instead are eventually killed
with SIGKILL when the graceful timeout is reached (often around 1
minute).
This is happening due to the signal handlers for SIGTERM. There are
multiple issues.
1) oslo_service, ml2/ovn mech_driver, and ml2/ovo_rpc.py all call
signal.signal(signal.SIGTERM, ...) overwriting each others signal
handlers.
2) SIGTERM is handled in the main thread, and running blocking code
there causes AssertionErrors in eventlet which also prevents the
process from exiting.
3) The ml2/ovn cleanup code doesn't cause the process to end, so it
interrupts the killing of the process.
oslo_service has a singleton SignalHandler class that solves all of
these issues
Closes-Bug: #2056366
Depends-On: https://review.opendev.org/c/openstack/oslo.service/+/913512
Change-Id: I730a12746bceaa744c658854e38439420efc4629
Signed-off-by: Terry Wilson <twilson@redhat.com>
(cherry picked from commit a4e49b6b8fcf9acfa4e84c65de19ffd56b9022e7)
The "tagging" service plugin API extension does use the policy enforcer
since [1]. If a tag API call is done just after the Neutron server has
been initialized and the policy enforcer, that is a global variable per
API worker, has not been initialized, the API call will fail.
This patch initializes the policy enforcer as is done in the
``PolicyHook``, that is called by many other API resources that inherit
from the ``APIExtensionDescriptor`` class.
[1]https://review.opendev.org/q/I9f3e032739824f268db74c5a1b4f04d353742dbd
Closes-Bug: #2073782
Change-Id: Ia35c51fb81cfc0a55c5a2436fc5c55f2b4c9bd01
(cherry picked from commit 776178e90763d004ccb595b131cdd4dd617cd34f)
The method ``ProcessMonitor._check_child_processes`` was releasing
the thread executor inside a method that creates a lock for the resource
"_check_child_processes". Despite this resource is not used anywhere
else (at least for this instance), this could lead to a potential
deadlock.
The current implementation of ``lockutils.synchronized`` with the
default value "external=False" and "fair=False" is a
``threading.Lock()`` instance. The goal of this lock is, precisely, to
execute the code inside the locked code without any interruption and
then to be able to release the executor.
Closes-Bug: #2073743
Change-Id: I44c7a4ce81a67b86054832ac050cf5b465727adf
(cherry picked from commit baa57ab38d754bfa2dba488feb9429c1380d616c)
If northd is very busy, it may happen port is deleted when handling an
LSP down event causing standard attribute being gone when bumping ovn
revision number. This is because the port is set down in SB DB first and
then northd propagates that to NB DB, and then the event is emited.
This patch just makes sure the traceback is not printed in case this
happens.
TrivialFix
Closes-bug: #2069442
Change-Id: I7d21e4adc27fab411346e0458c92191e69ce6b30
Signed-off-by: Jakub Libosvar <libosvar@redhat.com>
(cherry picked from commit 8ab385f97de99c464258ac74cf342b0353580823)
Currently when sriov agent is enabled and migrating a non-sriov
instance, non-sriov port status is frequently set to BUILD
instead of ACTIVE.
This is because the 'binding_activate' function in sriov-nic-agent sets it
BUILD with get_device_details_from_port_id(as it calls _get_new_status).
This patch checks network_ports in binding_activate and
skip binding port if it is not sriov port
Closes-Bug: #2072154
Change-Id: I2d7702e17c75c96ca2f29749dccab77cb2f4bcf4
(cherry picked from commit a311606fcdae488e76c29e0e5e4035f8da621a34)
If the DB connection is not stopped at the defined timeout (10 seconds),
the clean-up process will continue.
Closes-Bug: #2034589
Change-Id: I6c3b4da49364c3fed86053515e79121acac078d6
(cherry picked from commit b40c728cbb0903d78a5d4d47336fe107f06b9f4d)
In [1], a method to process the DHCP events in the correct order was
implemented. That method checks the port events in order to match
the "fixed_ips" field. That implies the Neutron server provides this
information in the port event, sent via RPC.
However in [2], the "fixed_ips" information was removed from the
``DhcpAgentNotifyAPI._after_router_interface_deleted``, causing a
periodic error in the ``DHCPResourceUpdate.__lt__`` method, as reported
in the LP bug. This patch is restoring this field in the RPC message.
[1]https://review.opendev.org/c/openstack/neutron/+/773160
[2]https://review.opendev.org/c/openstack/neutron/+/639814
Closes-Bug: #2071426
Change-Id: If1362b9b91794e74e8cf6bb233e661fba9fb3b26
(cherry picked from commit b0081ac6c0eca93f7589f5c910d0f6385d83dd47)
To solve a performance issue when using network rbacs with thousands
of entries in the subnets, networks, and networks rbacs tables, it's
necessary to change the eager loader strategy to not create and process
a "cartesian" product of thousands of unnecessary combinatios for the
purpose of the relationship included between rbac rules and subnetpool
database model.
We don't need a many-to-many relationship here. So, we can use the
selectin eager loading to make this relationship one-to-many and create
the model with only the necessary steps, without exploding into a
thousands of rows caused by the "left outer join" cascade.
The "total" queries from this process would be divided into a series of
smaller queries with much better performance, and the resulting huge
select query will be resolved much faster without joined cascade,
representing significant performance gains.
Closes-bug: #2071374
Change-Id: I2e4fa0ffd2ad091ab6928bdf0d440b082c37def2
(cherry picked from commit 46edf255bde0603fe88b2dd9f4e482590e384382)
The subnet policy rule ``ADMIN_OR_NET_OWNER_MEMBER`` requires to
retrieve the network object from the database to read the project ID.
When retrieving a list of subnets, this operation can slow down the
API call. This patch is reordering the subnet RBAC policy checks to
make this check at the end.
As reported in the related LP bug, it is usual to have a "creator"
project where different resources are created and then shared to others;
in this case networks and subnets. All these subnets will belong to the
same project. If a non-admin user from this project list all the
subnets, with the code before to this patch it would be needed to
retrieve all the networks to read the project ID. With the current code
it is needed only to check that the user is a project reader.
The following benchmark has been done in a VM running a standalone
OpenStack deployment. One project has created 400 networks and 400
subnets (one per network). Each network has been shared with another
project. API time to process "GET /networking/v2.0/subnets":
* Without this patch: 5.5 seconds (average)
* With this patch: 0.25 seconds (average)
Related-Bug: #2071374
Related-Bug: #2037107
Change-Id: Ibca174213bba3c56fc18ec2732d80054ac95e859
(cherry picked from commit 729920da5e836fa7a27b1b85b3b2999146d905ba)
https://review.opendev.org/c/openstack/neutron/+/867359 inadvertently
dropped a return when binding:profile was missing, making it possible
to hit a KeyError when trying to access port["binding:profile"]. This
was seen in the Maintenance thread after adding a port.
Fixes: b6750fb2b8
Closes-Bug: #2071822
Change-Id: I232daa2905904d464ddf84e66e857f8b1f08e941
(cherry picked from commit e5a8829c565755e4c7d4e8b2d52536234c90d8b4)
This patch ensures that the "classless-static-route" is wrapped in {} as
expected by OVN and also merges the default routes with the user
inputted ones so everything works as expected.
Closes-Bug: #2069625
Change-Id: I302a872161c55df447a05b31d99c702537502a2f
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
(cherry picked from commit ceee380a1835d706579aa0f3597ad9e0ce1a37ee)
It's seen that these tests interfered(removed router namespace) with
test_metadata_proxy_rate_limiting_ipv6 but can interfere with others
too, let's run these serially to avoid random failures.
Following tests will run serially now:-
- test_periodic_sync_routers_task
- test_periodic_sync_routers_task_routers_deleted_while_agent_down
- test_periodic_sync_routers_task_routers_deleted_while_agent_sync
Closes-Bug: #2069744
Change-Id: I34598cb9ad39c96f5e46d98af1185992c5eb3446
(cherry picked from commit bf82263df027a5c5213422feb12eefd6de9fa867)
If enable explicitly_egress_direct=True and set port as
no security group and port_security=False, the ingress
flood will reappear. The pipleline is:
Ingress
table_0 -> table_60 -> NORMAL -> VM
Egress
table_0 -> ... -> table_94 -> output
Because ingress final action is normal, the br-int will learn the
source MAC, but egress final action is output. So VM's mac will
never be learnt by the br-int. Then ingress flood comes again.
This patch adds a default direct flow to table 94 during the
openflow security group init and explicitly_egress_direct=True, then
the pipleline will be:
Ingress
table_0 -> table_60 -> table_94 -> output VM
Egress
table_0 -> ... -> table_94 -> output
And this patch adds the flows coming from patch port which will
match local vlan then go to table 94 do the same direct actions.
Above flood issue will be addressed by these flows.
Closes-Bug: #2051351
Change-Id: Ia61784174ee610b338f26660b2954330abc131a1
(cherry picked from commit d6f56c5f96c42e1682f3d1723a65253429778c20)
This patch adds bump revision after updating the hostname
of a virtual port (more specifically its associated port).
This way there is no misalignment between the revision number
of Neutron DB and OVN DB.
It also avoids the unnecessary execution of the maintenance
task to simply match the revision_number.
Closes-Bug: #2069046
Change-Id: I2734984f10341ab97ebbdee11389d214bb1150f3
(cherry picked from commit f210a904793b585dafea8085ed62e06f3fed2e6e)
As mentioned in change [1], the condition should be a `is None` as per inline comment.
[1] https://review.opendev.org/c/openstack/neutron/+/896883
Related-Bug: #2038413
Change-Id: I3666cf0509747863ca2a416c8bfc065582573734
(cherry picked from commit 170d99f2d53f77d4c66e505f310fd9d8f3481149)
This reverts commit 85d3fff97e55ba85f72cda4365ad0441c10bd9f6.
Reason for revert:
The original change was made as a “cheap win” to optimize the number
of queries the neutron server makes during testing. This did
improve the number of queries made but introduced regression in
real world deployments where some customers(through automation)
would define hundreds of tags per port across a large deployment.
I am proposing to revert this change in favor of the old “subquery”
relation in order to fix this regression. In addition, I filed the
Related-Bug #2069061 to investigate using `selectin` as the more
appropriate long term solution.
Change-Id: I83ec349e49e1f343da8996cab149d76443120873
Closes-bug: #2068761
Related-Bug: #2069061
There are three reasons to revert this patch.
1. It broke RPC push API for trunks because it added port db model to
event payload that is not serializeable.
2. It also broke the callback event payload interface, which requires
that all entries in .states attribute belong to the same core object.
To quote from neutron-lib,
```
# an iterable of states for the resource from the newest to the oldest
# for example db states or api request/response
# the actual object type for states will vary depending on event caller
self.states = ...
```
3. There is no good justification why ml2/ovn would not allow this
operation. The rationale for the original patch was to align the
behavior with ml2/ovs, but we don't such parity requirements. The 409
error that can be returned by the API endpoints is backend specific.
To quote api-ref,
```
409 The operation returns this error code for one of these reasons:
A system configuration prevents the operation from succeeding.
```
AFAIU there is nothing that prevents ml2/ovn to create a trunk in this
situation.
This will have to be backported in all supported branches (the original
patch was backported down to Wallaby).
Conflicts:
neutron/services/trunk/drivers/ovn/trunk_driver.py
This reverts commit 833a6d82cd705548130cdac73a88d388f52c7824.
Closes-Bug: #2065707
Related-Bug: #2022059
Change-Id: I067c2f7286b2684b67b4389ca085d06a93f856ce
(cherry picked from commit ac15191f88a63bd5e0510c3602fb6d19c9ac1c92)