In ML2/OVN, the subports bindings are not updated with the host
information. This patch skips the LSP update in that case.
Currently the method ``update_lsp_host_info`` is stuck executing
``_wait_for_port_bindings_host``. During this time the subport
can be deleted or removed from the trunk. That will clash with
the newer operation that tries to remove the LSP port host info
and is the cause of the related bug.
Closes-Bug: #2085462
Change-Id: Ic68f9b5aa3b06bc4e1cbfbe577efc33b4b617b45
(cherry picked from commit 63d14a3ff225faa75a825991cf0b33b2fd745b9b)
When building the security group dictionary, it is not needed to
build the security group rules objects individually. These objects
(OVO) are built along with the security group OVO and added in
the result dictionary in ``_make_security_group_dict``.
Related-Bug: #2083682
Change-Id: I66fbf8487b390f7685ef0a4e44c3f58b79cab05f
(cherry picked from commit 232d1d26ea096c1e3b5f92b46029e67689185ae1)
There are some operations where the SG DB object can be used instead of
the SG OVO. That saves conversion time, including the conversion of the
SG rule OVOs, that are child resources of the SG OVO.
This optimization applies to the following methods:
* SecurityGroupDbMixin.get_security_groups
* SecurityGroupDbMixin.update_security_group (partially)
The Nova query to retrieve the SG list in the "server list" command,
has been benchmarked. The testing environment had a single SG with
250 SG rules. Call:
"GET /networking/v2.0/security-groups?id=81f64aa4-2cea-46db-8fea-cd944f106aab
&fields=id&fields=name HTTP/1.1"
* Without this patch: around 1.25 seconds
* With this patch: around 0.025 second (50x improvement).
Closes-bug: #2083682
Change-Id: Ibd032ea77c5bfbc1fa80b3b3ee9ba7d5c36bb1bc
(cherry picked from commit adbc3e23b7d2251cc7de088e2a757674a41c2f6a)
Since [1], the OVN Metadata agent has support for IPv6. If the agent
is updated, the HA proxy instances need to be reconfigured and
restarted. However, that needs to be done only once; the next time
the OVN agent is restarted, if the HA proxy instances are updated
(have IPv6 support), they won't be restarted.
[1]https://review.opendev.org/c/openstack/neutron/+/894026
Conflicts:
neutron/agent/linux/utils.py
neutron/tests/unit/agent/dhcp/test_agent.py
Closes-Bug: #2079996
Change-Id: Id0f678c7ffe162df42e18dfebb97dce677fc79fc
(cherry picked from commit 7b7f8d986a4f818d289149c6960c9eb8b62b432d)
In those Neutron objects and DB definitions where the declarative
attribute ``standard_attr_id`` is defined, use it instead of accessing
to the ``standard_attr`` child object.
Closes-Bug: #2081945
Change-Id: Iadfbeff79c0200c3a6b90f785b910dc391f9deb3
(cherry picked from commit 144e140e750987a286e6adc74ff0ffad1da474d6)
Functional tests started to fail with
"Too many open files" randomly, the default ulimit in
OS is configured to 1024, increasing this to 4096
to avoid these random failures.
Closes-Bug: #2080199
Change-Id: Iff86599678ebdd5189d5b56d11f3373c9b138562
(cherry picked from commit 6970f39a49b83f279b9e0479f7637d03a123a40e)
Fixes a logic error which meant that we didn't iterate over all logical
switches when associating a FIP to an OVN loadbalancer. The symptom was
that the FIP would show in neutron, but would not exist in OVN.
Closes-Bug: #2068644
Change-Id: I6d1979dfb4d6f455ca419e64248087047fbf73d7
Co-Authored-By: Brian Haley <haleyb.dev@gmail.com>
(cherry picked from commit d8a4ad9167afd824a3f823d86a8fd33fb67c4abd)
Currently if the nova endpoint do not exist
exception is raised. Even the endpoint gets created
notification keeps on failing until the session
expires.
If the endpoint not exist the session is not useful
so marking it as invalid, this will ensure if endpoint is
created later the notification do not fail.
Closes-Bug: #2081174
Change-Id: I1f7fd1d1371ca0a3c4edb409cffd2177d44a1f23
(cherry picked from commit 7d1a20ed4d458c6682a52679b71b6bc8dea20d07)
Since [1], the SG rule SQL view also retrieves the table
"default_security_group", using a complex relationship [2].
When the number of SG rules of a SG is high (above 50 it
is clearly noticeable the performance degradation), the
API call can take several seconds. For example, for 100
SG rules it can take up to one minute.
This patch changes the load method of the SG rule
"default_security_group" relationship to "selectin".
Benchmarks with a single default SG and 100 rules,
doing "openstack security group show $sg":
* 2023.2 (without this feature): around 0.05 seconds
* master: between 45-50 seconds (1000x time increase)
* loading method "selectin" or "dynamic": around 0.5 seconds.
NOTE: this feature [1] was implemented in 2024.1. At this
time, SQLAlchemy version was <2.0 and "selectin" method was
not available. For this version, "dynamic" can be used instead.
[1]https://review.opendev.org/q/topic:%22bug/2019960%22
[2]08fff4087d/neutron/db/models/securitygroup.py (L120-L121)
Closes-Bug: #2081087
Change-Id: I46af1179f6905307c0d60b5c0fdee264a40a4eac
(cherry picked from commit c1b05e29adf9d0d68c1ac636013a8a363a92eb85)
The method ``_extend_tags_dict`` can be called from a "list" operation.
If one resource and its "standardattr" register is deleted concurrently,
the "standard_attr" field retrieval will fail.
The "list" operation is protected with a READER transaction context;
however this is failing with the DB PostgreSQL backend.
Closes-Bug: #2078787
Change-Id: I55142ce21cec8bd8e2d6b7b8b20c0147873699da
(cherry picked from commit c7d07b7421034c2722fb0d0cfd2371e052928b97)
If a ML2/SR-IOV port is disabled (status=DOWN), it will have precedence
on the VF link state value over the "auto" value. That will stop any
transmission from the VF.
Closes-Bug: #2078789
Change-Id: I11d973d245dd391623e501aa14b470daa780b4db
(cherry picked from commit 8211c29158d6fc8a1af938c326dfbaa685428a4a)
This patch fixes 2 issues related to that port_hardware_offload_type
extension:
1. API extension is now not supported by the ML2 plugin directly so if
ml2 extension is not loaded Neutron will not report that API
extension is available,
2. Fix error 500 when creating port with hardware_offload_type
attribute set but when binding:profile is not set (is of type
Sentinel).
Conflicts:
neutron/plugins/ml2/plugin.py
Closes-bug: #2078432
Closes-bug: #2078434
Change-Id: Ib0038dd39d8d210104ee8a70e4519124f09292da
(cherry picked from commit fbb7c9ae3d672796b72b796c53f89865ea6b3763)
When an IPv6 only network is used as the sole network for a VM and
there are no other bound ports on the same network in the same chassis,
the OVN metadata agent concludes that the associated namespace is not
needed and deletes it. As a consequence, the VM cannot access the
metadata service. With this change, the namespace is preserved if there
is at least one bound port on the chassis with either IPv4 or IPv6
addresses.
Closes-Bug: #2069482
Change-Id: Ie15c3344161ad521bf10b98303c7bb730351e2d8
(cherry picked from commit f7000f3d57bc59732522c4943d6ff2e9dfcf7d31)
Currently, is_valid_ipv6 accepts ipv6 addresses with scope. However
netaddr library won't accept an address with scope. Now,
get_noscope_ipv6() can be used to avoid this situation. In a future we
will be able to use the same function which is also being defined on
oslo.utils. https://review.opendev.org/c/openstack/oslo.utils/+/925469
Closes-Bug: #2073894
Signed-off-by: Elvira García <egarciar@redhat.com>
Change-Id: I27f25f90c54d7aaa3c4a7b5317b4b8a4122e4068
(cherry picked from commit 1ed8609a6818d99133bf56483adb9bce8c886fd6)
For openvswitch security group, due to some extreme
case, if ofport is processed once, the openvswitch
security driver will cache some old ofport informations
with different local vlan from current assignment.
So this patch changes the local_vlan get method
to the port other_config, this value should be
managed by ovs_agent properly, we can rely on
that.
Closes-Bug: #2071451
Change-Id: I7ad7df72807c95571ef3156c99072852d1c4f494
(cherry picked from commit ae587c34ab59a5717630eded2fab84413f3c1742)
Required since the Depends-On patch included, without
it postgres job fails with:-
AttributeError: 'NoneType' object has no attribute 'id'
Depends-On: https://review.opendev.org/c/openstack/neutron-lib/+/923926
Related-Bug: #2072567
Change-Id: I8f2229eb0a9d8dce927ded004037eda93ce3650d
(cherry picked from commit f17cc24e8adb2bf18af32a45a44e68790c50dc6b)
Some of the OVN maintenance tasks are expected to be run just once and
then they raise periodic.NeverAgain() to not be run anymore. Those tasks
also require to have acquried ovn db lock so that only one of the
maintenance workers really runs them.
All those tasks had set 600 seconds as a spacing time so they were run
every 600 seconds. This works fine usually but that may cause small
issue in the environments were Neutron is run in POD as k8s/openshift
application. In such case, when e.g. configuration of neutron is
updated, it may happen that first new POD with Neutron is spawned and
only once it is already running, k8s will stop old POD. Because of that
maintenance worker running in the new neutron-server POD will not
acquire lock on the OVN DB (old POD still holds the lock) and will not
run all those maintenance tasks immediately. After old POD will be
terminated, one of the new PODs will at some point acquire that lock and
then will run all those maintenance tasks but this would cause 600
seconds delay in running them.
To avoid such long waiting time to run those maintenance tasks, this
patch lowers its spacing time from 600 to just 5 seconds.
Additionally maintenance tasks which are supposed to be run only once and
only by the maintenance worker which has acquired ovn db lock will now be
stopped (periodic.NeverAgain will be raised) after 100 attempts of
run.
This will avoid running them every 5 seconds forever on the workers
which don't acquire lock on the OVN DB at all.
Conflicts:
neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/maintenance.py
Closes-bug: #2074209
Change-Id: Iabb4bb427588c1a5da27a5d313f75b5bd23805b2
(cherry picked from commit 04c217bcd0eda07d52a60121b6f86236ba6e26ee)
Setting of the 'reside-on-chassis-redirect' was skipped for LRP ports of
the provider tenant networks in patch [1] but later patch [2] removed
this limitation from the ovn_client but not from the maintenance task.
Due to that this option wasn't updated after e.g. change of the
'enable_distributed_floating_ip' config option and connectivity to the
existing Floating IPs associated to the ports in vlan tenant networks
was broken.
This patch removes that limitation and this option is now updated for
all of the Logical_Router_Ports for vlan networks, not only for external
gateways.
[1] https://review.opendev.org/c/openstack/neutron/+/871252
[2] https://review.opendev.org/c/openstack/neutron/+/878450
Conflicts:
neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py
Closes-bug: #2073987
Change-Id: I56e791847c8f4f3a07f543689bf22fde8160c9b7
(cherry picked from commit 4b1bfb93e380b8dce78935395b2cda57076e5476)
Commit 260c968118934 broke the gate by causing jobs
to not get run when it added RE2 compatibility for
irrelevant-files. Digging found that RE2 doesn't
support negative lookahead (and won't ever), so it's
impossible to replace the previous pcre filter with a
similar RE2 filter.
Instead of reverting to the original filter, which
is considered obsolete by zuul, this patch fixes the
issue by explicitly listing all files under zuul.d/
except the one that we actually want to trigger the
jobs: zuul.d/project.yaml.
Listing all the files in the directory for every job
is not ideal, and we may revisit it later, or perhaps
even reconsider the extensive use of irrelevant-files
in the neutron tree. This will have to wait for when
the gate is in better shape though.
[0] https://github.com/google/re2/issues/156
Conflicts:
zuul.d/base.yaml
zuul.d/grenade.yaml
zuul.d/job-templates.yaml
zuul.d/project.yaml
zuul.d/rally.yaml
zuul.d/tempest-multinode.yaml
zuul.d/tempest-singlenode.yaml
Related-bug: #2065821
Change-Id: I3bba89ac14414c6b7d375072ae92d2e0b5497736
(cherry picked from commit 11027e3e1ef9a58d5b2faa575a3764bd33cd2a08)
This is follow up patch for the [1] which introduced this new decorator.
[1] https://review.opendev.org/c/openstack/neutron/+/896544
Change-Id: I2de3b5d7ba5783dd82acacda89ab4b64c2d29149
(cherry picked from commit 2a6bc5db237d28ddfdda16aea7c1b3416f3e14a4)
Neither fdb_removal_limit nor mac_binding_removal_limit config
options currently get set in the OVN DB. This patch corrects that
and adds missing testing for the MAC_Binding aging maintenance
task.
Fixes: 0a554b4f29 ("Add support for OVN MAC_Binding aging")
Fixes: 1e9f50c736 ("Add support for FDB aging")
Closes-Bug: #2073309
Change-Id: I80d79faeb9f1057d398ee750ae6e246598fd13d2
(cherry picked from commit b4c8cc600a21469e247a0012585969a7897a0929)
When Neutron is killed with SIGTERM (like via systemctl), when using
ML2/OVN neutron workers do not exit and instead are eventually killed
with SIGKILL when the graceful timeout is reached (often around 1
minute).
This is happening due to the signal handlers for SIGTERM. There are
multiple issues.
1) oslo_service, ml2/ovn mech_driver, and ml2/ovo_rpc.py all call
signal.signal(signal.SIGTERM, ...) overwriting each others signal
handlers.
2) SIGTERM is handled in the main thread, and running blocking code
there causes AssertionErrors in eventlet which also prevents the
process from exiting.
3) The ml2/ovn cleanup code doesn't cause the process to end, so it
interrupts the killing of the process.
oslo_service has a singleton SignalHandler class that solves all of
these issues
Closes-Bug: #2056366
Depends-On: https://review.opendev.org/c/openstack/oslo.service/+/913512
Change-Id: I730a12746bceaa744c658854e38439420efc4629
Signed-off-by: Terry Wilson <twilson@redhat.com>
(cherry picked from commit a4e49b6b8fcf9acfa4e84c65de19ffd56b9022e7)
The "tagging" service plugin API extension does use the policy enforcer
since [1]. If a tag API call is done just after the Neutron server has
been initialized and the policy enforcer, that is a global variable per
API worker, has not been initialized, the API call will fail.
This patch initializes the policy enforcer as is done in the
``PolicyHook``, that is called by many other API resources that inherit
from the ``APIExtensionDescriptor`` class.
[1]https://review.opendev.org/q/I9f3e032739824f268db74c5a1b4f04d353742dbd
Closes-Bug: #2073782
Change-Id: Ia35c51fb81cfc0a55c5a2436fc5c55f2b4c9bd01
(cherry picked from commit 776178e90763d004ccb595b131cdd4dd617cd34f)
The method ``ProcessMonitor._check_child_processes`` was releasing
the thread executor inside a method that creates a lock for the resource
"_check_child_processes". Despite this resource is not used anywhere
else (at least for this instance), this could lead to a potential
deadlock.
The current implementation of ``lockutils.synchronized`` with the
default value "external=False" and "fair=False" is a
``threading.Lock()`` instance. The goal of this lock is, precisely, to
execute the code inside the locked code without any interruption and
then to be able to release the executor.
Closes-Bug: #2073743
Change-Id: I44c7a4ce81a67b86054832ac050cf5b465727adf
(cherry picked from commit baa57ab38d754bfa2dba488feb9429c1380d616c)
If northd is very busy, it may happen port is deleted when handling an
LSP down event causing standard attribute being gone when bumping ovn
revision number. This is because the port is set down in SB DB first and
then northd propagates that to NB DB, and then the event is emited.
This patch just makes sure the traceback is not printed in case this
happens.
TrivialFix
Closes-bug: #2069442
Change-Id: I7d21e4adc27fab411346e0458c92191e69ce6b30
Signed-off-by: Jakub Libosvar <libosvar@redhat.com>
(cherry picked from commit 8ab385f97de99c464258ac74cf342b0353580823)
Currently when sriov agent is enabled and migrating a non-sriov
instance, non-sriov port status is frequently set to BUILD
instead of ACTIVE.
This is because the 'binding_activate' function in sriov-nic-agent sets it
BUILD with get_device_details_from_port_id(as it calls _get_new_status).
This patch checks network_ports in binding_activate and
skip binding port if it is not sriov port
Closes-Bug: #2072154
Change-Id: I2d7702e17c75c96ca2f29749dccab77cb2f4bcf4
(cherry picked from commit a311606fcdae488e76c29e0e5e4035f8da621a34)
If the DB connection is not stopped at the defined timeout (10 seconds),
the clean-up process will continue.
Closes-Bug: #2034589
Change-Id: I6c3b4da49364c3fed86053515e79121acac078d6
(cherry picked from commit b40c728cbb0903d78a5d4d47336fe107f06b9f4d)
In [1], a method to process the DHCP events in the correct order was
implemented. That method checks the port events in order to match
the "fixed_ips" field. That implies the Neutron server provides this
information in the port event, sent via RPC.
However in [2], the "fixed_ips" information was removed from the
``DhcpAgentNotifyAPI._after_router_interface_deleted``, causing a
periodic error in the ``DHCPResourceUpdate.__lt__`` method, as reported
in the LP bug. This patch is restoring this field in the RPC message.
[1]https://review.opendev.org/c/openstack/neutron/+/773160
[2]https://review.opendev.org/c/openstack/neutron/+/639814
Closes-Bug: #2071426
Change-Id: If1362b9b91794e74e8cf6bb233e661fba9fb3b26
(cherry picked from commit b0081ac6c0eca93f7589f5c910d0f6385d83dd47)