60372 Commits

Author SHA1 Message Date
Gorka Eguileor
18163761d0 Fix guard for NVMeOF volumes
When detaching multiple NVMe-oF volumes from the same host we may end
with a NVMe subsystem in "connecting" state, and we'll see a bunch nvme
error in dmesg.

This happens on storage systems that share the same subsystem for
multiple volumes because Nova has not been updated to support the
tri-state "shared_targets" option that groups the detach and unmap of
volumes to prevent race conditions.

This is related to the issue mentioned in an os-brick commit message [1]

For the guard_connection method of os-brick to work as expected for
NVMe-oF volumes we need to use microversion 3.69 when retrieving the
cinder volume.

In microversion 3.69 we started reporting 3 states for shared_targets:
True, False, and None.

- True is to guard iSCSI volumes and will only be used if the iSCSI
  initiator running on the host doesn't have the manual scans feature.

- False is that no target/subsystem is being shared so no guard is
  necessary.

- None is to force guarding, and it's currenly used for NVMe-oF volumes
  when sharing the subsystem.

[1]: https://review.opendev.org/c/openstack/os-brick/+/836062/12//COMMIT_MSG

Closes-Bug: #2035375
Change-Id: I4def1c0f20118d0b8eb7d3bbb09af2948ffd70e1
2024-01-05 16:58:02 +01:00
Zuul
53012f1c55 Merge "adapt to oslo.log changes" 2023-09-11 16:42:25 +00:00
Zuul
96ae9708e6 Merge "doc: mark the maximum microversion for 2023.2 Bobcat" 2023-09-11 15:48:39 +00:00
Sean Mooney
ea07f96eb1 adapt to oslo.log changes
This change refactors prvisep util test cases to account for the
fact that oslo.log now conditionally uses an internal pipe mutex
when logging under eventlet.

This was added by Iac1b0891ae584ce4b95964e6cdc0ff2483a4e57d
which is part of oslo.log 5.3.0

As a result we need to mock all calls to oslo.log in unit tests
that are assertign if os.write is called. when the internal
pipe mutex is used oslo.log calls os.write when the mutex is
released.

Related-Bug: #1983863
Change-Id: Id313669df80f9190b79690fff25f8e3fce2a4aca
2023-09-11 16:15:32 +01:00
Zuul
49bff9b9c6 Merge "Follow up for unified limits: PCPU and documentation" 2023-09-05 14:36:09 +00:00
Sylvain Bauza
515d9cbfa4 doc: mark the maximum microversion for 2023.2 Bobcat
We need it for this release.

Change-Id: Ic9aee508a72489ad0cc1948d1ccafad0bd09fb14
2023-09-05 16:20:13 +02:00
OpenStack Proposal Bot
82a17a37de Imported Translations from Zanata
For more information about this automatic import see:
https://docs.openstack.org/i18n/latest/reviewing-translation-import.html

Change-Id: I1e8922b801f13a05d82d46d5eb6c237fb4ea56b8
2023-09-05 03:09:58 +00:00
Zuul
ad307b31f8 Merge "[alembic] Alembic operations require keywords only arguments" 2023-09-04 13:25:42 +00:00
Zuul
eee5b39b8e Merge "Make compute node rebalance safter" 2023-09-02 08:26:43 +00:00
Zuul
0318016ea4 Merge "Add nova-manage ironic-compute-node-move" 2023-09-02 02:42:22 +00:00
Sean Mooney
68b2131d81 only attempt to clean up dangling bdm if cinder is installed
This change ensure we only try to clean up dangling bdms if
cinder is installed and reachable.

Closes-Bug: #2033752
Change-Id: I0ada59d8901f8620fd1f3dc20d6be303aa7dabca
2023-09-01 17:00:40 +00:00
melanie witt
d42fe462be Follow up for unified limits: PCPU and documentation
This addresses comments from code review to add handling of PCPU during
the migration/copy of limits from the Nova database to Keystone. In
legacy quotas, there is no settable quota limit for PCPU, so the limit
for VCPU is used for PCPU. With unified limits, PCPU will have its own
quota limit, so for the automated migration command, we will simply
create a dedicated limit for PCPU that is the same value as the limit
for VCPU.

On the docs side, this adds more detail about the token authorization
settings needed to use the nova-manage limits migrate_to_unified_limits
CLI command and documents more OSC limit commands like show and delete.

Related to blueprint unified-limits-nova-tool-and-docs

Change-Id: Ifdb1691d7b25d28216d26479418ea323476fee1a
2023-09-01 06:19:51 +00:00
Zuul
88c73e2931 Merge "Bump MIN_{LIBVIRT,QEMU} for "Bobcat"" 2023-09-01 04:56:37 +00:00
Zuul
3fdc97ca5f Merge "Add documentation for unified limits" 2023-08-31 23:54:17 +00:00
Zuul
f2c84c82e2 Merge "nova-manage: Add 'limits migrate_to_unified_limits'" 2023-08-31 23:54:10 +00:00
Zuul
5ab8ba17e8 Merge "Fix tox docs target" 2023-08-31 21:02:48 +00:00
Zuul
33ed31a421 Merge "Remove unused mocks" 2023-08-31 21:02:39 +00:00
Zuul
080d7f561b Merge "Update serial console example client for py3" 2023-08-31 21:02:32 +00:00
Zuul
c8bb6236ae Merge "Delete dangling bdms" 2023-08-31 21:02:24 +00:00
Zuul
9adf7af0e7 Merge "Limit nodes by ironic shard key" 2023-08-31 21:00:22 +00:00
Zuul
579e673f90 Merge "Reproducer for dangling bdms" 2023-08-31 19:06:39 +00:00
Zuul
b71c14b5ab Merge "Add function to get all attachments in Cinder.API module" 2023-08-31 19:06:30 +00:00
John Garbutt
772f5a1ae4 Make compute node rebalance safter
Many bugs around nova-compute rebalancing are focused around
problems when the compute node and placement resources are
deleted, and sometimes they never get re-created.

To limit this class of bugs, we add a check to ensure a compute
node is only ever deleted when it is known to have been deleted
in Ironic.

There is a risk this might leave orphaned compute nodes and
resource providers that need manual clean up because users
do not want to delete the node in Ironic, but are removing it
from nova management. But on balance, it seems safer to leave
these cases up to the operator to resolve manually, and collect
feedback on how to better help those users.

blueprint ironic-shards

Change-Id: I7cd9e5ab878cea05462cac24de581dca6d50b3c3
2023-08-31 17:21:15 +00:00
John Garbutt
9068db09e4
Add nova-manage ironic-compute-node-move
When people transition from three ironic nova-compute processes down
to one process, we need a way to move the ironic nodes, and any
associcated instances, between nova-compute processes.

For saftey, a nova-compute process must first be forced_down via
the API, similar to when using evacaute, before moving the associated
ironic nodes to another nova-compute process. The destination
nova-compute process should ideally not be running, but not forced
down.

blueprint ironic-shards

Change-Id: I7ef25e27bf8c47f994e28c59858cf3df30975b05
2023-08-31 18:19:49 +01:00
Amit Uniyal
9d5935d007 Delete dangling bdms
On reboot, check the instance volume status on the cinder side.
Verify if volume exists and cinder has an attachment ID, else
delete its BDMS data from nova DB and vice versa.

Updated existing test cases to use CinderFixture while rebooting as
reboot calls get_all_attachments

Implements: blueprint https://blueprints.launchpad.net/nova/+spec/cleanup-dangling-volume-attachments
Closes-Bug: 2019078

Change-Id: Ieb619d4bfe0a6472aefb118b58283d7ad8d24c29
2023-08-31 14:19:58 +00:00
John Garbutt
f5a12f511b
Limit nodes by ironic shard key
Ironic in API 1.82 added the option for nodes to be associated with
a specific shard key. This can be used to partition up the nodes within
a single ironic conductor group into smaller sets of nodes that can
each be managed by their own nova-compute ironic service.

We add a new [ironic]shard config option to allow operators to say
which shard each nova-compute process should target.
As such, when the shard is set we ignore the peer_list setting
and always have a hash ring of one.

blueprint ironic-shards

Change-Id: I5c1b5688c96096f4cfecfc5b16ea59d2ee5756d6
2023-08-31 14:31:26 +01:00
John Garbutt
cbf400df1d Deprecate ironic.peer_list
As part of the move to using Ironic shards, we document that the best
practice for scaling Ironic and Nova deployments is to shard Ironic
nodes between nova-compute processes, rather than attempting to
user the peer_list.

Currently, we only allow users to do this using conductor groups.
This works well for those wanting a conductor group per L2 network
domain. But in general, conductor groups per nova-compute are
a very poor trade off in terms of ironic deployment complexity.
Futher patches will look to enable the use of ironic shards,
alongside conductor groups, to more easily shard your ironic nodes
between nova-compute processes.

To avoid confusion, we rename the partition_key configuration
value to conductor_group.

blueprint ironic-shards

Change-Id: Ia2e23a59dbd2f13c6f74ca975c249751bebf54b2
2023-08-31 08:56:10 +00:00
Zuul
4490c8bc84 Merge "Remove deprecated AZ filter." 2023-08-31 07:21:33 +00:00
Amit Uniyal
c33a9ccf4c Reproducer for dangling bdms
- Added reproducer test for dangling BDMs

Related-Bug: 2019078
Change-Id: Ieb76b00600acc9c7610546f5a1afab5b65903825
2023-08-31 05:27:55 +00:00
Zuul
573d28435c Merge "libvirt: Add 'COMPUTE_ADDRESS_SPACE_*' traits support" 2023-08-31 02:47:00 +00:00
melanie witt
8f0817f078 Add documentation for unified limits
This adds documentation for unified limits and signals deprecation of
the nova.quota.DbQuotaDriver.

Related to blueprint unified-limits-nova-tool-and-docs

Change-Id: I3951317111396aa4df36c5700b4d4dd33e721a74
2023-08-30 19:33:50 +00:00
melanie witt
395501c876 nova-manage: Add 'limits migrate_to_unified_limits'
This command aims to help migrate to unified limits quotas by reading
legacy quota limits from the Nova database and calling the Keystone API
to create corresponding unified limits.

Related to blueprint unified-limits-nova-tool-and-docs

Change-Id: I5536010ea1212918e61b3f4f22c2077fadc5ebfe
2023-08-30 19:13:07 +00:00
Zuul
2e40f7952b Merge "Add a new NumInstancesWeigher" 2023-08-30 18:46:10 +00:00
Zuul
065c5906d2 Merge "[functional]Fix remaining greenlet leaks" 2023-08-25 20:31:27 +00:00
Zuul
2d6adec763 Merge "[functional] Avoid leaking greenlet in UnifiedLimits tests" 2023-08-25 18:44:42 +00:00
Amit Uniyal
32ed205794 Add function to get all attachments in Cinder.API module
- Added function to get all attachments by instance or volume id from Cinder in Cinder.API
- Updated CinderFixture to add mock get_all_attachment functionality.
- Added unit tests get_all_attachments.

Related-Bug: 2019078

Change-Id: I8619d898f68250bf70a17b1e6b8b0c249245b43b
2023-08-25 16:56:51 +00:00
Zuul
a183e0d80e Merge "introduce global greenpool" 2023-08-25 16:28:15 +00:00
Balazs Gibizer
bc58c1d2fb [functional]Fix remaining greenlet leaks
We leak due to running background operations like server create, rebuild
and in one case a sleeping claim. So this patch removes the leaks by
making sure that the background operations stop before the test
finishes.

As there is no more leak in functional test too this patch makes the
leak an error there too.

Change-Id: I6905999050e8d09b772837034a212c534e9c3226
2023-08-25 16:15:29 +02:00
Balazs Gibizer
0ae1802db6 [functional] Avoid leaking greenlet in UnifiedLimits tests
The leak tests just start a server create and the finish so I think
the actual conductor thread leaks. If I add a bit of sleep to the end of
these test cases then the leak disappears. Unfortunately adding the
sleep at the greenpool fixture does not have this effect. So instead I
added delete_server calls to the end of these tests as those are nicer
than sleeps and has the same effect regarding the leak.

We are down to 3 leaking functional test

Change-Id: I070390c695283bdd9b87cd879aa2a9257ee7bdfb
2023-08-25 15:55:13 +02:00
Zuul
3a3a75698a Merge "Fix exception catch when volume mount fails" 2023-08-25 10:44:22 +00:00
Zuul
fbbfda315d Merge "Avoid lazy-loads in resize" 2023-08-24 23:40:22 +00:00
Sean Mooney
d71d2dc219 introduce global greenpool
This change add a global greenpool which is used to manage
the greenthreads created via nova.utils.spawn(_n).

A test fixture is also added to use an isolated greenpool
which will raise an exception if a greenthread is leaked.
the fixture will optionally raise if greenlets are leaked.
This is enabled by unit test by default and is configurable
for functional tests.

This change removes all greenthread leaks from the unit
and functional tests that were detected. 7 functional
tests still leak greenlets but they have no obvious
cause. as such greenlet leaks are not treated as errors
for funtional tests by default. Greenthread leaks
are always treated as errors.
Set NOVA_RAISE_ON_GREENLET_LEAK=1|true|yes when invoking
tox to make greenlet leaks an error for functional tests.

Change-Id: I73b4684744b340bfb80da08537a745167ddea106
2023-08-25 00:03:35 +01:00
Zuul
d970862c9f Merge "Avoid lazy-loads on server create" 2023-08-24 18:39:26 +00:00
Zuul
15f90f9107 Merge "Pick next min libvirt / QEMU versions for "C" (2024) release" 2023-08-24 16:50:24 +00:00
Zuul
a9b1e024eb Merge "Remove a lazy load on every server show" 2023-08-24 02:50:24 +00:00
Zuul
17b46bae84 Merge "Avoid lazy-loading in resize and rebuild/evacuate" 2023-08-22 22:54:12 +00:00
Zuul
f2b5fd0c84 Merge "Log excessive lazy-loading behavior" 2023-08-22 16:05:15 +00:00
Zuul
927eeb54f2 Merge "Remove n-v ceph live migration job from gate" 2023-08-21 22:18:09 +00:00
Dan Smith
8d0a0ec88e Avoid lazy-loads in resize
Change-Id: Ib8f8bb596fa58dde40c1a0b359435ef0a48f15b2
2023-08-21 08:53:19 -07:00
Dan Smith
3a68b5e193 Avoid lazy-loads on server create
If we have no pci_requests we will have no pci_devices, so initialize
that on create so we stop trying to lazy-load them later. Also,
migration_context will always be empty on create, so initialize that
for the same reason.

Change-Id: I546961e6018c3c48cf482cc38ca2d91a29e0da77
2023-08-21 08:53:19 -07:00