When detaching multiple NVMe-oF volumes from the same host we may end
with a NVMe subsystem in "connecting" state, and we'll see a bunch nvme
error in dmesg.
This happens on storage systems that share the same subsystem for
multiple volumes because Nova has not been updated to support the
tri-state "shared_targets" option that groups the detach and unmap of
volumes to prevent race conditions.
This is related to the issue mentioned in an os-brick commit message [1]
For the guard_connection method of os-brick to work as expected for
NVMe-oF volumes we need to use microversion 3.69 when retrieving the
cinder volume.
In microversion 3.69 we started reporting 3 states for shared_targets:
True, False, and None.
- True is to guard iSCSI volumes and will only be used if the iSCSI
initiator running on the host doesn't have the manual scans feature.
- False is that no target/subsystem is being shared so no guard is
necessary.
- None is to force guarding, and it's currenly used for NVMe-oF volumes
when sharing the subsystem.
[1]: https://review.opendev.org/c/openstack/os-brick/+/836062/12//COMMIT_MSG
Closes-Bug: #2035375
Change-Id: I4def1c0f20118d0b8eb7d3bbb09af2948ffd70e1
This change refactors prvisep util test cases to account for the
fact that oslo.log now conditionally uses an internal pipe mutex
when logging under eventlet.
This was added by Iac1b0891ae584ce4b95964e6cdc0ff2483a4e57d
which is part of oslo.log 5.3.0
As a result we need to mock all calls to oslo.log in unit tests
that are assertign if os.write is called. when the internal
pipe mutex is used oslo.log calls os.write when the mutex is
released.
Related-Bug: #1983863
Change-Id: Id313669df80f9190b79690fff25f8e3fce2a4aca
This change ensure we only try to clean up dangling bdms if
cinder is installed and reachable.
Closes-Bug: #2033752
Change-Id: I0ada59d8901f8620fd1f3dc20d6be303aa7dabca
This addresses comments from code review to add handling of PCPU during
the migration/copy of limits from the Nova database to Keystone. In
legacy quotas, there is no settable quota limit for PCPU, so the limit
for VCPU is used for PCPU. With unified limits, PCPU will have its own
quota limit, so for the automated migration command, we will simply
create a dedicated limit for PCPU that is the same value as the limit
for VCPU.
On the docs side, this adds more detail about the token authorization
settings needed to use the nova-manage limits migrate_to_unified_limits
CLI command and documents more OSC limit commands like show and delete.
Related to blueprint unified-limits-nova-tool-and-docs
Change-Id: Ifdb1691d7b25d28216d26479418ea323476fee1a
Many bugs around nova-compute rebalancing are focused around
problems when the compute node and placement resources are
deleted, and sometimes they never get re-created.
To limit this class of bugs, we add a check to ensure a compute
node is only ever deleted when it is known to have been deleted
in Ironic.
There is a risk this might leave orphaned compute nodes and
resource providers that need manual clean up because users
do not want to delete the node in Ironic, but are removing it
from nova management. But on balance, it seems safer to leave
these cases up to the operator to resolve manually, and collect
feedback on how to better help those users.
blueprint ironic-shards
Change-Id: I7cd9e5ab878cea05462cac24de581dca6d50b3c3
When people transition from three ironic nova-compute processes down
to one process, we need a way to move the ironic nodes, and any
associcated instances, between nova-compute processes.
For saftey, a nova-compute process must first be forced_down via
the API, similar to when using evacaute, before moving the associated
ironic nodes to another nova-compute process. The destination
nova-compute process should ideally not be running, but not forced
down.
blueprint ironic-shards
Change-Id: I7ef25e27bf8c47f994e28c59858cf3df30975b05
On reboot, check the instance volume status on the cinder side.
Verify if volume exists and cinder has an attachment ID, else
delete its BDMS data from nova DB and vice versa.
Updated existing test cases to use CinderFixture while rebooting as
reboot calls get_all_attachments
Implements: blueprint https://blueprints.launchpad.net/nova/+spec/cleanup-dangling-volume-attachments
Closes-Bug: 2019078
Change-Id: Ieb619d4bfe0a6472aefb118b58283d7ad8d24c29
Ironic in API 1.82 added the option for nodes to be associated with
a specific shard key. This can be used to partition up the nodes within
a single ironic conductor group into smaller sets of nodes that can
each be managed by their own nova-compute ironic service.
We add a new [ironic]shard config option to allow operators to say
which shard each nova-compute process should target.
As such, when the shard is set we ignore the peer_list setting
and always have a hash ring of one.
blueprint ironic-shards
Change-Id: I5c1b5688c96096f4cfecfc5b16ea59d2ee5756d6
As part of the move to using Ironic shards, we document that the best
practice for scaling Ironic and Nova deployments is to shard Ironic
nodes between nova-compute processes, rather than attempting to
user the peer_list.
Currently, we only allow users to do this using conductor groups.
This works well for those wanting a conductor group per L2 network
domain. But in general, conductor groups per nova-compute are
a very poor trade off in terms of ironic deployment complexity.
Futher patches will look to enable the use of ironic shards,
alongside conductor groups, to more easily shard your ironic nodes
between nova-compute processes.
To avoid confusion, we rename the partition_key configuration
value to conductor_group.
blueprint ironic-shards
Change-Id: Ia2e23a59dbd2f13c6f74ca975c249751bebf54b2
This adds documentation for unified limits and signals deprecation of
the nova.quota.DbQuotaDriver.
Related to blueprint unified-limits-nova-tool-and-docs
Change-Id: I3951317111396aa4df36c5700b4d4dd33e721a74
This command aims to help migrate to unified limits quotas by reading
legacy quota limits from the Nova database and calling the Keystone API
to create corresponding unified limits.
Related to blueprint unified-limits-nova-tool-and-docs
Change-Id: I5536010ea1212918e61b3f4f22c2077fadc5ebfe
- Added function to get all attachments by instance or volume id from Cinder in Cinder.API
- Updated CinderFixture to add mock get_all_attachment functionality.
- Added unit tests get_all_attachments.
Related-Bug: 2019078
Change-Id: I8619d898f68250bf70a17b1e6b8b0c249245b43b
We leak due to running background operations like server create, rebuild
and in one case a sleeping claim. So this patch removes the leaks by
making sure that the background operations stop before the test
finishes.
As there is no more leak in functional test too this patch makes the
leak an error there too.
Change-Id: I6905999050e8d09b772837034a212c534e9c3226
The leak tests just start a server create and the finish so I think
the actual conductor thread leaks. If I add a bit of sleep to the end of
these test cases then the leak disappears. Unfortunately adding the
sleep at the greenpool fixture does not have this effect. So instead I
added delete_server calls to the end of these tests as those are nicer
than sleeps and has the same effect regarding the leak.
We are down to 3 leaking functional test
Change-Id: I070390c695283bdd9b87cd879aa2a9257ee7bdfb
This change add a global greenpool which is used to manage
the greenthreads created via nova.utils.spawn(_n).
A test fixture is also added to use an isolated greenpool
which will raise an exception if a greenthread is leaked.
the fixture will optionally raise if greenlets are leaked.
This is enabled by unit test by default and is configurable
for functional tests.
This change removes all greenthread leaks from the unit
and functional tests that were detected. 7 functional
tests still leak greenlets but they have no obvious
cause. as such greenlet leaks are not treated as errors
for funtional tests by default. Greenthread leaks
are always treated as errors.
Set NOVA_RAISE_ON_GREENLET_LEAK=1|true|yes when invoking
tox to make greenlet leaks an error for functional tests.
Change-Id: I73b4684744b340bfb80da08537a745167ddea106
If we have no pci_requests we will have no pci_devices, so initialize
that on create so we stop trying to lazy-load them later. Also,
migration_context will always be empty on create, so initialize that
for the same reason.
Change-Id: I546961e6018c3c48cf482cc38ca2d91a29e0da77