Adds cleanup to remove dangling volumes

Change-Id: Ic7a456ecb59dd4498444f953a6bcb7f63ee3c902
This commit is contained in:
Amit Uniyal 2023-03-28 10:16:07 +00:00
parent 5229724450
commit 2e1c161885
1 changed files with 239 additions and 0 deletions

View File

@ -0,0 +1,239 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=============================================
Cleanup dangling volumes block device mapping
=============================================
https://blueprints.launchpad.net/nova/+spec/nova-manage-cleanup-dangling-volume-attachments
Find out if there are any dangling/unattached volume in nova database
and remove, if they exists.
Problem description
===================
In case after some volume operation, volume get detached from instance
but nova did not get notified and thinks volume is still attached to an
instance.
This can lead to different issues which required volume details from
block device mapping table, such as live miration and resizing of instance.
Steps to reproduce:
- Create an Instance and attach a volume to it.
- Delete volume attachement using cinder, so nova do not know about deletion.
.. code-block:: shell
$ cinder --os-volume-api-version 3.27 attachment-delete <attahcment_id>
- Verify using cinder api.
.. code-block:: shell
$ openstack volume list
volume is not attached to the instance and status would be as 'available'.
- Verify from nova api.
.. code-block:: shell
$ openstack server volume list <server>
volume is listed as attached to instance.
- Verify in nova block device mapping table, volume would be listed as
attached to instance.
.. warning::
The above mentioned steps are only to understand and reproduce
the issue and ``Nova`` does not support deleting volume from cinder.
Use Cases
---------
- As an operator, I want all dangling volumes safely removed from my instance,
as having these volume in BDM makes instance goes to error state on instance
startup.
- As an operator, I want all dangling volumes safely removed from my instance,
so any volume-related operations do not get affected.
Proposed change
===============
To spawn a new instance, Nova retrieves a copy of the base OS image from
Glance, now this image is an instance storage, which means if we create any
file, it will persist in this storage. Nova creates a BDM for it in the
block_device_mapping database with source_type as image and destination_type
as local.
Similarly, when we ask Nova to attach volume to an instance, Nova creates a
BDM of it in the block_device_mapping database and sets source_type and
destination_type as volume.
While restarting the instance, verify, on the basis of source_type and
volume_type, whether the attached BDM is a volume or not, if it is a volume,
then verify if this volume exists in Cinder or not. If it exists, verify if
its status is 'in-use' or 'available'. If it's 'in-use', that means the volume
attachment is correct, and both Nova and Cinder are aware of this attachment.
If it's 'available' that means the volume is not attached properly to the
instance, so remove or soft delete the BDM from the block_device_mapping
database.
Also log the error or exception of this in Nova logs, so operators can be
aware of the reason for this modification or update in the database.
Code Changes
------------
To delete the BDM's from database we first must need to shutdown the instance.
So this functionality should be added in instance reboot process. While
rebooting, once instance shutoff properly perform all the volume checks
and delete the BDM's.
_delete_dangling_bdms() should be added inside ComptuteManager and called from
ComptuteManager.reboot_instance.
Once dangling volume is found, log an exception for InvalidVolume.
Similar to below error message should be printed in nova-compute logs, so
operator can be aware of these database modifications.
.. code-block:: shell
ERROR nova.compute.manager Traceback (most recent call last):
ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 4168, in _delete_dangling_bdms
ERROR nova.compute.manager self.volume_api.check_attached(admin_ctxt, volume)
ERROR nova.compute.manager File "/opt/stack/nova/nova/volume/cinder.py", line 524, in check_attached
ERROR nova.compute.manager raise exception.InvalidVolume(reason=msg)
ERROR nova.compute.manager nova.exception.InvalidVolume: Invalid volume: volume 'VOLUME-ID' status must be 'in-use'. Currently in 'available' status
ERROR nova.compute.manager
INFO nova.compute.manager [None REQ-ID admin admin] Deleting volume 'VOLUME-ID' from nova block device mapping.
Alternatives
------------
- A cleanup functionality for nova-manage utility, which takes instance
an remove all dangling volumes from instance.
.. code-block:: shell
$ nova-manage volume_attachment cleanup <server-id>
- A cron job which check for each instance in the BDM table, if instance has
dangling volumes, remove volume entry from table. In this job instance UUID
is not required.
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
None
Developer impact
----------------
None
Upgrade impact
--------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
auniyal
Feature Liaison
---------------
Feature liaison:
None
Work Items
----------
- Create a cleanup functionality and add in instance restart process.
- Add unit and functional tests for cleanup.
Dependencies
============
None
Testing
=======
Unit and Functional tests will be added.
Documentation Impact
====================
Documentation for cleanup dangling volumes while server restart will be added
in nova docs.
References
==========
None
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - 2023.2 Bobcat
- Introduced