Adds cleanup to remove dangling volumes
Change-Id: Ic7a456ecb59dd4498444f953a6bcb7f63ee3c902
This commit is contained in:
parent
5229724450
commit
2e1c161885
|
@ -0,0 +1,239 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=============================================
|
||||
Cleanup dangling volumes block device mapping
|
||||
=============================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/nova-manage-cleanup-dangling-volume-attachments
|
||||
|
||||
Find out if there are any dangling/unattached volume in nova database
|
||||
and remove, if they exists.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
In case after some volume operation, volume get detached from instance
|
||||
but nova did not get notified and thinks volume is still attached to an
|
||||
instance.
|
||||
|
||||
This can lead to different issues which required volume details from
|
||||
block device mapping table, such as live miration and resizing of instance.
|
||||
|
||||
Steps to reproduce:
|
||||
|
||||
- Create an Instance and attach a volume to it.
|
||||
|
||||
- Delete volume attachement using cinder, so nova do not know about deletion.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cinder --os-volume-api-version 3.27 attachment-delete <attahcment_id>
|
||||
|
||||
- Verify using cinder api.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ openstack volume list
|
||||
|
||||
volume is not attached to the instance and status would be as 'available'.
|
||||
|
||||
- Verify from nova api.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ openstack server volume list <server>
|
||||
|
||||
volume is listed as attached to instance.
|
||||
|
||||
- Verify in nova block device mapping table, volume would be listed as
|
||||
attached to instance.
|
||||
|
||||
.. warning::
|
||||
|
||||
The above mentioned steps are only to understand and reproduce
|
||||
the issue and ``Nova`` does not support deleting volume from cinder.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
- As an operator, I want all dangling volumes safely removed from my instance,
|
||||
as having these volume in BDM makes instance goes to error state on instance
|
||||
startup.
|
||||
|
||||
- As an operator, I want all dangling volumes safely removed from my instance,
|
||||
so any volume-related operations do not get affected.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
To spawn a new instance, Nova retrieves a copy of the base OS image from
|
||||
Glance, now this image is an instance storage, which means if we create any
|
||||
file, it will persist in this storage. Nova creates a BDM for it in the
|
||||
block_device_mapping database with source_type as image and destination_type
|
||||
as local.
|
||||
|
||||
Similarly, when we ask Nova to attach volume to an instance, Nova creates a
|
||||
BDM of it in the block_device_mapping database and sets source_type and
|
||||
destination_type as volume.
|
||||
|
||||
While restarting the instance, verify, on the basis of source_type and
|
||||
volume_type, whether the attached BDM is a volume or not, if it is a volume,
|
||||
then verify if this volume exists in Cinder or not. If it exists, verify if
|
||||
its status is 'in-use' or 'available'. If it's 'in-use', that means the volume
|
||||
attachment is correct, and both Nova and Cinder are aware of this attachment.
|
||||
If it's 'available' that means the volume is not attached properly to the
|
||||
instance, so remove or soft delete the BDM from the block_device_mapping
|
||||
database.
|
||||
|
||||
Also log the error or exception of this in Nova logs, so operators can be
|
||||
aware of the reason for this modification or update in the database.
|
||||
|
||||
Code Changes
|
||||
------------
|
||||
|
||||
To delete the BDM's from database we first must need to shutdown the instance.
|
||||
So this functionality should be added in instance reboot process. While
|
||||
rebooting, once instance shutoff properly perform all the volume checks
|
||||
and delete the BDM's.
|
||||
|
||||
_delete_dangling_bdms() should be added inside ComptuteManager and called from
|
||||
ComptuteManager.reboot_instance.
|
||||
|
||||
Once dangling volume is found, log an exception for InvalidVolume.
|
||||
|
||||
Similar to below error message should be printed in nova-compute logs, so
|
||||
operator can be aware of these database modifications.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
ERROR nova.compute.manager Traceback (most recent call last):
|
||||
ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 4168, in _delete_dangling_bdms
|
||||
ERROR nova.compute.manager self.volume_api.check_attached(admin_ctxt, volume)
|
||||
ERROR nova.compute.manager File "/opt/stack/nova/nova/volume/cinder.py", line 524, in check_attached
|
||||
ERROR nova.compute.manager raise exception.InvalidVolume(reason=msg)
|
||||
ERROR nova.compute.manager nova.exception.InvalidVolume: Invalid volume: volume 'VOLUME-ID' status must be 'in-use'. Currently in 'available' status
|
||||
ERROR nova.compute.manager
|
||||
INFO nova.compute.manager [None REQ-ID admin admin] Deleting volume 'VOLUME-ID' from nova block device mapping.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
- A cleanup functionality for nova-manage utility, which takes instance
|
||||
an remove all dangling volumes from instance.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ nova-manage volume_attachment cleanup <server-id>
|
||||
|
||||
- A cron job which check for each instance in the BDM table, if instance has
|
||||
dangling volumes, remove volume entry from table. In this job instance UUID
|
||||
is not required.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Upgrade impact
|
||||
--------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
auniyal
|
||||
|
||||
Feature Liaison
|
||||
---------------
|
||||
|
||||
Feature liaison:
|
||||
None
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
- Create a cleanup functionality and add in instance restart process.
|
||||
- Add unit and functional tests for cleanup.
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Unit and Functional tests will be added.
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Documentation for cleanup dangling volumes while server restart will be added
|
||||
in nova docs.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
None
|
||||
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - 2023.2 Bobcat
|
||||
- Introduced
|
Loading…
Reference in New Issue