2ec2222841
There are cases where requests to delete an attachment made by Nova can race other third-party requests to delete the overall volume. This has been observed when running cinder-csi, where it first requests that Nova detaches a volume before itself requesting that the overall volume is deleted once it becomes `available`. This is a cinder race condition, and like most race conditions is not simple to explain. Some context on the issue: - Cinder API uses the volume "status" field as a locking mechanism to prevent concurrent request processing on the same volume. - Most cinder operations are asynchronous, so the API returns before the operation has been completed by the cinder-volume service, but the attachment operations such as creating/updating/deleting an attachment are synchronous, so the API only returns to the caller after the cinder-volume service has completed the operation. - Our current code **incorrectly** modifies the status of the volume both on the cinder-volume and the cinder-api services on the attachment delete operation. The actual set of events that leads to the issue reported in this bug are: [Cinder-CSI] - Requests Nova to detach volume (Request R1) [Nova] - R1: Asks cinder-api to delete the attachment and **waits** [Cinder-API] - R1: Checks the status of the volume - R1: Sends terminate connection request (R1) to cinder-volume and **waits** [Cinder-Volume] - R1: Ask the driver to terminate the connection - R1: The driver asks the backend to unmap and unexport the volume - R1: The last attachment is removed from the DB and the status of the volume is changed in the DB to "available" [Cinder-CSI] - Checks that there are no attachments in the volume and asks Cinder to delete it (Request R2) [Cinder-API] - R2: Check that the volume's status is valid. It doesn't have attachments and is available, so it can be deleted. - R2: Tell cinder-volume to delete the volume and return immediately. [Cinder-Volume] - R2: Volume is deleted and DB entry is deleted - R1: Finish the termination of the connection [Cinder-API] - R1: Now that cinder-volume has finished the termination the code continues - R1: Try to modify the volume in the DB - R1: DB layer raises VolumeNotFound since the volume has been deleted from the DB - R1: VolumeNotFound is converted to HTTP 404 status code which is returned to Nova [Nova] - R1: Cinder responds with 404 on the attachment delete request - R1: Nova leaves the volume as attached, since the attachment delete failed At this point the Cinder and Nova DBs are out of sync, because Nova thinks that the attachment is connected and Cinder has detached the volume and even deleted it. Hardening is also being done on the Nova side [2] to accept that the volume attachment may be gone. This patch fixes the issue mentioned above, but there is a request on Cinder-CSI [1] to use Nova as the source of truth regarding its attachments that, when implemented, would also fix the issue. [1]: https://github.com/kubernetes/cloud-provider-openstack/issues/1645 [2]: https://review.opendev.org/q/topic:%2522bug/1937084%2522+project:openstack/nova Closes-Bug: #1937084 Change-Id: Iaf149dadad5791e81a3c0efd089d0ee66a1a5614
7 lines
223 B
YAML
7 lines
223 B
YAML
---
|
|
fixes:
|
|
- |
|
|
`Bug #1937084 <https://bugs.launchpad.net/cinder/+bug/1937084>`_: Fixed
|
|
race condition between delete attachment and delete volume that can leave
|
|
deleted volumes stuck as attached to instances.
|