Files
nova-specs/specs/queens/approved/multi-attach-volume.rst
Matt Riedemann 24a0513b44 Fix the names on some specs
These spec filenames don't match the blueprint name in launchpad.

Change-Id: I4b1c1447d8cb2dc683002a1966ad140fcb6070fc
2018-02-08 19:21:25 -05:00

331 lines
12 KiB
ReStructuredText

..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==================================
Support Cinder Volume Multi-attach
==================================
https://blueprints.launchpad.net/nova/+spec/multi-attach-volume
Currently, Nova only allows a volume to be attached to a single
instance. There are times when a user may want to be able
to attach the same volume to multiple instances.
Problem description
===================
Currently Nova is not prepared to attach a single Cinder volume to
multiple VM instances even if the volume itself allows that operation.
This document describes the required changes in Nova to introduce this new
functionality and also lists the limitations it has.
Use Cases
---------
Allow users to share volumes between multiple guests using read-write
attachments like clustered applications with two nodes where one is active and
one is passive. Both require access to the same volume although only one
accesses actively. When the active one goes down, the passive one can take
over quickly and has access to the data.
The above example works with active/active scenario as well, it's the user's
responsibility to choose the right filesystem.
Proposed change
===============
The new 'multi-attach' functionality will be enabled by using the new Cinder
attach/detach API which is available from the API microversion 3.45 [#]_.
Cinder will only allow a volume to be attached more than once if its
'multiattach' flag is set on the volume at create time. Nova is expected to
rely on Cinder to do the check on the volume state when it's reserving the
volume on the API level by calling attachment_create.
.. todo:: A volume should only be allowed to be attached to a given instance
once, regardless of multiattach. This requires investigation since the
``nova.block_device_mappings`` table and ``cinder.volume_attachment`` tables
do not have unique constraints on volume_id and instance_uuid.
There are problems today when multiple volume attachments share a single
target to the volume backend. If we do not take care, multi-attach would
make these problems much worse. The simplest fix is to serialize all attach and
detach operations involving a shared target. To do this Cinder will expose
a volume info property of 'shared_targets', when True a lock will be
placed around all attachment_update and attachment_delete calls, and the
associated calls to os-brick.::
# The lock uses the volume.backend_uuid value.
with optional_host_local_lock(acquire=volume.shared_target):
connector = os_brick.get_connector()
conn_info = attachment.update(connector).conn_info
os_brick.connect_volume(conn_info)
attachment.attach_complete()
with optional_host_local_lock(acquire=volume.shared_target):
os_brick.disconnect_volume(conn_info)
attachment.delete()
.. note::
We assume the detach and attach related calls to Cinder are synchronous so
there will be no races between os-brick operations on the host and cinder
operations on the backend. Any driver deviation from this pattern will be
considered a bug.
By default libvirt assumes all disks are exclusively used by a single guest.
If you want to share disks between instances, you need to tell libvirt
when configuring the guest XML for that disk via setting the 'shareable' flag
for the disk. This means that the hypervisor will not try to take an exclusive
lock on the disk, that all I/O caching is disabled, and any SELinux labeling
allows use by all domains.
Nova needs to set this 'shareable' flag for the multi-attach volumes (where the
'multattach' flag is set to True) for every single attachment. This spec will
only enable this feature for libvirt, all other drivers should reject attach
calls to multi-attach volumes, until that driver adds support to this
functionality. The information is stored among the virt driver capabilities
dict in the base ComputeDriver where support multi-attach will be True for
Libvirt and for all other virt drivers this capability is disabled. To
introduce the usage of the flag we will also need to bump the minimum compute
version.
The following policy rules will be added to Cinder:
* Enable/Disable multiattach=True
* Enable/Disable multiattach=True + bootable=True
Nova should reject the attach request in case the hypervisor does not support
it, but with the current API it is not possible. This can be solved in part
with the policy rules above. For example, if you're running a cloud with
computes that don't support multiattach, let's say it's all vmware, then the
operator can configure policy to disable multiattach volumes on the cinder
side. If you've got a mixed hypervisor cloud and the user tries to attach a
multiattach volume to an instance on a compute where the virt driver doesn't
support multiattach, then the attach request fails on the compute and
nova-compute calls attachment_delete to delete the attachment created in
nova-api's attach_volume code. If nova-api exposed backend compute driver
capabilities then we could check and fail fast in the API, but nova doesn't
have that yet so we're just left with policy rules and checks on the backend.
Alternatives
------------
For the use case described above the failover scenario can be handled by
attaching the volume to the passive/standby instance. This means that the
standby instance is not a hot standby anymore as the volume attachment
requires time, which means that the new primary instance is without volume
for the time of re-attaching, which can vary in the sense of marking the
volume free after the failure of the primary instance.
Another alternative is to clone a volume and attach the clone to the second
instance. The downside to this is any changes to the original volume don't
show up in the mounted clone so this is only a viable alternative if the
volume is read-only.
Data model impact
-----------------
None
REST API impact
---------------
There are features of the Nova API that has to be handled by care or disabled
completely for now for volumes that support multi-attach.
The create call in the 'os-assisted-volume-snapshot' API calls the
'volume_snapshot_create' where we don't have the instance_uuid to retrieve the
right BDM, therefore we need to disable this call for multi-attach. The API
format for this request is not changed, it is only a protection until the
required API changes to support this request with multi-attach.
Another feature that needs further investigation is 'boot from volume' (BFV).
The first aspect of the feature is the 'delete_on_termination' flag, which will
be allowed to use along with multi-attach, no changes are necessary when the
volume provided has multiattach=True and the delete_on_termination=True flag is
passed in for BFV. When this flag is set to True it is intended to remove the
volume that is attached to the instance when it is deleted. This option does
not cause problem as Cinder takes care of not deleting a volume if it still
has active attachments. Nova will receive an error from Cinder that the volume
deletion failed, which will then be logged [#]_ and also in the API on
'_local_delete' [#]_, but will not affect the instance termination process.
The second aspect of BFV is the boot process. In this case Nova only checks the
'bootable' flag. The policy check happens on the Cinder side on allowing it
together with multiattach or not.
For cases, where Nova creates the volume itself, i.e. source_type is
blank/image/snapshot, it should not enable multi-attach for the volume, i.e. no
change to the existing code for now.
When we attach a volume at boot time (BFV with source=volume,dest=volume)
scheduling will fail in case of selecting computes that do not support
multi-attach. Later on we can add a new scheduler filter to avoid the failure.
The filter would check the compute capabilities. This step is considered
to be a future improvement.
When we enable the feature we will have a 'multiattach' policy to enable or
disable the operation entirely on the Cinder side as noted above. Read/Only
policy is a future work item and out of the scope of this spec.
.. todo:: Whether or not a new compute API microversion is needed will be
determined during implementation and code review. API users will need
some way to discover if they can perform volume multiattach and a
microversion might be the signal, but it is unclear if Nova would block
those requests on a lower microversion, e.g. 2.1. It probably makes sense
to do a microversion like 2.49 for tagged attach capabilities.
Security impact
---------------
In the libvirt driver, the disk is given a shared SELinux label,
and so that disk has no longer strong sVirt SELinux isolation.
The OpenStack volume encryption capability is supposed to work out of the
box with this use case also, it should not break how the encryptor works
below the clustered file system, by using the same key for all connections.
The attachment of an encrypted volume to multiple instances should be
tested in Tempest to see if there is any unexpected issue with it.
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
None
Developer impact
----------------
None
Implementation
==============
Based on the work from Walter Boring and Charlie Zhou.
Agreed with Walter to start the work again.
Assignee(s)
-----------
Primary assignee:
ildiko-vancsa
Work Items
----------
1. Update libvirt driver to generate proper domain XML for instances with
multi-attach volumes
2. Provide the necessary checks in the Nova API to block the operation in the
above listed cases
3. Add Tempest test cases and documentation
Dependencies
============
* This requires the version 3.2.0 or above of the python-cinderclient.
Corresponding blueprint:
https://blueprints.launchpad.net/python-cinderclient/+spec/multi-attach-volume
* Corresponding, implemented spec in Cinder:
https://blueprints.launchpad.net/cinder/+spec/multi-attach-volume
* Link needed to Cinder spec to address detach issues currently captured here:
https://etherpad.openstack.org/p/cinder-nova-api-changes
Testing
=======
We'll have to add new Tempest tests to support the new Cinder volume
multiattach flag. The new cinder multiattach flag is what allows a volume to be
attached more than once. For instance the following scenarios will need to be
tested:
* Attach the same volume to two instances.
* Boot from volume with multiattach
* Encrypted volume with multiattach
* Boot from multi-attachable volume with boot_index=0
* Negative testing:
* Tying to attach a non-multiattach volume to multiple instances
Additionally to the above, Cinder migrate needs to be tested on the gate, as it
triggres swap_volume in Nova.
Documentation Impact
====================
We will have to update the documentations to discuss the new ability to
attach a volume to multiple instances if the cinder multiattach flag is set
on a volume. It is also need to be added to the documentation that the volume
creation for these types of volumes will not be supported by the API due to
the deprecation of the volume creation Nova API. If a volume needs to allow
multiple volume attachments it has to be created on the Cinder side with
the needed properties specified.
It also needs to be outlined in the documentation that attaching a volume
multiple times in read-write mode can cause data corruption, if not handled
correctly. It is the users' responsibility to add some type of exclusion
(at the file system or network file system layer) to prevent multiple writers
from corrupting the data. Examples should be provided if available to guide
users on how to do this.
References
==========
* This is the cinder wiki page that discusses the approach to multi-attach
https://wiki.openstack.org/wiki/Cinder/blueprints/multi-attach-volume
* Queens PTG etherpad:
https://etherpad.openstack.org/p/cinder-ptg-queens-thursday-notes
.. [#] https://review.openstack.org/#/c/509005/
.. [#] http://lists.openstack.org/pipermail/openstack-dev/2016-May/094089.html
.. [#] https://github.com/openstack/nova/blob/295224c41e7da07c5ddbdafc72ac5abf2d708c69/nova/compute/manager.py#L2369
.. [#] https://github.com/openstack/nova/blob/295224c41e7da07c5ddbdafc72ac5abf2d708c69/nova/compute/api.py#L1834
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Kilo
- Introduced
* - Liberty
- Re-approved
* - Mitaka-1
- Re-approved
* - Mitaka-2
- Updated with API limitations and testing scenarios
* - Newton
- Re-approved
* - Queens
- Re-proposed