Add spec to use cinder's new attachment API

Cinder has added their new attachment API in Ocata.
This is the spec to describe how Nova can adopt that new API.

The aim is to have a solid base on which we can safely implement support
for Cinder multi-attach.

Previously-approved: Pike

blueprint cinder-new-attach-apis

Change-Id: Ic1f0f1f69bac68f6441f0e6dce103a5bee52524c
This commit is contained in:
Ildiko Vancsa
2017-08-24 20:44:31 +02:00
committed by Matt Riedemann
parent 2c3c027526
commit e64a344edc

View File

@@ -0,0 +1,462 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
===================================
Use Cinder's new Attach/Detach APIs
===================================
https://blueprints.launchpad.net/nova/+spec/cinder-new-attach-apis
Make Nova use Cinder's new attach/dettach APIs.
Problem description
===================
In attempting to implement Cinder multi-attach and trying to get live
migration working with all drivers, it has become clear Cinder and Nova
interaction is not well understood, and that is leading to both bugs
and issues when trying to evolve the interaction between the two projects.
Lets create a new clean interface between Nova and Cinder.
You can see details on the new Cinder API here:
http://specs.openstack.org/openstack/cinder-specs/specs/ocata/add-new-attach-apis.html
Use Cases
---------
The main API actions to consider are:
* Attach a volume to an instance, including during spawning an instance,
and calling os-brick to (optionally) connect the volume backend to the
hypervisor.
The connect is optional because when there is a shared connection from the
host to the volume backend, the backend may already be attached.
* Detach volume from an instance, including (optionally) calling os-brick to
disconnect the volume from the hypervisor host.
* Live-migrate an instance, involves setting up the volume connection on the
destination host, before kicking off the live-migrate, then removing source
host connections once the live-migrate has completed. If there is a rollback
the destination host connection is removed.
* Migrate and resize are very similar to live-migrate, from this new view of
the world.
* Evacuate, we know the old host is no longer running, and we need to attach
the volume to a new host.
* Shelve, we want the volume to stay logically attached to the instance, but
we also need to detach it from the host when the instance is offloaded.
* For shelved-offloaded case the volume is in a 'reserved' state and not
physically attached
* Attach/Detach a volume to/from a shelved instance
* Use swap volume to migrate a volume between two different Cinder backends.
In particular, please note:
* Volume attachment is specific to a host uuid, instance uuid, and volume uuid
* You can have multiple attachments to the same volume, to different instances
(on the same host or different hosts), when the volume is marked
multi_attach=True
* For the same instance uuid and volume uuid, you can have connections on two
different hosts, even when multi_attach=False. This is generally used when
moving a VM.
* Volume connections on a host can be shared with other volumes that are
connected to the same volume backend, depending on the chosen driver.
As such, need to take care when removing that connection, and not adding two
connections by mistake and not removing an in use connection too early.
Cinder needs to provide extra information to Nova, in particular, for each
attachment, if the connection is shared, and if so, who that connection is
currently shared with.
Proposed change
===============
Cinder now has two different API flows for attach/detach. We need a way to
switch from the old API to the new API without affecting any existing
instances.
Firstly, we need to decide when it is safe to use the new API. We need to have
the Cinder v3 API configured, and that endpoint should have the micro-version
v3.44 available. In addition we should only use the new API when all of the
nova-compute nodes have been upgraded. We can detect that by looking up the
minimum service version relating to when we add the support for the new
Cinder API. Note, this means we will need to increment the service version so
we can explicitly detect the support for the new Cinder API.
If we allow the use of the new API, we can use that for all new attachments.
When adding a new attachment we:
* (api) call attachment_create, with no connector, before API call returns.
BDM record is updated with attachment_id.
Note, if the volume is not multi_attach=True, it will only allow one
instance_uuid to be associated with each volume. While the long term aim
is to enable multi-attach, this spec will not attach to any volume that has
multi-attach=True. While we could still make a single attachment to the
volume, as we rely on cinder to restrict the number of attachments to the
volume, for safety we shouldn't allow any attachments if multi_attach=True
until we have that support fully implemented in Nova.
* (compute) get connector info and use that to call attachment_update.
The API now returns with all the information that needs to be given to
os-brick to attach the volume backend, and how to attach the VM to that
connection to the volume backend.
* (compute) Before we can actually connect to the volume we need to wait for
the volume to be ready and fully provisioned. If we timeout waiting for the
volume to be ready, we fail here and delete the attachment. If this is the
first boot of the instance, that will put the instance into the ERROR state.
If the volume is ready, we can continue with the attach process.
* (compute) use os-brick to connect to the volume backend.
If there are any errors, attempt to call os-brick disconnect
(to double check it is fully cleaned up) and then remove the attachment
in Cinder. If there are any issues in the rollback, put instance into the
ERROR state.
* (compute) now the backend is connected, and the volume is ready, we can
attach the backend connection to the VM in the usual way.
* (compute) we call attachment_complete to mark the attachment and volume
'attached' when all the above operations are successfully completed.
For a detach:
* (compute) if attachment_id is set in the BDM, we use the new detach flow,
otherwise we fall back to the old detach flow. The new flow is...
* (api) usual checks to see if request is valid
* (compute) detach volume from VM, if fails stop request here
* (compute) call os-brick to disconnect from the volume backend
* (compute) if success, attachment_remove is called.
If there was an error, we add an instance fault
and set the instance into the error state.
As above, we can use the presence of the attachment_id in the BDM to decide
if the attachment was made using the new or old flow. Long term we want to
migrate all existing attachments to a new style attachment, but this is left
for a later spec.
Live-migrate
------------
During live-migration, we start the process by ensuring the volume is attached
on both the source and destination. When a volume is multi_attach=False, and
we are about to start live-migrating VM1, you get a situation like this ::
+-------------------+ +-------------------+
| | | |
| +------------+ | | +--------------+ |
| |VM1 (active)| | | |VM1 (inactive)| |
| +---+--------+ | | +--+-----------+ |
| | | | | |
| | Host 1 | | | Host 2 |
+-------------------+ +-------------------+
| |
+-----------+----------+
|
|
+---------------------------+
| | |
| +---------+---------+ |
| | VolA | |
| +-------------------+ |
| |
| Cinder Backend 1 |
| |
+---------------------------+
Note, in cinder we end up with two attachments for this multi_attach=False
volume:
* attachment 1: VolA, VM1, Host 1
* attachment 2: VolA, VM1, Host 2
Logically we have two attachments to the one non-multi-attach volume. Both
attachments are related to vm1, but there is an attachment for both the
source and destination host for the duration of the live-migration.
Note both attachments are associated with the same instance uuid,
which is why the two attachments are allowed even though multi_attach=False.
Should the live-migration succeed, we will delete attachment 1 (i.e. source
host attachment, host 1) and we are left with just attachment 2
(i.e. destination host attachment, host 2). If there are any failures with
os-brick disconnect on the source host, we put the instance into the ERROR
state and don't delete the attachment in Cinder. We do this to signal to the
operator that something needs manually fixing. We also put the migration into
the error state, as we would even if a failure had a clean rollback.
If we have any failures in the live-migration such that the instance is still
running on host 1, we do the opposite of the above. We attempt os-brick
disconnect on host 2. If success we delete attachment 2, otherwise put the
instance into the ERROR state. If the rollback succeeds we are back to one
attachment again, but in this case its attachment 1.
So for volumes that have an attachment_id in their BDM, we follow this new
flow of API calls Cinder:
* (destination) get connector, and create new attachment
* (destination) attach the volume backend
* (source) kicks off live-migration
If live-migration succeeds:
* (source) call os-brick to disconnect
* (source) if success, delete the attachment, otherwise put the
instance into an ERROR state
If live-migration rolls back due to an abort or similar:
* (destination) call os-brick to disconnect
* (destination) if success, delete the attachment, otherwise put the
instance into an ERROR state
Migrate
-------
Similar to live-migrate, at the start of the migration we have attachments
for both the source and destination node. On calling confirm resize we do
a detach on source, a call to revert resize and its detach on destination.
Evacuate
--------
When you call evacuate, and there is a volume that has an attachment_id in its
BDM, we follow this new flow:
* (source) Nothing happens on the source, it is assumed the administrator
has already fenced the host, and confirmed that by calling force host down.
* (destination) Create a second attachment for this instance_uuid for
any attached volumes
* (destination) Follow the usual volume attach flow
* (destination) Now delete the old attachment to ensure Cinder cleans up any
resources relating to that connection. It is similar to how we call
terminate_connection today, except we must call this after creating the
new attachment to ensure the volume is always reserved to this instance
during the whole of the evacuate process.
* (operator) should the source host never be started, the instances that
have been evacuated are detected in the usual way (using the migration
record created when evacuate is called). This may leave some things not
cleaned up by os-brick, but that is fairly safe, and we are in a no worse
situation than we are today.
Shelve and Unshelve
-------------------
When a volume attached to an instance has an attachment_id in the BDM, we
follow this new flow of calls to the Cinder API.
Note: it is possible to have both old flow and new flow volumes attached to
the one instance that is getting shelved.
When offloading from an old host, we first add a new attachment (with no
connector set) then perform a disconnect of the old attachment in the
usual way. This ensures the volume is still attached to the instance,
but is safely detached from the host we are offloading from. Should that
detach fail, the instance should be moved into an ERROR state.
Similarly, when it comes to unshelve, we update the existing attachments
with the connector, before continuing with the usual attach volume flow.
Swap Volume
-----------
For swap volume, we have one host, one instance, one device path, but
multiple volumes.
In this section, we talk about what happens should the volume being swapped
have the attachment_id present in the BDM, and as such we follow the new flow.
Firstly, there is the flow when cinder calls our API, secondly when a
user calls our API. Both flows are covered here:
* The Nova swap volume API is called to swap uuid-old with uuid-new
* The new volume may have been created by the user in cinder, and the
user may have made the Nova API call.
* Alternatively, the user may have called Cinder's migrate volume API.
That means cinder has created the new volume, and calls the Nova API on
the user's behalf.
* (api) create new attachment for the volume uuid-new, fail API call if we
can't create that attachment
* (compute) update cinder attachment with connector for uuid-new
* (compute) os-brick connect the new volume. If there is an error we
deal with this like a failure during attach, and delete the
attachment to the new volume
* (compute) Nova copies content of volume uuid-old to volume uuid-new,
in libvirt this is via a rebase operation
* (compute) once the copy is complete, we detach uuid-old from instance
* (compute) update BDM so the attachment_id now points to the attachment
associated with uuid-new
* (compute) once the old volume is detached, we do an os-brick disconnect
* (compute) for a Nova initiated swap we don't call cinder's
migrate_volume_completion callback. We check the state of the volume in this
one case to ensure it's not 'retyping' or 'migrating'.
* (compute) Update the BDM with a new volume-uuid, based on what
migrate_volume_completion has returned (when called). Note if cinder called
swap, it will have deleted the old volume, but renamed the new volume to have
the same uuid as the old volume had. If someone called Nova, we get back
uuid-new, and we update the BDM to reflect the change.
* so on success we have created a new attachment to the new volume
and deleted the attachment to the old volume.
Note: it is assumed if a volume is multi-attach, the swap operation will fail
and not be allowed. That will be true in either the Cinder or Nova started
case. In time we will likely move to Cinder's migrate_volume_completion API
using attachment_ids instead of volume ids. This spec does not look at what is
needed to support multi-attach, but this problem seemed worth noting here.
Alternatives
------------
We could struggle on fixing bugs in a "whack a mole" way.
There are several ways we should structure the API interactions. One of the
key alternatives is to add lots of state machine complexity into the API so
the shared connection related locking is handled by Cinder in the API layer.
While it makes the clients more complex, it seemed simpler for Nova and other
clients to do the locking discussed above.
Nova could look up the attachment uuid rather than store it in the BDM, there
is a period where the host uuid is not set, so it seems safer to store the
attachment uuid to stop any possible confusion around which attachment is
associated to each BDM.
During live-migration we could store the additional attachment_ids in the
migrate data, rather than as part of the BDM.
We could continue to save the connection_info in the BDM to be used when we
detach the volume. While seems like it might help avoid issues with changes
in the connection info that Nova hasn't been notified of, this is really a
premature optimization. We should instead work with Cinder and os-brick to
properly fix any such interaction problems in a way that helps all systems
that work with Cinder.
Data model impact
-----------------
When using the new API flow, we no longer need to store the connection_info,
as we don't need to pass that back to Cinder. Instead we just store the
attachment_id for each host the volume is attached to, and any time we need
the connection_info we fetch that from Cinder.
When an attachment_id is populated, we use the new flow to do all attach or
detach operations. When not present, we use the old flow.
REST API impact
---------------
No changes to Nova's REST API.
Security impact
---------------
Nova no longer needs to store the volume connection information, however it is
now available at any time from the Cinder API.
Notifications impact
--------------------
None.
Other end user impact
---------------------
None.
Performance Impact
------------------
There should be no impact to performance. The focus here is stability across
all drivers. There may slightly more API calls between Nova and Cinder, but it
is not expected to be significantly impact performance.
Other deployer impact
---------------------
To use this more stable API interaction, and the new features that will depend
on this effort, must upgrade Cinder to a version that supports the new API.
It is expected we will drop support for older versions of Cinder within
two release cycles of this work being completed.
Developer impact
----------------
Nova and Cinder interactions should be better understood.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Ildiko Vancsa
Other contributors:
Matt Riedemann
John Griffith
Steve Noyes
Work Items
----------
To make progress in the previous and this cycle we needed to split this work
into small patches. The overall strategy is that we implement new style attach
last, and all the other operations depend on the attachment_id being in the
BDM, that will not be true until the attach code is merged.
* use Cinder v3 API
* detect if the microversion that includes the new BDM support is present
* detach a new style BDM/volume attach - Merged in Pike
* reboot / rebuild (get connection info from cinder using attachment_id)
* live-migration
* migration
* evacuate
* shelve and unshelve
* swap volume - Merged in Pike
* attach (this means we now expose all the previous features)
Note there are more steps before we can support multi-attach, but these are
left for future specs:
* migrate old BDMs to the new BDM flow
* add explicit support for shared backend connections
Dependencies
============
Depends on the Cinder work to add the new API.
This was completed in Ocata.
Testing
=======
We need to functionally test both old and new Cinder interactions. A new case
was added to grenade that creates and attaches a volume to an instance before
the upgrade, and detaches it after the upgrade. There is also an addition in
Tempest to check the volume attachments after live migration. Beyond this unit
and functional tests are added in Nova to reach proper test coverage for the
new flow.
Documentation Impact
====================
We need to add good developer documentation around the updated
Nova and Cinder interactions.
References
==========
* Cinder API spec:
http://specs.openstack.org/openstack/cinder-specs/specs/ocata/add-new-attach-apis.html
* Merged and open reviews:
https://review.openstack.org/#/q/topic:bp/cinder-new-attach-apis
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Pike
- Introduced
* - Queens
- Re-proposed