RBD instance snapshots
This spec covers using rbd snapshots for instance snapshots rather than copies to local disk and back into rbd. blueprint rbd-instance-snapshots Change-Id: Ia3666d53e663eacf8c65dbffbd4bc847dd948171
This commit is contained in:
committed by
Chet Burgess
parent
3bf0d79755
commit
b5a895eb81
281
specs/mitaka/approved/rbd-instance-snapshots.rst
Normal file
281
specs/mitaka/approved/rbd-instance-snapshots.rst
Normal file
@@ -0,0 +1,281 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
======================
|
||||
RBD Instance Snapshots
|
||||
======================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/rbd-instance-snapshots
|
||||
|
||||
When using RBD as storage for glance and nova, instance snapshots are
|
||||
slow and inefficient, resulting in poor end user experience. Using
|
||||
local disk for the upload increases operator costs for supporting
|
||||
instance snapshots.
|
||||
|
||||
As background reading the follow link provides an overview of the
|
||||
snapshotting capabilities available in ceph.
|
||||
|
||||
http://docs.ceph.com/docs/master/rbd/rbd-snapshot/
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
RBD is often used to back glance images and nova disks. When using rbd
|
||||
for nova's disks, nova 'snapshots' are slow, since they create full
|
||||
copies by downloading data from rbd to a local file, uploading it to
|
||||
glance, and putting it back into rbd. Since raw images are normally
|
||||
used with rbd to enable copy-on-write clones, this process removes any
|
||||
sparseness in the data uploaded to glance. This is a problem of user
|
||||
experience, since this slow, inefficient process takes much longer
|
||||
than necessary to let users customize images.
|
||||
|
||||
For operators, this is also a problem of efficiency and cost. For
|
||||
rbd-backed nova deployments, this is the last part that uses
|
||||
significant local disk space.
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
This allows end users to quickly iterate on images, for example to
|
||||
customize or update them, and start using the snapshots far more
|
||||
quickly.
|
||||
|
||||
For operators, this eliminates any need for large local disks on
|
||||
compute nodes, since instance data in rbd stays in rbd. It also
|
||||
prevents lots of wasted space.
|
||||
|
||||
Project Priority
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Instead of copying all the data to local disk, keep it in RBD by
|
||||
taking an RBD snapshot in Nova and cloning it into Glance. Rather
|
||||
than uploading the data, just tell Glance about its location in
|
||||
RBD. This way data stays in the Ceph cluster, and the snapshot is
|
||||
far more rapidly usable by the end user.
|
||||
|
||||
In broad strokes, the workflow is as follows:
|
||||
|
||||
1. Create an RBD snapshot of the ephemeral disk via Nova in
|
||||
the ceph pool Nova is configured to use.
|
||||
|
||||
2. Clone the RBD snapshot into Glance's RBD pool. [7]
|
||||
|
||||
3. To keep from having to manage dependencies between snapshots
|
||||
and clones, deep-flatten the RBD clone in Glance's RBD pool and
|
||||
detach it from the Nova RBD snapshot in ceph. [7]
|
||||
|
||||
5. Remove the RBD snapshot from ceph created in (1) as it is no
|
||||
longer needed.
|
||||
|
||||
6. Update Glance with the location of the RBD clone created and
|
||||
flattend in (2) and (3).
|
||||
|
||||
This is the reverse of how images are cloned into nova instance disks
|
||||
when both are on rbd [0].
|
||||
|
||||
If any of these steps fail, clean up any partial state and fall back
|
||||
to the current full copy method. Failure of the RBD snapshot method
|
||||
will be quick and usually transient in nature. The cloud admin can
|
||||
monitor for these failures and address the underlying CEPH issues
|
||||
causing the RBD snapshot to fail.
|
||||
|
||||
Failures will be reported in the form of stack traces in the nova
|
||||
compute logs.
|
||||
|
||||
There are a few reasons for falling back to full copies instead of
|
||||
bailing out if efficient snapshots fail:
|
||||
|
||||
* It makes upgrades graceful, since nova snapshots still work
|
||||
before glance has enough permissions for efficient snapshots
|
||||
(see Security Impact for glance permission details).
|
||||
|
||||
* Nova snapshots still work when efficient snapshots are not
|
||||
possible due to architecture choices, such as not using rbd as
|
||||
a glance backend, or using different ceph clusters for glance
|
||||
and nova.
|
||||
|
||||
* This is consistent with existing rbd behavior in nova and cinder.
|
||||
If cloning from a glance image fails, both projects fall back
|
||||
to full copies when creating volumes or instance disks.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
The clone flatten step could be handled as a background task in a
|
||||
green thread, or completely asynchronously as a periodic task. This
|
||||
would increase user-facing performance, as the snapshots would be
|
||||
available for use immediately, but it would also introduce
|
||||
race-condition-like issues around deleting dependent images.
|
||||
|
||||
The flatten step could be omitted completely, and glance could be
|
||||
made responsible for tracking the various image dependencies. At
|
||||
the rbd level, an instance snapshot would consist of three things
|
||||
for each disk. This is true of any instance, regardless of whether
|
||||
it was created from a snapshot itself, or is just created from a
|
||||
usual image. In rbd, there would be:
|
||||
|
||||
1. a snapshot of the instance disk
|
||||
|
||||
2. a clone of the instance disk
|
||||
|
||||
3. a snapshot of the clone
|
||||
|
||||
(3) is exposed through glance's backend location.
|
||||
(2) is an internal detail of glance.
|
||||
(1) is an internal detail that nova and glance handle.
|
||||
|
||||
At the rbd level, a disk with snapshots can't be deleted. Hide this
|
||||
from the user if they delete an instance with snapshots by making
|
||||
glance responsible for their eventual deletion, once their dependent
|
||||
snapshots are deleted. Nova does this by renaming instance disks that
|
||||
it deletes in rbd, so glance is aware that they can be deleted.
|
||||
|
||||
When a glance snapshot is deleted, it deletes (3), then (2), and
|
||||
(1). If nova has renamed its parent in rbd with a preset suffix, the
|
||||
instance has been destroyed already, so glance tries to delete the
|
||||
original instance disk. The original instance disk will be
|
||||
successfully deleted when the last snapshot is removed.
|
||||
|
||||
If glance snapshots are created but deleted before the instance is
|
||||
destroyed, nova will delete the instance disks as usual.
|
||||
|
||||
The mechanism nova uses to let glance know it needs to clean up the
|
||||
original disk could be different. It could use an image property with
|
||||
certain restrictions which aren't possible in the current glance api:
|
||||
|
||||
* it must be writeable only once
|
||||
|
||||
* to avoid exposing backend details, it would need to be hidden
|
||||
from end users
|
||||
|
||||
Storing this state in ceph is much easier to keep consistent with
|
||||
ceph, rather than an external database which could become out of sync.
|
||||
It would also be an odd abstraction leak in the glance_store api, when
|
||||
upper layers don't need to be aware of it at all.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
Glance will need to be configured with direct_url support enabled
|
||||
in order for Nova to determine what and where to clone the image
|
||||
from, depending on system configurations, this could leak backend
|
||||
credentials [5]. Devstack has already been updated to switch
|
||||
behaviors when Ceph support is requested [6].
|
||||
|
||||
Documentation has typically recommended using different ceph pools
|
||||
for glance and nova, with different access to each. Since nova
|
||||
would need to be able to create the snapshot in the pool used by
|
||||
glance, it would need write access to this pool as well.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Snapshots of RBD-backed instances would be significantly faster.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
Snapshots of RBD-backed instances would be significantly faster.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
To use this in an existing installation with authx, adding 'allow
|
||||
rwx pool=images' to nova's ceph user capabilities is necessary. The
|
||||
'ceph auth caps' command can be used for this [1]. If these permissions
|
||||
are not updated, nova will continue using the existing full copy
|
||||
mechanism for instance snapshots because the the fast snapshot will fail
|
||||
and nova compute will fall back to the full copy method.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
nic
|
||||
|
||||
Other contributors:
|
||||
jdurgin
|
||||
pbrady
|
||||
nagyz
|
||||
cfb-n/cburgess
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
Implementation: [4]
|
||||
|
||||
The libvirt imagebackend does not currently recognize AMI images
|
||||
as raw (and therefore cloneable) for whatever reason, so this
|
||||
proposed change is of limited utility with a very popular image
|
||||
format. This should be addressed in a separate change.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
You need a Havana or newer version of glance as direct URL was added in
|
||||
Havana.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
The existing tempest tests with ceph in the gate cover instance
|
||||
snapshots generically. As fast snapshots are enabled automatically, there
|
||||
is no need to change the tempest tests. Additionally, unit tests in nova
|
||||
will verify error handling (falling back to full copies if the process
|
||||
fails), and make sure that when configured correctly rbd snapshots and
|
||||
clones are used rather than full copies.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
See the security and other deployer impact sections above.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[0] http://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/rbd-clone-image-handler.html
|
||||
|
||||
[1] Ceph authentication docs: http://ceph.com/docs/master/rados/operations/user-management/#modify-user-capabilities
|
||||
|
||||
[2] Alternative: Glance cleanup patch: https://review.openstack.org/127397
|
||||
|
||||
[3] Alternative: Nova patch: https://review.openstack.org/125963
|
||||
|
||||
[4] Nova patch: https://review.openstack.org/205282
|
||||
|
||||
[5] https://bugs.launchpad.net/glance/+bug/880910
|
||||
|
||||
[6] https://review.openstack.org/206039
|
||||
|
||||
[7] http://docs.ceph.com/docs/master/dev/rbd-layering/
|
||||
Reference in New Issue
Block a user