RBD instance snapshots

This spec covers using rbd snapshots for instance snapshots rather than copies to local disk and back into rbd. blueprint rbd-instance-snapshots Change-Id: Ia3666d53e663eacf8c65dbffbd4bc847dd948171
2015-06-03 19:57:23 -07:00
parent 3bf0d79755
commit b5a895eb81
1 changed files with 281 additions and 0 deletions
--- a/specs/mitaka/approved/rbd-instance-snapshots.rst
+++ b/specs/mitaka/approved/rbd-instance-snapshots.rst
@@ -0,0 +1,281 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+======================
+RBD Instance Snapshots
+======================
+
+https://blueprints.launchpad.net/nova/+spec/rbd-instance-snapshots
+
+When using RBD as storage for glance and nova, instance snapshots are
+slow and inefficient, resulting in poor end user experience. Using
+local disk for the upload increases operator costs for supporting
+instance snapshots.
+
+As background reading the follow link provides an overview of the
+snapshotting capabilities available in ceph.
+
+http://docs.ceph.com/docs/master/rbd/rbd-snapshot/
+
+Problem description
+===================
+
+RBD is often used to back glance images and nova disks. When using rbd
+for nova's disks, nova 'snapshots' are slow, since they create full
+copies by downloading data from rbd to a local file, uploading it to
+glance, and putting it back into rbd. Since raw images are normally
+used with rbd to enable copy-on-write clones, this process removes any
+sparseness in the data uploaded to glance. This is a problem of user
+experience, since this slow, inefficient process takes much longer
+than necessary to let users customize images.
+
+For operators, this is also a problem of efficiency and cost. For
+rbd-backed nova deployments, this is the last part that uses
+significant local disk space.
+
+Use Cases
+----------
+
+This allows end users to quickly iterate on images, for example to
+customize or update them, and start using the snapshots far more
+quickly.
+
+For operators, this eliminates any need for large local disks on
+compute nodes, since instance data in rbd stays in rbd. It also
+prevents lots of wasted space.
+
+Project Priority
+-----------------
+
+None
+
+Proposed change
+===============
+
+Instead of copying all the data to local disk, keep it in RBD by
+taking an RBD snapshot in Nova and cloning it into Glance.  Rather
+than uploading the data, just tell Glance about its location in
+RBD. This way data stays in the Ceph cluster, and the snapshot is
+far more rapidly usable by the end user.
+
+In broad strokes, the workflow is as follows:
+
+  1. Create an RBD snapshot of the ephemeral disk via Nova in
+     the ceph pool Nova is configured to use.
+
+  2. Clone the RBD snapshot into Glance's RBD pool. [7]
+
+  3. To keep from having to manage dependencies between snapshots
+     and clones, deep-flatten the RBD clone in Glance's RBD pool and
+     detach it from the Nova RBD snapshot in ceph. [7]
+
+  5. Remove the RBD snapshot from ceph created in (1) as it is no
+     longer needed.
+
+  6. Update Glance with the location of the RBD clone created and
+     flattend in (2) and (3).
+
+This is the reverse of how images are cloned into nova instance disks
+when both are on rbd [0].
+
+If any of these steps fail, clean up any partial state and fall back
+to the current full copy method. Failure of the RBD snapshot method
+will be quick and usually transient in nature. The cloud admin can
+monitor for these failures and address the underlying CEPH issues
+causing the RBD snapshot to fail.
+
+Failures will be reported in the form of stack traces in the nova
+compute logs.
+
+There are a few reasons for falling back to full copies instead of
+bailing out if efficient snapshots fail:
+
+  * It makes upgrades graceful, since nova snapshots still work
+    before glance has enough permissions for efficient snapshots
+    (see Security Impact for glance permission details).
+
+  * Nova snapshots still work when efficient snapshots are not
+    possible due to architecture choices, such as not using rbd as
+    a glance backend, or using different ceph clusters for glance
+    and nova.
+
+  * This is consistent with existing rbd behavior in nova and cinder.
+    If cloning from a glance image fails, both projects fall back
+    to full copies when creating volumes or instance disks.
+
+Alternatives
+------------
+
+The clone flatten step could be handled as a background task in a
+green thread, or completely asynchronously as a periodic task.  This
+would increase user-facing performance, as the snapshots would be
+available for use immediately, but it would also introduce
+race-condition-like issues around deleting dependent images.
+
+The flatten step could be omitted completely, and glance could be
+made responsible for tracking the various image dependencies.  At
+the rbd level, an instance snapshot would consist of three things
+for each disk. This is true of any instance, regardless of whether
+it was created from a snapshot itself, or is just created from a
+usual image. In rbd, there would be:
+
+  1. a snapshot of the instance disk
+
+  2. a clone of the instance disk
+
+  3. a snapshot of the clone
+
+  (3) is exposed through glance's backend location.
+  (2) is an internal detail of glance.
+  (1) is an internal detail that nova and glance handle.
+
+At the rbd level, a disk with snapshots can't be deleted. Hide this
+from the user if they delete an instance with snapshots by making
+glance responsible for their eventual deletion, once their dependent
+snapshots are deleted. Nova does this by renaming instance disks that
+it deletes in rbd, so glance is aware that they can be deleted.
+
+When a glance snapshot is deleted, it deletes (3), then (2), and
+(1). If nova has renamed its parent in rbd with a preset suffix, the
+instance has been destroyed already, so glance tries to delete the
+original instance disk. The original instance disk will be
+successfully deleted when the last snapshot is removed.
+
+If glance snapshots are created but deleted before the instance is
+destroyed, nova will delete the instance disks as usual.
+
+The mechanism nova uses to let glance know it needs to clean up the
+original disk could be different. It could use an image property with
+certain restrictions which aren't possible in the current glance api:
+
+  * it must be writeable only once
+
+  * to avoid exposing backend details, it would need to be hidden
+    from end users
+
+Storing this state in ceph is much easier to keep consistent with
+ceph, rather than an external database which could become out of sync.
+It would also be an odd abstraction leak in the glance_store api, when
+upper layers don't need to be aware of it at all.
+
+Data model impact
+-----------------
+
+None
+
+REST API impact
+---------------
+
+None
+
+Security impact
+---------------
+
+Glance will need to be configured with direct_url support enabled
+in order for Nova to determine what and where to clone the image
+from, depending on system configurations, this could leak backend
+credentials [5].  Devstack has already been updated to switch
+behaviors when Ceph support is requested [6].
+
+Documentation has typically recommended using different ceph pools
+for glance and nova, with different access to each. Since nova
+would need to be able to create the snapshot in the pool used by
+glance, it would need write access to this pool as well.
+
+Notifications impact
+--------------------
+
+None
+
+Performance Impact
+------------------
+
+Snapshots of RBD-backed instances would be significantly faster.
+
+Other end user impact
+---------------------
+
+Snapshots of RBD-backed instances would be significantly faster.
+
+Other deployer impact
+---------------------
+
+To use this in an existing installation with authx, adding 'allow
+rwx pool=images' to nova's ceph user capabilities is necessary. The
+'ceph auth caps' command can be used for this [1]. If these permissions
+are not updated, nova will continue using the existing full copy
+mechanism for instance snapshots because the the fast snapshot will fail
+and nova compute will fall back to the full copy method.
+
+Developer impact
+----------------
+
+None
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  nic
+
+Other contributors:
+  jdurgin
+  pbrady
+  nagyz
+  cfb-n/cburgess
+
+Work Items
+----------
+
+Implementation: [4]
+
+The libvirt imagebackend does not currently recognize AMI images
+as raw (and therefore cloneable) for whatever reason, so this
+proposed change is of limited utility with a very popular image
+format.  This should be addressed in a separate change.
+
+Dependencies
+============
+
+You need a Havana or newer version of glance as direct URL was added in
+Havana.
+
+Testing
+=======
+
+The existing tempest tests with ceph in the gate cover instance
+snapshots generically. As fast snapshots are enabled automatically, there
+is no need to change the tempest tests. Additionally, unit tests in nova
+will verify error handling (falling back to full copies if the process
+fails), and make sure that when configured correctly rbd snapshots and
+clones are used rather than full copies.
+
+Documentation Impact
+====================
+
+See the security and other deployer impact sections above.
+
+References
+==========
+
+[0] http://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/rbd-clone-image-handler.html
+
+[1] Ceph authentication docs: http://ceph.com/docs/master/rados/operations/user-management/#modify-user-capabilities
+
+[2] Alternative: Glance cleanup patch: https://review.openstack.org/127397
+
+[3] Alternative: Nova patch: https://review.openstack.org/125963
+
+[4] Nova patch: https://review.openstack.org/205282
+
+[5] https://bugs.launchpad.net/glance/+bug/880910
+
+[6] https://review.openstack.org/206039
+
+[7] http://docs.ceph.com/docs/master/dev/rbd-layering/