Copy-on-write cloning for RBD-backed disks

Currently RBD-backed ephemeral disks are created by downloading an image from Glance to a local file, then uploading that file into RBD. Even if the file is cached, uploading may take a long time, since 'rbd import' is synchronous and slow. If the image is already stored in RBD by Glance, there's no need for any local copies - it can be cloned to a new image for a new disk without copying the data at all. blueprint rbd-clone-image-handler Change-Id: Ie71699e845e14251233efc993a690713f1118c31
2014-04-30 16:13:52 -07:00
parent 608fc7f09a
commit 8ff7ded755
1 changed files with 222 additions and 0 deletions
--- a/specs/juno/rbd-clone-image-handler.rst
+++ b/specs/juno/rbd-clone-image-handler.rst
@@ -0,0 +1,222 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+===================================================
+Storage: Copy-on-write cloning for RBD-backed disks
+===================================================
+
+https://blueprints.launchpad.net/nova/+spec/rbd-clone-image-handler
+
+Currently RBD-backed ephemeral disks are created by downloading an image from
+Glance to a local file, then uploading that file into RBD. Even if the file is
+cached, uploading may take a long time, since 'rbd import' is synchronous and
+slow. If the image is already stored in RBD by Glance, there's no need for any
+local copies - it can be cloned to a new image for a new disk without copying
+the data at all.
+
+
+Problem description
+===================
+
+The primary use case that benefits from this change is launching an instance
+from a Glance image where Ceph RBD backend is enabled for both Glance and Nova,
+and Glance images are stored in RBD in RAW format.
+
+Following problems are addressed:
+
+* Disk space on compute nodes is wasted by caching an additional copy of the
+  image on each compute node that runs instances from that image.
+
+* Disk space in Ceph is wasted by uploading a full copy of an image instead of
+  creating a copy-on-write clone.
+
+* Network capacity is wasted by downloading the image from RBD to a compute
+  node the first time that node launches an instance from that image, and by
+  uploading the image to RBD every time a new instance is launched from the
+  same image.
+
+* Increased time required to launch an instance reduces elasticity of the cloud
+  environment and increases the number of in-flight operations that have to be
+  maintained by Nova.
+
+
+Proposed change
+===============
+
+Extract RBD specific utility code into a new file, align its structure and
+provided functionality in line with similar code in Cinder. This includes the
+volume cleanup code that should be converted from rbd CLI to using the RBD
+library.
+
+Add utility functions to support cloning, including checks whether image exists
+and whether it can be cloned.
+
+Add direct_fetch() method to nova.virt.libvirt.imagebackend, make its
+implementation in the Rbd subclass try to clone the image when possible.
+Following criteria are used to determine that the image can be cloned:
+
+* Image location uses the rbd:// schema and contains a valid reference to an
+  RBD snapshot;
+
+* Image location references the same Ceph cluster as Nova configuration;
+
+* Image disk format is 'raw';
+
+* RBD snapshot referenced by image location is accessible by Nova.
+
+Extend fetch_to_raw() in nova.virt.images to try direct_fetch() when a new
+optional backend parameter is passed. Make the libvirt driver pass the backend
+parameter.
+
+Instead of calling disk.get_disk_size() directly from verify_base_size(), which
+assumes the disk is stored locally, add a new method that is overridden by the
+Rbd subclass to get the disk size.
+
+Alternatives
+------------
+
+An alternative implementation based on the image-multiple-location blueprint
+(https://blueprints.launchpad.net/glance/+spec/multiple-image-locations) was
+tried in Icehouse. It was ultimately reverted, which can be attributed to a sum
+of multiple reasons:
+
+* The implementation in https://review.openstack.org/33409 took a long time to
+  stabilize, and didn't land until hours before Icehouse feature freeze.
+
+* The impact of https://review.openstack.org/33409 was significantly larger
+  than that of the ephemeral RBD clone change that was built on top of it.
+
+* The impact included exposing nova.image.glance._get_locations() method that
+  relies on Glance API v2 to code paths that assume Glance API v1, which caused
+  LP bug #1291014 (https://bugs.launchpad.net/nova/+bug/1291014).
+
+This design has a significantly smaller footprint, and is mostly isolated to
+the RBD image backend in the libvirt driver.
+
+Data model impact
+-----------------
+
+None.
+
+REST API impact
+---------------
+
+None.
+
+Security impact
+---------------
+
+None.
+
+Notifications impact
+--------------------
+
+None.
+
+Other end user impact
+---------------------
+
+When Ceph RBD backend is enabled for Glance and Nova, there will be a
+noticeable difference in time and resource consumption when launching instances
+from Glance images in RAW and non-RAW formats.
+
+Performance Impact
+------------------
+
+In the primary use case defined in the `Problem description`_ section above,
+there will be a significant performance improvement.
+
+In other use cases, libvirt driver will introduce one more API call to Glance
+to retrieve a list of image locations when RBD backend is enabled. The
+performance impact of that call is insignificant compared to the time and
+resources it takes to fetch a full image from Glance.
+
+Other deployer impact
+---------------------
+
+None.
+
+Developer impact
+----------------
+
+None.
+
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  jdurgin
+
+Other contributors:
+  angdraug
+
+Work Items
+----------
+
+Current implementation (see `References`_) consists of following changes:
+
+* Move libvirt RBD utilities to a new file
+
+* Use library instead of CLI to cleanup RBD volumes
+
+* Enable cloning for rbd-backed ephemeral disks
+
+
+Dependencies
+============
+
+None.
+
+
+Testing
+=======
+
+This is a non-functional change with no impact on the test cases that need to
+be covered.
+
+There is work currently going on to get all of tempest running against an
+environment using Ceph in the OpenStack CI environment.  The first step is ceph
+support for devstack, which you can see here:
+
+    https://review.openstack.org/#/c/65113
+
+There's also a test devstack patch with forces ceph to be enabled, which
+results in all of the devstack jobs being run with ceph enabled.  You can find
+that here:
+
+    https://review.openstack.org/#/c/107472/
+
+There are some tests failing (14 and 15 the first couple of runs).  However,
+that also means that the vast majority of tests that cover this code (anything
+that spawns an instance) are passing.  So, we at least have a way to run these
+tests on demand against master.  Once the devstack patch merges, we will enable
+a job that can run against patches in all projects (perhaps experimental to
+start with).
+
+Fuel CI also includes a suite of tests for OpenStack deployments with Ceph:
+https://github.com/stackforge/fuel-main/blob/master/fuelweb_test/tests/test_ceph.py
+
+
+Documentation Impact
+====================
+
+None.
+
+
+References
+==========
+
+Mailing list discussions:
+http://lists.openstack.org/pipermail/openstack-dev/2014-March/029127.html
+http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-March/008659.html
+
+Current implementation:
+https://github.com/angdraug/nova/tree/rbd-ephemeral-clone
+https://review.openstack.org/#/q/status:open+topic:bp/rbd-clone-image-handler,n,z