Copy-on-write cloning for RBD-backed disks
Currently RBD-backed ephemeral disks are created by downloading an image from Glance to a local file, then uploading that file into RBD. Even if the file is cached, uploading may take a long time, since 'rbd import' is synchronous and slow. If the image is already stored in RBD by Glance, there's no need for any local copies - it can be cloned to a new image for a new disk without copying the data at all. blueprint rbd-clone-image-handler Change-Id: Ie71699e845e14251233efc993a690713f1118c31
This commit is contained in:
222
specs/juno/rbd-clone-image-handler.rst
Normal file
222
specs/juno/rbd-clone-image-handler.rst
Normal file
@@ -0,0 +1,222 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===================================================
|
||||
Storage: Copy-on-write cloning for RBD-backed disks
|
||||
===================================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/rbd-clone-image-handler
|
||||
|
||||
Currently RBD-backed ephemeral disks are created by downloading an image from
|
||||
Glance to a local file, then uploading that file into RBD. Even if the file is
|
||||
cached, uploading may take a long time, since 'rbd import' is synchronous and
|
||||
slow. If the image is already stored in RBD by Glance, there's no need for any
|
||||
local copies - it can be cloned to a new image for a new disk without copying
|
||||
the data at all.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The primary use case that benefits from this change is launching an instance
|
||||
from a Glance image where Ceph RBD backend is enabled for both Glance and Nova,
|
||||
and Glance images are stored in RBD in RAW format.
|
||||
|
||||
Following problems are addressed:
|
||||
|
||||
* Disk space on compute nodes is wasted by caching an additional copy of the
|
||||
image on each compute node that runs instances from that image.
|
||||
|
||||
* Disk space in Ceph is wasted by uploading a full copy of an image instead of
|
||||
creating a copy-on-write clone.
|
||||
|
||||
* Network capacity is wasted by downloading the image from RBD to a compute
|
||||
node the first time that node launches an instance from that image, and by
|
||||
uploading the image to RBD every time a new instance is launched from the
|
||||
same image.
|
||||
|
||||
* Increased time required to launch an instance reduces elasticity of the cloud
|
||||
environment and increases the number of in-flight operations that have to be
|
||||
maintained by Nova.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Extract RBD specific utility code into a new file, align its structure and
|
||||
provided functionality in line with similar code in Cinder. This includes the
|
||||
volume cleanup code that should be converted from rbd CLI to using the RBD
|
||||
library.
|
||||
|
||||
Add utility functions to support cloning, including checks whether image exists
|
||||
and whether it can be cloned.
|
||||
|
||||
Add direct_fetch() method to nova.virt.libvirt.imagebackend, make its
|
||||
implementation in the Rbd subclass try to clone the image when possible.
|
||||
Following criteria are used to determine that the image can be cloned:
|
||||
|
||||
* Image location uses the rbd:// schema and contains a valid reference to an
|
||||
RBD snapshot;
|
||||
|
||||
* Image location references the same Ceph cluster as Nova configuration;
|
||||
|
||||
* Image disk format is 'raw';
|
||||
|
||||
* RBD snapshot referenced by image location is accessible by Nova.
|
||||
|
||||
Extend fetch_to_raw() in nova.virt.images to try direct_fetch() when a new
|
||||
optional backend parameter is passed. Make the libvirt driver pass the backend
|
||||
parameter.
|
||||
|
||||
Instead of calling disk.get_disk_size() directly from verify_base_size(), which
|
||||
assumes the disk is stored locally, add a new method that is overridden by the
|
||||
Rbd subclass to get the disk size.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
An alternative implementation based on the image-multiple-location blueprint
|
||||
(https://blueprints.launchpad.net/glance/+spec/multiple-image-locations) was
|
||||
tried in Icehouse. It was ultimately reverted, which can be attributed to a sum
|
||||
of multiple reasons:
|
||||
|
||||
* The implementation in https://review.openstack.org/33409 took a long time to
|
||||
stabilize, and didn't land until hours before Icehouse feature freeze.
|
||||
|
||||
* The impact of https://review.openstack.org/33409 was significantly larger
|
||||
than that of the ephemeral RBD clone change that was built on top of it.
|
||||
|
||||
* The impact included exposing nova.image.glance._get_locations() method that
|
||||
relies on Glance API v2 to code paths that assume Glance API v1, which caused
|
||||
LP bug #1291014 (https://bugs.launchpad.net/nova/+bug/1291014).
|
||||
|
||||
This design has a significantly smaller footprint, and is mostly isolated to
|
||||
the RBD image backend in the libvirt driver.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
When Ceph RBD backend is enabled for Glance and Nova, there will be a
|
||||
noticeable difference in time and resource consumption when launching instances
|
||||
from Glance images in RAW and non-RAW formats.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
In the primary use case defined in the `Problem description`_ section above,
|
||||
there will be a significant performance improvement.
|
||||
|
||||
In other use cases, libvirt driver will introduce one more API call to Glance
|
||||
to retrieve a list of image locations when RBD backend is enabled. The
|
||||
performance impact of that call is insignificant compared to the time and
|
||||
resources it takes to fetch a full image from Glance.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
jdurgin
|
||||
|
||||
Other contributors:
|
||||
angdraug
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
Current implementation (see `References`_) consists of following changes:
|
||||
|
||||
* Move libvirt RBD utilities to a new file
|
||||
|
||||
* Use library instead of CLI to cleanup RBD volumes
|
||||
|
||||
* Enable cloning for rbd-backed ephemeral disks
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None.
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
This is a non-functional change with no impact on the test cases that need to
|
||||
be covered.
|
||||
|
||||
There is work currently going on to get all of tempest running against an
|
||||
environment using Ceph in the OpenStack CI environment. The first step is ceph
|
||||
support for devstack, which you can see here:
|
||||
|
||||
https://review.openstack.org/#/c/65113
|
||||
|
||||
There's also a test devstack patch with forces ceph to be enabled, which
|
||||
results in all of the devstack jobs being run with ceph enabled. You can find
|
||||
that here:
|
||||
|
||||
https://review.openstack.org/#/c/107472/
|
||||
|
||||
There are some tests failing (14 and 15 the first couple of runs). However,
|
||||
that also means that the vast majority of tests that cover this code (anything
|
||||
that spawns an instance) are passing. So, we at least have a way to run these
|
||||
tests on demand against master. Once the devstack patch merges, we will enable
|
||||
a job that can run against patches in all projects (perhaps experimental to
|
||||
start with).
|
||||
|
||||
Fuel CI also includes a suite of tests for OpenStack deployments with Ceph:
|
||||
https://github.com/stackforge/fuel-main/blob/master/fuelweb_test/tests/test_ceph.py
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
None.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
Mailing list discussions:
|
||||
http://lists.openstack.org/pipermail/openstack-dev/2014-March/029127.html
|
||||
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-March/008659.html
|
||||
|
||||
Current implementation:
|
||||
https://github.com/angdraug/nova/tree/rbd-ephemeral-clone
|
||||
https://review.openstack.org/#/q/status:open+topic:bp/rbd-clone-image-handler,n,z
|
||||
Reference in New Issue
Block a user