qemu: Make disk image conversion dramatically faster
tl;dr: Use 'writeback' instead of 'writethrough' as the cache mode of
the target image for `qemu-img convert`. Two reasons: (a) if the image
conversion completes succesfully, then 'writeback' calls fsync() to
safely write data to the physical disk; and (b) 'writeback' makes the
image conversion a _lot_ faster.
Back-of-the-envelope "benchmark" (on an SSD)
--------------------------------------------
(Ran both the tests thrice each; version: qemu-img-2.11.0)
With 'writethrough':
$> time (qemu-img convert -t writethrough -f qcow2 -O raw \
Fedora-Cloud-Base-29.qcow2 Fedora-Cloud-Base-29.raw)
real 1m43.470s
user 0m8.310s
sys 0m3.661s
With 'writeback':
$> time (qemu-img convert -t writeback -f qcow2 -O raw \
Fedora-Cloud-Base-29.qcow2 5-Fedora-Cloud-Base-29.raw)
real 0m7.390s
user 0m5.179s
sys 0m1.780s
I.e. ~103 seconds of elapsed wall-clock time for 'writethrough' vs. ~7
seconds for 'writeback' -- IOW, 'writeback' is nearly _15_ times faster!
Details
-------
Nova commit e6ce9557f8
("qemu-img do not
use cache=none if no O_DIRECT support") was introduced to make instances
boot on filesystems that don't support 'O_DIRECT' (which bypasses the
host page cache and flushes data directly to the disk), such as 'tmpfs'.
In doing so it introduced the 'writethrough' cache for the target image
for `qemu-img convert`.
This patch proposes to change that to 'writeback'.
Let's addresses the 'safety' concern:
"What about data integrity in the event of a host crash (especially
on shared file systems such as NFS)?"
Answer: If the host crashes mid-way during image conversion, then
neither "data integrity" nor the cache mode in use matters. But if the
image conversion completes _succesfully_, then 'writeback' will safely
write the data to the physical disk, just as 'writethough' does.
So we are as safe as we can, but with the extra benefit of image
conversion being _much_ faster.
* * *
The `qemu-img convert` command defaults to 'cache=writeback' for the
source image. And 'cache=unsafe' for the target, because if `qemu-img`
"crashes during the conversion, the user will throw away the broken
output file anyway and start over"[1]. And `qemu-img convert`
supports[2] fsync() for the target image since QEMU 1.1 (2012).
[1] https://git.qemu.org/?p=qemu.git;a=commitdiff;h=1bd8e175
-- "qemu-img convert: Use cache=unsafe for output image"
[2] https://git.qemu.org/?p=qemu.git;a=commitdiff;h=80ccf93b
-- "qemu-img: let 'qemu-img convert' flush data"
Closes-Bug: #1818847
Change-Id: I574be2b629aaff23556e25f8db0d740105be6f07
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
Looks-good-to-me'd-by: Kevin Wolf <kwolf@redhat.com>
This commit is contained in:
parent
e608568518
commit
e7b64eaad8
@ -35,23 +35,30 @@ def convert_image(source, dest, in_format, out_format, instances_path,
|
||||
# NOTE(mikal): this method is deliberately not wrapped in a privsep entrypoint
|
||||
def unprivileged_convert_image(source, dest, in_format, out_format,
|
||||
instances_path, compress):
|
||||
# NOTE(mdbooth): qemu-img convert defaults to cache=unsafe, which means
|
||||
# that data is not synced to disk at completion. We explicitly use
|
||||
# cache=none here to (1) ensure that we don't interfere with other
|
||||
# applications using the host's io cache, and (2) ensure that the data is
|
||||
# on persistent storage when the command exits. Without (2), a host crash
|
||||
# may leave a corrupt image in the image cache, which Nova cannot recover
|
||||
# automatically.
|
||||
# NOTE(zigo): we cannot use -t none if the instances dir is mounted on a
|
||||
# filesystem that doesn't have support for O_DIRECT, which is the case
|
||||
# for example with tmpfs. This simply crashes "openstack server create"
|
||||
# in environments like live distributions. In such case, the best choice
|
||||
# is writethrough, which is power-failure safe, but still faster than
|
||||
# writeback.
|
||||
# NOTE(mdbooth, kchamart): `qemu-img convert` defaults to
|
||||
# 'cache=writeback' for the source image, and 'cache=unsafe' for the
|
||||
# target, which means that data is not synced to disk at completion.
|
||||
# We explicitly use 'cache=none' here, for the target image, to (1)
|
||||
# ensure that we don't interfere with other applications using the
|
||||
# host's I/O cache, and (2) ensure that the data is on persistent
|
||||
# storage when the command exits. Without (2), a host crash may
|
||||
# leave a corrupt image in the image cache, which Nova cannot
|
||||
# recover automatically.
|
||||
|
||||
# NOTE(zigo, kchamart): We cannot use `qemu-img convert -t none` if
|
||||
# the 'instance_dir' is mounted on a filesystem that doesn't support
|
||||
# O_DIRECT, which is the case, for example, with 'tmpfs'. This
|
||||
# simply crashes `openstack server create` in environments like live
|
||||
# distributions. In such cases, the best choice is 'writeback',
|
||||
# which (a) makes the conversion multiple times faster; and (b) is
|
||||
# as safe as it can be, because at the end of the conversion it,
|
||||
# just like 'writethrough', calls fsync(2)|fdatasync(2), which
|
||||
# ensures to safely write the data to the physical disk.
|
||||
|
||||
if nova.privsep.utils.supports_direct_io(instances_path):
|
||||
cache_mode = 'none'
|
||||
else:
|
||||
cache_mode = 'writethrough'
|
||||
cache_mode = 'writeback'
|
||||
cmd = ('qemu-img', 'convert', '-t', cache_mode, '-O', out_format)
|
||||
|
||||
if in_format is not None:
|
||||
|
@ -19092,7 +19092,7 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
|
||||
mock_disk_op_sema.__enter__.assert_called_once()
|
||||
mock_direct_io.assert_called_once_with(CONF.instances_path)
|
||||
mock_execute.assert_has_calls([
|
||||
mock.call('qemu-img', 'convert', '-t', 'writethrough',
|
||||
mock.call('qemu-img', 'convert', '-t', 'writeback',
|
||||
'-O', 'qcow2', '-f', 'raw', path, _path_qcow)])
|
||||
mock_rename.assert_has_calls([
|
||||
mock.call(_path_qcow, path)])
|
||||
|
@ -532,7 +532,7 @@ disk size: 4.4M
|
||||
|
||||
libvirt_utils.extract_snapshot('/path/to/disk/image', src_format,
|
||||
'/extracted/snap', dest_format)
|
||||
qemu_img_cmd = ('qemu-img', 'convert', '-t', 'writethrough',
|
||||
qemu_img_cmd = ('qemu-img', 'convert', '-t', 'writeback',
|
||||
'-O', out_format, '-f', src_format, )
|
||||
if CONF.libvirt.snapshot_compression and dest_format == "qcow2":
|
||||
qemu_img_cmd += ('-c',)
|
||||
|
@ -125,7 +125,7 @@ class QemuTestCase(test.NoDBTestCase):
|
||||
mock_disk_op_sema):
|
||||
images._convert_image('source', 'dest', 'in_format', 'out_format',
|
||||
run_as_root=False)
|
||||
expected = ('qemu-img', 'convert', '-t', 'writethrough',
|
||||
expected = ('qemu-img', 'convert', '-t', 'writeback',
|
||||
'-O', 'out_format', '-f', 'in_format', 'source', 'dest')
|
||||
mock_disk_op_sema.__enter__.assert_called_once()
|
||||
self.assertTupleEqual(expected, mock_execute.call_args[0])
|
||||
|
@ -0,0 +1,8 @@
|
||||
---
|
||||
fixes:
|
||||
- |
|
||||
By using ``writeback`` QEMU cache mode, make Nova's disk image
|
||||
conversion (e.g. from raw to QCOW2 or vice versa) dramatically
|
||||
faster, without compromising data integrity. `Bug 1818847`_.
|
||||
|
||||
.. _Bug 1818847: https://bugs.launchpad.net/nova/+bug/1818847
|
Loading…
Reference in New Issue
Block a user