libvirt: Use 'writeback' QEMU cache mode when 'none' is not viable

When configuring QEMU cache modes for Nova instances, we use 'writethrough' when 'none' is not available. But that's not correct, because of our misunderstanding of how cache modes work. E.g. the function disk_cachemode() in the libvirt driver assumes that 'writethrough' and 'none' cache modes have the same behaviour with respect to host crash safety, which is not at all true. The misunderstanding and complexity stems from not realizing that each QEMU cache mode is a shorthand to toggle *three* booleans. Refer to the convenient cache mode table in the code comment (in nova/virt/libvirt/driver.py). As Kevin Wolf (thanks!), QEMU Block Layer maintainer, explains (I made a couple of micro edits for clarity): The thing that makes 'writethrough' so safe against host crashes is that it never keeps data in a "write cache", but it calls fsync() after _every_ write. This is also what makes it horribly slow. But 'cache=none' doesn't do this and therefore doesn't provide this kind of safety. The guest OS must explicitly flush the cache in the right places to make sure data is safe on the disk. And OSes do that. So if 'cache=none' is safe enough for you, then 'cache=writeback' should be safe enough for you, too -- because both of them have the boolean 'cache.writeback=on'. The difference is only in 'cache.direct', but 'cache.direct=on' only bypasses the host kernel page cache and data could still sit in other caches that could be present between QEMU and the disk (such as commonly a volatile write cache on the disk itself). So use 'writeback' mode instead of the debilitatingly slow 'writethrough' for cases where the O_DIRECT-based 'none' is unsupported. Do the minimum required update to the `disk_cachemodes` config help text. (In a future patch, rewrite the cache modes documentation to fix confusing fragments and outdated information.) Closes-Bug: #1818847 Change-Id: Ibe236988af24a3b43508eec4efbe52a4ed05d45f Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com> Looks-good-to-me'd-by: Kevin Wolf <kwolf@redhat.com>
2019-03-04 17:20:53 +01:00 · 2019-03-04 17:20:53 +01:00 · b9dc86d8d6
parent f58f73978e
commit b9dc86d8d6
4 changed files with 79 additions and 29 deletions
--- a/nova/conf/libvirt.py
+++ b/nova/conf/libvirt.py
@ -654,7 +654,11 @@ this environment.

 Possible cache modes:

-* default: Same as writethrough.
+* default: "It Depends" -- For Nova-managed disks, ``none``, if the host
+  file system is capable of Linux's 'O_DIRECT' semantics; otherwise
+  ``writeback``.  For volume drivers, the default is driver-dependent:
+  ``none`` for everything except for SMBFS and Virtuzzo (which use
+  ``writeback``).
 * none: With caching mode set to none, the host page cache is disabled, but
  the disk write cache is enabled for the guest. In this mode, the write
  performance in the guest is optimal because write operations bypass the host
@ -667,25 +671,25 @@ Possible cache modes:
  writethrough mode. Shareable disk devices, like for a multi-attachable block
  storage volume, will have their cache mode set to 'none' regardless of
  configuration.
-* writethrough: writethrough mode is the default caching mode. With
-  caching set to writethrough mode, the host page cache is enabled, but the
-  disk write cache is disabled for the guest. Consequently, this caching mode
-  ensures data integrity even if the applications and storage stack in the
-  guest do not transfer data to permanent storage properly (either through
-  fsync operations or file system barriers). Because the host page cache is
-  enabled in this mode, the read performance for applications running in the
-  guest is generally better. However, the write performance might be reduced
-  because the disk write cache is disabled.
-* writeback: With caching set to writeback mode, both the host page cache
-  and the disk write cache are enabled for the guest. Because of this, the
-  I/O performance for applications running in the guest is good, but the data
-  is not protected in a power failure. As a result, this caching mode is
-  recommended only for temporary data where potential data loss is not a
-  concern.
-  NOTE: Certain backend disk mechanisms may provide safe writeback cache
-  semantics. Specifically those that bypass the host page cache, such as
-  QEMU's integrated RBD driver. Ceph documentation recommends setting this
-  to writeback for maximum performance while maintaining data safety.
+* writethrough: With caching set to writethrough mode, the host page cache is
+  enabled, but the disk write cache is disabled for the guest. Consequently,
+  this caching mode ensures data integrity even if the applications and storage
+  stack in the guest do not transfer data to permanent storage properly (either
+  through fsync operations or file system barriers). Because the host page
+  cache is enabled in this mode, the read performance for applications running
+  in the guest is generally better. However, the write performance might be
+  reduced because the disk write cache is disabled.
+* writeback: With caching set to writeback mode, both the host page
+  cache and the disk write cache are enabled for the guest. Because of
+  this, the I/O performance for applications running in the guest is
+  good, but the data is not protected in a power failure. As a result,
+  this caching mode is recommended only for temporary data where
+  potential data loss is not a concern.
+  NOTE: Certain backend disk mechanisms may provide safe
+  writeback cache semantics. Specifically those that bypass the host
+  page cache, such as QEMU's integrated RBD driver. Ceph documentation
+  recommends setting this to writeback for maximum performance while
+  maintaining data safety.
 * directsync: Like "writethrough", but it bypasses the host page cache.
 * unsafe: Caching mode of unsafe ignores cache transfer operations
  completely. As its name implies, this caching mode should be used only for
--- a/nova/tests/unit/virt/libvirt/test_driver.py
+++ b/nova/tests/unit/virt/libvirt/test_driver.py
@ -8522,7 +8522,7 @@ class LibvirtConnTestCase(test.NoDBTestCase,
        tree = etree.fromstring(xml)
        disks = tree.findall('./devices/disk/driver')
        for guest_disk in disks:
-            self.assertEqual(guest_disk.get("cache"), "writethrough")
+            self.assertEqual(guest_disk.get("cache"), "writeback")

    def _check_xml_and_disk_bus(self, image_meta,
                                block_device_info, wantConfig):
@ -16123,7 +16123,7 @@ class LibvirtConnTestCase(test.NoDBTestCase,
        """Tests that when conf.shareable is True, the configuration is
        ignored and the driver_cache is forced to 'none'.
        """
-        self.flags(disk_cachemodes=['block=writethrough'], group='libvirt')
+        self.flags(disk_cachemodes=['block=writeback'], group='libvirt')
        drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), True)
        fake_conf = FakeConfigGuestDisk()
        fake_conf.shareable = True
--- a/nova/virt/libvirt/driver.py
+++ b/nova/virt/libvirt/driver.py
@ -407,16 +407,43 @@ class LibvirtDriver(driver.ComputeDriver):

    @property
    def disk_cachemode(self):
+        # It can be confusing to understand the QEMU cache mode
+        # behaviour, because each cache=$MODE is a convenient shorthand
+        # to toggle _three_ cache.* booleans.  Consult the below table
+        # (quoting from the QEMU man page):
+        #
+        #              | cache.writeback | cache.direct | cache.no-flush
+        # --------------------------------------------------------------
+        # writeback    | on              | off          | off
+        # none         | on              | on           | off
+        # writethrough | off             | off          | off
+        # directsync   | off             | on           | off
+        # unsafe       | on              | off          | on
+        #
+        # Where:
+        #
+        #  - 'cache.writeback=off' means: QEMU adds an automatic fsync()
+        #    after each write request.
+        #
+        #  - 'cache.direct=on' means: Use Linux's O_DIRECT, i.e. bypass
+        #    the kernel page cache.  Caches in any other layer (disk
+        #    cache, QEMU metadata caches, etc.) can still be present.
+        #
+        #  - 'cache.no-flush=on' means: Ignore flush requests, i.e.
+        #    never call fsync(), even if the guest explicitly requested
+        #    it.
+        #
+        # Use cache mode "none" (cache.writeback=on, cache.direct=on,
+        # cache.no-flush=off) for consistent performance and
+        # migration correctness.  Some filesystems don't support
+        # O_DIRECT, though.  For those we fallback to the next
+        # reasonable option that is "writeback" (cache.writeback=on,
+        # cache.direct=off, cache.no-flush=off).
+
        if self._disk_cachemode is None:
-            # We prefer 'none' for consistent performance, host crash
-            # safety & migration correctness by avoiding host page cache.
-            # Some filesystems don't support O_DIRECT though. For those we
-            # fallback to 'writethrough' which gives host crash safety, and
-            # is safe for migration provided the filesystem is cache coherent
-            # (cluster filesystems typically are, but things like NFS are not).
            self._disk_cachemode = "none"
            if not nova.privsep.utils.supports_direct_io(CONF.instances_path):
-                self._disk_cachemode = "writethrough"
+                self._disk_cachemode = "writeback"
        return self._disk_cachemode

    def _set_cache_mode(self, conf):
--- a/releasenotes/notes/writeback-cache-mode-for-guests-a7e4d2806c956164.yaml
+++ b/releasenotes/notes/writeback-cache-mode-for-guests-a7e4d2806c956164.yaml
@ -0,0 +1,19 @@
+---
+fixes:
+  - |
+    Update the way QEMU cache mode is configured for Nova guests: If the
+    file system hosting the directory with Nova instances is capable of
+    Linux's O_DIRECT, use ``none``; otherwise fallback to ``writeback``
+    cache mode.  This improves performance without compromising data
+    integrity.  `Bug 1818847`_.
+
+    Context: What makes ``writethrough`` so safe against host crashes is
+    that it never keeps data in a "write cache", but it calls fsync()
+    after *every* write.  This is also what makes it horribly slow.  But
+    cache mode ``none`` doesn't do this and therefore doesn't provide
+    this kind of safety.  The guest OS must explicitly flush the cache
+    in the right places to make sure data is safe on the disk; and all
+    modern OSes flush data as needed.  So if cache mode ``none`` is safe
+    enough for you, then ``writeback`` should be safe enough too.
+
+    .. _Bug 1818847: https://bugs.launchpad.net/nova/+bug/1818847