nova/releasenotes/notes/writeback-cache-mode-for-guests-a7e4d2806c956164.yaml
Kashyap Chamarthy b9dc86d8d6 libvirt: Use 'writeback' QEMU cache mode when 'none' is not viable
When configuring QEMU cache modes for Nova instances, we use
'writethrough' when 'none' is not available.  But that's not correct,
because of our misunderstanding of how cache modes work.  E.g. the
function disk_cachemode() in the libvirt driver assumes that
'writethrough' and 'none' cache modes have the same behaviour with
respect to host crash safety, which is not at all true.

The misunderstanding and complexity stems from not realizing that each
QEMU cache mode is a shorthand to toggle *three* booleans.  Refer to the
convenient cache mode table in the code comment (in
nova/virt/libvirt/driver.py).

As Kevin Wolf (thanks!), QEMU Block Layer maintainer, explains (I made
a couple of micro edits for clarity):

    The thing that makes 'writethrough' so safe against host crashes is
    that it never keeps data in a "write cache", but it calls fsync()
    after _every_ write.  This is also what makes it horribly slow.  But
    'cache=none' doesn't do this and therefore doesn't provide this kind
    of safety.  The guest OS must explicitly flush the cache in the
    right places to make sure data is safe on the disk.  And OSes do
    that.

    So if 'cache=none' is safe enough for you, then 'cache=writeback'
    should be safe enough for you, too -- because both of them have the
    boolean 'cache.writeback=on'.  The difference is only in
    'cache.direct', but 'cache.direct=on' only bypasses the host kernel
    page cache and data could still sit in other caches that could be
    present between QEMU and the disk (such as commonly a volatile write
    cache on the disk itself).

So use 'writeback' mode instead of the debilitatingly slow
'writethrough' for cases where the O_DIRECT-based 'none' is unsupported.

Do the minimum required update to the `disk_cachemodes` config help
text.  (In a future patch, rewrite the cache modes documentation to fix
confusing fragments and outdated information.)

Closes-Bug: #1818847
Change-Id: Ibe236988af24a3b43508eec4efbe52a4ed05d45f
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
Looks-good-to-me'd-by: Kevin Wolf <kwolf@redhat.com>
2019-03-21 14:17:22 +01:00

20 lines
970 B
YAML

---
fixes:
- |
Update the way QEMU cache mode is configured for Nova guests: If the
file system hosting the directory with Nova instances is capable of
Linux's O_DIRECT, use ``none``; otherwise fallback to ``writeback``
cache mode. This improves performance without compromising data
integrity. `Bug 1818847`_.
Context: What makes ``writethrough`` so safe against host crashes is
that it never keeps data in a "write cache", but it calls fsync()
after *every* write. This is also what makes it horribly slow. But
cache mode ``none`` doesn't do this and therefore doesn't provide
this kind of safety. The guest OS must explicitly flush the cache
in the right places to make sure data is safe on the disk; and all
modern OSes flush data as needed. So if cache mode ``none`` is safe
enough for you, then ``writeback`` should be safe enough too.
.. _Bug 1818847: https://bugs.launchpad.net/nova/+bug/1818847