Merge "Virtual instance rescue with stable disk devices"

This commit is contained in:
Zuul 2019-11-15 15:42:39 +00:00 committed by Gerrit Code Review
commit b4c9898201
1 changed files with 255 additions and 0 deletions

View File

@ -0,0 +1,255 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
================================================
Virtual instance rescue with stable disk devices
================================================
https://blueprints.launchpad.net/nova/+spec/virt-rescue-stable-disk-devices
This will provide the ability to indicate that the rescue disk image
should be attached as a transient disk device (ie USB stick), so that
existing storage attached to an instance doesn't change its device
address during rescue mode.
Problem description
===================
When an instance is booted normally there are a number of possible disks
that will be attached to the instance
- An ephemeral or persistent cinder volume root disk
- Zero or more ephemeral non-root disks
- Zero or more persistent non-root cinder volumes
- An optional swap disk
- An optional config drive disk
When the instance is booted in rescue mode though, this storage setup
changes significantly, and differently depending on virt drivers. In
the Libvirt driver, the rescue instance gets:
- A rescue root disk
- The original root disk
- An optional config drive disk
There are multiple problems with this. First of all several of the disks
are missing entirely, eg the ephemeral non-root disks, all cinder volumes
and the swap disk. This missing storage limits the scope of work the admin
can do in rescue mode.
The rescue root disk is put on a device that previously held the real
root disk. For example the rescue root is /dev/vda and the real root image is
now shifted to a different device /dev/vdb. Although a well designed
OS setup should not rely on the root device appearing at a fixed device
name, some OSes none the less do depend on this. Moving the root disk
during rescue mode can thus introduce problems of its own, and in fact
contribute to mistakes in rescue mode. For example it may confuse the admin
into setting up their fstab to refer to /dev/vdb, when the root disk will go
back to /dev/vda after rescue mode is finished.
This change in disk presence during rescue mode is very different to
what happens to disks on a baremetal machine when booted from rescue
media. This means that admin knowledge from working in a bare metal
world needs to be re-learned for OpenStack rescue mode, which adds an
undesirable learning burden for the admin.
When disks change what address they appear at, this can cause upset
licensing checks of some guests OS too. For example, if hardware
devices change their address too frequently, Windows may decide to
ask for license re-activation. This is again an undesirable thing
for admins in general.
Use Cases
----------
When the tenant user boots a VM in rescue mode they expect the existing
storage device configuration to be identical to that seen when running
in normal mode, but with an extra transient disk hotplugged to represent
the rescue media.
Proposed change
===============
This spec will not cover the removal of the current boot from volume check in
the compute API that currently blocks any attempt to rescue an instance using a
root cinder volume. The removal of this check and subsequent impact on the
overall API will be covered in a follow up spec.
The compute manager code will be changed such that when rescue is performed the
full block device mapping will be present. This will allow instances to be
configured with the full set of non-root cinder volumes that would appear
during normal boot.
New image properties have already been introduced during Ocata [1]_ that will
be used to indicate the type of device and associated bus to use as the rescue
device.
- hw_rescue_bus=virtio|ide|usb|scsi
- hw_rescue_device=disk|floppy|cdrom
If omitted, the virt driver will default to whatever behaviour it currently
has for setting up the rescue disk. For the Libvirt driver, this means the
default bus would match the hw_disk_bus, and the device type would be "disk".
The expected recommended setup would be to tag the rescue image in glance
with hw_rescue_bus=usb, which would indicate to the virt driver that it
should attach a USB flash drive to the guest, containing the rescue image.
For hypervisors which can't support this an alternative recommendation would
be to tag the rescue image with hw_rescue_bus=ide and hw_rescue_device=cdrom
to cause a new CDROM device to be exposed with the rescue media.
The Libvirt nova driver will be changed so that when booting in rescue mode,
all the non-root cinder volumes, local ephemeral non-root disks and swap disks
are present in rescue mode. The rescue root device will be added as the *last*
device in the configuration, but will be marked as bootable for the BIOS, so it
takes priority over the existing root device. This relies on KVM/QEMU
supporting the "bootindex" parameter, which all supported versions do. This new
rescue mode would not be supported by Xen, nor LXC.
Other virt driver maintainers may wish to also implement this blueprint, so
approval should be considered to give blessing to all virt drivers. If other
virt driver maintainers wish to commit to doing this in this cycle the list
of assignees will be updated.
Alternatives
------------
Do nothing is always an option, but the current setup has a number of
undesirable characteristics described earlier.
An alternative might be to simply hardcode a different approach. eg when
using KVM simply always use a USB flash device as the rescue media, and
don't bother with supporting an image property. This is certainly a viable
option, and if it were not for the sake of maintaining backwards compatibility
with earlier OpenStack, it might even be the preferred approach.
Data model impact
-----------------
None, as the ImageMetaProps object changes have already landed in Ocata [1]_.
REST API impact
---------------
None, as support for BFV instances will be covered in a separate spec.
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
The tenant user will gain the ability to set a new image meta property against
rescue disk images which will indicate the type of disk bus and device to use
when rescuing instances.
Performance Impact
------------------
None
Other deployer impact
---------------------
If the admin pre-populates any rescue disk images, they may wish to set the
disk bus and device type to override the historic default behaviour.
Developer impact
----------------
Virt driver maintainers can continue to silently ignore the newly introduced
image properties or optionally start using them by implementing this new stable
device approach.
Upgrade impact
--------------
Older Libvirt based computes that are not able to honour the stable device
rescue image properties will continue to silently ignore them as they have
since these were introduced during Ocata [1]_. Once upgraded to Ussuri they
will then start rescusing instances with a stable device layout.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
lyarwood (Libvirt impl)
Other contributors:
None
Feature Liaison
---------------
lyarwood
Work Items
----------
Extend the compute manager rescue code to handle the full block device mapping
including non-root cinder volume attachments.
Extend the nova Libvirt driver to setup all disks when running in rescue
mode.
Extend the nova Libvirt driver to honour the new image meta properties in
rescue mode disk config.
Dependencies
============
None
Testing
=======
A new tempest Libvirt feature configurable and test will be used to validate
correct operation of the new code.
Documentation Impact
====================
The new image properties should be documented, and any information about
rescue mode should be updated to explain how disks appear.
References
==========
.. [1] hw_rescue_device and hw_rescue_bus image properties https://review.opendev.org/#/c/270285/
.. [2] https://review.opendev.org/#/c/230442/
.. [3] https://review.opendev.org/#/c/273122/
.. [4] https://review.opendev.org/#/c/510106/
.. [5] https://review.opendev.org/#/c/651151/
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Mitaka
- Introduced [2]_
* - Newton
- Reproposed [3]_
* - Queens
- Reproposed [4]_
* - Train
- Reproposed [5]_
* - Ussuri
- Reproposed