Merge "Virtual instance rescue with stable disk devices"
This commit is contained in:
commit
b4c9898201
|
@ -0,0 +1,255 @@
|
||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
================================================
|
||||||
|
Virtual instance rescue with stable disk devices
|
||||||
|
================================================
|
||||||
|
|
||||||
|
https://blueprints.launchpad.net/nova/+spec/virt-rescue-stable-disk-devices
|
||||||
|
|
||||||
|
This will provide the ability to indicate that the rescue disk image
|
||||||
|
should be attached as a transient disk device (ie USB stick), so that
|
||||||
|
existing storage attached to an instance doesn't change its device
|
||||||
|
address during rescue mode.
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
When an instance is booted normally there are a number of possible disks
|
||||||
|
that will be attached to the instance
|
||||||
|
|
||||||
|
- An ephemeral or persistent cinder volume root disk
|
||||||
|
- Zero or more ephemeral non-root disks
|
||||||
|
- Zero or more persistent non-root cinder volumes
|
||||||
|
- An optional swap disk
|
||||||
|
- An optional config drive disk
|
||||||
|
|
||||||
|
When the instance is booted in rescue mode though, this storage setup
|
||||||
|
changes significantly, and differently depending on virt drivers. In
|
||||||
|
the Libvirt driver, the rescue instance gets:
|
||||||
|
|
||||||
|
- A rescue root disk
|
||||||
|
- The original root disk
|
||||||
|
- An optional config drive disk
|
||||||
|
|
||||||
|
There are multiple problems with this. First of all several of the disks
|
||||||
|
are missing entirely, eg the ephemeral non-root disks, all cinder volumes
|
||||||
|
and the swap disk. This missing storage limits the scope of work the admin
|
||||||
|
can do in rescue mode.
|
||||||
|
|
||||||
|
The rescue root disk is put on a device that previously held the real
|
||||||
|
root disk. For example the rescue root is /dev/vda and the real root image is
|
||||||
|
now shifted to a different device /dev/vdb. Although a well designed
|
||||||
|
OS setup should not rely on the root device appearing at a fixed device
|
||||||
|
name, some OSes none the less do depend on this. Moving the root disk
|
||||||
|
during rescue mode can thus introduce problems of its own, and in fact
|
||||||
|
contribute to mistakes in rescue mode. For example it may confuse the admin
|
||||||
|
into setting up their fstab to refer to /dev/vdb, when the root disk will go
|
||||||
|
back to /dev/vda after rescue mode is finished.
|
||||||
|
|
||||||
|
This change in disk presence during rescue mode is very different to
|
||||||
|
what happens to disks on a baremetal machine when booted from rescue
|
||||||
|
media. This means that admin knowledge from working in a bare metal
|
||||||
|
world needs to be re-learned for OpenStack rescue mode, which adds an
|
||||||
|
undesirable learning burden for the admin.
|
||||||
|
|
||||||
|
When disks change what address they appear at, this can cause upset
|
||||||
|
licensing checks of some guests OS too. For example, if hardware
|
||||||
|
devices change their address too frequently, Windows may decide to
|
||||||
|
ask for license re-activation. This is again an undesirable thing
|
||||||
|
for admins in general.
|
||||||
|
|
||||||
|
Use Cases
|
||||||
|
----------
|
||||||
|
|
||||||
|
When the tenant user boots a VM in rescue mode they expect the existing
|
||||||
|
storage device configuration to be identical to that seen when running
|
||||||
|
in normal mode, but with an extra transient disk hotplugged to represent
|
||||||
|
the rescue media.
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
This spec will not cover the removal of the current boot from volume check in
|
||||||
|
the compute API that currently blocks any attempt to rescue an instance using a
|
||||||
|
root cinder volume. The removal of this check and subsequent impact on the
|
||||||
|
overall API will be covered in a follow up spec.
|
||||||
|
|
||||||
|
The compute manager code will be changed such that when rescue is performed the
|
||||||
|
full block device mapping will be present. This will allow instances to be
|
||||||
|
configured with the full set of non-root cinder volumes that would appear
|
||||||
|
during normal boot.
|
||||||
|
|
||||||
|
New image properties have already been introduced during Ocata [1]_ that will
|
||||||
|
be used to indicate the type of device and associated bus to use as the rescue
|
||||||
|
device.
|
||||||
|
|
||||||
|
- hw_rescue_bus=virtio|ide|usb|scsi
|
||||||
|
- hw_rescue_device=disk|floppy|cdrom
|
||||||
|
|
||||||
|
If omitted, the virt driver will default to whatever behaviour it currently
|
||||||
|
has for setting up the rescue disk. For the Libvirt driver, this means the
|
||||||
|
default bus would match the hw_disk_bus, and the device type would be "disk".
|
||||||
|
|
||||||
|
The expected recommended setup would be to tag the rescue image in glance
|
||||||
|
with hw_rescue_bus=usb, which would indicate to the virt driver that it
|
||||||
|
should attach a USB flash drive to the guest, containing the rescue image.
|
||||||
|
For hypervisors which can't support this an alternative recommendation would
|
||||||
|
be to tag the rescue image with hw_rescue_bus=ide and hw_rescue_device=cdrom
|
||||||
|
to cause a new CDROM device to be exposed with the rescue media.
|
||||||
|
|
||||||
|
The Libvirt nova driver will be changed so that when booting in rescue mode,
|
||||||
|
all the non-root cinder volumes, local ephemeral non-root disks and swap disks
|
||||||
|
are present in rescue mode. The rescue root device will be added as the *last*
|
||||||
|
device in the configuration, but will be marked as bootable for the BIOS, so it
|
||||||
|
takes priority over the existing root device. This relies on KVM/QEMU
|
||||||
|
supporting the "bootindex" parameter, which all supported versions do. This new
|
||||||
|
rescue mode would not be supported by Xen, nor LXC.
|
||||||
|
|
||||||
|
Other virt driver maintainers may wish to also implement this blueprint, so
|
||||||
|
approval should be considered to give blessing to all virt drivers. If other
|
||||||
|
virt driver maintainers wish to commit to doing this in this cycle the list
|
||||||
|
of assignees will be updated.
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
Do nothing is always an option, but the current setup has a number of
|
||||||
|
undesirable characteristics described earlier.
|
||||||
|
|
||||||
|
An alternative might be to simply hardcode a different approach. eg when
|
||||||
|
using KVM simply always use a USB flash device as the rescue media, and
|
||||||
|
don't bother with supporting an image property. This is certainly a viable
|
||||||
|
option, and if it were not for the sake of maintaining backwards compatibility
|
||||||
|
with earlier OpenStack, it might even be the preferred approach.
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
None, as the ImageMetaProps object changes have already landed in Ocata [1]_.
|
||||||
|
|
||||||
|
|
||||||
|
REST API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None, as support for BFV instances will be covered in a separate spec.
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Notifications impact
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other end user impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
The tenant user will gain the ability to set a new image meta property against
|
||||||
|
rescue disk images which will indicate the type of disk bus and device to use
|
||||||
|
when rescuing instances.
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other deployer impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
If the admin pre-populates any rescue disk images, they may wish to set the
|
||||||
|
disk bus and device type to override the historic default behaviour.
|
||||||
|
|
||||||
|
Developer impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Virt driver maintainers can continue to silently ignore the newly introduced
|
||||||
|
image properties or optionally start using them by implementing this new stable
|
||||||
|
device approach.
|
||||||
|
|
||||||
|
Upgrade impact
|
||||||
|
--------------
|
||||||
|
|
||||||
|
Older Libvirt based computes that are not able to honour the stable device
|
||||||
|
rescue image properties will continue to silently ignore them as they have
|
||||||
|
since these were introduced during Ocata [1]_. Once upgraded to Ussuri they
|
||||||
|
will then start rescusing instances with a stable device layout.
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
lyarwood (Libvirt impl)
|
||||||
|
|
||||||
|
Other contributors:
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Feature Liaison
|
||||||
|
---------------
|
||||||
|
lyarwood
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
Extend the compute manager rescue code to handle the full block device mapping
|
||||||
|
including non-root cinder volume attachments.
|
||||||
|
|
||||||
|
Extend the nova Libvirt driver to setup all disks when running in rescue
|
||||||
|
mode.
|
||||||
|
|
||||||
|
Extend the nova Libvirt driver to honour the new image meta properties in
|
||||||
|
rescue mode disk config.
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
A new tempest Libvirt feature configurable and test will be used to validate
|
||||||
|
correct operation of the new code.
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
The new image properties should be documented, and any information about
|
||||||
|
rescue mode should be updated to explain how disks appear.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
.. [1] hw_rescue_device and hw_rescue_bus image properties https://review.opendev.org/#/c/270285/
|
||||||
|
.. [2] https://review.opendev.org/#/c/230442/
|
||||||
|
.. [3] https://review.opendev.org/#/c/273122/
|
||||||
|
.. [4] https://review.opendev.org/#/c/510106/
|
||||||
|
.. [5] https://review.opendev.org/#/c/651151/
|
||||||
|
|
||||||
|
History
|
||||||
|
=======
|
||||||
|
|
||||||
|
.. list-table:: Revisions
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Release Name
|
||||||
|
- Description
|
||||||
|
* - Mitaka
|
||||||
|
- Introduced [2]_
|
||||||
|
* - Newton
|
||||||
|
- Reproposed [3]_
|
||||||
|
* - Queens
|
||||||
|
- Reproposed [4]_
|
||||||
|
* - Train
|
||||||
|
- Reproposed [5]_
|
||||||
|
* - Ussuri
|
||||||
|
- Reproposed
|
Loading…
Reference in New Issue