a338a4da11
This commit adds some (long overdue) documentation around block device mapping and how it's used in Nova. blueprint devref-refresh-liberty Change-Id: Idca142f3b34ad896ab99f02a3f9eb72a6a3b4778
205 lines
10 KiB
ReStructuredText
205 lines
10 KiB
ReStructuredText
..
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may
|
|
not use this file except in compliance with the License. You may obtain
|
|
a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
|
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
|
License for the specific language governing permissions and limitations
|
|
under the License.
|
|
|
|
Block Device Mapping in Nova
|
|
============================
|
|
|
|
Nova has a concept of block devices that can be exposed to cloud instances.
|
|
There are several types of block devices an instance can have (we will go into
|
|
more details about this later in this document), and which ones are available
|
|
depends on a particular deployment and the usage limitations set for tenants
|
|
and users. Block device mapping is a way to organize and keep data about all of
|
|
the block devices an instance has.
|
|
|
|
When we talk about block device mapping, we usually refer to one of two things
|
|
|
|
1. API/CLI structure and syntax for specifying block devices for an instance
|
|
boot request
|
|
|
|
2. The data structure internal to Nova that is used for recording and keeping,
|
|
which is ultimately persisted in the block_device_mapping table. However,
|
|
Nova internally has several "slightly" different formats for representing
|
|
the same data. All of them are documented in the code and or presented by
|
|
a distinct set of classes, but not knowing that they exist might trip up
|
|
people reading the code. So in addition to BlockDeviceMapping [1]_ objects
|
|
that mirror the database schema, we have:
|
|
|
|
2.1 The API format - this is the set of raw key-value pairs received from
|
|
the API client, and is almost immediately transformed into the object;
|
|
however, some validations are done using this format. We will refer to this
|
|
format as the 'API BDMs' from now on.
|
|
|
|
2.2 The virt driver format - this is the format defined by the classes in
|
|
:mod: `nova.virt.block_device`. This format is used and expected by the code
|
|
in the various virt drivers. These classes, in addition to exposing a
|
|
different format (mimicking the Python dict interface), also provide a place
|
|
to bundle some functionality common to certain types of block devices (for
|
|
example attaching volumes which has to interact with both Cinder and the
|
|
virt driver code). We will refer to this format as 'Driver BDMs' from now
|
|
on.
|
|
|
|
|
|
Data format and its history
|
|
----------------------------
|
|
|
|
In the early days of Nova, block device mapping general structure closely
|
|
mirrored that of the EC2 API. During the Havana release of Nova, block device
|
|
handling code, and in turn the block device mapping structure, had work done on
|
|
improving the generality and usefulness. These improvements included exposing
|
|
additional details and features in the API. In order to facilitate this, a new
|
|
extension was added to the v2 API called `BlockDeviceMappingV2Boot` [2]_, that
|
|
added an additional `block_device_mapping_v2` field to the instance boot API
|
|
request.
|
|
|
|
Block device mapping v1 (aka legacy)
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
This was the original format that supported only cinder volumes (similar to how
|
|
EC2 block devices support only EBS volumes). Every entry was keyed by device
|
|
name (we will discuss why this was problematic in its own section later on
|
|
this page), and would accept only:
|
|
|
|
* UUID of the Cinder volume or snapshot
|
|
* Type field - used only to distinguish between volumes and Cinder volume
|
|
snapshots
|
|
* Optional size field
|
|
* Optional `delete_on_termination` flag
|
|
|
|
While all of Nova internal code only uses and stores the new data structure, we
|
|
still need to handle API requests that use the legacy format. This is handled
|
|
by the Nova API service on every request. As we will see later, since block
|
|
device mapping information can also be stored in the image metadata in Glance,
|
|
this is another place where we need to handle the v1 format. The code to handle
|
|
legacy conversions is part of the :mod: `nova.block_device` module.
|
|
|
|
Intermezzo - problem with device names
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Using device names as the primary per-instance identifier, and exposing them in
|
|
the API, is problematic for Nova mostly because several hypervisors Nova
|
|
supports with its drivers can't guarantee that the device names the guest OS
|
|
assigns are the ones the user requested from Nova. Exposing such a detail
|
|
in the public API of Nova is obviously not ideal, but it needed to stay for
|
|
backwards compatibility. It is also required for some (slightly obscure)
|
|
features around overloading a block device in a Glance image when booting an
|
|
instance [3].
|
|
|
|
The plan for fixing this was to allow users to not specify the device name of a
|
|
block device, and Nova will determine it (with the help of the virt driver), so
|
|
that it can still be discovered through the API and used when necessary, like
|
|
for the features mentioned above (and preferably only then).
|
|
|
|
Another use for specifying the device name was to allow the "boot from volume"
|
|
functionality, by specifying a device name that matches the root device name
|
|
for the instance (usually `/dev/vda`).
|
|
|
|
Currently (mid Liberty) users are discouraged from specifying device names
|
|
for all calls requiring or allowing block device mapping, except when trying to
|
|
override the image block device mapping on instance boot, and it will likely
|
|
remain like that in the future. Libvirt device driver will outright override
|
|
any device names passed with it's own values.
|
|
|
|
Block device mapping v2
|
|
^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
New format was introduced in an attempt to solve issues with the original
|
|
block device mapping format discussed above, and also to allow for more
|
|
flexibility and addition of features that were not possible with the simple
|
|
format we had.
|
|
|
|
New block device mapping is a list of dictionaries containing the following
|
|
fields (in addition to the ones that were already there):
|
|
|
|
* source_type - this can have one of the following values:
|
|
|
|
* `image`
|
|
* `volume`
|
|
* `snapshot`
|
|
* `blank`
|
|
|
|
* dest_type - this can have one of the following values:
|
|
|
|
* `local`
|
|
* `volume`
|
|
|
|
Combination of the above two fields would define what kind of block device the
|
|
entry is referring to. We currently support the following combinations:
|
|
|
|
* `image` -> `local` - this is only currently reserved for the entry
|
|
referring to the Glance image that the instance is being booted with (it
|
|
should also be marked as a boot device). It is also worth noting that an
|
|
API request that specifies this, also has to provide the same Glance uuid
|
|
as the `image_ref` parameter to the boot request (this is done for
|
|
backwards compatibility and may be changed in the future). This
|
|
functionality might be extended to specify additional Glance images
|
|
to be attached to an instance after boot (similar to kernel/ramdisk
|
|
images) but this functionality is not supported by any of the current
|
|
drivers.
|
|
* `volume` -> `volume` - this is just a Cinder volume to be attached to the
|
|
instance. It can be marked as a boot device.
|
|
* `snapshot` -> `volume` - this works exactly as passing `type=snap` does.
|
|
It would create a volume from a Cinder volume snapshot and attach that
|
|
volume to the instance. Can be marked bootable.
|
|
* `image` -> `volume` - As one would imagine, this would download a Glance
|
|
image to a cinder volume and attach it to an instance. Can also be marked
|
|
as bootable. This is really only a shortcut for creating a volume out of
|
|
an image before booting an instance with the newly created volume.
|
|
* `blank` -> `volume` - Creates a blank Cinder volume and attaches it. This
|
|
will also require the volume size to be set.
|
|
* `blank` -> `local` - Depending on the guest_format field (see below),
|
|
this will either mean an ephemeral blank disk on hypervisor local
|
|
storage, or a swap disk (instances can have only one of those).
|
|
|
|
* guest_format - Tells Nova how/if to format the device prior to attaching,
|
|
should be only used with blank local images. Denotes a swap disk if the value
|
|
is `swap`.
|
|
|
|
* device_name - See the previous section for a more in depth explanation of
|
|
this - currently best left empty (not specified that is), unless the user
|
|
wants to override the existing device specified in the image metadata.
|
|
In case of Libvirt, even when passed in with the purpose of overriding the
|
|
existing image metadata, final set of device names for the instance may still
|
|
get changed by the driver.
|
|
|
|
* disk_bus and device_type - low level details that some hypervisors (currently
|
|
only libvirt) may support. Some example disk_bus values can be: `ide`, `usb`,
|
|
`virtio`, `scsi`, while device_type may be `disk`, `cdrom`, `floppy`, `lun`.
|
|
This is not an exhaustive list as it depends on the virtualization driver,
|
|
and may change as more support is added. Leaving these empty is the most
|
|
common thing to do.
|
|
|
|
* boot_index - Defines the order in which a hypervisor will try devices when
|
|
attempting to boot the guest from storage. Each device which is capable of
|
|
being used as boot device should be given a unique boot index, starting from
|
|
0 in ascending order. Some hypervisors may not support booting from multiple
|
|
devices, so will only consider the device with boot index of 0. Some
|
|
hypervisors will support booting from multiple devices, but only if they are
|
|
of different types - eg a disk and CD-ROM. Setting a negative value or None
|
|
indicates that the device should not be used for booting. The simplest
|
|
usage is to set it to 0 for the boot device and leave it as None for any
|
|
other devices.
|
|
|
|
|
|
Nova will not allow mixing of two formats in a single request, and will do
|
|
basic validation to make sure that the requested block device mapping is valid
|
|
before accepting a boot request.
|
|
|
|
.. [1] In addition to the BlockDeviceMapping Nova object, we also have the
|
|
BlockDeviceDict class in :mod: `nova.block_device` module. This class
|
|
handles transforming and validating the API BDM format.
|
|
.. [2] This work predates API microversions and thus the only way to add it was
|
|
by means of an API extension.
|
|
.. [3] This is a feature that the EC2 API offers as well and has been in Nova
|
|
for a long time, although it has been broken in several releases. More info
|
|
can be found on `this bug <https://launchpad.net/bugs/1370250>`
|