Add documentation for block device mapping
This commit adds some (long overdue) documentation around block device mapping and how it's used in Nova. blueprint devref-refresh-liberty Change-Id: Idca142f3b34ad896ab99f02a3f9eb72a6a3b4778
This commit is contained in:
parent
4bd8a4bd8e
commit
a338a4da11
204
doc/source/block_device_mapping.rst
Normal file
204
doc/source/block_device_mapping.rst
Normal file
@ -0,0 +1,204 @@
|
||||
..
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
not use this file except in compliance with the License. You may obtain
|
||||
a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
License for the specific language governing permissions and limitations
|
||||
under the License.
|
||||
|
||||
Block Device Mapping in Nova
|
||||
============================
|
||||
|
||||
Nova has a concept of block devices that can be exposed to cloud instances.
|
||||
There are several types of block devices an instance can have (we will go into
|
||||
more details about this later in this document), and which ones are available
|
||||
depends on a particular deployment and the usage limitations set for tenants
|
||||
and users. Block device mapping is a way to organize and keep data about all of
|
||||
the block devices an instance has.
|
||||
|
||||
When we talk about block device mapping, we usually refer to one of two things
|
||||
|
||||
1. API/CLI structure and syntax for specifying block devices for an instance
|
||||
boot request
|
||||
|
||||
2. The data structure internal to Nova that is used for recording and keeping,
|
||||
which is ultimately persisted in the block_device_mapping table. However,
|
||||
Nova internally has several "slightly" different formats for representing
|
||||
the same data. All of them are documented in the code and or presented by
|
||||
a distinct set of classes, but not knowing that they exist might trip up
|
||||
people reading the code. So in addition to BlockDeviceMapping [1]_ objects
|
||||
that mirror the database schema, we have:
|
||||
|
||||
2.1 The API format - this is the set of raw key-value pairs received from
|
||||
the API client, and is almost immediately transformed into the object;
|
||||
however, some validations are done using this format. We will refer to this
|
||||
format as the 'API BDMs' from now on.
|
||||
|
||||
2.2 The virt driver format - this is the format defined by the classes in
|
||||
:mod: `nova.virt.block_device`. This format is used and expected by the code
|
||||
in the various virt drivers. These classes, in addition to exposing a
|
||||
different format (mimicking the Python dict interface), also provide a place
|
||||
to bundle some functionality common to certain types of block devices (for
|
||||
example attaching volumes which has to interact with both Cinder and the
|
||||
virt driver code). We will refer to this format as 'Driver BDMs' from now
|
||||
on.
|
||||
|
||||
|
||||
Data format and its history
|
||||
----------------------------
|
||||
|
||||
In the early days of Nova, block device mapping general structure closely
|
||||
mirrored that of the EC2 API. During the Havana release of Nova, block device
|
||||
handling code, and in turn the block device mapping structure, had work done on
|
||||
improving the generality and usefulness. These improvements included exposing
|
||||
additional details and features in the API. In order to facilitate this, a new
|
||||
extension was added to the v2 API called `BlockDeviceMappingV2Boot` [2]_, that
|
||||
added an additional `block_device_mapping_v2` field to the instance boot API
|
||||
request.
|
||||
|
||||
Block device mapping v1 (aka legacy)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
This was the original format that supported only cinder volumes (similar to how
|
||||
EC2 block devices support only EBS volumes). Every entry was keyed by device
|
||||
name (we will discuss why this was problematic in its own section later on
|
||||
this page), and would accept only:
|
||||
|
||||
* UUID of the Cinder volume or snapshot
|
||||
* Type field - used only to distinguish between volumes and Cinder volume
|
||||
snapshots
|
||||
* Optional size field
|
||||
* Optional `delete_on_termination` flag
|
||||
|
||||
While all of Nova internal code only uses and stores the new data structure, we
|
||||
still need to handle API requests that use the legacy format. This is handled
|
||||
by the Nova API service on every request. As we will see later, since block
|
||||
device mapping information can also be stored in the image metadata in Glance,
|
||||
this is another place where we need to handle the v1 format. The code to handle
|
||||
legacy conversions is part of the :mod: `nova.block_device` module.
|
||||
|
||||
Intermezzo - problem with device names
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Using device names as the primary per-instance identifier, and exposing them in
|
||||
the API, is problematic for Nova mostly because several hypervisors Nova
|
||||
supports with its drivers can't guarantee that the device names the guest OS
|
||||
assigns are the ones the user requested from Nova. Exposing such a detail
|
||||
in the public API of Nova is obviously not ideal, but it needed to stay for
|
||||
backwards compatibility. It is also required for some (slightly obscure)
|
||||
features around overloading a block device in a Glance image when booting an
|
||||
instance [3].
|
||||
|
||||
The plan for fixing this was to allow users to not specify the device name of a
|
||||
block device, and Nova will determine it (with the help of the virt driver), so
|
||||
that it can still be discovered through the API and used when necessary, like
|
||||
for the features mentioned above (and preferably only then).
|
||||
|
||||
Another use for specifying the device name was to allow the "boot from volume"
|
||||
functionality, by specifying a device name that matches the root device name
|
||||
for the instance (usually `/dev/vda`).
|
||||
|
||||
Currently (mid Liberty) users are discouraged from specifying device names
|
||||
for all calls requiring or allowing block device mapping, except when trying to
|
||||
override the image block device mapping on instance boot, and it will likely
|
||||
remain like that in the future. Libvirt device driver will outright override
|
||||
any device names passed with it's own values.
|
||||
|
||||
Block device mapping v2
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
New format was introduced in an attempt to solve issues with the original
|
||||
block device mapping format discussed above, and also to allow for more
|
||||
flexibility and addition of features that were not possible with the simple
|
||||
format we had.
|
||||
|
||||
New block device mapping is a list of dictionaries containing the following
|
||||
fields (in addition to the ones that were already there):
|
||||
|
||||
* source_type - this can have one of the following values:
|
||||
|
||||
* `image`
|
||||
* `volume`
|
||||
* `snapshot`
|
||||
* `blank`
|
||||
|
||||
* dest_type - this can have one of the following values:
|
||||
|
||||
* `local`
|
||||
* `volume`
|
||||
|
||||
Combination of the above two fields would define what kind of block device the
|
||||
entry is referring to. We currently support the following combinations:
|
||||
|
||||
* `image` -> `local` - this is only currently reserved for the entry
|
||||
referring to the Glance image that the instance is being booted with (it
|
||||
should also be marked as a boot device). It is also worth noting that an
|
||||
API request that specifies this, also has to provide the same Glance uuid
|
||||
as the `image_ref` parameter to the boot request (this is done for
|
||||
backwards compatibility and may be changed in the future). This
|
||||
functionality might be extended to specify additional Glance images
|
||||
to be attached to an instance after boot (similar to kernel/ramdisk
|
||||
images) but this functionality is not supported by any of the current
|
||||
drivers.
|
||||
* `volume` -> `volume` - this is just a Cinder volume to be attached to the
|
||||
instance. It can be marked as a boot device.
|
||||
* `snapshot` -> `volume` - this works exactly as passing `type=snap` does.
|
||||
It would create a volume from a Cinder volume snapshot and attach that
|
||||
volume to the instance. Can be marked bootable.
|
||||
* `image` -> `volume` - As one would imagine, this would download a Glance
|
||||
image to a cinder volume and attach it to an instance. Can also be marked
|
||||
as bootable. This is really only a shortcut for creating a volume out of
|
||||
an image before booting an instance with the newly created volume.
|
||||
* `blank` -> `volume` - Creates a blank Cinder volume and attaches it. This
|
||||
will also require the volume size to be set.
|
||||
* `blank` -> `local` - Depending on the guest_format field (see below),
|
||||
this will either mean an ephemeral blank disk on hypervisor local
|
||||
storage, or a swap disk (instances can have only one of those).
|
||||
|
||||
* guest_format - Tells Nova how/if to format the device prior to attaching,
|
||||
should be only used with blank local images. Denotes a swap disk if the value
|
||||
is `swap`.
|
||||
|
||||
* device_name - See the previous section for a more in depth explanation of
|
||||
this - currently best left empty (not specified that is), unless the user
|
||||
wants to override the existing device specified in the image metadata.
|
||||
In case of Libvirt, even when passed in with the purpose of overriding the
|
||||
existing image metadata, final set of device names for the instance may still
|
||||
get changed by the driver.
|
||||
|
||||
* disk_bus and device_type - low level details that some hypervisors (currently
|
||||
only libvirt) may support. Some example disk_bus values can be: `ide`, `usb`,
|
||||
`virtio`, `scsi`, while device_type may be `disk`, `cdrom`, `floppy`, `lun`.
|
||||
This is not an exhaustive list as it depends on the virtualization driver,
|
||||
and may change as more support is added. Leaving these empty is the most
|
||||
common thing to do.
|
||||
|
||||
* boot_index - Defines the order in which a hypervisor will try devices when
|
||||
attempting to boot the guest from storage. Each device which is capable of
|
||||
being used as boot device should be given a unique boot index, starting from
|
||||
0 in ascending order. Some hypervisors may not support booting from multiple
|
||||
devices, so will only consider the device with boot index of 0. Some
|
||||
hypervisors will support booting from multiple devices, but only if they are
|
||||
of different types - eg a disk and CD-ROM. Setting a negative value or None
|
||||
indicates that the device should not be used for booting. The simplest
|
||||
usage is to set it to 0 for the boot device and leave it as None for any
|
||||
other devices.
|
||||
|
||||
|
||||
Nova will not allow mixing of two formats in a single request, and will do
|
||||
basic validation to make sure that the requested block device mapping is valid
|
||||
before accepting a boot request.
|
||||
|
||||
.. [1] In addition to the BlockDeviceMapping Nova object, we also have the
|
||||
BlockDeviceDict class in :mod: `nova.block_device` module. This class
|
||||
handles transforming and validating the API BDM format.
|
||||
.. [2] This work predates API microversions and thus the only way to add it was
|
||||
by means of an API extension.
|
||||
.. [3] This is a feature that the EC2 API offers as well and has been in Nova
|
||||
for a long time, although it has been broken in several releases. More info
|
||||
can be found on `this bug <https://launchpad.net/bugs/1370250>`
|
@ -141,6 +141,7 @@ Open Development.
|
||||
filter_scheduler
|
||||
rpc
|
||||
hooks
|
||||
block_device_mapping
|
||||
addmethod.openstackapi
|
||||
|
||||
Architecture Evolution Plans
|
||||
|
Loading…
Reference in New Issue
Block a user