Admin guide live-migration config chapter outdated

It contained pre-Kilo information and was not very readable.
This patch only addresses the KVM/Libvirt section. The chapter
also contains a Xenserver section, which is untouched.

Change-Id: Ic226c4845b3fe7286833d59a72f3fe87887eacec
Closes-Bug: #1670225
This commit is contained in:
Bernd Bausch 2017-03-09 20:07:24 +09:00 committed by Anne Gentle
parent 841cae18f3
commit 637081a97a

View File

@ -1,259 +1,290 @@
.. _section_configuring-compute-migrations:
====================
Configure migrations
====================
=========================
Configure live migrations
=========================
Migration enables an administrator to move a virtual machine instance
from one compute host to another. A typical scenario is planned
maintenance on the source host, but
migration can also be useful to redistribute
the load when many VM instances are running on a specific physical
machine.
This document covers live migrations using the
:ref:`configuring-migrations-kvm-libvirt`
and :ref:`configuring-migrations-xenserver` hypervisors.
.. :ref:`_configuring-migrations-kvm-libvirt`
.. :ref:`_configuring-migrations-xenserver`
.. note::
Only administrators can perform live migrations. If your cloud
is configured to use cells, you can perform live migration within
but not between cells.
Not all Compute service hypervisor drivers support live-migration,
or support all live-migration features.
Migration enables an administrator to move a virtual-machine instance
from one compute host to another. This feature is useful when a compute
host requires maintenance. Migration can also be useful to redistribute
the load when many VM instances are running on a specific physical
machine.
Consult the `Hypervisor Support Matrix
<https://docs.openstack.org/developer/nova/support-matrix.html>`_ to
determine which hypervisors support live-migration.
See the `Hypervisor configuration pages
<https://docs.openstack.org/ocata/config-reference/compute/hypervisors.html>`_
for details on hypervisor-specific configuration settings.
The migration types are:
- **Non-live migration** (sometimes referred to simply as 'migration').
The instance is shut down for a period of time to be moved to another
hypervisor. In this case, the instance recognizes that it was
rebooted.
- **Non-live migration**, also known as cold migration or simply
migration.
- **Live migration** (or 'true live migration'). Almost no instance
downtime. Useful when the instances must be kept running during the
migration. The different types of live migration are:
The instance is shut down, then moved to another
hypervisor and restarted. The instance recognizes that it was
rebooted, and the application running on the instance is disrupted.
- **Shared storage-based live migration**. Both hypervisors have
access to shared storage.
This section does not cover cold migration.
- **Block live migration**. No shared storage is required.
Incompatible with read-only devices such as CD-ROMs and
- **Live migration**
The instance keeps running throughout the migration.
This is useful when it is not possible or desirable to stop the application
running on the instance.
Live migrations can be classified further by the way they treat instance
storage:
- **Shared storage-based live migration**. The instance has ephemeral
disks that are located on storage shared between the source and
destination hosts.
- **Block live migration**, or simply block migration.
The instance has ephemeral disks that
are not shared between the source and destination hosts.
Block migration is
incompatible with read-only devices such as CD-ROMs and
`Configuration Drive (config\_drive) <https://docs.openstack.org/user-guide/cli-config-drive.html>`_.
- **Volume-backed live migration**. Instances are backed by volumes
rather than ephemeral disk, no shared storage is required, and
migration is supported (currently only available for libvirt-based
hypervisors).
- **Volume-backed live migration**. Instances use volumes
rather than ephemeral disks.
The following sections describe how to configure your hosts and compute
nodes for migrations by using the KVM and XenServer hypervisors.
Block live migration requires copying disks from the source to the
destination host. It takes more time and puts more load on the network.
Shared-storage and volume-backed live migration does not copy disks.
.. note::
In a multi-cell cloud, instances can be live migrated to a
different host in the same cell, but not across cells.
The following sections describe how to configure your hosts
for live migrations using the KVM and XenServer hypervisors.
.. _configuring-migrations-kvm-libvirt:
KVM-Libvirt
KVM-libvirt
~~~~~~~~~~~
.. :ref:`_configuring-migrations-kvm-general`
.. :ref:`_configuring-migrations-kvm-block-and-volume-migration`
.. :ref:`_configuring-migrations-kvm-shared-storage`
.. :ref:`_configuring-migrations-kvm-block-migration`
.. _configuring-migrations-kvm-general:
General configuration
---------------------
To enable any type of live migration, configure the compute hosts according
to the instructions below:
#. Set the following parameters in ``nova.conf`` on all compute hosts:
- ``vncserver_listen=0.0.0.0``
You must not make the VNC server listen to the IP address of its
compute host, since that addresses changes when the instance is migrated.
.. important::
Since this setting allows VNC clients from any IP address to connect
to instance consoles, you must take additional measures like secure
networks or firewalls to prevent potential attackers from gaining
access to instances.
- ``instances_path`` must have the same value for all compute hosts.
In this guide, the value ``/var/lib/nova/instances`` is assumed.
#. Ensure that name resolution on all compute hosts is identical, so
that they can connect each other through their hostnames.
If you use ``/etc/hosts`` for name resolution and enable SELinux,
ensure
that ``/etc/hosts`` has the correct SELinux context:
.. code-block:: console
# restorecon /etc/hosts
#. Enable password-less SSH so that
root on one compute host can log on to any other compute host
without providing a password.
The ``libvirtd`` daemon, which runs as root,
uses the SSH protocol to copy the instance to the destination
and can't know the passwords of all compute hosts.
You may, for example, compile root's public SSH keys on all compute hosts
into an ``authorized_keys`` file and deploy that file to the compute hosts.
#. Configure the firewalls to allow libvirt to
communicate between compute hosts.
By default, libvirt uses the TCP
port range from 49152 to 49261 for copying memory and disk contents.
Compute hosts
must accept connections in this range.
For information about ports used by libvirt,
see the `libvirt documentation <http://libvirt.org/remote.html#Remote_libvirtd_configuration>`_.
.. important::
Be mindful
of the security risks introduced by opening ports.
.. _configuring-migrations-kvm-block-and-volume-migration:
Block migration, volume-based live migration
--------------------------------------------
No additional configuration is required for block migration and volume-backed
live migration.
Be aware that block migration adds load to the network and storage subsystems.
.. _configuring-migrations-kvm-shared-storage:
Shared storage
--------------
.. :ref:`_section_example-compute-install`
.. :ref:`_true-live-migration-kvm-libvirt`
Compute hosts have many options for sharing storage,
for example NFS, shared disk array LUNs,
Ceph or GlusterFS.
**Prerequisites**
The next steps show how a regular Linux system
might be configured as an NFS v4 server for live migration.
For detailed information and alternative ways to configure
NFS on Linux, see instructions for
`Ubuntu <https://help.ubuntu.com/community/SettingUpNFSHowTo>`_,
`RHEL and derivatives <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/nfs-serverconfig.html>`_
or `SLES and OpenSUSE <https://www.suse.com/documentation/sles-12/book_sle_admin/data/sec_nfs_configuring-nfs-server.html>`_.
- **Hypervisor:** KVM with libvirt
#. Ensure that UID and GID of the nova user
are identical on the compute hosts and the NFS server.
- **Shared storage:** ``NOVA-INST-DIR/instances/`` (for example,
``/var/lib/nova/instances``) has to be mounted by shared storage.
This guide uses NFS but other options, including the
`OpenStack Gluster Connector <http://gluster.org/community/documentation//index.php/OSConnect>`_
are available.
#. Create a directory
with enough disk space for all
instances in the cloud, owned by user nova. In this guide, we
assume ``/var/lib/nova/instances``.
- **Instances:** Instance can be migrated with iSCSI-based volumes.
**Notes**
- Because the Compute service does not use the libvirt live
migration functionality by default, guests are suspended before
migration and might experience several minutes of downtime. For
details, see `Enabling true live migration`.
- Compute calculates the amount of downtime required using the RAM size of
the disk being migrated, in accordance with the ``live_migration_downtime``
configuration parameters. Migration downtime is measured in steps, with an
exponential backoff between each step. This means that the maximum
downtime between each step starts off small, and is increased in ever
larger amounts as Compute waits for the migration to complete. This gives
the guest a chance to complete the migration successfully, with a minimum
amount of downtime.
- This guide assumes the default value for ``instances_path`` in
your ``nova.conf`` file (``NOVA-INST-DIR/instances``). If you
have changed the ``state_path`` or ``instances_path`` variables,
modify the commands accordingly.
- You must specify ``vncserver_listen=0.0.0.0`` or live migration
will not work correctly. Because of this listening access, you must take
additional security measures to protect access to the VNC proxy from the
hypervisor. Using secure networks for that connection and configuring
firewalls is a best practice to make sure that you do not provide root
access to attackers gaining access to VMs through the proxy.
- You must specify the ``instances_path`` in each node that runs
``nova-compute``. The mount point for ``instances_path`` must be the
same value for each node, or live migration will not work
correctly.
.. _section_example-compute-install:
Example Compute installation environment
----------------------------------------
- Prepare at least three servers. In this example, we refer to the
servers as ``HostA``, ``HostB``, and ``HostC``:
- ``HostA`` is the Cloud Controller, and should run these services:
``nova-api``, ``nova-scheduler``, ``nova-network``, ``cinder-volume``,
and ``nova-objectstore``.
- ``HostB`` and ``HostC`` are the compute nodes that run
``nova-compute``.
Ensure that ``NOVA-INST-DIR`` (set with ``state_path`` in the
``nova.conf`` file) is the same on all hosts.
- In this example, ``HostA`` is the NFSv4 server that exports
``NOVA-INST-DIR/instances`` directory. ``HostB`` and ``HostC`` are
NFSv4 clients that mount ``HostA``.
**Configuring your system**
#. Configure your DNS or ``/etc/hosts`` and ensure it is consistent across
all hosts. Make sure that the three hosts can perform name resolution
with each other. As a test, use the :command:`ping` command to ping each host
from one another:
#. Set the execute/search bit on the ``instances`` directory:
.. code-block:: console
$ ping HostA
$ ping HostB
$ ping HostC
$ chmod o+x /var/lib/nova/instances
#. Ensure that the UID and GID of your Compute and libvirt users are
identical between each of your servers. This ensures that the
permissions on the NFS mount works correctly.
This allows qemu to access the ``instances`` directory tree.
#. Ensure you can access SSH without a password and without
StrictHostKeyChecking between ``HostB`` and ``HostC`` as ``nova``
user (set with the owner of ``nova-compute`` service). Direct access
from one compute host to another is needed to copy the VM file
across. It is also needed to detect if the source and target
compute nodes share a storage subsystem.
#. Export ``NOVA-INST-DIR/instances`` from ``HostA``, and ensure it is
readable and writable by the Compute user on ``HostB`` and ``HostC``.
For more information, see: `SettingUpNFSHowTo <https://help.ubuntu.com/community/SettingUpNFSHowTo>`_
or `CentOS/Red Hat: Setup NFS v4.0 File Server <http://www.cyberciti.biz/faq/centos-fedora-rhel-nfs-v4-configuration/>`_
#. Configure the NFS server at ``HostA`` by adding the following line to
the ``/etc/exports`` file:
#. Export ``/var/lib/nova/instances``
to the compute hosts. For example, add the following line to
``/etc/exports``:
.. code-block:: ini
NOVA-INST-DIR/instances HostA/255.255.0.0(rw,sync,fsid=0,no_root_squash)
/var/lib/nova/instances *(rw,sync,fsid=0,no_root_squash)
Change the subnet mask (``255.255.0.0``) to the appropriate value to
include the IP addresses of ``HostB`` and ``HostC``. Then restart the
``NFS`` server:
The asterisk permits access to any NFS client. The option ``fsid=0``
exports the instances directory as the NFS root.
After setting up the NFS server, mount the remote filesystem
on all compute hosts.
#. Assuming the NFS server's hostname is ``nfs-server``,
add this line to ``/etc/fstab`` to mount the NFS root:
.. code-block:: console
# /etc/init.d/nfs-kernel-server restart
# /etc/init.d/idmapd restart
nfs-server:/ /var/lib/nova/instances nfs4 defaults 0 0
#. On both compute nodes, enable the ``execute/search`` bit on your shared
directory to allow qemu to be able to use the images within the
directories. On all hosts, run the following command:
#. Test NFS by mounting the instances directory and
check access permissions for the nova user:
.. code-block:: console
$ chmod o+x NOVA-INST-DIR/instances
$ sudo mount -a -v
$ ls -ld /var/lib/nova/instances/
drwxr-xr-x. 2 nova nova 6 Mar 14 21:30 /var/lib/nova/instances/
#. Configure NFS on ``HostB`` and ``HostC`` by adding the following line to
the ``/etc/fstab`` file
.. _configuring-migrations-kvm-advanced:
.. code-block:: console
Advanced configuration for KVM and QEMU
---------------------------------------
HostA:/ /NOVA-INST-DIR/instances nfs4 defaults 0 0
Live migration copies the instance's memory from the source to the
destination compute host. After a memory page has been copied,
the instance
may write to it again, so that it has to be copied again.
Instances that
frequently write to different memory pages can overwhelm the
memory copy
process and prevent the live migration from completing.
Ensure that you can mount the exported directory
This section covers configuration settings that can help live
migration
of memory-intensive instances succeed.
.. code-block:: console
#. **Live migration completion timeout**
$ mount -a -v
The Compute service aborts a migration when it has been running
for too long.
The timeout is calculated based on the instance size, which is the
instance's
memory size in GiB. In the case of block migration, the size of
ephemeral storage in GiB is added.
Check that ``HostA`` can see the ``NOVA-INST-DIR/instances/``
directory
The timeout in seconds is the instance size multiplied by the
configurable parameter
``live_migration_completion_timeout``, whose default is 800. For
example,
shared-storage live migration of an instance with 8GiB memory will
time out after 6400 seconds.
.. code-block:: console
#. **Live migration progress timeout**
$ ls -ld NOVA-INST-DIR/instances/
drwxr-xr-x 2 nova nova 4096 2012-05-19 14:34 nova-install-dir/instances/
The Compute service also aborts a live migration when it detects that
memory copy is not making progress for a certain time. You can set
this time, in seconds,
through the configurable parameter
``live_migration_progress_timeout``.
Perform the same check on ``HostB`` and ``HostC``, paying special
attention to the permissions (Compute should be able to write)
In Ocata,
the default value of ``live_migration_progress_timeout`` is 0,
which disables progress timeouts. You should not change
this value, since the algorithm that detects memory copy progress
has been determined to be unreliable. It may be re-enabled in
future releases.
.. code-block:: console
#. **Instance downtime**
$ ls -ld NOVA-INST-DIR/instances/
drwxr-xr-x 2 nova nova 4096 2012-05-07 14:34 nova-install-dir/instances/
Near the end of the memory copy, the instance is paused for a
short time
so that the remaining few pages can be copied without
interference from
instance memory writes. The Compute service initializes this
time to a small
value that depends on the instance size, typically around 50
milliseconds. When
it notices that the memory copy does not make sufficient
progress, it increases
the time gradually.
$ df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 921514972 4180880 870523828 1% /
none 16498340 1228 16497112 1% /dev
none 16502856 0 16502856 0% /dev/shm
none 16502856 368 16502488 1% /var/run
none 16502856 0 16502856 0% /var/lock
none 16502856 0 16502856 0% /lib/init/rw
HostA: 921515008 101921792 772783104 12% /var/lib/nova/instances ( <--- this line is important.)
#. Update the libvirt configurations so that the calls can be made
securely. These methods enable remote access over TCP and are not
documented here.
- SSH tunnel to libvirtd's UNIX socket
- libvirtd TCP socket, with GSSAPI/Kerberos for auth+data encryption
- libvirtd TCP socket, with TLS for encryption and x509 client certs
for authentication
- libvirtd TCP socket, with TLS for encryption and Kerberos for
authentication
Restart ``libvirt``. After you run the command, ensure that libvirt is
successfully restarted
.. code-block:: console
# stop libvirt-bin && start libvirt-bin
$ ps -ef | grep libvirt
root 1145 1 0 Nov27 ? 00:00:03 /usr/sbin/libvirtd -d -l\
#. Configure your firewall to allow libvirt to communicate between nodes.
By default, libvirt listens on TCP port 16509, and an ephemeral TCP
range from 49152 to 49261 is used for the KVM communications. Based on
the secure remote access TCP configuration you chose, be careful which
ports you open, and always understand who has access. For information
about ports that are used with libvirt,
see the `libvirt documentation <http://libvirt.org/remote.html#Remote_libvirtd_configuration>`_.
#. Configure the downtime required for the migration by adjusting these
parameters in the ``nova.conf`` file:
You can influence the instance downtime algorithm with the
help of three
configuration variables on the compute hosts:
.. code-block:: ini
@ -261,64 +292,77 @@ Example Compute installation environment
live_migration_downtime_steps = 10
live_migration_downtime_delay = 75
The ``live_migration_downtime`` parameter sets the maximum permitted
downtime for a live migration, in milliseconds. This setting defaults to
500 milliseconds.
``live_migration_downtime`` sets the maximum permitted
downtime for a live migration, in *milliseconds*.
The default is 500.
The ``live_migration_downtime_steps`` parameter sets the total number of
incremental steps to reach the maximum downtime value. This setting
defaults to 10 steps.
``live_migration_downtime_steps`` sets the total number of
adjustment steps until ``live_migration_downtime`` is reached.
The default is 10 steps.
The ``live_migration_downtime_delay`` parameter sets the amount of time
to wait between each step, in seconds. This setting defaults to 75 seconds.
``live_migration_downtime_delay``
sets the time interval between two
adjustment steps in *seconds*. The default is 75.
#. You can now configure other options for live migration. In most cases, you
will not need to configure any options. For advanced configuration options,
see the `OpenStack Configuration Reference Guide <https://docs.openstack.org/
liberty/config-reference/content/list-of-compute-config-options.html
#config_table_nova_livemigration>`_.
#. **Auto-convergence**
.. _true-live-migration-kvm-libvirt:
One strategy for a successful live migration of a
memory-intensive instance
is slowing the instance down. This is called auto-convergence.
Both libvirt and QEMU implement this feature by automatically
throttling the instance's CPU when memory copy delays are detected.
Enabling true live migration
----------------------------
Auto-convergence is disabled by default.
You can enable it by setting
``live_migration_permit_auto_convergence=true``.
Prior to the Kilo release, the Compute service did not use the libvirt
live migration function by default. To enable this function, add the
following line to the ``[libvirt]`` section of the ``nova.conf`` file:
.. caution::
.. code-block:: ini
Before enabling auto-convergence,
make sure that the instance's application
tolerates a slow-down.
live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_TUNNELLED
Be aware that auto-convergence does not
guarantee live migration success.
On versions older than Kilo, the Compute service does not use libvirt's
live migration by default because there is a risk that the migration
process will never end. This can happen if the guest operating system
uses blocks on the disk faster than they can be migrated.
#. **Post-copy**
.. _configuring-migrations-kvm-block-migration:
Live migration of a memory-intensive instance is certain to
succeed
when you
enable post-copy. This feature, implemented by libvirt and
QEMU, activates the
virtual machine on the destination host before all of its
memory has been copied.
When the virtual machine accesses a page that is missing on
the destination host,
the resulting page fault is resolved by copying the page from
the source host.
Block migration
---------------
Post-copy is disabled by default. You can enable it by setting
``live_migration_permit_post_copy=true``.
Configuring KVM for block migration is exactly the same as the above
configuration in :ref:`configuring-migrations-kvm-shared-storage`
the section called shared storage, except that ``NOVA-INST-DIR/instances``
is local to each host rather than shared. No NFS client or server
configuration is required.
When you enable both auto-convergence and post-copy,
auto-convergence remains
disabled.
.. note::
.. caution::
- To use block migration, you must use the ``--block-migrate``
parameter with the live migration command.
The page faults introduced by post-copy can slow the
instance down.
- Block migration is incompatible with read-only devices such as
CD-ROMs and `Configuration Drive (config_drive) <https://docs.openstack.org/user-guide/cli-config-drive.html>`_.
When the network connection between source and destination
host is
interrupted, page faults cannot be resolved anymore and the
instance
is rebooted.
- Since the ephemeral drives are copied over the network in block
migration, migrations of instances with heavy I/O loads may never
complete if the drives are writing faster than the data can be
copied over the network.
.. TODO Bernd: I *believe* that it is certain to succeed,
.. but perhaps I am missing something.
The full list of live migration configuration parameters is documented
in the `OpenStack Configuration Reference Guide
<https://docs.openstack.org/ocata/config-reference/compute/config-options.html#id24>`_
.. _configuring-migrations-xenserver: