From 637081a97a9b649d8e438997cc539c02f15b0858 Mon Sep 17 00:00:00 2001 From: Bernd Bausch Date: Thu, 9 Mar 2017 20:07:24 +0900 Subject: [PATCH] Admin guide live-migration config chapter outdated It contained pre-Kilo information and was not very readable. This patch only addresses the KVM/Libvirt section. The chapter also contains a Xenserver section, which is untouched. Change-Id: Ic226c4845b3fe7286833d59a72f3fe87887eacec Closes-Bug: #1670225 --- .../source/compute-configuring-migrations.rst | 514 ++++++++++-------- 1 file changed, 279 insertions(+), 235 deletions(-) diff --git a/doc/admin-guide/source/compute-configuring-migrations.rst b/doc/admin-guide/source/compute-configuring-migrations.rst index 7309374aed..50ae8bb345 100644 --- a/doc/admin-guide/source/compute-configuring-migrations.rst +++ b/doc/admin-guide/source/compute-configuring-migrations.rst @@ -1,259 +1,290 @@ .. _section_configuring-compute-migrations: -==================== -Configure migrations -==================== +========================= +Configure live migrations +========================= + +Migration enables an administrator to move a virtual machine instance +from one compute host to another. A typical scenario is planned +maintenance on the source host, but +migration can also be useful to redistribute +the load when many VM instances are running on a specific physical +machine. + +This document covers live migrations using the +:ref:`configuring-migrations-kvm-libvirt` +and :ref:`configuring-migrations-xenserver` hypervisors. .. :ref:`_configuring-migrations-kvm-libvirt` .. :ref:`_configuring-migrations-xenserver` .. note:: - Only administrators can perform live migrations. If your cloud - is configured to use cells, you can perform live migration within - but not between cells. + Not all Compute service hypervisor drivers support live-migration, + or support all live-migration features. -Migration enables an administrator to move a virtual-machine instance -from one compute host to another. This feature is useful when a compute -host requires maintenance. Migration can also be useful to redistribute -the load when many VM instances are running on a specific physical -machine. + Consult the `Hypervisor Support Matrix + `_ to + determine which hypervisors support live-migration. + + See the `Hypervisor configuration pages + `_ + for details on hypervisor-specific configuration settings. The migration types are: -- **Non-live migration** (sometimes referred to simply as 'migration'). - The instance is shut down for a period of time to be moved to another - hypervisor. In this case, the instance recognizes that it was - rebooted. +- **Non-live migration**, also known as cold migration or simply + migration. -- **Live migration** (or 'true live migration'). Almost no instance - downtime. Useful when the instances must be kept running during the - migration. The different types of live migration are: + The instance is shut down, then moved to another + hypervisor and restarted. The instance recognizes that it was + rebooted, and the application running on the instance is disrupted. - - **Shared storage-based live migration**. Both hypervisors have - access to shared storage. + This section does not cover cold migration. - - **Block live migration**. No shared storage is required. - Incompatible with read-only devices such as CD-ROMs and +- **Live migration** + + The instance keeps running throughout the migration. + This is useful when it is not possible or desirable to stop the application + running on the instance. + + Live migrations can be classified further by the way they treat instance + storage: + + - **Shared storage-based live migration**. The instance has ephemeral + disks that are located on storage shared between the source and + destination hosts. + + - **Block live migration**, or simply block migration. + The instance has ephemeral disks that + are not shared between the source and destination hosts. + Block migration is + incompatible with read-only devices such as CD-ROMs and `Configuration Drive (config\_drive) `_. - - **Volume-backed live migration**. Instances are backed by volumes - rather than ephemeral disk, no shared storage is required, and - migration is supported (currently only available for libvirt-based - hypervisors). + - **Volume-backed live migration**. Instances use volumes + rather than ephemeral disks. -The following sections describe how to configure your hosts and compute -nodes for migrations by using the KVM and XenServer hypervisors. + Block live migration requires copying disks from the source to the + destination host. It takes more time and puts more load on the network. + Shared-storage and volume-backed live migration does not copy disks. + +.. note:: + + In a multi-cell cloud, instances can be live migrated to a + different host in the same cell, but not across cells. + +The following sections describe how to configure your hosts +for live migrations using the KVM and XenServer hypervisors. .. _configuring-migrations-kvm-libvirt: -KVM-Libvirt +KVM-libvirt ~~~~~~~~~~~ +.. :ref:`_configuring-migrations-kvm-general` +.. :ref:`_configuring-migrations-kvm-block-and-volume-migration` .. :ref:`_configuring-migrations-kvm-shared-storage` -.. :ref:`_configuring-migrations-kvm-block-migration` + +.. _configuring-migrations-kvm-general: + +General configuration +--------------------- + +To enable any type of live migration, configure the compute hosts according +to the instructions below: + +#. Set the following parameters in ``nova.conf`` on all compute hosts: + + - ``vncserver_listen=0.0.0.0`` + + You must not make the VNC server listen to the IP address of its + compute host, since that addresses changes when the instance is migrated. + + .. important:: + Since this setting allows VNC clients from any IP address to connect + to instance consoles, you must take additional measures like secure + networks or firewalls to prevent potential attackers from gaining + access to instances. + + - ``instances_path`` must have the same value for all compute hosts. + In this guide, the value ``/var/lib/nova/instances`` is assumed. + +#. Ensure that name resolution on all compute hosts is identical, so + that they can connect each other through their hostnames. + + If you use ``/etc/hosts`` for name resolution and enable SELinux, + ensure + that ``/etc/hosts`` has the correct SELinux context: + + .. code-block:: console + + # restorecon /etc/hosts + +#. Enable password-less SSH so that + root on one compute host can log on to any other compute host + without providing a password. + The ``libvirtd`` daemon, which runs as root, + uses the SSH protocol to copy the instance to the destination + and can't know the passwords of all compute hosts. + + You may, for example, compile root's public SSH keys on all compute hosts + into an ``authorized_keys`` file and deploy that file to the compute hosts. + +#. Configure the firewalls to allow libvirt to + communicate between compute hosts. + + By default, libvirt uses the TCP + port range from 49152 to 49261 for copying memory and disk contents. + Compute hosts + must accept connections in this range. + + For information about ports used by libvirt, + see the `libvirt documentation `_. + + .. important:: + Be mindful + of the security risks introduced by opening ports. + +.. _configuring-migrations-kvm-block-and-volume-migration: + +Block migration, volume-based live migration +-------------------------------------------- + +No additional configuration is required for block migration and volume-backed +live migration. + +Be aware that block migration adds load to the network and storage subsystems. .. _configuring-migrations-kvm-shared-storage: Shared storage -------------- -.. :ref:`_section_example-compute-install` -.. :ref:`_true-live-migration-kvm-libvirt` +Compute hosts have many options for sharing storage, +for example NFS, shared disk array LUNs, +Ceph or GlusterFS. -**Prerequisites** +The next steps show how a regular Linux system +might be configured as an NFS v4 server for live migration. +For detailed information and alternative ways to configure +NFS on Linux, see instructions for +`Ubuntu `_, +`RHEL and derivatives `_ +or `SLES and OpenSUSE `_. -- **Hypervisor:** KVM with libvirt +#. Ensure that UID and GID of the nova user + are identical on the compute hosts and the NFS server. -- **Shared storage:** ``NOVA-INST-DIR/instances/`` (for example, - ``/var/lib/nova/instances``) has to be mounted by shared storage. - This guide uses NFS but other options, including the - `OpenStack Gluster Connector `_ - are available. +#. Create a directory + with enough disk space for all + instances in the cloud, owned by user nova. In this guide, we + assume ``/var/lib/nova/instances``. -- **Instances:** Instance can be migrated with iSCSI-based volumes. - -**Notes** - -- Because the Compute service does not use the libvirt live - migration functionality by default, guests are suspended before - migration and might experience several minutes of downtime. For - details, see `Enabling true live migration`. - -- Compute calculates the amount of downtime required using the RAM size of - the disk being migrated, in accordance with the ``live_migration_downtime`` - configuration parameters. Migration downtime is measured in steps, with an - exponential backoff between each step. This means that the maximum - downtime between each step starts off small, and is increased in ever - larger amounts as Compute waits for the migration to complete. This gives - the guest a chance to complete the migration successfully, with a minimum - amount of downtime. - -- This guide assumes the default value for ``instances_path`` in - your ``nova.conf`` file (``NOVA-INST-DIR/instances``). If you - have changed the ``state_path`` or ``instances_path`` variables, - modify the commands accordingly. - -- You must specify ``vncserver_listen=0.0.0.0`` or live migration - will not work correctly. Because of this listening access, you must take - additional security measures to protect access to the VNC proxy from the - hypervisor. Using secure networks for that connection and configuring - firewalls is a best practice to make sure that you do not provide root - access to attackers gaining access to VMs through the proxy. - -- You must specify the ``instances_path`` in each node that runs - ``nova-compute``. The mount point for ``instances_path`` must be the - same value for each node, or live migration will not work - correctly. - -.. _section_example-compute-install: - -Example Compute installation environment ----------------------------------------- - -- Prepare at least three servers. In this example, we refer to the - servers as ``HostA``, ``HostB``, and ``HostC``: - - - ``HostA`` is the Cloud Controller, and should run these services: - ``nova-api``, ``nova-scheduler``, ``nova-network``, ``cinder-volume``, - and ``nova-objectstore``. - - - ``HostB`` and ``HostC`` are the compute nodes that run - ``nova-compute``. - - Ensure that ``NOVA-INST-DIR`` (set with ``state_path`` in the - ``nova.conf`` file) is the same on all hosts. - -- In this example, ``HostA`` is the NFSv4 server that exports - ``NOVA-INST-DIR/instances`` directory. ``HostB`` and ``HostC`` are - NFSv4 clients that mount ``HostA``. - -**Configuring your system** - -#. Configure your DNS or ``/etc/hosts`` and ensure it is consistent across - all hosts. Make sure that the three hosts can perform name resolution - with each other. As a test, use the :command:`ping` command to ping each host - from one another: +#. Set the execute/search bit on the ``instances`` directory: .. code-block:: console - $ ping HostA - $ ping HostB - $ ping HostC + $ chmod o+x /var/lib/nova/instances -#. Ensure that the UID and GID of your Compute and libvirt users are - identical between each of your servers. This ensures that the - permissions on the NFS mount works correctly. + This allows qemu to access the ``instances`` directory tree. -#. Ensure you can access SSH without a password and without - StrictHostKeyChecking between ``HostB`` and ``HostC`` as ``nova`` - user (set with the owner of ``nova-compute`` service). Direct access - from one compute host to another is needed to copy the VM file - across. It is also needed to detect if the source and target - compute nodes share a storage subsystem. - -#. Export ``NOVA-INST-DIR/instances`` from ``HostA``, and ensure it is - readable and writable by the Compute user on ``HostB`` and ``HostC``. - - For more information, see: `SettingUpNFSHowTo `_ - or `CentOS/Red Hat: Setup NFS v4.0 File Server `_ - -#. Configure the NFS server at ``HostA`` by adding the following line to - the ``/etc/exports`` file: +#. Export ``/var/lib/nova/instances`` + to the compute hosts. For example, add the following line to + ``/etc/exports``: .. code-block:: ini - NOVA-INST-DIR/instances HostA/255.255.0.0(rw,sync,fsid=0,no_root_squash) + /var/lib/nova/instances *(rw,sync,fsid=0,no_root_squash) - Change the subnet mask (``255.255.0.0``) to the appropriate value to - include the IP addresses of ``HostB`` and ``HostC``. Then restart the - ``NFS`` server: + The asterisk permits access to any NFS client. The option ``fsid=0`` + exports the instances directory as the NFS root. + +After setting up the NFS server, mount the remote filesystem +on all compute hosts. + +#. Assuming the NFS server's hostname is ``nfs-server``, + add this line to ``/etc/fstab`` to mount the NFS root: .. code-block:: console - # /etc/init.d/nfs-kernel-server restart - # /etc/init.d/idmapd restart + nfs-server:/ /var/lib/nova/instances nfs4 defaults 0 0 -#. On both compute nodes, enable the ``execute/search`` bit on your shared - directory to allow qemu to be able to use the images within the - directories. On all hosts, run the following command: +#. Test NFS by mounting the instances directory and + check access permissions for the nova user: .. code-block:: console - $ chmod o+x NOVA-INST-DIR/instances + $ sudo mount -a -v + $ ls -ld /var/lib/nova/instances/ + drwxr-xr-x. 2 nova nova 6 Mar 14 21:30 /var/lib/nova/instances/ -#. Configure NFS on ``HostB`` and ``HostC`` by adding the following line to - the ``/etc/fstab`` file +.. _configuring-migrations-kvm-advanced: - .. code-block:: console +Advanced configuration for KVM and QEMU +--------------------------------------- - HostA:/ /NOVA-INST-DIR/instances nfs4 defaults 0 0 +Live migration copies the instance's memory from the source to the +destination compute host. After a memory page has been copied, +the instance +may write to it again, so that it has to be copied again. +Instances that +frequently write to different memory pages can overwhelm the +memory copy +process and prevent the live migration from completing. - Ensure that you can mount the exported directory +This section covers configuration settings that can help live +migration +of memory-intensive instances succeed. - .. code-block:: console +#. **Live migration completion timeout** - $ mount -a -v + The Compute service aborts a migration when it has been running + for too long. + The timeout is calculated based on the instance size, which is the + instance's + memory size in GiB. In the case of block migration, the size of + ephemeral storage in GiB is added. - Check that ``HostA`` can see the ``NOVA-INST-DIR/instances/`` - directory + The timeout in seconds is the instance size multiplied by the + configurable parameter + ``live_migration_completion_timeout``, whose default is 800. For + example, + shared-storage live migration of an instance with 8GiB memory will + time out after 6400 seconds. - .. code-block:: console +#. **Live migration progress timeout** - $ ls -ld NOVA-INST-DIR/instances/ - drwxr-xr-x 2 nova nova 4096 2012-05-19 14:34 nova-install-dir/instances/ + The Compute service also aborts a live migration when it detects that + memory copy is not making progress for a certain time. You can set + this time, in seconds, + through the configurable parameter + ``live_migration_progress_timeout``. - Perform the same check on ``HostB`` and ``HostC``, paying special - attention to the permissions (Compute should be able to write) + In Ocata, + the default value of ``live_migration_progress_timeout`` is 0, + which disables progress timeouts. You should not change + this value, since the algorithm that detects memory copy progress + has been determined to be unreliable. It may be re-enabled in + future releases. - .. code-block:: console +#. **Instance downtime** - $ ls -ld NOVA-INST-DIR/instances/ - drwxr-xr-x 2 nova nova 4096 2012-05-07 14:34 nova-install-dir/instances/ + Near the end of the memory copy, the instance is paused for a + short time + so that the remaining few pages can be copied without + interference from + instance memory writes. The Compute service initializes this + time to a small + value that depends on the instance size, typically around 50 + milliseconds. When + it notices that the memory copy does not make sufficient + progress, it increases + the time gradually. - $ df -k - Filesystem 1K-blocks Used Available Use% Mounted on - /dev/sda1 921514972 4180880 870523828 1% / - none 16498340 1228 16497112 1% /dev - none 16502856 0 16502856 0% /dev/shm - none 16502856 368 16502488 1% /var/run - none 16502856 0 16502856 0% /var/lock - none 16502856 0 16502856 0% /lib/init/rw - HostA: 921515008 101921792 772783104 12% /var/lib/nova/instances ( <--- this line is important.) - -#. Update the libvirt configurations so that the calls can be made - securely. These methods enable remote access over TCP and are not - documented here. - - - SSH tunnel to libvirtd's UNIX socket - - - libvirtd TCP socket, with GSSAPI/Kerberos for auth+data encryption - - - libvirtd TCP socket, with TLS for encryption and x509 client certs - for authentication - - - libvirtd TCP socket, with TLS for encryption and Kerberos for - authentication - - Restart ``libvirt``. After you run the command, ensure that libvirt is - successfully restarted - - .. code-block:: console - - # stop libvirt-bin && start libvirt-bin - $ ps -ef | grep libvirt - root 1145 1 0 Nov27 ? 00:00:03 /usr/sbin/libvirtd -d -l\ - -#. Configure your firewall to allow libvirt to communicate between nodes. - By default, libvirt listens on TCP port 16509, and an ephemeral TCP - range from 49152 to 49261 is used for the KVM communications. Based on - the secure remote access TCP configuration you chose, be careful which - ports you open, and always understand who has access. For information - about ports that are used with libvirt, - see the `libvirt documentation `_. - -#. Configure the downtime required for the migration by adjusting these - parameters in the ``nova.conf`` file: + You can influence the instance downtime algorithm with the + help of three + configuration variables on the compute hosts: .. code-block:: ini @@ -261,64 +292,77 @@ Example Compute installation environment live_migration_downtime_steps = 10 live_migration_downtime_delay = 75 - The ``live_migration_downtime`` parameter sets the maximum permitted - downtime for a live migration, in milliseconds. This setting defaults to - 500 milliseconds. + ``live_migration_downtime`` sets the maximum permitted + downtime for a live migration, in *milliseconds*. + The default is 500. - The ``live_migration_downtime_steps`` parameter sets the total number of - incremental steps to reach the maximum downtime value. This setting - defaults to 10 steps. + ``live_migration_downtime_steps`` sets the total number of + adjustment steps until ``live_migration_downtime`` is reached. + The default is 10 steps. - The ``live_migration_downtime_delay`` parameter sets the amount of time - to wait between each step, in seconds. This setting defaults to 75 seconds. + ``live_migration_downtime_delay`` + sets the time interval between two + adjustment steps in *seconds*. The default is 75. -#. You can now configure other options for live migration. In most cases, you - will not need to configure any options. For advanced configuration options, - see the `OpenStack Configuration Reference Guide `_. +#. **Auto-convergence** -.. _true-live-migration-kvm-libvirt: + One strategy for a successful live migration of a + memory-intensive instance + is slowing the instance down. This is called auto-convergence. + Both libvirt and QEMU implement this feature by automatically + throttling the instance's CPU when memory copy delays are detected. -Enabling true live migration ----------------------------- + Auto-convergence is disabled by default. + You can enable it by setting + ``live_migration_permit_auto_convergence=true``. -Prior to the Kilo release, the Compute service did not use the libvirt -live migration function by default. To enable this function, add the -following line to the ``[libvirt]`` section of the ``nova.conf`` file: + .. caution:: -.. code-block:: ini + Before enabling auto-convergence, + make sure that the instance's application + tolerates a slow-down. - live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_TUNNELLED + Be aware that auto-convergence does not + guarantee live migration success. -On versions older than Kilo, the Compute service does not use libvirt's -live migration by default because there is a risk that the migration -process will never end. This can happen if the guest operating system -uses blocks on the disk faster than they can be migrated. +#. **Post-copy** -.. _configuring-migrations-kvm-block-migration: + Live migration of a memory-intensive instance is certain to + succeed + when you + enable post-copy. This feature, implemented by libvirt and + QEMU, activates the + virtual machine on the destination host before all of its + memory has been copied. + When the virtual machine accesses a page that is missing on + the destination host, + the resulting page fault is resolved by copying the page from + the source host. -Block migration ---------------- + Post-copy is disabled by default. You can enable it by setting + ``live_migration_permit_post_copy=true``. -Configuring KVM for block migration is exactly the same as the above -configuration in :ref:`configuring-migrations-kvm-shared-storage` -the section called shared storage, except that ``NOVA-INST-DIR/instances`` -is local to each host rather than shared. No NFS client or server -configuration is required. + When you enable both auto-convergence and post-copy, + auto-convergence remains + disabled. -.. note:: + .. caution:: - - To use block migration, you must use the ``--block-migrate`` - parameter with the live migration command. + The page faults introduced by post-copy can slow the + instance down. - - Block migration is incompatible with read-only devices such as - CD-ROMs and `Configuration Drive (config_drive) `_. + When the network connection between source and destination + host is + interrupted, page faults cannot be resolved anymore and the + instance + is rebooted. - - Since the ephemeral drives are copied over the network in block - migration, migrations of instances with heavy I/O loads may never - complete if the drives are writing faster than the data can be - copied over the network. +.. TODO Bernd: I *believe* that it is certain to succeed, +.. but perhaps I am missing something. + +The full list of live migration configuration parameters is documented +in the `OpenStack Configuration Reference Guide +`_ .. _configuring-migrations-xenserver: