Moved security hardening HowTos from Config Ref to Cloud Admin

Moved procedures and theory over to the admin guide. Also moved node recovery into its own file. Edited trusted-flavor procedure with minor edits on rest. Change-Id: I060d79271130d49b9c6b37638943e2f85ffae5cd Partial-Bug: #290687
2014-04-06 15:43:43 +10:00 · 2014-04-06 15:43:43 +10:00 · 0ccb2136b4
commit 0ccb2136b4
parent 469af0158b
6 changed files with 495 additions and 542 deletions
--- a/doc/admin-guide-cloud/compute/section_compute-recover-nodes.xml
+++ b/doc/admin-guide-cloud/compute/section_compute-recover-nodes.xml
@ -0,0 +1,405 @@
 <?xml version="1.0" encoding="UTF-8"?>
 <section xml:id="section_nova-compute-node-down"
    xmlns="http://docbook.org/ns/docbook"
    xmlns:xi="http://www.w3.org/2001/XInclude"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    version="5.0">
    <title>Recover from a failed compute node</title>
    <para>If you deployed Compute with a shared file system, you can quickly recover from a failed
        compute node. Of the two methods covered in these sections, evacuating is the preferred
        method even in the absence of shared storage. Evacuating provides many benefits over manual
        recovery, such as re-attachment of volumes and floating IPs.</para>
     <xi:include href="../../common/section_cli_nova_evacuate.xml"/>
     <section xml:id="nova-compute-node-down-manual-recovery">
         <title>Manual recovery</title>
                <para>To recover a KVM/libvirt compute node, see the previous section. Use the
            following procedure for all other hypervisors.</para>
                <procedure>
                    <title>Review host information</title>
                    <step>
                        <para>Identify the VMs on the affected hosts, using tools such as a
                    combination of <literal>nova list</literal> and <literal>nova show</literal> or
                        <literal>euca-describe-instances</literal>. For example, the following
                    output displays information about instance <systemitem>i-000015b9</systemitem>
                    that is running on node <systemitem>np-rcc54</systemitem>:</para>
                        <screen><prompt>$</prompt> <userinput>euca-describe-instances</userinput>
 <computeroutput>i-000015b9 at3-ui02 running nectarkey (376, np-rcc54) 0 m1.xxlarge 2012-06-19T00:48:11.000Z 115.146.93.60</computeroutput></screen>
                    </step>
                    <step>
                <para>Review the status of the host by querying the Compute database. Some of the
                    important information is highlighted below. The following example converts an
                    EC2 API instance ID into an OpenStack ID; if you used the
                        <literal>nova</literal> commands, you can substitute the ID directly. You
                    can find the credentials for your database in
                        <filename>/etc/nova.conf</filename>.</para>
                <screen><prompt>mysql></prompt> <userinput>SELECT * FROM instances WHERE id = CONV('15b9', 16, 10) \G;</userinput>
 <computeroutput>*************************** 1. row ***************************
              created_at: 2012-06-19 00:48:11
              updated_at: 2012-07-03 00:35:11
              deleted_at: NULL
 ...
                      id: 5561
 ...
             power_state: 5
                vm_state: shutoff
 ...
                hostname: at3-ui02
                    host: np-rcc54
 ...
                    uuid: 3f57699a-e773-4650-a443-b4b37eed5a06
 ...
              task_state: NULL
 ... </computeroutput></screen></step>
                </procedure>
                <procedure>
                    <title>Recover the VM</title>
                    <step>
                        <para>After you have determined the status of the VM on the failed host,
                    decide to which compute host the affected VM should be moved. For example, run
                    the following database command to move the VM to
                        <systemitem>np-rcc46</systemitem>:</para>
                        <screen><prompt>mysql></prompt> <userinput>UPDATE instances SET host = 'np-rcc46' WHERE uuid = '3f57699a-e773-4650-a443-b4b37eed5a06';</userinput></screen>
                    </step>
                    <step>
                        <para>If using a hypervisor that relies on libvirt (such as KVM), it is a
                        good idea to update the <literal>libvirt.xml</literal> file (found in
                            <literal>/var/lib/nova/instances/[instance ID]</literal>). The important
                        changes to make are:</para>
                    <para>
                        <itemizedlist>
                            <listitem>
                                <para>Change the <literal>DHCPSERVER</literal> value to the host IP
                                    address of the compute host that is now the VM's new
                                    home.</para>
                            </listitem>
                            <listitem>
                                <para>Update the VNC IP, if it isn't already updated, to:
                                    <literal>0.0.0.0</literal>.</para>
                            </listitem>
                        </itemizedlist>
                    </para>
                    </step>
                    <step>
                        <para>Reboot the VM:</para>
                        <screen><prompt>$</prompt> <userinput>nova reboot --hard 3f57699a-e773-4650-a443-b4b37eed5a06</userinput></screen>
                    </step>
                 </procedure>
                <para>In theory, the above database update and <literal>nova
                    reboot</literal> command are all that is required to recover a VM from a
                    failed host. However, if further problems occur, consider looking at
                    recreating the network filter configuration using <literal>virsh</literal>,
                    restarting the Compute services or updating the <literal>vm_state</literal>
                    and <literal>power_state</literal> in the Compute database.</para>
            </section>
        <section xml:id="section_nova-uid-mismatch">
            <title>Recover from a UID/GID mismatch</title>
            <para>When running OpenStack Compute, using a shared file system or an automated
            configuration tool, you could encounter a situation where some files on your compute
            node are using the wrong UID or GID. This causes a number of errors, such as being
            unable to do live migration or start virtual machines.</para>
            <para>The following procedure runs on <systemitem class="service"
                >nova-compute</systemitem> hosts, based on the KVM hypervisor, and could help to
            restore the situation:</para>
            <procedure>
                <title>To recover from a UID/GID mismatch</title>
                <step>
                    <para>Ensure you do not use numbers that are already used for some other
                    user/group.</para>
                </step>
                <step>
                    <para>Set the nova uid in <filename>/etc/passwd</filename> to the same number in
                    all hosts (for example, 112).</para>
                </step>
                <step>
                    <para>Set the libvirt-qemu uid in
                            <filename>/etc/passwd</filename> to the
                        same number in all hosts (for example,
                        119).</para>
                </step>
                <step>
                    <para>Set the nova group in
                            <filename>/etc/group</filename> file to
                        the same number in all hosts (for example,
                        120).</para>
                </step>
                <step>
                    <para>Set the libvirtd group in
                            <filename>/etc/group</filename> file to
                        the same number in all hosts (for example,
                        119).</para>
                </step>
                <step>
                    <para>Stop the services on the compute
                        node.</para>
                </step>
                <step>
                    <para>Change all the files owned by user <systemitem>nova</systemitem> or by
                    group <systemitem>nova</systemitem>. For example:</para>
                    <screen><prompt>#</prompt> <userinput>find / -uid 108 -exec chown nova {} \; </userinput># note the 108 here is the old nova uid before the change
 <prompt>#</prompt> <userinput>find / -gid 120 -exec chgrp nova {} \;</userinput></screen>
                </step>
                <step>
                    <para>Repeat the steps for the libvirt-qemu owned files if those needed to
                    change.</para>
                </step>
                <step>
                    <para>Restart the services.</para>
                </step>
                <step>
                    <para>Now you can run the <command>find</command>
                        command to verify that all files using the
                        correct identifiers.</para>
                </step>
            </procedure>
        </section>
        <section xml:id="section_nova-disaster-recovery-process">
            <title>Recover cloud after disaster</title>
            <para>Use the following procedures to manage your cloud after a disaster, and to easily
            back up its persistent storage volumes. Backups <emphasis role="bold">are</emphasis>
            mandatory, even outside of disaster scenarios.</para>
            <para>For a DRP definition, see <link
                xlink:href="http://en.wikipedia.org/wiki/Disaster_Recovery_Plan"
                >http://en.wikipedia.org/wiki/Disaster_Recovery_Plan</link>.</para>
            <simplesect>
                <title>Disaster recovery example</title>
                <para>A disaster could happen to several components of your architecture (for
                example, a disk crash, a network loss, or a power cut). In this example, the
                following components are configured:</para>
                <orderedlist>
                    <listitem>
                        <para>A cloud controller (<systemitem>nova-api</systemitem>,
                            <systemitem>nova-objectstore</systemitem>,
                            <systemitem>nova-network</systemitem>)</para>
                    </listitem>
                    <listitem>
                        <para>A compute node (<systemitem
                                class="service"
                                >nova-compute</systemitem>)</para>
                    </listitem>
                    <listitem>
                        <para>A Storage Area Network (SAN) used by OpenStack Block Storage
                            (<systemitem class="service">cinder-volumes</systemitem>)</para>
                    </listitem>
                </orderedlist>
                <para>The worst disaster for a cloud is a power loss, which applies to all three
                components. Before a power loss:</para>
                <itemizedlist>
                    <listitem>
                        <para>From the SAN to the cloud controller, we have an active iSCSI session
                        (used for the "cinder-volumes" LVM's VG).</para>
                    </listitem>
                    <listitem>
                        <para>From the cloud controller to the compute node, we also have active
                        iSCSI sessions (managed by <systemitem class="service"
                            >cinder-volume</systemitem>).</para>
                    </listitem>
                    <listitem>
                        <para>For every volume, an iSCSI session is made (so 14 ebs volumes equals
                        14 sessions).</para>
                    </listitem>
                    <listitem>
                        <para>From the cloud controller to the compute node, we also have iptables/
                        ebtables rules which allow access from the cloud controller to the running
                        instance.</para>
                    </listitem>
                    <listitem>
                        <para>And at least, from the cloud controller to the compute node; saved
                        into database, the current state of the instances (in that case "running" ),
                        and their volumes attachment (mount point, volume ID, volume status, and so
                        on.)</para>
                    </listitem>
                </itemizedlist>
                <para>After the power loss occurs and all hardware components restart:</para>
                <itemizedlist>
                    <listitem>
                        <para>From the SAN to the cloud, the iSCSI session no longer exists.</para>
                    </listitem>
                    <listitem>
                        <para>From the cloud controller to the compute node, the iSCSI sessions no
                        longer exist.</para>
                    </listitem>
                    <listitem>
                        <para>From the cloud controller to the compute node, the iptables and
                        ebtables are recreated, since at boot, <systemitem>nova-network</systemitem>
                        reapplies configurations.</para>
                    </listitem>
                    <listitem>
                        <para>From the cloud controller, instances are in a shutdown state (because
                        they are no longer running).</para>
                    </listitem>
                    <listitem>
                        <para>In the database, data was not updated at all, since Compute could not
                        have anticipated the crash.</para>
                    </listitem>
                </itemizedlist>
                <para>Before going further, and to prevent the administrator from making fatal
                    mistakes,<emphasis role="bold"> instances won't be lost</emphasis>, because no
                    "<command>destroy</command>" or "<command>terminate</command>" command was
                invoked, so the files for the instances remain on the compute node.</para>
                <para>Perform these tasks in the following order.
                    <warning><para>Do not add any extra steps at this stage.</para></warning></para>
                <para>
                    <orderedlist>
                        <listitem>
                            <para>Get the current relation from a
                                volume to its instance, so that you
                                can recreate the attachment.</para>
                        </listitem>
                        <listitem>
                            <para>Update the database to clean the
                                stalled state. (After that, you cannot
                                perform the first step).</para>
                        </listitem>
                        <listitem>
                            <para>Restart the instances. In other
                                words, go from a shutdown to running
                                state.</para>
                        </listitem>
                        <listitem>
                            <para>After the restart, reattach the volumes to their respective
                            instances (optional).</para>
                        </listitem>
                        <listitem>
                            <para>SSH into the instances to reboot them.</para>
                        </listitem>
                    </orderedlist>
                </para>
            </simplesect>
            <simplesect>
                <title>Recover after a disaster</title>
                <procedure>
                    <title>To perform disaster recovery</title>
                    <step>
                        <title>Get the instance-to-volume
                            relationship</title>
                        <para>You must determine the current relationship from a volume to its
                        instance, because you will re-create the attachment.</para>
                        <para>You can find this relationship by running <command>nova
                            volume-list</command>. Note that the <command>nova</command> client
                        includes the ability to get volume information from OpenStack Block
                        Storage.</para>
                    </step>
                    <step>
                        <title>Update the database</title>
                        <para>Update the database to clean the stalled state. You must restore for
                        every volume, using these queries to clean up the database:</para>
                        <screen><prompt>mysql></prompt> <userinput>use cinder;</userinput>
 <prompt>mysql></prompt> <userinput>update volumes set mountpoint=NULL;</userinput>
 <prompt>mysql></prompt> <userinput>update volumes set status="available" where status &lt;&gt;"error_deleting";</userinput>
 <prompt>mysql></prompt> <userinput>update volumes set attach_status="detached";</userinput>
 <prompt>mysql></prompt> <userinput>update volumes set instance_id=0;</userinput></screen>
                        <para>You can then run <command>nova volume-list</command> commands to list
                        all volumes.</para>
                    </step>
                    <step>
                        <title>Restart instances</title>
                        <para>Restart the instances using the <command>nova reboot
                                <replaceable>$instance</replaceable></command> command.</para>
                        <para>At this stage, depending on your image, some instances completely
                        reboot and become reachable, while others stop on the "plymouth"
                        stage.</para>
                    </step>
                    <step>
                        <title>DO NOT reboot a second time</title>
                        <para>Do not reboot instances that are stopped at this point. Instance state
                        depends on whether you added an <filename>/etc/fstab</filename> entry for
                        that volume. Images built with the <package>cloud-init</package> package
                        remain in a pending state, while others skip the missing volume and start.
                        The idea of that stage is only to ask Compute to reboot every instance, so
                        the stored state is preserved. For more information about
                            <package>cloud-init</package>, see <link
                            xlink:href="https://help.ubuntu.com/community/CloudInit"
                            >help.ubuntu.com/community/CloudInit</link>.</para>
                    </step>
                    <step>
                        <title>Reattach volumes</title>
                        <para>After the restart, and Compute has restored the right status, you can
                        reattach the volumes to their respective instances using the <command>nova
                            volume-attach</command> command. The following snippet uses a file of
                        listed volumes to reattach them:</para>
                        <programlisting language="bash">#!/bin/bash
 while read line; do
    volume=`echo $line | $CUT -f 1 -d " "`
    instance=`echo $line | $CUT -f 2 -d " "`
    mount_point=`echo $line | $CUT -f 3 -d " "`
        echo "ATTACHING VOLUME FOR INSTANCE - $instance"
    nova volume-attach $instance $volume $mount_point
    sleep 2
 done &lt; $volumes_tmp_file</programlisting>
                        <para>At this stage, instances that were pending on the boot sequence
                            (<application>plymouth</application>) automatically continue their boot,
                        and restart normally, while the ones that booted see the volume.</para>
                    </step>
                    <step>
                        <title>SSH into instances</title>
                        <para>If some services depend on the volume, or if a volume has an entry
                        into <systemitem>fstab</systemitem>, you should now simply restart the
                        instance. This restart needs to be made from the instance itself, not
                        through <command>nova</command>.</para>
                    <para>SSH into the instance and perform a reboot:</para>
                        <screen><prompt>#</prompt> <userinput>shutdown -r now</userinput></screen>
                    </step>
                </procedure>
                <para>By completing this procedure, you can
                    successfully recover your cloud.</para>
            <note>
                <para>Follow these guidelines:</para>
                <itemizedlist>
                    <listitem>
                        <para>Use the <parameter> errors=remount</parameter> parameter in the
                                <filename>fstab</filename> file, which prevents data
                            corruption.</para>
                        <para>The system locks any write to the disk if it detects an I/O error.
                            This configuration option should be added into the <systemitem
                                class="service">cinder-volume</systemitem> server (the one which
                            performs the iSCSI connection to the SAN), but also into the instances'
                                <filename>fstab</filename> file.</para>
                    </listitem>
                    <listitem>
                        <para>Do not add the entry for the SAN's disks to the <systemitem
                                class="service">cinder-volume</systemitem>'s
                                <filename>fstab</filename> file.</para>
                        <para>Some systems hang on that step, which means you could lose access to
                            your cloud-controller. To re-run the session manually, run the following
                            command before performing the mount:
                            <screen><prompt>#</prompt> <userinput>iscsiadm -m discovery -t st -p $SAN_IP $ iscsiadm -m node --target-name $IQN -p $SAN_IP -l</userinput></screen></para>
                    </listitem>
                    <listitem>
                        <para>For your instances, if you have the whole <filename>/home/</filename>
                            directory on the disk, leave a user's directory with the user's bash
                            files and the <filename>authorized_keys</filename> file (instead of
                            emptying the <filename>/home</filename> directory and mapping the disk
                            on it).</para>
                        <para>This enables you to connect to the instance, even without the volume
                            attached, if you allow only connections through public keys.</para>
                    </listitem>
                </itemizedlist>
            </note>
            </simplesect>
            <simplesect>
                <title>Script the DRP</title>
                    <para>You can download from <link
                        xlink:href="https://github.com/Razique/BashStuff/blob/master/SYSTEMS/OpenStack/SCR_5006_V00_NUAC-OPENSTACK-DRP-OpenStack.sh"
                        >here</link> a bash script which performs the following steps:</para>
                    <orderedlist>
                        <listitem><para>An array is created for instances and their attached volumes.</para></listitem>
                        <listitem><para>The MySQL database is updated.</para></listitem>
                        <listitem><para>Using <systemitem>euca2ools</systemitem>, all instances are restarted.</para></listitem>
                        <listitem><para>The volume attachment is made.</para></listitem>
                        <listitem><para>An SSH connection is performed into every instance using Compute credentials.</para></listitem>
                    </orderedlist>
                    <para>The "test mode" allows you to perform
                            that whole sequence for only one
                            instance.</para>
                    <para>To reproduce the power loss, connect to the compute node which runs
                        that same instance and close the iSCSI session. Do not detach the volume using the <command>nova
                            volume-detach</command> command; instead, manually close the iSCSI session. For the following
                        example command uses an iSCSI session with the number 15:</para>
                        <screen><prompt>#</prompt> <userinput>iscsiadm -m session -u -r 15</userinput></screen>
                        <para>Do not forget the <literal>-r</literal>
                            flag. Otherwise, you close ALL
                            sessions.</para>
            </simplesect>
        </section>
 </section>
--- a/doc/admin-guide-cloud/compute/section_compute-security.xml
+++ b/doc/admin-guide-cloud/compute/section_compute-security.xml
--- a/doc/admin-guide-cloud/compute/section_compute-system-admin.xml
+++ b/doc/admin-guide-cloud/compute/section_compute-system-admin.xml
@ -500,437 +500,6 @@ local0.error    @@172.20.1.43:1024</programlisting>
        </section>
        <xi:include href="../../common/section_compute-configure-console.xml"/>
        <xi:include href="section_compute-configure-service-groups.xml"/>
-        <section xml:id="section_nova-compute-node-down">
+        <xi:include href="section_compute-security.xml"/>
-            <title>Recover from a failed compute node</title>
+        <xi:include href="section_compute-recover-nodes.xml"/>
            <para>If you have deployed Compute with a shared file
                system, you can quickly recover from a failed compute
                node. Of the two methods covered in these sections,
                the evacuate API is the preferred method even in the
                absence of shared storage. The evacuate API provides
                many benefits over manual recovery, such as
                re-attachment of volumes and floating IPs.</para>
            <xi:include href="../../common/section_cli_nova_evacuate.xml"/>
            <section xml:id="nova-compute-node-down-manual-recovery">
                <title>Manual recovery</title>
                <para>For KVM/libvirt compute node recovery, see the previous section. Use the
                following procedure for all other hypervisors.</para>
                <procedure>
                    <title>To work with host information</title>
                    <step>
                        <para>Identify the VMs on the affected hosts, using tools such as a
                        combination of <literal>nova list</literal> and <literal>nova show</literal>
                        or <literal>euca-describe-instances</literal>. Here's an example using the
                        EC2 API - instance i-000015b9 that is running on node np-rcc54:</para>
                        <programlisting language="bash">i-000015b9 at3-ui02 running nectarkey (376, np-rcc54) 0 m1.xxlarge 2012-06-19T00:48:11.000Z 115.146.93.60</programlisting>
                    </step>
                    <step>
                        <para>You can review the status of the host by using the Compute database.
                        Some of the important information is highlighted below. This example
                        converts an EC2 API instance ID into an OpenStack ID; if you used the
                            <literal>nova</literal> commands, you can substitute the ID directly.
                        You can find the credentials for your database in
                            <filename>/etc/nova.conf</filename>.</para>
                        <programlisting language="bash">SELECT * FROM instances WHERE id = CONV('15b9', 16, 10) \G;
 *************************** 1. row ***************************
              created_at: 2012-06-19 00:48:11
              updated_at: 2012-07-03 00:35:11
              deleted_at: NULL
 ...
                      id: 5561
 ...
             power_state: 5
                vm_state: shutoff
 ...
                hostname: at3-ui02
                    host: np-rcc54
 ...
                    uuid: 3f57699a-e773-4650-a443-b4b37eed5a06
 ...
              task_state: NULL
 ...</programlisting>
                    </step>
                </procedure>
                <procedure>
                    <title>To recover the VM</title>
                    <step>
                        <para>When you know the status of the VM on the failed host, determine to
                        which compute host the affected VM should be moved. For example, run the
                        following database command to move the VM to np-rcc46:</para>
                        <programlisting language="bash">UPDATE instances SET host = 'np-rcc46' WHERE uuid = '3f57699a-e773-4650-a443-b4b37eed5a06'; </programlisting>
                    </step>
                    <step>
                        <para>If using a hypervisor that relies on libvirt (such as KVM), it is a
                        good idea to update the <literal>libvirt.xml</literal> file (found in
                            <literal>/var/lib/nova/instances/[instance ID]</literal>). The important
                        changes to make are:</para>
                    <para>
                        <itemizedlist>
                            <listitem>
                                <para>Change the <literal>DHCPSERVER</literal> value to the host IP
                                    address of the compute host that is now the VM's new
                                    home.</para>
                            </listitem>
                            <listitem>
                                <para>Update the VNC IP if it isn't already to:
                                        <literal>0.0.0.0</literal>.</para>
                            </listitem>
                        </itemizedlist>
                    </para>
                    </step>
                    <step>
                        <para>Reboot the VM:</para>
                        <screen><prompt>$</prompt> <userinput>nova reboot --hard 3f57699a-e773-4650-a443-b4b37eed5a06</userinput></screen>
                    </step>
                 </procedure>
                <para>In theory, the above database update and <literal>nova
                    reboot</literal> command are all that is required to recover a VM from a
                    failed host. However, if further problems occur, consider looking at
                    recreating the network filter configuration using <literal>virsh</literal>,
                    restarting the Compute services or updating the <literal>vm_state</literal>
                    and <literal>power_state</literal> in the Compute database.</para>
            </section>
        </section>
        <section xml:id="section_nova-uid-mismatch">
            <title>Recover from a UID/GID mismatch</title>
            <para>When running OpenStack compute, using a shared file
                system or an automated configuration tool, you could
                encounter a situation where some files on your compute
                node are using the wrong UID or GID. This causes a
                raft of errors, such as being unable to live migrate,
                or start virtual machines.</para>
            <para>The following procedure runs on <systemitem class="service"
                >nova-compute</systemitem> hosts, based on the KVM hypervisor, and could help to
            restore the situation:</para>
            <procedure>
                <title>To recover from a UID/GID mismatch</title>
                <step>
                    <para>Ensure you don't use numbers that are already used for some other
                    user/group.</para>
                </step>
                <step>
                    <para>Set the nova uid in <filename>/etc/passwd</filename> to the same number in
                    all hosts (for example, 112).</para>
                </step>
                <step>
                    <para>Set the libvirt-qemu uid in
                            <filename>/etc/passwd</filename> to the
                        same number in all hosts (for example,
                        119).</para>
                </step>
                <step>
                    <para>Set the nova group in
                            <filename>/etc/group</filename> file to
                        the same number in all hosts (for example,
                        120).</para>
                </step>
                <step>
                    <para>Set the libvirtd group in
                            <filename>/etc/group</filename> file to
                        the same number in all hosts (for example,
                        119).</para>
                </step>
                <step>
                    <para>Stop the services on the compute
                        node.</para>
                </step>
                <step>
                    <para>Change all the files owned by user nova or
                        by group nova. For example:</para>
                    <programlisting language="bash">find / -uid 108 -exec chown nova {} \; # note the 108 here is the old nova uid before the change
 find / -gid 120 -exec chgrp nova {} \;</programlisting>
                </step>
                <step>
                    <para>Repeat the steps for the libvirt-qemu owned files if those needed to
                    change.</para>
                </step>
                <step>
                    <para>Restart the services.</para>
                </step>
                <step>
                    <para>Now you can run the <command>find</command>
                        command to verify that all files using the
                        correct identifiers.</para>
                </step>
            </procedure>
        </section>
        <section xml:id="section_nova-disaster-recovery-process">
            <title>Compute disaster recovery process</title>
            <para>Use the following procedures to manage your cloud after a disaster, and to easily
            back up its persistent storage volumes. Backups <emphasis role="bold">are</emphasis>
            mandatory, even outside of disaster scenarios.</para>
            <para>For a DRP definition, see <link
                xlink:href="http://en.wikipedia.org/wiki/Disaster_Recovery_Plan"
                >http://en.wikipedia.org/wiki/Disaster_Recovery_Plan</link>.</para>
            <simplesect>
                <title>A- The disaster recovery process
                    presentation</title>
                <para>A disaster could happen to several components of
                    your architecture: a disk crash, a network loss, a
                    power cut, and so on. In this example, assume the
                    following set up:</para>
                <orderedlist>
                    <listitem>
                        <para>A cloud controller (<systemitem>nova-api</systemitem>,
                            <systemitem>nova-objecstore</systemitem>,
                            <systemitem>nova-network</systemitem>)</para>
                    </listitem>
                    <listitem>
                        <para>A compute node (<systemitem
                                class="service"
                                >nova-compute</systemitem>)</para>
                    </listitem>
                    <listitem>
                        <para>A Storage Area Network used by
                                <systemitem class="service"
                                >cinder-volumes</systemitem> (aka
                            SAN)</para>
                    </listitem>
                </orderedlist>
                <para>The disaster example is the worst one: a power
                    loss. That power loss applies to the three
                    components. <emphasis role="italic">Let's see what
                        runs and how it runs before the
                        crash</emphasis>:</para>
                <itemizedlist>
                    <listitem>
                        <para>From the SAN to the cloud controller, we
                            have an active iscsi session (used for the
                            "cinder-volumes" LVM's VG).</para>
                    </listitem>
                    <listitem>
                        <para>From the cloud controller to the compute node, we also have active
                        iscsi sessions (managed by <systemitem class="service"
                            >cinder-volume</systemitem>).</para>
                    </listitem>
                    <listitem>
                        <para>For every volume, an iscsi session is made (so 14 ebs volumes equals
                        14 sessions).</para>
                    </listitem>
                    <listitem>
                        <para>From the cloud controller to the compute node, we also have iptables/
                        ebtables rules which allow access from the cloud controller to the running
                        instance.</para>
                    </listitem>
                    <listitem>
                        <para>And at least, from the cloud controller to the compute node; saved
                        into database, the current state of the instances (in that case "running" ),
                        and their volumes attachment (mount point, volume ID, volume status, and so
                        on.)</para>
                    </listitem>
                </itemizedlist>
                <para>Now, after the power loss occurs and all
                    hardware components restart, the situation is as
                    follows:</para>
                <itemizedlist>
                    <listitem>
                        <para>From the SAN to the cloud, the ISCSI
                            session no longer exists.</para>
                    </listitem>
                    <listitem>
                        <para>From the cloud controller to the compute
                            node, the ISCSI sessions no longer exist.
                        </para>
                    </listitem>
                    <listitem>
                        <para>From the cloud controller to the compute node, the iptables and
                        ebtables are recreated, since, at boot,
                            <systemitem>nova-network</systemitem> reapplies the
                        configurations.</para>
                    </listitem>
                    <listitem>
                        <para>From the cloud controller, instances are in a shutdown state (because
                        they are no longer running)</para>
                    </listitem>
                    <listitem>
                        <para>In the database, data was not updated at all, since Compute could not
                        have anticipated the crash.</para>
                    </listitem>
                </itemizedlist>
                <para>Before going further, and to prevent the administrator from making fatal
                    mistakes,<emphasis role="bold"> the instances won't be lost</emphasis>, because
                no "<command role="italic">destroy</command>" or "<command role="italic"
                    >terminate</command>" command was invoked, so the files for the instances remain
                on the compute node.</para>
                <para>Perform these tasks in this exact order. <emphasis role="underline">Any extra
                    step would be dangerous at this stage</emphasis> :</para>
                <para>
                    <orderedlist>
                        <listitem>
                            <para>Get the current relation from a
                                volume to its instance, so that you
                                can recreate the attachment.</para>
                        </listitem>
                        <listitem>
                            <para>Update the database to clean the
                                stalled state. (After that, you cannot
                                perform the first step).</para>
                        </listitem>
                        <listitem>
                            <para>Restart the instances. In other
                                words, go from a shutdown to running
                                state.</para>
                        </listitem>
                        <listitem>
                            <para>After the restart, reattach the volumes to their respective
                            instances (optional).</para>
                        </listitem>
                        <listitem>
                            <para>SSH into the instances to reboot them.</para>
                        </listitem>
                    </orderedlist>
                </para>
            </simplesect>
            <simplesect>
                <title>B - Disaster recovery</title>
                <procedure>
                    <title>To perform disaster recovery</title>
                    <step>
                        <title>Get the instance-to-volume
                            relationship</title>
                        <para>You must get the current relationship from a volume to its instance,
                        because you will re-create the attachment.</para>
                        <para>You can find this relationship by running <command>nova
                            volume-list</command>. Note that the <command>nova</command> client
                        includes the ability to get volume information from Block Storage.</para>
                    </step>
                    <step>
                        <title>Update the database</title>
                        <para>Update the database to clean the stalled state. You must restore for
                        every volume, using these queries to clean up the database:</para>
                        <screen><prompt>mysql></prompt> <userinput>use cinder;</userinput>
 <prompt>mysql></prompt> <userinput>update volumes set mountpoint=NULL;</userinput>
 <prompt>mysql></prompt> <userinput>update volumes set status="available" where status &lt;&gt;"error_deleting";</userinput>
 <prompt>mysql></prompt> <userinput>update volumes set attach_status="detached";</userinput>
 <prompt>mysql></prompt> <userinput>update volumes set instance_id=0;</userinput></screen>
                        <para>Then, when you run <command>nova volume-list</command> commands, all
                        volumes appear in the listing.</para>
                    </step>
                    <step>
                        <title>Restart instances</title>
                        <para>Restart the instances using the <command>nova reboot
                                <replaceable>$instance</replaceable></command> command.</para>
                        <para>At this stage, depending on your image, some instances completely
                        reboot and become reachable, while others stop on the "plymouth"
                        stage.</para>
                    </step>
                    <step>
                        <title>DO NOT reboot a second time</title>
                        <para>Do not reboot instances that are stopped at this point. Instance state
                        depends on whether you added an <filename>/etc/fstab</filename> entry for
                        that volume. Images built with the <package>cloud-init</package> package
                        remain in a pending state, while others skip the missing volume and start.
                        The idea of that stage is only to ask nova to reboot every instance, so the
                        stored state is preserved. For more information about
                            <package>cloud-init</package>, see <link
                            xlink:href="https://help.ubuntu.com/community/CloudInit"
                            >help.ubuntu.com/community/CloudInit</link>.</para>
                    </step>
                    <step>
                        <title>Reattach volumes</title>
                        <para>After the restart, you can reattach the volumes to their respective
                        instances. Now that <command>nova</command> has restored the right status,
                        it is time to perform the attachments through a <command>nova
                            volume-attach</command></para>
                        <para>This simple snippet uses the created
                            file:</para>
                        <programlisting language="bash">#!/bin/bash
 while read line; do
    volume=`echo $line | $CUT -f 1 -d " "`
    instance=`echo $line | $CUT -f 2 -d " "`
    mount_point=`echo $line | $CUT -f 3 -d " "`
        echo "ATTACHING VOLUME FOR INSTANCE - $instance"
    nova volume-attach $instance $volume $mount_point
    sleep 2
 done &lt; $volumes_tmp_file</programlisting>
                        <para>At that stage, instances that were
                            pending on the boot sequence (<emphasis
                                role="italic">plymouth</emphasis>)
                            automatically continue their boot, and
                            restart normally, while the ones that
                            booted see the volume.</para>
                    </step>
                    <step>
                        <title>SSH into instances</title>
                        <para>If some services depend on the volume, or if a volume has an entry
                        into <systemitem>fstab</systemitem>, it could be good to simply restart the
                        instance. This restart needs to be made from the instance itself, not
                        through <command>nova</command>. So, we SSH into the instance and perform a
                        reboot:</para>
                        <screen><prompt>#</prompt> <userinput>shutdown -r now</userinput></screen>
                    </step>
                </procedure>
                <para>By completing this procedure, you can
                    successfully recover your cloud.</para>
            <note>
                <para>Follow these guidelines:</para>
                <itemizedlist>
                    <listitem>
                        <para>Use the <parameter> errors=remount</parameter> parameter in the
                                <filename>fstab</filename> file, which prevents data
                            corruption.</para>
                        <para>The system locks any write to the disk if it detects an I/O error.
                            This configuration option should be added into the <systemitem
                                class="service">cinder-volume</systemitem> server (the one which
                            performs the ISCSI connection to the SAN), but also into the instances'
                                <filename>fstab</filename> file.</para>
                    </listitem>
                    <listitem>
                        <para>Do not add the entry for the SAN's disks to the <systemitem
                                class="service">cinder-volume</systemitem>'s
                                <filename>fstab</filename> file.</para>
                        <para>Some systems hang on that step, which means you could lose access to
                            your cloud-controller. To re-run the session manually, you would run the
                            following command before performing the mount:
                            <screen><prompt>#</prompt> <userinput>iscsiadm -m discovery -t st -p $SAN_IP $ iscsiadm -m node --target-name $IQN -p $SAN_IP -l</userinput></screen></para>
                    </listitem>
                    <listitem>
                        <para>For your instances, if you have the whole <filename>/home/</filename>
                            directory on the disk, instead of emptying the
                                <filename>/home</filename> directory and map the disk on it, leave a
                            user's directory with the user's bash files and the
                                <filename>authorized_keys</filename> file.</para>
                        <para>This enables you to connect to the instance, even without the volume
                            attached, if you allow only connections through public keys.</para>
                    </listitem>
                </itemizedlist>
            </note>
            </simplesect>
            <simplesect>
                <title>C - Scripted DRP</title>
                <procedure>
                    <title>To use scripted DRP</title>
                    <para>You can download from <link
                            xlink:href="https://github.com/Razique/BashStuff/blob/master/SYSTEMS/OpenStack/SCR_5006_V00_NUAC-OPENSTACK-DRP-OpenStack.sh"
                            >here</link> a bash script which performs
                        these steps:</para>
                    <step>
                        <para>The "test mode" allows you to perform
                            that whole sequence for only one
                            instance.</para>
                    </step>
                    <step>
                        <para>To reproduce the power loss, connect to
                            the compute node which runs that same
                            instance and close the iscsi session.
                                <emphasis role="underline">Do not
                                detach the volume through
                                   <command>nova
                                   volume-detach</command></emphasis>,
                            but instead manually close the iscsi
                            session.</para>
                    </step>
                    <step>
                        <para>In this example, the iscsi session is
                            number 15 for that instance:</para>
                        <screen><prompt>#</prompt> <userinput>iscsiadm -m session -u -r 15</userinput></screen>
                    </step>
                    <step>
                        <para>Do not forget the <literal>-r</literal>
                            flag. Otherwise, you close ALL
                            sessions.</para>
                    </step>
                </procedure>
            </simplesect>
        </section>
    </section>
--- a/doc/common/section_cli_nova_evacuate.xml
+++ b/doc/common/section_cli_nova_evacuate.xml
@ -4,34 +4,26 @@
    xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"
    xml:id="nova_cli_evacuate">
    <title>Evacuate instances</title>
-    <para>If a cloud compute node fails due to a hardware malfunction
+    <para>If a cloud compute node fails due to a hardware malfunction or another reason, you can
-        or another reason, you can evacuate instances to make them
+        evacuate instances to make them available again. You can choose evacuation parameters for
-        available again.</para>
+        your use case.</para>
-    <para>You can choose evacuation parameters for your use
+    <para>To preserve user data on server disk, you must configure shared storage on the target
-        case.</para>
+        host. Also, you must validate that the current VM host is down; otherwise, the evacuation
    <para>To preserve user data on server disk, you must configure
        shared storage on the target host. Also, you must validate
        that the current VM host is down. Otherwise the evacuation
        fails with an error.</para>
    <procedure xml:id="evacuate_shared">
        <step>
-            <para>To find a different host for the evacuated instance,
+            <para>To list hosts and find a different host for the evacuated instance, run:</para>
                run this command to list hosts:</para>
            <screen><prompt>$</prompt> <userinput>nova host-list</userinput></screen>
        </step>
        <step>
-            <para>You can pass the instance password to the command by
+            <para>Evacuate the instance. You can pass the instance password to the command by using
-                using the <literal>--password &lt;pwd&gt;</literal>
+                the <literal>--password &lt;pwd&gt;</literal> option. If you do not specify a
-                option. If you do not specify a password, one is
+                password, one is generated and printed after the command finishes successfully. The
-                generated and printed after the command finishes
+                following command evacuates a server without shared storage from a host that is down
-                successfully. The following command evacuates a server
+                to the specified <replaceable>host_b</replaceable>:</para>
                without shared storage:</para>
            <screen><prompt>$</prompt> <userinput>nova evacuate <replaceable>evacuated_server_name</replaceable> <replaceable>host_b</replaceable></userinput> </screen>
-            <para>The command evacuates an instance from a down host
+            <para>The instance is booted from a new disk, but preserves its configuration including
-                to a specified host. The instance is booted from a new
+                its ID, name, uid, IP address, and so on. The command returns a password:</para>
                disk, but preserves its configuration including its
                ID, name, uid, IP address, and so on. The command
                returns a password:</para>
            <screen><computeroutput><?db-font-size 70%?>+-----------+--------------+
 | Property  |    Value     |
 +-----------+--------------+
@ -39,14 +31,12 @@
 +-----------+--------------+</computeroutput></screen>
        </step>
        <step>
-            <para>To preserve the user disk data on the evacuated
+            <para>To preserve the user disk data on the evacuated server, deploy OpenStack Compute
-                server, deploy OpenStack Compute with shared file
+                with a shared file system. To configure your system, see <link
                system. To configure your system, see <link
                    xlink:href="http://docs.openstack.org/havana/config-reference/content/configuring-openstack-compute-basics.html#section_configuring-compute-migrations"
-                    >Configure migrations</link> in
+                    >Configure migrations</link> in <citetitle>OpenStack Configuration
-                    <citetitle>OpenStack Configuration
+                    Reference</citetitle>. In the following example, the password remains
-                    Reference</citetitle>. In this example, the
+                unchanged:</para>
                password remains unchanged.</para>
            <screen><prompt>$</prompt> <userinput>nova evacuate <replaceable>evacuated_server_name</replaceable> <replaceable>host_b</replaceable> --on-shared-storage</userinput>    </screen>
        </step>
    </procedure>
--- a/doc/common/section_trusted-compute-pools.xml
+++ b/doc/common/section_trusted-compute-pools.xml
@ -4,16 +4,14 @@
    xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"
    xml:id="trusted-compute-pools">
    <title>Trusted compute pools</title>
-    <para>Trusted compute pools enable administrators to designate a
+    <para>Trusted compute pools enable administrators to designate a group of compute hosts as
-        group of compute hosts as trusted. These hosts use hardware-based
+        trusted. These hosts use hardware-based security features, such as the Intel Trusted
-        security features, such as the Intel Trusted Execution
+        Execution Technology (TXT), to provide an additional level of security. Combined with an
-        Technology (TXT), to provide an additional level of security.
+        external stand-alone, web-based remote attestation server, cloud providers can ensure that
-        Combined with an external stand-alone web-based remote
+        the compute node runs only software with verified measurements and can ensure a secure cloud
-        attestation server, cloud providers can ensure that the
+        stack.</para>
-        compute node runs only software with verified measurements and
+    <para>Using the trusted compute pools, cloud subscribers can request services to run on verified
-        can ensure a secure cloud stack.</para>
+        compute nodes.</para>
    <para>Through the trusted compute pools, cloud subscribers can
        request services to run on verified compute nodes.</para>
    <para>The remote attestation server performs node verification as
        follows:</para>
    <orderedlist>
@ -26,13 +24,12 @@
                measured.</para>
        </listitem>
        <listitem>
-            <para>Measured data is sent to the attestation server when
+            <para>Measured data is sent to the attestation server when challenged by the attestation
-                challenged by attestation server.</para>
+                server.</para>
        </listitem>
        <listitem>
-            <para>The attestation server verifies those measurements
+            <para>The attestation server verifies those measurements against a good and known
-                against a good and known database to determine nodes'
+                database to determine node trustworthiness.</para>
                trustworthiness.</para>
        </listitem>
    </orderedlist>
    <para>A description of how to set up an attestation service is
@ -57,27 +54,40 @@
        <title>Configure Compute to use trusted compute pools</title>
        <procedure>
            <step>
-                <para>Configure the Compute service with the
+                <para>Enable scheduling support for trusted compute pools by adding the following
-                    connection information for the attestation
+                    lines in the <literal>DEFAULT</literal> section in the
-                    service.</para>
+                        <filename>/etc/nova/nova.conf</filename> file:</para>
-                <para>Specify these connection options in the
+                <programlisting language="ini">[DEFAULT]
-                        <literal>trusted_computing</literal> section
+compute_scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
-                    in the <filename>nova.conf</filename>
+scheduler_available_filters=nova.scheduler.filters.all_filters
-                    configuration file:</para>
+scheduler_default_filters=AvailabilityZoneFilter,RamFilter,ComputeFilter,TrustedFilter</programlisting>
            </step>
            <step>
                <para>Specify the connection information for your attestation service by adding the
                    following lines to the <literal>trusted_computing</literal> section in the
                        <filename>/etc/nova/nova.conf</filename> file:</para>
                <programlisting language="ini">[trusted_computing]
 server=10.1.71.206
 port=8443
 server_ca_file=/etc/nova/ssl.10.1.71.206.crt
 # If using OAT v1.5, use this api_url:
 api_url=/AttestationService/resources
 # If using OAT pre-v1.5, use this api_url:
 #api_url=/OpenAttestationWebServices/V1.0
 auth_blob=i-am-openstack</programlisting>
            <para>Where:</para>
                <variablelist>
                    <varlistentry>
                        <term>server</term>
                        <listitem>
-                            <para>Host name or IP address of the host
+                            <para>Host name or IP address of the host that runs the attestation
-                                that runs the attestation
+                                service.</para>
                                service</para>
                        </listitem>
                    </varlistentry>
                    <varlistentry>
                        <term>port</term>
                        <listitem>
-                            <para>HTTPS port for the attestation
+                            <para>HTTPS port for the attestation service.</para>
                                service</para>
                        </listitem>
                    </varlistentry>
                    <varlistentry>
@ -90,8 +100,7 @@
                    <varlistentry>
                        <term>api_url</term>
                        <listitem>
-                            <para>The attestation service URL
+                            <para>The attestation service's URL path.</para>
                                path.</para>
                        </listitem>
                    </varlistentry>
                    <varlistentry>
@ -104,31 +113,6 @@
                    </varlistentry>
                </variablelist>
            </step>
            <step>
                <para>To enable scheduling support for trusted compute
                    pools, add the following lines to the
                        <literal>DEFAULT</literal> and
                        <literal>trusted_computing</literal> sections
                    in the <filename>/etc/nova/nova.conf</filename>
                    file. Edit the details in the
                        <literal>trusted_computing</literal> section
                    based on the details of your attestation
                    service:</para>
                <programlisting language="ini">[DEFAULT]
 compute_scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
 scheduler_available_filters=nova.scheduler.filters.all_filters
 scheduler_default_filters=AvailabilityZoneFilter,RamFilter,ComputeFilter,TrustedFilter
 [trusted_computing]
 server=10.1.71.206
 port=8443
 server_ca_file=/etc/nova/ssl.10.1.71.206.crt
 # If using OAT v1.5, use this api_url:
 api_url=/AttestationService/resources
 # If using OAT pre-v1.5, use this api_url:
 #api_url=/OpenAttestationWebServices/V1.0
 auth_blob=i-am-openstack</programlisting>
            </step>
            <step>
                <para>Restart the <systemitem class="service"
                        >nova-compute</systemitem> and <systemitem
@ -138,35 +122,41 @@ auth_blob=i-am-openstack</programlisting>
        </procedure>
        <section xml:id="config_ref">
            <title>Configuration reference</title>
-            <para>To customize the trusted compute pools, use the configuration
+            <para>To customize the trusted compute pools, use the following configuration
-                option settings documented in <xref
+                option settings:
-                    linkend="config_table_nova_trustedcomputing"/>.</para>
+            </para>
            <xi:include href="tables/nova-trustedcomputing.xml"/>
        </section>
    </section>
    <section xml:id="trusted_flavors">
        <title>Specify trusted flavors</title>
-        <para>You must configure one or more flavors as
+        <para>To designate hosts as trusted:</para>
-                trusted. Users can request
+        <procedure>
-            trusted nodes by specifying a trusted flavor when they
+                <step>
-            boot an instance.</para>
+                    <para>Configure one or more flavors as trusted by using the <command>nova
-        <para>Use the <command>nova flavor-key set</command> command
+                        flavor-key set</command> command. For example, to set the
-            to set a flavor as trusted. For example, to set the
+                        <literal>m1.tiny</literal> flavor as trusted:</para>
-                <literal>m1.tiny</literal> flavor as trusted:</para>
+                    <screen><prompt>$</prompt> <userinput>nova flavor-key m1.tiny set trust:trusted_host trusted</userinput></screen>
-        <screen><prompt>$</prompt> <userinput>nova flavor-key m1.tiny set trust:trusted_host trusted</userinput></screen>
+                </step>
-        <para>To request that their instances run on a trusted host,
+            <step><para>Request that your instance be run on a trusted host, by specifying a trusted flavor when
-            users can specify a trusted flavor on the <command>nova
+                    booting the instance. For example:</para>
-                boot</command> command:</para>
+                <screen><prompt>$</prompt> <userinput>nova boot --flavor m1.tiny --key_name myKeypairName --image myImageID newInstanceName</userinput></screen>
-        <mediaobject>
+                <figure xml:id="concept_trusted_pool">
-            <imageobject role="fo">
+                    <title>Trusted compute pool</title>
-                <imagedata
+                    <mediaobject>
-                    fileref="figures/OpenStackTrustedComputePool2.png"
+                        <imageobject role="fo">
-                    format="PNG" contentwidth="6in"/>
+                            <imagedata
-            </imageobject>
+                                fileref="figures/OpenStackTrustedComputePool2.png"
-            <imageobject role="html">
+                                format="PNG" contentwidth="6in"/>
-                <imagedata
+                        </imageobject>
-                    fileref="figures/OpenStackTrustedComputePool2.png"
+                        <imageobject role="html">
-                    format="PNG" contentwidth="6in"/>
+                            <imagedata
-            </imageobject>
+                                fileref="figures/OpenStackTrustedComputePool2.png"
-        </mediaobject>
+                                format="PNG" contentwidth="6in"/>
                        </imageobject>
                    </mediaobject>
                </figure>
            </step>
            </procedure>
    </section>
 </section>
--- a/doc/config-reference/ch_computeconfigure.xml
+++ b/doc/config-reference/ch_computeconfigure.xml
@ -92,7 +92,6 @@
  <xi:include href="compute/section_compute-scheduler.xml"/>
  <xi:include href="compute/section_compute-cells.xml"/>
  <xi:include href="compute/section_compute-conductor.xml"/>
  <xi:include href="compute/section_compute-security.xml"/>
  <xi:include href="compute/section_compute-config-samples.xml"/>
  <xi:include href="compute/section_nova-log-files.xml"/>
  <xi:include href="compute/section_compute-options-reference.xml"/>