Moved security hardening HowTos from Config Ref to Cloud Admin
Moved procedures and theory over to the admin guide. Also moved node recovery into its own file. Edited trusted-flavor procedure with minor edits on rest. Change-Id: I060d79271130d49b9c6b37638943e2f85ffae5cd Partial-Bug: #290687
This commit is contained in:
parent
469af0158b
commit
0ccb2136b4
405
doc/admin-guide-cloud/compute/section_compute-recover-nodes.xml
Normal file
405
doc/admin-guide-cloud/compute/section_compute-recover-nodes.xml
Normal file
@ -0,0 +1,405 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<section xml:id="section_nova-compute-node-down"
|
||||||
|
xmlns="http://docbook.org/ns/docbook"
|
||||||
|
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||||
|
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||||
|
version="5.0">
|
||||||
|
<title>Recover from a failed compute node</title>
|
||||||
|
<para>If you deployed Compute with a shared file system, you can quickly recover from a failed
|
||||||
|
compute node. Of the two methods covered in these sections, evacuating is the preferred
|
||||||
|
method even in the absence of shared storage. Evacuating provides many benefits over manual
|
||||||
|
recovery, such as re-attachment of volumes and floating IPs.</para>
|
||||||
|
<xi:include href="../../common/section_cli_nova_evacuate.xml"/>
|
||||||
|
<section xml:id="nova-compute-node-down-manual-recovery">
|
||||||
|
<title>Manual recovery</title>
|
||||||
|
<para>To recover a KVM/libvirt compute node, see the previous section. Use the
|
||||||
|
following procedure for all other hypervisors.</para>
|
||||||
|
<procedure>
|
||||||
|
<title>Review host information</title>
|
||||||
|
<step>
|
||||||
|
<para>Identify the VMs on the affected hosts, using tools such as a
|
||||||
|
combination of <literal>nova list</literal> and <literal>nova show</literal> or
|
||||||
|
<literal>euca-describe-instances</literal>. For example, the following
|
||||||
|
output displays information about instance <systemitem>i-000015b9</systemitem>
|
||||||
|
that is running on node <systemitem>np-rcc54</systemitem>:</para>
|
||||||
|
<screen><prompt>$</prompt> <userinput>euca-describe-instances</userinput>
|
||||||
|
<computeroutput>i-000015b9 at3-ui02 running nectarkey (376, np-rcc54) 0 m1.xxlarge 2012-06-19T00:48:11.000Z 115.146.93.60</computeroutput></screen>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<para>Review the status of the host by querying the Compute database. Some of the
|
||||||
|
important information is highlighted below. The following example converts an
|
||||||
|
EC2 API instance ID into an OpenStack ID; if you used the
|
||||||
|
<literal>nova</literal> commands, you can substitute the ID directly. You
|
||||||
|
can find the credentials for your database in
|
||||||
|
<filename>/etc/nova.conf</filename>.</para>
|
||||||
|
<screen><prompt>mysql></prompt> <userinput>SELECT * FROM instances WHERE id = CONV('15b9', 16, 10) \G;</userinput>
|
||||||
|
<computeroutput>*************************** 1. row ***************************
|
||||||
|
created_at: 2012-06-19 00:48:11
|
||||||
|
updated_at: 2012-07-03 00:35:11
|
||||||
|
deleted_at: NULL
|
||||||
|
...
|
||||||
|
id: 5561
|
||||||
|
...
|
||||||
|
power_state: 5
|
||||||
|
vm_state: shutoff
|
||||||
|
...
|
||||||
|
hostname: at3-ui02
|
||||||
|
host: np-rcc54
|
||||||
|
...
|
||||||
|
uuid: 3f57699a-e773-4650-a443-b4b37eed5a06
|
||||||
|
...
|
||||||
|
task_state: NULL
|
||||||
|
... </computeroutput></screen></step>
|
||||||
|
</procedure>
|
||||||
|
<procedure>
|
||||||
|
<title>Recover the VM</title>
|
||||||
|
<step>
|
||||||
|
<para>After you have determined the status of the VM on the failed host,
|
||||||
|
decide to which compute host the affected VM should be moved. For example, run
|
||||||
|
the following database command to move the VM to
|
||||||
|
<systemitem>np-rcc46</systemitem>:</para>
|
||||||
|
<screen><prompt>mysql></prompt> <userinput>UPDATE instances SET host = 'np-rcc46' WHERE uuid = '3f57699a-e773-4650-a443-b4b37eed5a06';</userinput></screen>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<para>If using a hypervisor that relies on libvirt (such as KVM), it is a
|
||||||
|
good idea to update the <literal>libvirt.xml</literal> file (found in
|
||||||
|
<literal>/var/lib/nova/instances/[instance ID]</literal>). The important
|
||||||
|
changes to make are:</para>
|
||||||
|
<para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>Change the <literal>DHCPSERVER</literal> value to the host IP
|
||||||
|
address of the compute host that is now the VM's new
|
||||||
|
home.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>Update the VNC IP, if it isn't already updated, to:
|
||||||
|
<literal>0.0.0.0</literal>.</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</para>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<para>Reboot the VM:</para>
|
||||||
|
<screen><prompt>$</prompt> <userinput>nova reboot --hard 3f57699a-e773-4650-a443-b4b37eed5a06</userinput></screen>
|
||||||
|
</step>
|
||||||
|
</procedure>
|
||||||
|
<para>In theory, the above database update and <literal>nova
|
||||||
|
reboot</literal> command are all that is required to recover a VM from a
|
||||||
|
failed host. However, if further problems occur, consider looking at
|
||||||
|
recreating the network filter configuration using <literal>virsh</literal>,
|
||||||
|
restarting the Compute services or updating the <literal>vm_state</literal>
|
||||||
|
and <literal>power_state</literal> in the Compute database.</para>
|
||||||
|
</section>
|
||||||
|
<section xml:id="section_nova-uid-mismatch">
|
||||||
|
<title>Recover from a UID/GID mismatch</title>
|
||||||
|
<para>When running OpenStack Compute, using a shared file system or an automated
|
||||||
|
configuration tool, you could encounter a situation where some files on your compute
|
||||||
|
node are using the wrong UID or GID. This causes a number of errors, such as being
|
||||||
|
unable to do live migration or start virtual machines.</para>
|
||||||
|
<para>The following procedure runs on <systemitem class="service"
|
||||||
|
>nova-compute</systemitem> hosts, based on the KVM hypervisor, and could help to
|
||||||
|
restore the situation:</para>
|
||||||
|
<procedure>
|
||||||
|
<title>To recover from a UID/GID mismatch</title>
|
||||||
|
<step>
|
||||||
|
<para>Ensure you do not use numbers that are already used for some other
|
||||||
|
user/group.</para>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<para>Set the nova uid in <filename>/etc/passwd</filename> to the same number in
|
||||||
|
all hosts (for example, 112).</para>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<para>Set the libvirt-qemu uid in
|
||||||
|
<filename>/etc/passwd</filename> to the
|
||||||
|
same number in all hosts (for example,
|
||||||
|
119).</para>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<para>Set the nova group in
|
||||||
|
<filename>/etc/group</filename> file to
|
||||||
|
the same number in all hosts (for example,
|
||||||
|
120).</para>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<para>Set the libvirtd group in
|
||||||
|
<filename>/etc/group</filename> file to
|
||||||
|
the same number in all hosts (for example,
|
||||||
|
119).</para>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<para>Stop the services on the compute
|
||||||
|
node.</para>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<para>Change all the files owned by user <systemitem>nova</systemitem> or by
|
||||||
|
group <systemitem>nova</systemitem>. For example:</para>
|
||||||
|
<screen><prompt>#</prompt> <userinput>find / -uid 108 -exec chown nova {} \; </userinput># note the 108 here is the old nova uid before the change
|
||||||
|
<prompt>#</prompt> <userinput>find / -gid 120 -exec chgrp nova {} \;</userinput></screen>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<para>Repeat the steps for the libvirt-qemu owned files if those needed to
|
||||||
|
change.</para>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<para>Restart the services.</para>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<para>Now you can run the <command>find</command>
|
||||||
|
command to verify that all files using the
|
||||||
|
correct identifiers.</para>
|
||||||
|
</step>
|
||||||
|
</procedure>
|
||||||
|
</section>
|
||||||
|
<section xml:id="section_nova-disaster-recovery-process">
|
||||||
|
<title>Recover cloud after disaster</title>
|
||||||
|
<para>Use the following procedures to manage your cloud after a disaster, and to easily
|
||||||
|
back up its persistent storage volumes. Backups <emphasis role="bold">are</emphasis>
|
||||||
|
mandatory, even outside of disaster scenarios.</para>
|
||||||
|
<para>For a DRP definition, see <link
|
||||||
|
xlink:href="http://en.wikipedia.org/wiki/Disaster_Recovery_Plan"
|
||||||
|
>http://en.wikipedia.org/wiki/Disaster_Recovery_Plan</link>.</para>
|
||||||
|
<simplesect>
|
||||||
|
<title>Disaster recovery example</title>
|
||||||
|
<para>A disaster could happen to several components of your architecture (for
|
||||||
|
example, a disk crash, a network loss, or a power cut). In this example, the
|
||||||
|
following components are configured:</para>
|
||||||
|
<orderedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>A cloud controller (<systemitem>nova-api</systemitem>,
|
||||||
|
<systemitem>nova-objectstore</systemitem>,
|
||||||
|
<systemitem>nova-network</systemitem>)</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>A compute node (<systemitem
|
||||||
|
class="service"
|
||||||
|
>nova-compute</systemitem>)</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>A Storage Area Network (SAN) used by OpenStack Block Storage
|
||||||
|
(<systemitem class="service">cinder-volumes</systemitem>)</para>
|
||||||
|
</listitem>
|
||||||
|
</orderedlist>
|
||||||
|
<para>The worst disaster for a cloud is a power loss, which applies to all three
|
||||||
|
components. Before a power loss:</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>From the SAN to the cloud controller, we have an active iSCSI session
|
||||||
|
(used for the "cinder-volumes" LVM's VG).</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>From the cloud controller to the compute node, we also have active
|
||||||
|
iSCSI sessions (managed by <systemitem class="service"
|
||||||
|
>cinder-volume</systemitem>).</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>For every volume, an iSCSI session is made (so 14 ebs volumes equals
|
||||||
|
14 sessions).</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>From the cloud controller to the compute node, we also have iptables/
|
||||||
|
ebtables rules which allow access from the cloud controller to the running
|
||||||
|
instance.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>And at least, from the cloud controller to the compute node; saved
|
||||||
|
into database, the current state of the instances (in that case "running" ),
|
||||||
|
and their volumes attachment (mount point, volume ID, volume status, and so
|
||||||
|
on.)</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
<para>After the power loss occurs and all hardware components restart:</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>From the SAN to the cloud, the iSCSI session no longer exists.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>From the cloud controller to the compute node, the iSCSI sessions no
|
||||||
|
longer exist.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>From the cloud controller to the compute node, the iptables and
|
||||||
|
ebtables are recreated, since at boot, <systemitem>nova-network</systemitem>
|
||||||
|
reapplies configurations.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>From the cloud controller, instances are in a shutdown state (because
|
||||||
|
they are no longer running).</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>In the database, data was not updated at all, since Compute could not
|
||||||
|
have anticipated the crash.</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
<para>Before going further, and to prevent the administrator from making fatal
|
||||||
|
mistakes,<emphasis role="bold"> instances won't be lost</emphasis>, because no
|
||||||
|
"<command>destroy</command>" or "<command>terminate</command>" command was
|
||||||
|
invoked, so the files for the instances remain on the compute node.</para>
|
||||||
|
<para>Perform these tasks in the following order.
|
||||||
|
<warning><para>Do not add any extra steps at this stage.</para></warning></para>
|
||||||
|
<para>
|
||||||
|
<orderedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>Get the current relation from a
|
||||||
|
volume to its instance, so that you
|
||||||
|
can recreate the attachment.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>Update the database to clean the
|
||||||
|
stalled state. (After that, you cannot
|
||||||
|
perform the first step).</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>Restart the instances. In other
|
||||||
|
words, go from a shutdown to running
|
||||||
|
state.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>After the restart, reattach the volumes to their respective
|
||||||
|
instances (optional).</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>SSH into the instances to reboot them.</para>
|
||||||
|
</listitem>
|
||||||
|
</orderedlist>
|
||||||
|
</para>
|
||||||
|
</simplesect>
|
||||||
|
<simplesect>
|
||||||
|
<title>Recover after a disaster</title>
|
||||||
|
<procedure>
|
||||||
|
<title>To perform disaster recovery</title>
|
||||||
|
<step>
|
||||||
|
<title>Get the instance-to-volume
|
||||||
|
relationship</title>
|
||||||
|
<para>You must determine the current relationship from a volume to its
|
||||||
|
instance, because you will re-create the attachment.</para>
|
||||||
|
<para>You can find this relationship by running <command>nova
|
||||||
|
volume-list</command>. Note that the <command>nova</command> client
|
||||||
|
includes the ability to get volume information from OpenStack Block
|
||||||
|
Storage.</para>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<title>Update the database</title>
|
||||||
|
<para>Update the database to clean the stalled state. You must restore for
|
||||||
|
every volume, using these queries to clean up the database:</para>
|
||||||
|
<screen><prompt>mysql></prompt> <userinput>use cinder;</userinput>
|
||||||
|
<prompt>mysql></prompt> <userinput>update volumes set mountpoint=NULL;</userinput>
|
||||||
|
<prompt>mysql></prompt> <userinput>update volumes set status="available" where status <>"error_deleting";</userinput>
|
||||||
|
<prompt>mysql></prompt> <userinput>update volumes set attach_status="detached";</userinput>
|
||||||
|
<prompt>mysql></prompt> <userinput>update volumes set instance_id=0;</userinput></screen>
|
||||||
|
<para>You can then run <command>nova volume-list</command> commands to list
|
||||||
|
all volumes.</para>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<title>Restart instances</title>
|
||||||
|
<para>Restart the instances using the <command>nova reboot
|
||||||
|
<replaceable>$instance</replaceable></command> command.</para>
|
||||||
|
<para>At this stage, depending on your image, some instances completely
|
||||||
|
reboot and become reachable, while others stop on the "plymouth"
|
||||||
|
stage.</para>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<title>DO NOT reboot a second time</title>
|
||||||
|
<para>Do not reboot instances that are stopped at this point. Instance state
|
||||||
|
depends on whether you added an <filename>/etc/fstab</filename> entry for
|
||||||
|
that volume. Images built with the <package>cloud-init</package> package
|
||||||
|
remain in a pending state, while others skip the missing volume and start.
|
||||||
|
The idea of that stage is only to ask Compute to reboot every instance, so
|
||||||
|
the stored state is preserved. For more information about
|
||||||
|
<package>cloud-init</package>, see <link
|
||||||
|
xlink:href="https://help.ubuntu.com/community/CloudInit"
|
||||||
|
>help.ubuntu.com/community/CloudInit</link>.</para>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<title>Reattach volumes</title>
|
||||||
|
<para>After the restart, and Compute has restored the right status, you can
|
||||||
|
reattach the volumes to their respective instances using the <command>nova
|
||||||
|
volume-attach</command> command. The following snippet uses a file of
|
||||||
|
listed volumes to reattach them:</para>
|
||||||
|
<programlisting language="bash">#!/bin/bash
|
||||||
|
|
||||||
|
while read line; do
|
||||||
|
volume=`echo $line | $CUT -f 1 -d " "`
|
||||||
|
instance=`echo $line | $CUT -f 2 -d " "`
|
||||||
|
mount_point=`echo $line | $CUT -f 3 -d " "`
|
||||||
|
echo "ATTACHING VOLUME FOR INSTANCE - $instance"
|
||||||
|
nova volume-attach $instance $volume $mount_point
|
||||||
|
sleep 2
|
||||||
|
done < $volumes_tmp_file</programlisting>
|
||||||
|
<para>At this stage, instances that were pending on the boot sequence
|
||||||
|
(<application>plymouth</application>) automatically continue their boot,
|
||||||
|
and restart normally, while the ones that booted see the volume.</para>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<title>SSH into instances</title>
|
||||||
|
<para>If some services depend on the volume, or if a volume has an entry
|
||||||
|
into <systemitem>fstab</systemitem>, you should now simply restart the
|
||||||
|
instance. This restart needs to be made from the instance itself, not
|
||||||
|
through <command>nova</command>.</para>
|
||||||
|
<para>SSH into the instance and perform a reboot:</para>
|
||||||
|
<screen><prompt>#</prompt> <userinput>shutdown -r now</userinput></screen>
|
||||||
|
</step>
|
||||||
|
</procedure>
|
||||||
|
<para>By completing this procedure, you can
|
||||||
|
successfully recover your cloud.</para>
|
||||||
|
<note>
|
||||||
|
<para>Follow these guidelines:</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>Use the <parameter> errors=remount</parameter> parameter in the
|
||||||
|
<filename>fstab</filename> file, which prevents data
|
||||||
|
corruption.</para>
|
||||||
|
<para>The system locks any write to the disk if it detects an I/O error.
|
||||||
|
This configuration option should be added into the <systemitem
|
||||||
|
class="service">cinder-volume</systemitem> server (the one which
|
||||||
|
performs the iSCSI connection to the SAN), but also into the instances'
|
||||||
|
<filename>fstab</filename> file.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>Do not add the entry for the SAN's disks to the <systemitem
|
||||||
|
class="service">cinder-volume</systemitem>'s
|
||||||
|
<filename>fstab</filename> file.</para>
|
||||||
|
<para>Some systems hang on that step, which means you could lose access to
|
||||||
|
your cloud-controller. To re-run the session manually, run the following
|
||||||
|
command before performing the mount:
|
||||||
|
<screen><prompt>#</prompt> <userinput>iscsiadm -m discovery -t st -p $SAN_IP $ iscsiadm -m node --target-name $IQN -p $SAN_IP -l</userinput></screen></para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>For your instances, if you have the whole <filename>/home/</filename>
|
||||||
|
directory on the disk, leave a user's directory with the user's bash
|
||||||
|
files and the <filename>authorized_keys</filename> file (instead of
|
||||||
|
emptying the <filename>/home</filename> directory and mapping the disk
|
||||||
|
on it).</para>
|
||||||
|
<para>This enables you to connect to the instance, even without the volume
|
||||||
|
attached, if you allow only connections through public keys.</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</note>
|
||||||
|
</simplesect>
|
||||||
|
<simplesect>
|
||||||
|
<title>Script the DRP</title>
|
||||||
|
<para>You can download from <link
|
||||||
|
xlink:href="https://github.com/Razique/BashStuff/blob/master/SYSTEMS/OpenStack/SCR_5006_V00_NUAC-OPENSTACK-DRP-OpenStack.sh"
|
||||||
|
>here</link> a bash script which performs the following steps:</para>
|
||||||
|
<orderedlist>
|
||||||
|
<listitem><para>An array is created for instances and their attached volumes.</para></listitem>
|
||||||
|
<listitem><para>The MySQL database is updated.</para></listitem>
|
||||||
|
<listitem><para>Using <systemitem>euca2ools</systemitem>, all instances are restarted.</para></listitem>
|
||||||
|
<listitem><para>The volume attachment is made.</para></listitem>
|
||||||
|
<listitem><para>An SSH connection is performed into every instance using Compute credentials.</para></listitem>
|
||||||
|
</orderedlist>
|
||||||
|
<para>The "test mode" allows you to perform
|
||||||
|
that whole sequence for only one
|
||||||
|
instance.</para>
|
||||||
|
<para>To reproduce the power loss, connect to the compute node which runs
|
||||||
|
that same instance and close the iSCSI session. Do not detach the volume using the <command>nova
|
||||||
|
volume-detach</command> command; instead, manually close the iSCSI session. For the following
|
||||||
|
example command uses an iSCSI session with the number 15:</para>
|
||||||
|
<screen><prompt>#</prompt> <userinput>iscsiadm -m session -u -r 15</userinput></screen>
|
||||||
|
<para>Do not forget the <literal>-r</literal>
|
||||||
|
flag. Otherwise, you close ALL
|
||||||
|
sessions.</para>
|
||||||
|
</simplesect>
|
||||||
|
</section>
|
||||||
|
</section>
|
@ -500,437 +500,6 @@ local0.error @@172.20.1.43:1024</programlisting>
|
|||||||
</section>
|
</section>
|
||||||
<xi:include href="../../common/section_compute-configure-console.xml"/>
|
<xi:include href="../../common/section_compute-configure-console.xml"/>
|
||||||
<xi:include href="section_compute-configure-service-groups.xml"/>
|
<xi:include href="section_compute-configure-service-groups.xml"/>
|
||||||
<section xml:id="section_nova-compute-node-down">
|
<xi:include href="section_compute-security.xml"/>
|
||||||
<title>Recover from a failed compute node</title>
|
<xi:include href="section_compute-recover-nodes.xml"/>
|
||||||
<para>If you have deployed Compute with a shared file
|
|
||||||
system, you can quickly recover from a failed compute
|
|
||||||
node. Of the two methods covered in these sections,
|
|
||||||
the evacuate API is the preferred method even in the
|
|
||||||
absence of shared storage. The evacuate API provides
|
|
||||||
many benefits over manual recovery, such as
|
|
||||||
re-attachment of volumes and floating IPs.</para>
|
|
||||||
<xi:include href="../../common/section_cli_nova_evacuate.xml"/>
|
|
||||||
<section xml:id="nova-compute-node-down-manual-recovery">
|
|
||||||
<title>Manual recovery</title>
|
|
||||||
<para>For KVM/libvirt compute node recovery, see the previous section. Use the
|
|
||||||
following procedure for all other hypervisors.</para>
|
|
||||||
<procedure>
|
|
||||||
<title>To work with host information</title>
|
|
||||||
<step>
|
|
||||||
<para>Identify the VMs on the affected hosts, using tools such as a
|
|
||||||
combination of <literal>nova list</literal> and <literal>nova show</literal>
|
|
||||||
or <literal>euca-describe-instances</literal>. Here's an example using the
|
|
||||||
EC2 API - instance i-000015b9 that is running on node np-rcc54:</para>
|
|
||||||
<programlisting language="bash">i-000015b9 at3-ui02 running nectarkey (376, np-rcc54) 0 m1.xxlarge 2012-06-19T00:48:11.000Z 115.146.93.60</programlisting>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<para>You can review the status of the host by using the Compute database.
|
|
||||||
Some of the important information is highlighted below. This example
|
|
||||||
converts an EC2 API instance ID into an OpenStack ID; if you used the
|
|
||||||
<literal>nova</literal> commands, you can substitute the ID directly.
|
|
||||||
You can find the credentials for your database in
|
|
||||||
<filename>/etc/nova.conf</filename>.</para>
|
|
||||||
<programlisting language="bash">SELECT * FROM instances WHERE id = CONV('15b9', 16, 10) \G;
|
|
||||||
*************************** 1. row ***************************
|
|
||||||
created_at: 2012-06-19 00:48:11
|
|
||||||
updated_at: 2012-07-03 00:35:11
|
|
||||||
deleted_at: NULL
|
|
||||||
...
|
|
||||||
id: 5561
|
|
||||||
...
|
|
||||||
power_state: 5
|
|
||||||
vm_state: shutoff
|
|
||||||
...
|
|
||||||
hostname: at3-ui02
|
|
||||||
host: np-rcc54
|
|
||||||
...
|
|
||||||
uuid: 3f57699a-e773-4650-a443-b4b37eed5a06
|
|
||||||
...
|
|
||||||
task_state: NULL
|
|
||||||
...</programlisting>
|
|
||||||
</step>
|
|
||||||
</procedure>
|
|
||||||
<procedure>
|
|
||||||
<title>To recover the VM</title>
|
|
||||||
<step>
|
|
||||||
<para>When you know the status of the VM on the failed host, determine to
|
|
||||||
which compute host the affected VM should be moved. For example, run the
|
|
||||||
following database command to move the VM to np-rcc46:</para>
|
|
||||||
<programlisting language="bash">UPDATE instances SET host = 'np-rcc46' WHERE uuid = '3f57699a-e773-4650-a443-b4b37eed5a06'; </programlisting>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<para>If using a hypervisor that relies on libvirt (such as KVM), it is a
|
|
||||||
good idea to update the <literal>libvirt.xml</literal> file (found in
|
|
||||||
<literal>/var/lib/nova/instances/[instance ID]</literal>). The important
|
|
||||||
changes to make are:</para>
|
|
||||||
<para>
|
|
||||||
<itemizedlist>
|
|
||||||
<listitem>
|
|
||||||
<para>Change the <literal>DHCPSERVER</literal> value to the host IP
|
|
||||||
address of the compute host that is now the VM's new
|
|
||||||
home.</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>Update the VNC IP if it isn't already to:
|
|
||||||
<literal>0.0.0.0</literal>.</para>
|
|
||||||
</listitem>
|
|
||||||
</itemizedlist>
|
|
||||||
</para>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<para>Reboot the VM:</para>
|
|
||||||
<screen><prompt>$</prompt> <userinput>nova reboot --hard 3f57699a-e773-4650-a443-b4b37eed5a06</userinput></screen>
|
|
||||||
</step>
|
|
||||||
</procedure>
|
|
||||||
<para>In theory, the above database update and <literal>nova
|
|
||||||
reboot</literal> command are all that is required to recover a VM from a
|
|
||||||
failed host. However, if further problems occur, consider looking at
|
|
||||||
recreating the network filter configuration using <literal>virsh</literal>,
|
|
||||||
restarting the Compute services or updating the <literal>vm_state</literal>
|
|
||||||
and <literal>power_state</literal> in the Compute database.</para>
|
|
||||||
</section>
|
|
||||||
</section>
|
|
||||||
<section xml:id="section_nova-uid-mismatch">
|
|
||||||
<title>Recover from a UID/GID mismatch</title>
|
|
||||||
<para>When running OpenStack compute, using a shared file
|
|
||||||
system or an automated configuration tool, you could
|
|
||||||
encounter a situation where some files on your compute
|
|
||||||
node are using the wrong UID or GID. This causes a
|
|
||||||
raft of errors, such as being unable to live migrate,
|
|
||||||
or start virtual machines.</para>
|
|
||||||
<para>The following procedure runs on <systemitem class="service"
|
|
||||||
>nova-compute</systemitem> hosts, based on the KVM hypervisor, and could help to
|
|
||||||
restore the situation:</para>
|
|
||||||
<procedure>
|
|
||||||
<title>To recover from a UID/GID mismatch</title>
|
|
||||||
<step>
|
|
||||||
<para>Ensure you don't use numbers that are already used for some other
|
|
||||||
user/group.</para>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<para>Set the nova uid in <filename>/etc/passwd</filename> to the same number in
|
|
||||||
all hosts (for example, 112).</para>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<para>Set the libvirt-qemu uid in
|
|
||||||
<filename>/etc/passwd</filename> to the
|
|
||||||
same number in all hosts (for example,
|
|
||||||
119).</para>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<para>Set the nova group in
|
|
||||||
<filename>/etc/group</filename> file to
|
|
||||||
the same number in all hosts (for example,
|
|
||||||
120).</para>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<para>Set the libvirtd group in
|
|
||||||
<filename>/etc/group</filename> file to
|
|
||||||
the same number in all hosts (for example,
|
|
||||||
119).</para>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<para>Stop the services on the compute
|
|
||||||
node.</para>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<para>Change all the files owned by user nova or
|
|
||||||
by group nova. For example:</para>
|
|
||||||
<programlisting language="bash">find / -uid 108 -exec chown nova {} \; # note the 108 here is the old nova uid before the change
|
|
||||||
find / -gid 120 -exec chgrp nova {} \;</programlisting>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<para>Repeat the steps for the libvirt-qemu owned files if those needed to
|
|
||||||
change.</para>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<para>Restart the services.</para>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<para>Now you can run the <command>find</command>
|
|
||||||
command to verify that all files using the
|
|
||||||
correct identifiers.</para>
|
|
||||||
</step>
|
|
||||||
</procedure>
|
|
||||||
</section>
|
|
||||||
<section xml:id="section_nova-disaster-recovery-process">
|
|
||||||
<title>Compute disaster recovery process</title>
|
|
||||||
<para>Use the following procedures to manage your cloud after a disaster, and to easily
|
|
||||||
back up its persistent storage volumes. Backups <emphasis role="bold">are</emphasis>
|
|
||||||
mandatory, even outside of disaster scenarios.</para>
|
|
||||||
<para>For a DRP definition, see <link
|
|
||||||
xlink:href="http://en.wikipedia.org/wiki/Disaster_Recovery_Plan"
|
|
||||||
>http://en.wikipedia.org/wiki/Disaster_Recovery_Plan</link>.</para>
|
|
||||||
<simplesect>
|
|
||||||
<title>A- The disaster recovery process
|
|
||||||
presentation</title>
|
|
||||||
<para>A disaster could happen to several components of
|
|
||||||
your architecture: a disk crash, a network loss, a
|
|
||||||
power cut, and so on. In this example, assume the
|
|
||||||
following set up:</para>
|
|
||||||
<orderedlist>
|
|
||||||
<listitem>
|
|
||||||
<para>A cloud controller (<systemitem>nova-api</systemitem>,
|
|
||||||
<systemitem>nova-objecstore</systemitem>,
|
|
||||||
<systemitem>nova-network</systemitem>)</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>A compute node (<systemitem
|
|
||||||
class="service"
|
|
||||||
>nova-compute</systemitem>)</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>A Storage Area Network used by
|
|
||||||
<systemitem class="service"
|
|
||||||
>cinder-volumes</systemitem> (aka
|
|
||||||
SAN)</para>
|
|
||||||
</listitem>
|
|
||||||
</orderedlist>
|
|
||||||
<para>The disaster example is the worst one: a power
|
|
||||||
loss. That power loss applies to the three
|
|
||||||
components. <emphasis role="italic">Let's see what
|
|
||||||
runs and how it runs before the
|
|
||||||
crash</emphasis>:</para>
|
|
||||||
<itemizedlist>
|
|
||||||
<listitem>
|
|
||||||
<para>From the SAN to the cloud controller, we
|
|
||||||
have an active iscsi session (used for the
|
|
||||||
"cinder-volumes" LVM's VG).</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>From the cloud controller to the compute node, we also have active
|
|
||||||
iscsi sessions (managed by <systemitem class="service"
|
|
||||||
>cinder-volume</systemitem>).</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>For every volume, an iscsi session is made (so 14 ebs volumes equals
|
|
||||||
14 sessions).</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>From the cloud controller to the compute node, we also have iptables/
|
|
||||||
ebtables rules which allow access from the cloud controller to the running
|
|
||||||
instance.</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>And at least, from the cloud controller to the compute node; saved
|
|
||||||
into database, the current state of the instances (in that case "running" ),
|
|
||||||
and their volumes attachment (mount point, volume ID, volume status, and so
|
|
||||||
on.)</para>
|
|
||||||
</listitem>
|
|
||||||
</itemizedlist>
|
|
||||||
<para>Now, after the power loss occurs and all
|
|
||||||
hardware components restart, the situation is as
|
|
||||||
follows:</para>
|
|
||||||
<itemizedlist>
|
|
||||||
<listitem>
|
|
||||||
<para>From the SAN to the cloud, the ISCSI
|
|
||||||
session no longer exists.</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>From the cloud controller to the compute
|
|
||||||
node, the ISCSI sessions no longer exist.
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>From the cloud controller to the compute node, the iptables and
|
|
||||||
ebtables are recreated, since, at boot,
|
|
||||||
<systemitem>nova-network</systemitem> reapplies the
|
|
||||||
configurations.</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>From the cloud controller, instances are in a shutdown state (because
|
|
||||||
they are no longer running)</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>In the database, data was not updated at all, since Compute could not
|
|
||||||
have anticipated the crash.</para>
|
|
||||||
</listitem>
|
|
||||||
</itemizedlist>
|
|
||||||
<para>Before going further, and to prevent the administrator from making fatal
|
|
||||||
mistakes,<emphasis role="bold"> the instances won't be lost</emphasis>, because
|
|
||||||
no "<command role="italic">destroy</command>" or "<command role="italic"
|
|
||||||
>terminate</command>" command was invoked, so the files for the instances remain
|
|
||||||
on the compute node.</para>
|
|
||||||
<para>Perform these tasks in this exact order. <emphasis role="underline">Any extra
|
|
||||||
step would be dangerous at this stage</emphasis> :</para>
|
|
||||||
<para>
|
|
||||||
<orderedlist>
|
|
||||||
<listitem>
|
|
||||||
<para>Get the current relation from a
|
|
||||||
volume to its instance, so that you
|
|
||||||
can recreate the attachment.</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>Update the database to clean the
|
|
||||||
stalled state. (After that, you cannot
|
|
||||||
perform the first step).</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>Restart the instances. In other
|
|
||||||
words, go from a shutdown to running
|
|
||||||
state.</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>After the restart, reattach the volumes to their respective
|
|
||||||
instances (optional).</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>SSH into the instances to reboot them.</para>
|
|
||||||
</listitem>
|
|
||||||
</orderedlist>
|
|
||||||
</para>
|
|
||||||
</simplesect>
|
|
||||||
<simplesect>
|
|
||||||
<title>B - Disaster recovery</title>
|
|
||||||
<procedure>
|
|
||||||
<title>To perform disaster recovery</title>
|
|
||||||
<step>
|
|
||||||
<title>Get the instance-to-volume
|
|
||||||
relationship</title>
|
|
||||||
<para>You must get the current relationship from a volume to its instance,
|
|
||||||
because you will re-create the attachment.</para>
|
|
||||||
<para>You can find this relationship by running <command>nova
|
|
||||||
volume-list</command>. Note that the <command>nova</command> client
|
|
||||||
includes the ability to get volume information from Block Storage.</para>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<title>Update the database</title>
|
|
||||||
<para>Update the database to clean the stalled state. You must restore for
|
|
||||||
every volume, using these queries to clean up the database:</para>
|
|
||||||
<screen><prompt>mysql></prompt> <userinput>use cinder;</userinput>
|
|
||||||
<prompt>mysql></prompt> <userinput>update volumes set mountpoint=NULL;</userinput>
|
|
||||||
<prompt>mysql></prompt> <userinput>update volumes set status="available" where status <>"error_deleting";</userinput>
|
|
||||||
<prompt>mysql></prompt> <userinput>update volumes set attach_status="detached";</userinput>
|
|
||||||
<prompt>mysql></prompt> <userinput>update volumes set instance_id=0;</userinput></screen>
|
|
||||||
<para>Then, when you run <command>nova volume-list</command> commands, all
|
|
||||||
volumes appear in the listing.</para>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<title>Restart instances</title>
|
|
||||||
<para>Restart the instances using the <command>nova reboot
|
|
||||||
<replaceable>$instance</replaceable></command> command.</para>
|
|
||||||
<para>At this stage, depending on your image, some instances completely
|
|
||||||
reboot and become reachable, while others stop on the "plymouth"
|
|
||||||
stage.</para>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<title>DO NOT reboot a second time</title>
|
|
||||||
<para>Do not reboot instances that are stopped at this point. Instance state
|
|
||||||
depends on whether you added an <filename>/etc/fstab</filename> entry for
|
|
||||||
that volume. Images built with the <package>cloud-init</package> package
|
|
||||||
remain in a pending state, while others skip the missing volume and start.
|
|
||||||
The idea of that stage is only to ask nova to reboot every instance, so the
|
|
||||||
stored state is preserved. For more information about
|
|
||||||
<package>cloud-init</package>, see <link
|
|
||||||
xlink:href="https://help.ubuntu.com/community/CloudInit"
|
|
||||||
>help.ubuntu.com/community/CloudInit</link>.</para>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<title>Reattach volumes</title>
|
|
||||||
<para>After the restart, you can reattach the volumes to their respective
|
|
||||||
instances. Now that <command>nova</command> has restored the right status,
|
|
||||||
it is time to perform the attachments through a <command>nova
|
|
||||||
volume-attach</command></para>
|
|
||||||
<para>This simple snippet uses the created
|
|
||||||
file:</para>
|
|
||||||
<programlisting language="bash">#!/bin/bash
|
|
||||||
|
|
||||||
while read line; do
|
|
||||||
volume=`echo $line | $CUT -f 1 -d " "`
|
|
||||||
instance=`echo $line | $CUT -f 2 -d " "`
|
|
||||||
mount_point=`echo $line | $CUT -f 3 -d " "`
|
|
||||||
echo "ATTACHING VOLUME FOR INSTANCE - $instance"
|
|
||||||
nova volume-attach $instance $volume $mount_point
|
|
||||||
sleep 2
|
|
||||||
done < $volumes_tmp_file</programlisting>
|
|
||||||
<para>At that stage, instances that were
|
|
||||||
pending on the boot sequence (<emphasis
|
|
||||||
role="italic">plymouth</emphasis>)
|
|
||||||
automatically continue their boot, and
|
|
||||||
restart normally, while the ones that
|
|
||||||
booted see the volume.</para>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<title>SSH into instances</title>
|
|
||||||
<para>If some services depend on the volume, or if a volume has an entry
|
|
||||||
into <systemitem>fstab</systemitem>, it could be good to simply restart the
|
|
||||||
instance. This restart needs to be made from the instance itself, not
|
|
||||||
through <command>nova</command>. So, we SSH into the instance and perform a
|
|
||||||
reboot:</para>
|
|
||||||
<screen><prompt>#</prompt> <userinput>shutdown -r now</userinput></screen>
|
|
||||||
</step>
|
|
||||||
</procedure>
|
|
||||||
<para>By completing this procedure, you can
|
|
||||||
successfully recover your cloud.</para>
|
|
||||||
<note>
|
|
||||||
<para>Follow these guidelines:</para>
|
|
||||||
<itemizedlist>
|
|
||||||
<listitem>
|
|
||||||
<para>Use the <parameter> errors=remount</parameter> parameter in the
|
|
||||||
<filename>fstab</filename> file, which prevents data
|
|
||||||
corruption.</para>
|
|
||||||
<para>The system locks any write to the disk if it detects an I/O error.
|
|
||||||
This configuration option should be added into the <systemitem
|
|
||||||
class="service">cinder-volume</systemitem> server (the one which
|
|
||||||
performs the ISCSI connection to the SAN), but also into the instances'
|
|
||||||
<filename>fstab</filename> file.</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>Do not add the entry for the SAN's disks to the <systemitem
|
|
||||||
class="service">cinder-volume</systemitem>'s
|
|
||||||
<filename>fstab</filename> file.</para>
|
|
||||||
<para>Some systems hang on that step, which means you could lose access to
|
|
||||||
your cloud-controller. To re-run the session manually, you would run the
|
|
||||||
following command before performing the mount:
|
|
||||||
<screen><prompt>#</prompt> <userinput>iscsiadm -m discovery -t st -p $SAN_IP $ iscsiadm -m node --target-name $IQN -p $SAN_IP -l</userinput></screen></para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>For your instances, if you have the whole <filename>/home/</filename>
|
|
||||||
directory on the disk, instead of emptying the
|
|
||||||
<filename>/home</filename> directory and map the disk on it, leave a
|
|
||||||
user's directory with the user's bash files and the
|
|
||||||
<filename>authorized_keys</filename> file.</para>
|
|
||||||
<para>This enables you to connect to the instance, even without the volume
|
|
||||||
attached, if you allow only connections through public keys.</para>
|
|
||||||
</listitem>
|
|
||||||
</itemizedlist>
|
|
||||||
</note>
|
|
||||||
</simplesect>
|
|
||||||
<simplesect>
|
|
||||||
<title>C - Scripted DRP</title>
|
|
||||||
<procedure>
|
|
||||||
<title>To use scripted DRP</title>
|
|
||||||
<para>You can download from <link
|
|
||||||
xlink:href="https://github.com/Razique/BashStuff/blob/master/SYSTEMS/OpenStack/SCR_5006_V00_NUAC-OPENSTACK-DRP-OpenStack.sh"
|
|
||||||
>here</link> a bash script which performs
|
|
||||||
these steps:</para>
|
|
||||||
<step>
|
|
||||||
<para>The "test mode" allows you to perform
|
|
||||||
that whole sequence for only one
|
|
||||||
instance.</para>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<para>To reproduce the power loss, connect to
|
|
||||||
the compute node which runs that same
|
|
||||||
instance and close the iscsi session.
|
|
||||||
<emphasis role="underline">Do not
|
|
||||||
detach the volume through
|
|
||||||
<command>nova
|
|
||||||
volume-detach</command></emphasis>,
|
|
||||||
but instead manually close the iscsi
|
|
||||||
session.</para>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<para>In this example, the iscsi session is
|
|
||||||
number 15 for that instance:</para>
|
|
||||||
<screen><prompt>#</prompt> <userinput>iscsiadm -m session -u -r 15</userinput></screen>
|
|
||||||
</step>
|
|
||||||
<step>
|
|
||||||
<para>Do not forget the <literal>-r</literal>
|
|
||||||
flag. Otherwise, you close ALL
|
|
||||||
sessions.</para>
|
|
||||||
</step>
|
|
||||||
</procedure>
|
|
||||||
</simplesect>
|
|
||||||
</section>
|
|
||||||
</section>
|
</section>
|
||||||
|
@ -4,34 +4,26 @@
|
|||||||
xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"
|
xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"
|
||||||
xml:id="nova_cli_evacuate">
|
xml:id="nova_cli_evacuate">
|
||||||
<title>Evacuate instances</title>
|
<title>Evacuate instances</title>
|
||||||
<para>If a cloud compute node fails due to a hardware malfunction
|
<para>If a cloud compute node fails due to a hardware malfunction or another reason, you can
|
||||||
or another reason, you can evacuate instances to make them
|
evacuate instances to make them available again. You can choose evacuation parameters for
|
||||||
available again.</para>
|
your use case.</para>
|
||||||
<para>You can choose evacuation parameters for your use
|
<para>To preserve user data on server disk, you must configure shared storage on the target
|
||||||
case.</para>
|
host. Also, you must validate that the current VM host is down; otherwise, the evacuation
|
||||||
<para>To preserve user data on server disk, you must configure
|
|
||||||
shared storage on the target host. Also, you must validate
|
|
||||||
that the current VM host is down. Otherwise the evacuation
|
|
||||||
fails with an error.</para>
|
fails with an error.</para>
|
||||||
<procedure xml:id="evacuate_shared">
|
<procedure xml:id="evacuate_shared">
|
||||||
<step>
|
<step>
|
||||||
<para>To find a different host for the evacuated instance,
|
<para>To list hosts and find a different host for the evacuated instance, run:</para>
|
||||||
run this command to list hosts:</para>
|
|
||||||
<screen><prompt>$</prompt> <userinput>nova host-list</userinput></screen>
|
<screen><prompt>$</prompt> <userinput>nova host-list</userinput></screen>
|
||||||
</step>
|
</step>
|
||||||
<step>
|
<step>
|
||||||
<para>You can pass the instance password to the command by
|
<para>Evacuate the instance. You can pass the instance password to the command by using
|
||||||
using the <literal>--password <pwd></literal>
|
the <literal>--password <pwd></literal> option. If you do not specify a
|
||||||
option. If you do not specify a password, one is
|
password, one is generated and printed after the command finishes successfully. The
|
||||||
generated and printed after the command finishes
|
following command evacuates a server without shared storage from a host that is down
|
||||||
successfully. The following command evacuates a server
|
to the specified <replaceable>host_b</replaceable>:</para>
|
||||||
without shared storage:</para>
|
|
||||||
<screen><prompt>$</prompt> <userinput>nova evacuate <replaceable>evacuated_server_name</replaceable> <replaceable>host_b</replaceable></userinput> </screen>
|
<screen><prompt>$</prompt> <userinput>nova evacuate <replaceable>evacuated_server_name</replaceable> <replaceable>host_b</replaceable></userinput> </screen>
|
||||||
<para>The command evacuates an instance from a down host
|
<para>The instance is booted from a new disk, but preserves its configuration including
|
||||||
to a specified host. The instance is booted from a new
|
its ID, name, uid, IP address, and so on. The command returns a password:</para>
|
||||||
disk, but preserves its configuration including its
|
|
||||||
ID, name, uid, IP address, and so on. The command
|
|
||||||
returns a password:</para>
|
|
||||||
<screen><computeroutput><?db-font-size 70%?>+-----------+--------------+
|
<screen><computeroutput><?db-font-size 70%?>+-----------+--------------+
|
||||||
| Property | Value |
|
| Property | Value |
|
||||||
+-----------+--------------+
|
+-----------+--------------+
|
||||||
@ -39,14 +31,12 @@
|
|||||||
+-----------+--------------+</computeroutput></screen>
|
+-----------+--------------+</computeroutput></screen>
|
||||||
</step>
|
</step>
|
||||||
<step>
|
<step>
|
||||||
<para>To preserve the user disk data on the evacuated
|
<para>To preserve the user disk data on the evacuated server, deploy OpenStack Compute
|
||||||
server, deploy OpenStack Compute with shared file
|
with a shared file system. To configure your system, see <link
|
||||||
system. To configure your system, see <link
|
|
||||||
xlink:href="http://docs.openstack.org/havana/config-reference/content/configuring-openstack-compute-basics.html#section_configuring-compute-migrations"
|
xlink:href="http://docs.openstack.org/havana/config-reference/content/configuring-openstack-compute-basics.html#section_configuring-compute-migrations"
|
||||||
>Configure migrations</link> in
|
>Configure migrations</link> in <citetitle>OpenStack Configuration
|
||||||
<citetitle>OpenStack Configuration
|
Reference</citetitle>. In the following example, the password remains
|
||||||
Reference</citetitle>. In this example, the
|
unchanged:</para>
|
||||||
password remains unchanged.</para>
|
|
||||||
<screen><prompt>$</prompt> <userinput>nova evacuate <replaceable>evacuated_server_name</replaceable> <replaceable>host_b</replaceable> --on-shared-storage</userinput> </screen>
|
<screen><prompt>$</prompt> <userinput>nova evacuate <replaceable>evacuated_server_name</replaceable> <replaceable>host_b</replaceable> --on-shared-storage</userinput> </screen>
|
||||||
</step>
|
</step>
|
||||||
</procedure>
|
</procedure>
|
||||||
|
@ -4,16 +4,14 @@
|
|||||||
xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"
|
xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"
|
||||||
xml:id="trusted-compute-pools">
|
xml:id="trusted-compute-pools">
|
||||||
<title>Trusted compute pools</title>
|
<title>Trusted compute pools</title>
|
||||||
<para>Trusted compute pools enable administrators to designate a
|
<para>Trusted compute pools enable administrators to designate a group of compute hosts as
|
||||||
group of compute hosts as trusted. These hosts use hardware-based
|
trusted. These hosts use hardware-based security features, such as the Intel Trusted
|
||||||
security features, such as the Intel Trusted Execution
|
Execution Technology (TXT), to provide an additional level of security. Combined with an
|
||||||
Technology (TXT), to provide an additional level of security.
|
external stand-alone, web-based remote attestation server, cloud providers can ensure that
|
||||||
Combined with an external stand-alone web-based remote
|
the compute node runs only software with verified measurements and can ensure a secure cloud
|
||||||
attestation server, cloud providers can ensure that the
|
stack.</para>
|
||||||
compute node runs only software with verified measurements and
|
<para>Using the trusted compute pools, cloud subscribers can request services to run on verified
|
||||||
can ensure a secure cloud stack.</para>
|
compute nodes.</para>
|
||||||
<para>Through the trusted compute pools, cloud subscribers can
|
|
||||||
request services to run on verified compute nodes.</para>
|
|
||||||
<para>The remote attestation server performs node verification as
|
<para>The remote attestation server performs node verification as
|
||||||
follows:</para>
|
follows:</para>
|
||||||
<orderedlist>
|
<orderedlist>
|
||||||
@ -26,13 +24,12 @@
|
|||||||
measured.</para>
|
measured.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>Measured data is sent to the attestation server when
|
<para>Measured data is sent to the attestation server when challenged by the attestation
|
||||||
challenged by attestation server.</para>
|
server.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>The attestation server verifies those measurements
|
<para>The attestation server verifies those measurements against a good and known
|
||||||
against a good and known database to determine nodes'
|
database to determine node trustworthiness.</para>
|
||||||
trustworthiness.</para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
</orderedlist>
|
</orderedlist>
|
||||||
<para>A description of how to set up an attestation service is
|
<para>A description of how to set up an attestation service is
|
||||||
@ -57,27 +54,40 @@
|
|||||||
<title>Configure Compute to use trusted compute pools</title>
|
<title>Configure Compute to use trusted compute pools</title>
|
||||||
<procedure>
|
<procedure>
|
||||||
<step>
|
<step>
|
||||||
<para>Configure the Compute service with the
|
<para>Enable scheduling support for trusted compute pools by adding the following
|
||||||
connection information for the attestation
|
lines in the <literal>DEFAULT</literal> section in the
|
||||||
service.</para>
|
<filename>/etc/nova/nova.conf</filename> file:</para>
|
||||||
<para>Specify these connection options in the
|
<programlisting language="ini">[DEFAULT]
|
||||||
<literal>trusted_computing</literal> section
|
compute_scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
|
||||||
in the <filename>nova.conf</filename>
|
scheduler_available_filters=nova.scheduler.filters.all_filters
|
||||||
configuration file:</para>
|
scheduler_default_filters=AvailabilityZoneFilter,RamFilter,ComputeFilter,TrustedFilter</programlisting>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<para>Specify the connection information for your attestation service by adding the
|
||||||
|
following lines to the <literal>trusted_computing</literal> section in the
|
||||||
|
<filename>/etc/nova/nova.conf</filename> file:</para>
|
||||||
|
<programlisting language="ini">[trusted_computing]
|
||||||
|
server=10.1.71.206
|
||||||
|
port=8443
|
||||||
|
server_ca_file=/etc/nova/ssl.10.1.71.206.crt
|
||||||
|
# If using OAT v1.5, use this api_url:
|
||||||
|
api_url=/AttestationService/resources
|
||||||
|
# If using OAT pre-v1.5, use this api_url:
|
||||||
|
#api_url=/OpenAttestationWebServices/V1.0
|
||||||
|
auth_blob=i-am-openstack</programlisting>
|
||||||
|
<para>Where:</para>
|
||||||
<variablelist>
|
<variablelist>
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
<term>server</term>
|
<term>server</term>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>Host name or IP address of the host
|
<para>Host name or IP address of the host that runs the attestation
|
||||||
that runs the attestation
|
service.</para>
|
||||||
service</para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
<term>port</term>
|
<term>port</term>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>HTTPS port for the attestation
|
<para>HTTPS port for the attestation service.</para>
|
||||||
service</para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
@ -90,8 +100,7 @@
|
|||||||
<varlistentry>
|
<varlistentry>
|
||||||
<term>api_url</term>
|
<term>api_url</term>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>The attestation service URL
|
<para>The attestation service's URL path.</para>
|
||||||
path.</para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
@ -104,31 +113,6 @@
|
|||||||
</varlistentry>
|
</varlistentry>
|
||||||
</variablelist>
|
</variablelist>
|
||||||
</step>
|
</step>
|
||||||
<step>
|
|
||||||
<para>To enable scheduling support for trusted compute
|
|
||||||
pools, add the following lines to the
|
|
||||||
<literal>DEFAULT</literal> and
|
|
||||||
<literal>trusted_computing</literal> sections
|
|
||||||
in the <filename>/etc/nova/nova.conf</filename>
|
|
||||||
file. Edit the details in the
|
|
||||||
<literal>trusted_computing</literal> section
|
|
||||||
based on the details of your attestation
|
|
||||||
service:</para>
|
|
||||||
<programlisting language="ini">[DEFAULT]
|
|
||||||
compute_scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
|
|
||||||
scheduler_available_filters=nova.scheduler.filters.all_filters
|
|
||||||
scheduler_default_filters=AvailabilityZoneFilter,RamFilter,ComputeFilter,TrustedFilter
|
|
||||||
|
|
||||||
[trusted_computing]
|
|
||||||
server=10.1.71.206
|
|
||||||
port=8443
|
|
||||||
server_ca_file=/etc/nova/ssl.10.1.71.206.crt
|
|
||||||
# If using OAT v1.5, use this api_url:
|
|
||||||
api_url=/AttestationService/resources
|
|
||||||
# If using OAT pre-v1.5, use this api_url:
|
|
||||||
#api_url=/OpenAttestationWebServices/V1.0
|
|
||||||
auth_blob=i-am-openstack</programlisting>
|
|
||||||
</step>
|
|
||||||
<step>
|
<step>
|
||||||
<para>Restart the <systemitem class="service"
|
<para>Restart the <systemitem class="service"
|
||||||
>nova-compute</systemitem> and <systemitem
|
>nova-compute</systemitem> and <systemitem
|
||||||
@ -138,35 +122,41 @@ auth_blob=i-am-openstack</programlisting>
|
|||||||
</procedure>
|
</procedure>
|
||||||
<section xml:id="config_ref">
|
<section xml:id="config_ref">
|
||||||
<title>Configuration reference</title>
|
<title>Configuration reference</title>
|
||||||
<para>To customize the trusted compute pools, use the configuration
|
<para>To customize the trusted compute pools, use the following configuration
|
||||||
option settings documented in <xref
|
option settings:
|
||||||
linkend="config_table_nova_trustedcomputing"/>.</para>
|
</para>
|
||||||
|
<xi:include href="tables/nova-trustedcomputing.xml"/>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="trusted_flavors">
|
<section xml:id="trusted_flavors">
|
||||||
<title>Specify trusted flavors</title>
|
<title>Specify trusted flavors</title>
|
||||||
<para>You must configure one or more flavors as
|
<para>To designate hosts as trusted:</para>
|
||||||
trusted. Users can request
|
<procedure>
|
||||||
trusted nodes by specifying a trusted flavor when they
|
<step>
|
||||||
boot an instance.</para>
|
<para>Configure one or more flavors as trusted by using the <command>nova
|
||||||
<para>Use the <command>nova flavor-key set</command> command
|
flavor-key set</command> command. For example, to set the
|
||||||
to set a flavor as trusted. For example, to set the
|
<literal>m1.tiny</literal> flavor as trusted:</para>
|
||||||
<literal>m1.tiny</literal> flavor as trusted:</para>
|
<screen><prompt>$</prompt> <userinput>nova flavor-key m1.tiny set trust:trusted_host trusted</userinput></screen>
|
||||||
<screen><prompt>$</prompt> <userinput>nova flavor-key m1.tiny set trust:trusted_host trusted</userinput></screen>
|
</step>
|
||||||
<para>To request that their instances run on a trusted host,
|
<step><para>Request that your instance be run on a trusted host, by specifying a trusted flavor when
|
||||||
users can specify a trusted flavor on the <command>nova
|
booting the instance. For example:</para>
|
||||||
boot</command> command:</para>
|
<screen><prompt>$</prompt> <userinput>nova boot --flavor m1.tiny --key_name myKeypairName --image myImageID newInstanceName</userinput></screen>
|
||||||
<mediaobject>
|
<figure xml:id="concept_trusted_pool">
|
||||||
<imageobject role="fo">
|
<title>Trusted compute pool</title>
|
||||||
<imagedata
|
<mediaobject>
|
||||||
fileref="figures/OpenStackTrustedComputePool2.png"
|
<imageobject role="fo">
|
||||||
format="PNG" contentwidth="6in"/>
|
<imagedata
|
||||||
</imageobject>
|
fileref="figures/OpenStackTrustedComputePool2.png"
|
||||||
<imageobject role="html">
|
format="PNG" contentwidth="6in"/>
|
||||||
<imagedata
|
</imageobject>
|
||||||
fileref="figures/OpenStackTrustedComputePool2.png"
|
<imageobject role="html">
|
||||||
format="PNG" contentwidth="6in"/>
|
<imagedata
|
||||||
</imageobject>
|
fileref="figures/OpenStackTrustedComputePool2.png"
|
||||||
</mediaobject>
|
format="PNG" contentwidth="6in"/>
|
||||||
|
</imageobject>
|
||||||
|
</mediaobject>
|
||||||
|
</figure>
|
||||||
|
</step>
|
||||||
|
</procedure>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
@ -92,7 +92,6 @@
|
|||||||
<xi:include href="compute/section_compute-scheduler.xml"/>
|
<xi:include href="compute/section_compute-scheduler.xml"/>
|
||||||
<xi:include href="compute/section_compute-cells.xml"/>
|
<xi:include href="compute/section_compute-cells.xml"/>
|
||||||
<xi:include href="compute/section_compute-conductor.xml"/>
|
<xi:include href="compute/section_compute-conductor.xml"/>
|
||||||
<xi:include href="compute/section_compute-security.xml"/>
|
|
||||||
<xi:include href="compute/section_compute-config-samples.xml"/>
|
<xi:include href="compute/section_compute-config-samples.xml"/>
|
||||||
<xi:include href="compute/section_nova-log-files.xml"/>
|
<xi:include href="compute/section_nova-log-files.xml"/>
|
||||||
<xi:include href="compute/section_compute-options-reference.xml"/>
|
<xi:include href="compute/section_compute-options-reference.xml"/>
|
||||||
|
Loading…
Reference in New Issue
Block a user