
Moved firewall section from CRG Compute to CAG Compute. Renamed log-file chapter to match other 'file' chapter. Moved Compute sections into files to trim down the massive Compute chapter file. Edited touched files. In section_cli_nova_volumes.xml, added example and one new option. In section_compute-rootwrap.xml, added note with NFS share info. In section_system-admin.xml: * Added new services. * Replaced deprecated nova-manage commands with nova. Change-Id: Ie300a9ce25d305b80bb0b21d3cfc318909f3a123
935 lines
56 KiB
XML
935 lines
56 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<section xml:id="section_compute-system-admin"
|
||
xmlns="http://docbook.org/ns/docbook"
|
||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||
version="5.0">
|
||
<title>System administration</title>
|
||
<para>By understanding how the different installed nodes
|
||
interact with each other, you can administer the Compute
|
||
installation. Compute offers many ways to install using
|
||
multiple servers but the general idea is that you can have
|
||
multiple compute nodes that control the virtual servers
|
||
and a cloud controller node that contains the remaining
|
||
Compute services.</para>
|
||
<para>The Compute cloud works through the interaction of a series of daemon processes named
|
||
<systemitem>nova-*</systemitem> that reside persistently on the host machine or
|
||
machines. These binaries can all run on the same machine or be spread out on multiple boxes
|
||
in a large deployment. The responsibilities of services and drivers are:</para>
|
||
<para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>Services:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para><systemitem class="service">nova-api</systemitem>. Receives xml
|
||
requests and sends them to the rest of the system. It is a wsgi app that
|
||
routes and authenticate requests. It supports the EC2 and OpenStack
|
||
APIs. There is a <filename>nova-api.conf</filename> file created when
|
||
you install Compute.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><systemitem>nova-cert</systemitem>. Provides the certificate
|
||
manager.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><systemitem class="service">nova-compute</systemitem>. Responsible for
|
||
managing virtual machines. It loads a Service object which exposes the
|
||
public methods on ComputeManager through Remote Procedure Call
|
||
(RPC).</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><systemitem>nova-conductor</systemitem>. Provides database-access
|
||
support for Compute nodes (thereby reducing security risks).</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><systemitem>nova-consoleauth</systemitem>. Handles console
|
||
authentication.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><systemitem class="service">nova-objectstore</systemitem>: The
|
||
<systemitem class="service">nova-objectstore</systemitem> service is
|
||
an ultra simple file-based storage system for images that replicates
|
||
most of the S3 API. It can be replaced with OpenStack Image Service and
|
||
a simple image manager or use OpenStack Object Storage as the virtual
|
||
machine image storage facility. It must reside on the same node as
|
||
<systemitem class="service">nova-compute</systemitem>.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><systemitem class="service">nova-network</systemitem>. Responsible for
|
||
managing floating and fixed IPs, DHCP, bridging and VLANs. It loads a
|
||
Service object which exposes the public methods on one of the subclasses
|
||
of NetworkManager. Different networking strategies are available to the
|
||
service by changing the network_manager configuration option to
|
||
FlatManager, FlatDHCPManager, or VlanManager (default is VLAN if no
|
||
other is specified).</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><systemitem>nova-scheduler</systemitem>. Dispatches requests for
|
||
new virtual machines to the correct node.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><systemitem>nova-novncproxy</systemitem>. Provides a VNC proxy for
|
||
browsers (enabling VNC consoles to access virtual machines).</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Some services have drivers that change how the service implements the core of
|
||
its functionality. For example, the <systemitem>nova-compute</systemitem>
|
||
service supports drivers that let you choose with which hypervisor type it will
|
||
talk. <systemitem>nova-network</systemitem> and
|
||
<systemitem>nova-scheduler</systemitem> also have drivers.</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
<section xml:id="section_compute-service-arch">
|
||
<title>Compute service architecture</title>
|
||
<para>The following basic categories describe the service architecture and what's going
|
||
on within the cloud controller.</para>
|
||
<simplesect>
|
||
<title>API server</title>
|
||
<para>At the heart of the cloud framework is an API server. This API server makes
|
||
command and control of the hypervisor, storage, and networking programmatically
|
||
available to users.</para>
|
||
<para>The API endpoints are basic HTTP web services
|
||
which handle authentication, authorization, and
|
||
basic command and control functions using various
|
||
API interfaces under the Amazon, Rackspace, and
|
||
related models. This enables API compatibility
|
||
with multiple existing tool sets created for
|
||
interaction with offerings from other vendors.
|
||
This broad compatibility prevents vendor
|
||
lock-in.</para>
|
||
</simplesect>
|
||
<simplesect>
|
||
<title>Message queue</title>
|
||
<para>A messaging queue brokers the interaction
|
||
between compute nodes (processing), the networking
|
||
controllers (software which controls network
|
||
infrastructure), API endpoints, the scheduler
|
||
(determines which physical hardware to allocate to
|
||
a virtual resource), and similar components.
|
||
Communication to and from the cloud controller is
|
||
by HTTP requests through multiple API
|
||
endpoints.</para>
|
||
<para>A typical message passing event begins with the API server receiving a request
|
||
from a user. The API server authenticates the user and ensures that the user is
|
||
permitted to issue the subject command. The availability of objects implicated in
|
||
the request is evaluated and, if available, the request is routed to the queuing
|
||
engine for the relevant workers. Workers continually listen to the queue based on
|
||
their role, and occasionally their type host name. When an applicable work request
|
||
arrives on the queue, the worker takes assignment of the task and begins its
|
||
execution. Upon completion, a response is dispatched to the queue which is received
|
||
by the API server and relayed to the originating user. Database entries are queried,
|
||
added, or removed as necessary throughout the process.</para>
|
||
</simplesect>
|
||
<simplesect>
|
||
<title>Compute worker</title>
|
||
<para>Compute workers manage computing instances on
|
||
host machines. The API dispatches commands to
|
||
compute workers to complete these tasks:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>Run instances</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Terminate instances</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Reboot instances</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Attach volumes</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Detach volumes</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Get console output</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</simplesect>
|
||
<simplesect>
|
||
<title>Network Controller</title>
|
||
<para>The Network Controller manages the networking
|
||
resources on host machines. The API server
|
||
dispatches commands through the message queue,
|
||
which are subsequently processed by Network
|
||
Controllers. Specific operations include:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>Allocate fixed IP addresses</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Configuring VLANs for projects</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Configuring networks for compute
|
||
nodes</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</simplesect>
|
||
</section>
|
||
<section xml:id="section_manage-compute-users">
|
||
<title>Manage Compute users</title>
|
||
<para>Access to the Euca2ools (ec2) API is controlled by
|
||
an access and secret key. The user’s access key needs
|
||
to be included in the request, and the request must be
|
||
signed with the secret key. Upon receipt of API
|
||
requests, Compute verifies the signature and runs
|
||
commands on behalf of the user.</para>
|
||
<para>To begin using Compute, you must create a user with
|
||
the Identity Service.</para>
|
||
</section>
|
||
<section xml:id="section_manage-the-cloud">
|
||
<title>Manage the cloud</title>
|
||
<para>A system administrator can use the <command>nova</command> client and the
|
||
<command>Euca2ools</command> commands to manage the cloud.</para>
|
||
<para>Both nova client and euca2ools can be used by all users, though specific commands
|
||
might be restricted by Role Based Access Control in the Identity Service.</para>
|
||
<procedure>
|
||
<title>To use the nova client</title>
|
||
<step>
|
||
<para>Installing the <package>python-novaclient</package> package gives you a
|
||
<code>nova</code> shell command that enables Compute API interactions from
|
||
the command line. Install the client, and then provide your user name and
|
||
password (typically set as environment variables for convenience), and then you
|
||
have the ability to send commands to your cloud on the command line.</para>
|
||
<para>To install <package>python-novaclient</package>, download the tarball from
|
||
<link
|
||
xlink:href="http://pypi.python.org/pypi/python-novaclient/2.6.3#downloads"
|
||
>http://pypi.python.org/pypi/python-novaclient/2.6.3#downloads</link> and
|
||
then install it in your favorite python environment.</para>
|
||
<screen><prompt>$</prompt> <userinput>curl -O http://pypi.python.org/packages/source/p/python-novaclient/python-novaclient-2.6.3.tar.gz</userinput>
|
||
<prompt>$</prompt> <userinput>tar -zxvf python-novaclient-2.6.3.tar.gz</userinput>
|
||
<prompt>$</prompt> <userinput>cd python-novaclient-2.6.3</userinput></screen>
|
||
<para>As <systemitem class="username">root</systemitem> execute:</para>
|
||
<screen><prompt>#</prompt> <userinput>python setup.py install</userinput></screen>
|
||
</step>
|
||
<step>
|
||
<para>Confirm the installation by running:</para>
|
||
<screen><prompt>$</prompt> <userinput>nova help</userinput>
|
||
<computeroutput>usage: nova [--version] [--debug] [--os-cache] [--timings]
|
||
[--timeout <seconds>] [--os-username <auth-user-name>]
|
||
[--os-password <auth-password>]
|
||
[--os-tenant-name <auth-tenant-name>]
|
||
[--os-tenant-id <auth-tenant-id>] [--os-auth-url <auth-url>]
|
||
[--os-region-name <region-name>] [--os-auth-system <auth-system>]
|
||
[--service-type <service-type>] [--service-name <service-name>]
|
||
[--volume-service-name <volume-service-name>]
|
||
[--endpoint-type <endpoint-type>]
|
||
[--os-compute-api-version <compute-api-ver>]
|
||
[--os-cacert <ca-certificate>] [--insecure]
|
||
[--bypass-url <bypass-url>]
|
||
<subcommand> ...</computeroutput></screen>
|
||
<note><para>This command returns a list of <command>nova</command> commands and parameters. To obtain help
|
||
for a subcommand, run:</para>
|
||
<screen><prompt>$</prompt> <userinput>nova help <replaceable>subcommand</replaceable></userinput></screen>
|
||
<para>You can also refer to the <link
|
||
xlink:href="http://docs.openstack.org/cli-reference/content/">
|
||
<citetitle>OpenStack Command-Line Reference</citetitle></link>
|
||
for a complete listing of <command>nova</command>
|
||
commands and parameters.</para></note>
|
||
</step>
|
||
<step>
|
||
<para>Set the required parameters as environment variables to make running
|
||
commands easier. For example, you can add <parameter>--os-username</parameter>
|
||
as a <command>nova</command> option, or set it as an environment variable. To
|
||
set the user name, password, and tenant as environment variables, use:</para>
|
||
<screen><prompt>$</prompt> <userinput>export OS_USERNAME=joecool</userinput>
|
||
<prompt>$</prompt> <userinput>export OS_PASSWORD=coolword</userinput>
|
||
<prompt>$</prompt> <userinput>export OS_TENANT_NAME=coolu</userinput> </screen>
|
||
</step>
|
||
<step>
|
||
<para>Using the Identity Service, you are supplied with an authentication
|
||
endpoint, which Compute recognizes as the <literal>OS_AUTH_URL</literal>.</para>
|
||
<para>
|
||
<screen><prompt>$</prompt> <userinput>export OS_AUTH_URL=http://hostname:5000/v2.0</userinput>
|
||
<prompt>$</prompt> <userinput>export NOVA_VERSION=1.1</userinput></screen>
|
||
</para>
|
||
</step>
|
||
</procedure>
|
||
<simplesect>
|
||
<title>Use the euca2ools commands</title>
|
||
<para>For a command-line interface to EC2 API calls, use the
|
||
<command>euca2ools</command> command-line tool. See <link
|
||
xlink:href="http://open.eucalyptus.com/wiki/Euca2oolsGuide_v1.3"
|
||
>http://open.eucalyptus.com/wiki/Euca2oolsGuide_v1.3</link></para>
|
||
</simplesect>
|
||
</section>
|
||
<xi:include
|
||
href="../../common/section_cli_nova_usage_statistics.xml"/>
|
||
<section xml:id="section_manage-logs">
|
||
<title>Manage logs</title>
|
||
<simplesect>
|
||
<title>Logging module</title>
|
||
<para>To specify a configuration file to change the logging behavior, add this line to
|
||
the <filename>/etc/nova/nova.conf</filename> file . To change the logging level,
|
||
such as <literal>DEBUG</literal>, <literal>INFO</literal>,
|
||
<literal>WARNING</literal>, <literal>ERROR</literal>), use:
|
||
<programlisting language="ini">log-config=/etc/nova/logging.conf</programlisting></para>
|
||
<para>The logging configuration file is an ini-style configuration file, which must
|
||
contain a section called <literal>logger_nova</literal>, which controls the behavior
|
||
of the logging facility in the <literal>nova-*</literal> services. For
|
||
example:<programlisting language="ini">[logger_nova]
|
||
level = INFO
|
||
handlers = stderr
|
||
qualname = nova</programlisting></para>
|
||
<para>This example sets the debugging level to <literal>INFO</literal> (which less
|
||
verbose than the default <literal>DEBUG</literal> setting). <itemizedlist>
|
||
<listitem>
|
||
<para>For more details on the logging configuration syntax, including the
|
||
meaning of the <literal>handlers</literal> and
|
||
<literal>quaname</literal> variables, see the <link
|
||
xlink:href="http://docs.python.org/release/2.7/library/logging.html#configuration-file-format"
|
||
>Python documentation on logging configuration file format
|
||
</link>f.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>For an example <filename>logging.conf</filename> file with various
|
||
defined handlers, see the
|
||
<link xlink:href="http://docs.openstack.org/trunk/config-reference/content/">
|
||
<citetitle>OpenStack Configuration Reference</citetitle></link>.</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</simplesect>
|
||
<simplesect>
|
||
<title>Syslog</title>
|
||
<para>You can configure OpenStack Compute services to send logging information to
|
||
<systemitem>syslog</systemitem>. This is useful if you want to use
|
||
<systemitem>rsyslog</systemitem>, which forwards the logs to a remote machine.
|
||
You need to separately configure the Compute service (nova), the Identity service
|
||
(keystone), the Image Service (glance), and, if you are using it, the Block Storage
|
||
service (cinder) to send log messages to <systemitem>syslog</systemitem>. To do so,
|
||
add the following lines to:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para><filename>/etc/nova/nova.conf</filename></para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><filename>/etc/keystone/keystone.conf</filename></para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><filename>/etc/glance/glance-api.conf</filename></para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><filename>/etc/glance/glance-registry.conf</filename></para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><filename>/etc/cinder/cinder.conf</filename></para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<programlisting language="ini">verbose = False
|
||
debug = False
|
||
use_syslog = True
|
||
syslog_log_facility = LOG_LOCAL0</programlisting>
|
||
<para>In addition to enabling <systemitem>syslog</systemitem>, these settings also
|
||
turn off more verbose output and debugging output from the log.<note>
|
||
<para>Although the example above uses the same local facility for each service
|
||
(<literal>LOG_LOCAL0</literal>, which corresponds to
|
||
<systemitem>syslog</systemitem> facility <literal>LOCAL0</literal>), we
|
||
recommend that you configure a separate local facility for each service, as
|
||
this provides better isolation and more flexibility. For example, you may
|
||
want to capture logging information at different severity levels for
|
||
different services. <systemitem>syslog</systemitem> allows you to define up
|
||
to seven local facilities, <literal>LOCAL0, LOCAL1, ..., LOCAL7</literal>.
|
||
For more details, see the <systemitem>syslog</systemitem>
|
||
documentation.</para>
|
||
</note></para>
|
||
</simplesect>
|
||
<simplesect>
|
||
<title>Rsyslog</title>
|
||
<para><systemitem>rsyslog</systemitem> is a useful tool for setting up a centralized
|
||
log server across multiple machines. We briefly describe the configuration to set up
|
||
an <systemitem>rsyslog</systemitem> server; a full treatment of
|
||
<systemitem>rsyslog</systemitem> is beyond the scope of this document. We assume
|
||
<systemitem>rsyslog</systemitem> has already been installed on your hosts
|
||
(default for most Linux distributions).</para>
|
||
<para>This example provides a minimal configuration for
|
||
<filename>/etc/rsyslog.conf</filename> on the log server host, which receives
|
||
the log files:</para>
|
||
<programlisting language="bash"># provides TCP syslog reception
|
||
$ModLoad imtcp
|
||
$InputTCPServerRun 1024</programlisting>
|
||
<para>Add a filter rule to <filename>/etc/rsyslog.conf</filename> which looks for a
|
||
host name. The example below uses <replaceable>compute-01</replaceable> as an
|
||
example of a compute host name:</para>
|
||
<programlisting language="bash">:hostname, isequal, "<replaceable>compute-01</replaceable>" /mnt/rsyslog/logs/compute-01.log</programlisting>
|
||
<para>On each compute host, create a file named
|
||
<filename>/etc/rsyslog.d/60-nova.conf</filename>, with the following
|
||
content:</para>
|
||
<programlisting language="bash"># prevent debug from dnsmasq with the daemon.none parameter
|
||
*.*;auth,authpriv.none,daemon.none,local0.none -/var/log/syslog
|
||
# Specify a log level of ERROR
|
||
local0.error @@172.20.1.43:1024</programlisting>
|
||
<para>Once you have created this file, restart your <systemitem>rsyslog</systemitem>
|
||
daemon. Error-level log messages on the compute hosts should now be sent to your log
|
||
server.</para>
|
||
</simplesect>
|
||
</section>
|
||
<xi:include href="section_compute-rootwrap.xml"/>
|
||
<xi:include href="section_compute-configure-migrations.xml"/>
|
||
<section xml:id="section_live-migration-usage">
|
||
<title>Migrate instances</title>
|
||
<para>Before starting migrations, review the <link linkend="section_configuring-compute-migrations">Configure migrations section</link>.</para>
|
||
<para>Migration provides a scheme to migrate running
|
||
instances from one OpenStack Compute server to another
|
||
OpenStack Compute server.</para>
|
||
<procedure>
|
||
<title>To migrate instances</title>
|
||
<step>
|
||
<para>Look at the running instances, to get the ID
|
||
of the instance you wish to migrate.</para>
|
||
<screen><prompt>$</prompt> <userinput>nova list</userinput>
|
||
<computeroutput><![CDATA[+--------------------------------------+------+--------+-----------------+
|
||
| ID | Name | Status |Networks |
|
||
+--------------------------------------+------+--------+-----------------+
|
||
| d1df1b5a-70c4-4fed-98b7-423362f2c47c | vm1 | ACTIVE | private=a.b.c.d |
|
||
| d693db9e-a7cf-45ef-a7c9-b3ecb5f22645 | vm2 | ACTIVE | private=e.f.g.h |
|
||
+--------------------------------------+------+--------+-----------------+]]></computeroutput></screen>
|
||
</step>
|
||
<step>
|
||
<para>Look at information associated with that instance. This example uses 'vm1'
|
||
from above.</para>
|
||
<screen><prompt>$</prompt> <userinput>nova show d1df1b5a-70c4-4fed-98b7-423362f2c47c</userinput>
|
||
<computeroutput><![CDATA[+-------------------------------------+----------------------------------------------------------+
|
||
| Property | Value |
|
||
+-------------------------------------+----------------------------------------------------------+
|
||
...
|
||
| OS-EXT-SRV-ATTR:host | HostB |
|
||
...
|
||
| flavor | m1.tiny |
|
||
| id | d1df1b5a-70c4-4fed-98b7-423362f2c47c |
|
||
| name | vm1 |
|
||
| private network | a.b.c.d |
|
||
| status | ACTIVE |
|
||
...
|
||
+-------------------------------------+----------------------------------------------------------+]]></computeroutput></screen>
|
||
<para>In this example, vm1 is running on HostB.</para>
|
||
</step>
|
||
<step>
|
||
<para>Select the server to which instances will be migrated:</para>
|
||
<screen><prompt>#</prompt> <userinput>nova service-list</userinput>
|
||
<computeroutput>+------------------+------------+----------+---------+-------+----------------------------+-----------------+
|
||
| Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
|
||
+------------------+------------+----------+---------+-------+----------------------------+-----------------+
|
||
| nova-consoleauth | HostA | internal | enabled | up | 2014-03-25T10:33:25.000000 | - |
|
||
| nova-scheduler | HostA | internal | enabled | up | 2014-03-25T10:33:25.000000 | - |
|
||
| nova-conductor | HostA | internal | enabled | up | 2014-03-25T10:33:27.000000 | - |
|
||
| nova-compute | HostB | nova | enabled | up | 2014-03-25T10:33:31.000000 | - |
|
||
| nova-compute | HostC | nova | enabled | up | 2014-03-25T10:33:31.000000 | - |
|
||
| nova-cert | HostA | internal | enabled | up | 2014-03-25T10:33:31.000000 | - |
|
||
+------------------+-----------------------+----------+---------+-------+----------------------------+-----------------+</computeroutput>
|
||
</screen>
|
||
<para>In this example, HostC can be picked up
|
||
because <systemitem class="service">nova-compute</systemitem>
|
||
is running on it.</para>
|
||
</step>
|
||
<step>
|
||
<para>Ensure that HostC has enough resources for
|
||
migration.</para>
|
||
<screen><prompt>#</prompt> <userinput>nova host-describe HostC</userinput>
|
||
<computeroutput>+-----------+------------+-----+-----------+---------+
|
||
| HOST | PROJECT | cpu | memory_mb | disk_gb |
|
||
+-----------+------------+-----+-----------+---------+
|
||
| HostC | (total) | 16 | 32232 | 878 |
|
||
| HostC | (used_now) | 13 | 21284 | 442 |
|
||
| HostC | (used_max) | 13 | 21284 | 442 |
|
||
| HostC | p1 | 13 | 21284 | 442 |
|
||
| HostC | p2 | 13 | 21284 | 442 |
|
||
+-----------+------------+-----+-----------+---------+</computeroutput>
|
||
</screen>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para><emphasis role="bold"
|
||
>cpu:</emphasis>the number of
|
||
cpu</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><emphasis role="bold">memory_mb:</emphasis>total amount of memory
|
||
(in MB)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><emphasis role="bold">disk_gb:</emphasis>total amount of space for
|
||
NOVA-INST-DIR/instances (in GB)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><emphasis role="bold">1st line shows </emphasis>total amount of
|
||
resources for the physical server.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><emphasis role="bold">2nd line shows </emphasis>currently used
|
||
resources.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><emphasis role="bold">3rd line shows </emphasis>maximum used
|
||
resources.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><emphasis role="bold">4th line and
|
||
under</emphasis> shows the resource
|
||
for each project.</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</step>
|
||
<step>
|
||
<para>Use the <command>nova live-migration</command> command to migrate the
|
||
instances:<screen><prompt>$</prompt> <userinput>nova live-migration <replaceable>server</replaceable> <replaceable>host_name</replaceable> </userinput></screen></para>
|
||
<para>Where <replaceable>server</replaceable> can be either the server's ID or name.
|
||
For example:</para>
|
||
<screen><prompt>$</prompt> <userinput>nova live-migration d1df1b5a-70c4-4fed-98b7-423362f2c47c HostC</userinput><computeroutput>
|
||
<![CDATA[Migration of d1df1b5a-70c4-4fed-98b7-423362f2c47c initiated.]]></computeroutput></screen>
|
||
<para>Ensure instances are migrated successfully with <command>nova
|
||
list</command>. If instances are still running on HostB, check log files
|
||
(src/dest <systemitem class="service">nova-compute</systemitem> and <systemitem
|
||
class="service">nova-scheduler</systemitem>) to determine why. <note>
|
||
<para>Although the <command>nova</command> command is called
|
||
<command>live-migration</command>, under the default Compute
|
||
configuration options the instances are suspended before
|
||
migration.</para>
|
||
<para>For more details, see <link
|
||
xlink:href="http://docs.openstack.org/trunk/config-reference/content/configuring-openstack-compute-basics.html"
|
||
>Configure migrations</link> in <citetitle>OpenStack Configuration
|
||
Reference</citetitle>.</para>
|
||
</note>
|
||
</para>
|
||
</step>
|
||
</procedure>
|
||
</section>
|
||
<section xml:id="section_nova-compute-node-down">
|
||
<title>Recover from a failed compute node</title>
|
||
<para>If you have deployed Compute with a shared file
|
||
system, you can quickly recover from a failed compute
|
||
node. Of the two methods covered in these sections,
|
||
the evacuate API is the preferred method even in the
|
||
absence of shared storage. The evacuate API provides
|
||
many benefits over manual recovery, such as
|
||
re-attachment of volumes and floating IPs.</para>
|
||
<xi:include href="../../common/section_cli_nova_evacuate.xml"/>
|
||
<section xml:id="nova-compute-node-down-manual-recovery">
|
||
<title>Manual recovery</title>
|
||
<para>For KVM/libvirt compute node recovery, see the previous section. Use the
|
||
following procedure for all other hypervisors.</para>
|
||
<procedure>
|
||
<title>To work with host information</title>
|
||
<step>
|
||
<para>Identify the VMs on the affected hosts, using tools such as a
|
||
combination of <literal>nova list</literal> and <literal>nova show</literal>
|
||
or <literal>euca-describe-instances</literal>. Here's an example using the
|
||
EC2 API - instance i-000015b9 that is running on node np-rcc54:</para>
|
||
<programlisting language="bash">i-000015b9 at3-ui02 running nectarkey (376, np-rcc54) 0 m1.xxlarge 2012-06-19T00:48:11.000Z 115.146.93.60</programlisting>
|
||
</step>
|
||
<step>
|
||
<para>You can review the status of the host by using the Compute database.
|
||
Some of the important information is highlighted below. This example
|
||
converts an EC2 API instance ID into an OpenStack ID; if you used the
|
||
<literal>nova</literal> commands, you can substitute the ID directly.
|
||
You can find the credentials for your database in
|
||
<filename>/etc/nova.conf</filename>.</para>
|
||
<programlisting language="bash">SELECT * FROM instances WHERE id = CONV('15b9', 16, 10) \G;
|
||
*************************** 1. row ***************************
|
||
created_at: 2012-06-19 00:48:11
|
||
updated_at: 2012-07-03 00:35:11
|
||
deleted_at: NULL
|
||
...
|
||
id: 5561
|
||
...
|
||
power_state: 5
|
||
vm_state: shutoff
|
||
...
|
||
hostname: at3-ui02
|
||
host: np-rcc54
|
||
...
|
||
uuid: 3f57699a-e773-4650-a443-b4b37eed5a06
|
||
...
|
||
task_state: NULL
|
||
...</programlisting>
|
||
</step>
|
||
</procedure>
|
||
<procedure>
|
||
<title>To recover the VM</title>
|
||
<step>
|
||
<para>When you know the status of the VM on the failed host, determine to
|
||
which compute host the affected VM should be moved. For example, run the
|
||
following database command to move the VM to np-rcc46:</para>
|
||
<programlisting language="bash">UPDATE instances SET host = 'np-rcc46' WHERE uuid = '3f57699a-e773-4650-a443-b4b37eed5a06'; </programlisting>
|
||
</step>
|
||
<step>
|
||
<para>If using a hypervisor that relies on libvirt (such as KVM), it is a
|
||
good idea to update the <literal>libvirt.xml</literal> file (found in
|
||
<literal>/var/lib/nova/instances/[instance ID]</literal>). The important
|
||
changes to make are:</para>
|
||
<para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>Change the <literal>DHCPSERVER</literal> value to the host IP
|
||
address of the compute host that is now the VM's new
|
||
home.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Update the VNC IP if it isn't already to:
|
||
<literal>0.0.0.0</literal>.</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</step>
|
||
<step>
|
||
<para>Reboot the VM:</para>
|
||
<screen><prompt>$</prompt> <userinput>nova reboot --hard 3f57699a-e773-4650-a443-b4b37eed5a06</userinput></screen>
|
||
</step>
|
||
</procedure>
|
||
<para>In theory, the above database update and <literal>nova
|
||
reboot</literal> command are all that is required to recover a VM from a
|
||
failed host. However, if further problems occur, consider looking at
|
||
recreating the network filter configuration using <literal>virsh</literal>,
|
||
restarting the Compute services or updating the <literal>vm_state</literal>
|
||
and <literal>power_state</literal> in the Compute database.</para>
|
||
</section>
|
||
</section>
|
||
<section xml:id="section_nova-uid-mismatch">
|
||
<title>Recover from a UID/GID mismatch</title>
|
||
<para>When running OpenStack compute, using a shared file
|
||
system or an automated configuration tool, you could
|
||
encounter a situation where some files on your compute
|
||
node are using the wrong UID or GID. This causes a
|
||
raft of errors, such as being unable to live migrate,
|
||
or start virtual machines.</para>
|
||
<para>The following procedure runs on <systemitem class="service"
|
||
>nova-compute</systemitem> hosts, based on the KVM hypervisor, and could help to
|
||
restore the situation:</para>
|
||
<procedure>
|
||
<title>To recover from a UID/GID mismatch</title>
|
||
<step>
|
||
<para>Ensure you don't use numbers that are already used for some other
|
||
user/group.</para>
|
||
</step>
|
||
<step>
|
||
<para>Set the nova uid in <filename>/etc/passwd</filename> to the same number in
|
||
all hosts (for example, 112).</para>
|
||
</step>
|
||
<step>
|
||
<para>Set the libvirt-qemu uid in
|
||
<filename>/etc/passwd</filename> to the
|
||
same number in all hosts (for example,
|
||
119).</para>
|
||
</step>
|
||
<step>
|
||
<para>Set the nova group in
|
||
<filename>/etc/group</filename> file to
|
||
the same number in all hosts (for example,
|
||
120).</para>
|
||
</step>
|
||
<step>
|
||
<para>Set the libvirtd group in
|
||
<filename>/etc/group</filename> file to
|
||
the same number in all hosts (for example,
|
||
119).</para>
|
||
</step>
|
||
<step>
|
||
<para>Stop the services on the compute
|
||
node.</para>
|
||
</step>
|
||
<step>
|
||
<para>Change all the files owned by user nova or
|
||
by group nova. For example:</para>
|
||
<programlisting language="bash">find / -uid 108 -exec chown nova {} \; # note the 108 here is the old nova uid before the change
|
||
find / -gid 120 -exec chgrp nova {} \;</programlisting>
|
||
</step>
|
||
<step>
|
||
<para>Repeat the steps for the libvirt-qemu owned files if those needed to
|
||
change.</para>
|
||
</step>
|
||
<step>
|
||
<para>Restart the services.</para>
|
||
</step>
|
||
<step>
|
||
<para>Now you can run the <command>find</command>
|
||
command to verify that all files using the
|
||
correct identifiers.</para>
|
||
</step>
|
||
</procedure>
|
||
</section>
|
||
<section xml:id="section_nova-disaster-recovery-process">
|
||
<title>Compute disaster recovery process</title>
|
||
<para>Use the following procedures to manage your cloud after a disaster, and to easily
|
||
back up its persistent storage volumes. Backups <emphasis role="bold">are</emphasis>
|
||
mandatory, even outside of disaster scenarios.</para>
|
||
<para>For a DRP definition, see <link
|
||
xlink:href="http://en.wikipedia.org/wiki/Disaster_Recovery_Plan"
|
||
>http://en.wikipedia.org/wiki/Disaster_Recovery_Plan</link>.</para>
|
||
<simplesect>
|
||
<title>A- The disaster recovery process
|
||
presentation</title>
|
||
<para>A disaster could happen to several components of
|
||
your architecture: a disk crash, a network loss, a
|
||
power cut, and so on. In this example, assume the
|
||
following set up:</para>
|
||
<orderedlist>
|
||
<listitem>
|
||
<para>A cloud controller (<systemitem>nova-api</systemitem>,
|
||
<systemitem>nova-objecstore</systemitem>,
|
||
<systemitem>nova-network</systemitem>)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>A compute node (<systemitem
|
||
class="service"
|
||
>nova-compute</systemitem>)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>A Storage Area Network used by
|
||
<systemitem class="service"
|
||
>cinder-volumes</systemitem> (aka
|
||
SAN)</para>
|
||
</listitem>
|
||
</orderedlist>
|
||
<para>The disaster example is the worst one: a power
|
||
loss. That power loss applies to the three
|
||
components. <emphasis role="italic">Let's see what
|
||
runs and how it runs before the
|
||
crash</emphasis>:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>From the SAN to the cloud controller, we
|
||
have an active iscsi session (used for the
|
||
"cinder-volumes" LVM's VG).</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>From the cloud controller to the compute node, we also have active
|
||
iscsi sessions (managed by <systemitem class="service"
|
||
>cinder-volume</systemitem>).</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>For every volume, an iscsi session is made (so 14 ebs volumes equals
|
||
14 sessions).</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>From the cloud controller to the compute node, we also have iptables/
|
||
ebtables rules which allow access from the cloud controller to the running
|
||
instance.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>And at least, from the cloud controller to the compute node; saved
|
||
into database, the current state of the instances (in that case "running" ),
|
||
and their volumes attachment (mount point, volume ID, volume status, and so
|
||
on.)</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>Now, after the power loss occurs and all
|
||
hardware components restart, the situation is as
|
||
follows:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>From the SAN to the cloud, the ISCSI
|
||
session no longer exists.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>From the cloud controller to the compute
|
||
node, the ISCSI sessions no longer exist.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>From the cloud controller to the compute node, the iptables and
|
||
ebtables are recreated, since, at boot,
|
||
<systemitem>nova-network</systemitem> reapplies the
|
||
configurations.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>From the cloud controller, instances are in a shutdown state (because
|
||
they are no longer running)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>In the database, data was not updated at all, since Compute could not
|
||
have anticipated the crash.</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>Before going further, and to prevent the administrator from making fatal
|
||
mistakes,<emphasis role="bold"> the instances won't be lost</emphasis>, because
|
||
no "<command role="italic">destroy</command>" or "<command role="italic"
|
||
>terminate</command>" command was invoked, so the files for the instances remain
|
||
on the compute node.</para>
|
||
<para>Perform these tasks in this exact order. <emphasis role="underline">Any extra
|
||
step would be dangerous at this stage</emphasis> :</para>
|
||
<para>
|
||
<orderedlist>
|
||
<listitem>
|
||
<para>Get the current relation from a
|
||
volume to its instance, so that you
|
||
can recreate the attachment.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Update the database to clean the
|
||
stalled state. (After that, you cannot
|
||
perform the first step).</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Restart the instances. In other
|
||
words, go from a shutdown to running
|
||
state.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>After the restart, reattach the volumes to their respective
|
||
instances (optional).</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>SSH into the instances to reboot them.</para>
|
||
</listitem>
|
||
</orderedlist>
|
||
</para>
|
||
</simplesect>
|
||
<simplesect>
|
||
<title>B - Disaster recovery</title>
|
||
<procedure>
|
||
<title>To perform disaster recovery</title>
|
||
<step>
|
||
<title>Get the instance-to-volume
|
||
relationship</title>
|
||
<para>You must get the current relationship from a volume to its instance,
|
||
because you will re-create the attachment.</para>
|
||
<para>You can find this relationship by running <command>nova
|
||
volume-list</command>. Note that the <command>nova</command> client
|
||
includes the ability to get volume information from Block Storage.</para>
|
||
</step>
|
||
<step>
|
||
<title>Update the database</title>
|
||
<para>Update the database to clean the stalled state. You must restore for
|
||
every volume, using these queries to clean up the database:</para>
|
||
<screen><prompt>mysql></prompt> <userinput>use cinder;</userinput>
|
||
<prompt>mysql></prompt> <userinput>update volumes set mountpoint=NULL;</userinput>
|
||
<prompt>mysql></prompt> <userinput>update volumes set status="available" where status <>"error_deleting";</userinput>
|
||
<prompt>mysql></prompt> <userinput>update volumes set attach_status="detached";</userinput>
|
||
<prompt>mysql></prompt> <userinput>update volumes set instance_id=0;</userinput></screen>
|
||
<para>Then, when you run <command>nova volume-list</command> commands, all
|
||
volumes appear in the listing.</para>
|
||
</step>
|
||
<step>
|
||
<title>Restart instances</title>
|
||
<para>Restart the instances using the <command>nova reboot
|
||
<replaceable>$instance</replaceable></command> command.</para>
|
||
<para>At this stage, depending on your image, some instances completely
|
||
reboot and become reachable, while others stop on the "plymouth"
|
||
stage.</para>
|
||
</step>
|
||
<step>
|
||
<title>DO NOT reboot a second time</title>
|
||
<para>Do not reboot instances that are stopped at this point. Instance state
|
||
depends on whether you added an <filename>/etc/fstab</filename> entry for
|
||
that volume. Images built with the <package>cloud-init</package> package
|
||
remain in a pending state, while others skip the missing volume and start.
|
||
The idea of that stage is only to ask nova to reboot every instance, so the
|
||
stored state is preserved. For more information about
|
||
<package>cloud-init</package>, see <link
|
||
xlink:href="https://help.ubuntu.com/community/CloudInit"
|
||
>help.ubuntu.com/community/CloudInit</link>.</para>
|
||
</step>
|
||
<step>
|
||
<title>Reattach volumes</title>
|
||
<para>After the restart, you can reattach the volumes to their respective
|
||
instances. Now that <command>nova</command> has restored the right status,
|
||
it is time to perform the attachments through a <command>nova
|
||
volume-attach</command></para>
|
||
<para>This simple snippet uses the created
|
||
file:</para>
|
||
<programlisting language="bash">#!/bin/bash
|
||
|
||
while read line; do
|
||
volume=`echo $line | $CUT -f 1 -d " "`
|
||
instance=`echo $line | $CUT -f 2 -d " "`
|
||
mount_point=`echo $line | $CUT -f 3 -d " "`
|
||
echo "ATTACHING VOLUME FOR INSTANCE - $instance"
|
||
nova volume-attach $instance $volume $mount_point
|
||
sleep 2
|
||
done < $volumes_tmp_file</programlisting>
|
||
<para>At that stage, instances that were
|
||
pending on the boot sequence (<emphasis
|
||
role="italic">plymouth</emphasis>)
|
||
automatically continue their boot, and
|
||
restart normally, while the ones that
|
||
booted see the volume.</para>
|
||
</step>
|
||
<step>
|
||
<title>SSH into instances</title>
|
||
<para>If some services depend on the volume, or if a volume has an entry
|
||
into <systemitem>fstab</systemitem>, it could be good to simply restart the
|
||
instance. This restart needs to be made from the instance itself, not
|
||
through <command>nova</command>. So, we SSH into the instance and perform a
|
||
reboot:</para>
|
||
<screen><prompt>#</prompt> <userinput>shutdown -r now</userinput></screen>
|
||
</step>
|
||
</procedure>
|
||
<para>By completing this procedure, you can
|
||
successfully recover your cloud.</para>
|
||
<note>
|
||
<para>Follow these guidelines:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>Use the <parameter> errors=remount</parameter> parameter in the
|
||
<filename>fstab</filename> file, which prevents data
|
||
corruption.</para>
|
||
<para>The system locks any write to the disk if it detects an I/O error.
|
||
This configuration option should be added into the <systemitem
|
||
class="service">cinder-volume</systemitem> server (the one which
|
||
performs the ISCSI connection to the SAN), but also into the instances'
|
||
<filename>fstab</filename> file.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Do not add the entry for the SAN's disks to the <systemitem
|
||
class="service">cinder-volume</systemitem>'s
|
||
<filename>fstab</filename> file.</para>
|
||
<para>Some systems hang on that step, which means you could lose access to
|
||
your cloud-controller. To re-run the session manually, you would run the
|
||
following command before performing the mount:
|
||
<screen><prompt>#</prompt> <userinput>iscsiadm -m discovery -t st -p $SAN_IP $ iscsiadm -m node --target-name $IQN -p $SAN_IP -l</userinput></screen></para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>For your instances, if you have the whole <filename>/home/</filename>
|
||
directory on the disk, instead of emptying the
|
||
<filename>/home</filename> directory and map the disk on it, leave a
|
||
user's directory with the user's bash files and the
|
||
<filename>authorized_keys</filename> file.</para>
|
||
<para>This enables you to connect to the instance, even without the volume
|
||
attached, if you allow only connections through public keys.</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</note>
|
||
</simplesect>
|
||
<simplesect>
|
||
<title>C - Scripted DRP</title>
|
||
<procedure>
|
||
<title>To use scripted DRP</title>
|
||
<para>You can download from <link
|
||
xlink:href="https://github.com/Razique/BashStuff/blob/master/SYSTEMS/OpenStack/SCR_5006_V00_NUAC-OPENSTACK-DRP-OpenStack.sh"
|
||
>here</link> a bash script which performs
|
||
these steps:</para>
|
||
<step>
|
||
<para>The "test mode" allows you to perform
|
||
that whole sequence for only one
|
||
instance.</para>
|
||
</step>
|
||
<step>
|
||
<para>To reproduce the power loss, connect to
|
||
the compute node which runs that same
|
||
instance and close the iscsi session.
|
||
<emphasis role="underline">Do not
|
||
detach the volume through
|
||
<command>nova
|
||
volume-detach</command></emphasis>,
|
||
but instead manually close the iscsi
|
||
session.</para>
|
||
</step>
|
||
<step>
|
||
<para>In this example, the iscsi session is
|
||
number 15 for that instance:</para>
|
||
<screen><prompt>#</prompt> <userinput>iscsiadm -m session -u -r 15</userinput></screen>
|
||
</step>
|
||
<step>
|
||
<para>Do not forget the <literal>-r</literal>
|
||
flag. Otherwise, you close ALL
|
||
sessions.</para>
|
||
</step>
|
||
</procedure>
|
||
</simplesect>
|
||
</section>
|
||
</section>
|