System administrationBy understanding how the different installed nodes
interact with each other, you can administer the Compute
installation. Compute offers many ways to install using
multiple servers but the general idea is that you can have
multiple compute nodes that control the virtual servers
and a cloud controller node that contains the remaining
Compute services.The Compute cloud works through the interaction of a series of daemon processes named
nova-* that reside persistently on the host machine or
machines. These binaries can all run on the same machine or be spread out on multiple boxes
in a large deployment. The responsibilities of services and drivers are:Services:nova-api. Receives xml
requests and sends them to the rest of the system. It is a wsgi app that
routes and authenticate requests. It supports the EC2 and OpenStack
APIs. There is a nova-api.conf file created when
you install Compute.nova-cert. Provides the certificate
manager.nova-compute. Responsible for
managing virtual machines. It loads a Service object which exposes the
public methods on ComputeManager through Remote Procedure Call
(RPC).nova-conductor. Provides database-access
support for Compute nodes (thereby reducing security risks).nova-consoleauth. Handles console
authentication.nova-objectstore: The
nova-objectstore service is
an ultra simple file-based storage system for images that replicates
most of the S3 API. It can be replaced with OpenStack Image Service and
a simple image manager or use OpenStack Object Storage as the virtual
machine image storage facility. It must reside on the same node as
nova-compute.nova-network. Responsible for
managing floating and fixed IPs, DHCP, bridging and VLANs. It loads a
Service object which exposes the public methods on one of the subclasses
of NetworkManager. Different networking strategies are available to the
service by changing the network_manager configuration option to
FlatManager, FlatDHCPManager, or VlanManager (default is VLAN if no
other is specified).nova-scheduler. Dispatches requests for
new virtual machines to the correct node.nova-novncproxy. Provides a VNC proxy for
browsers (enabling VNC consoles to access virtual machines).Some services have drivers that change how the service implements the core of
its functionality. For example, the nova-compute
service supports drivers that let you choose with which hypervisor type it will
talk. nova-network and
nova-scheduler also have drivers.Compute service architectureThe following basic categories describe the service architecture and what's going
on within the cloud controller.API serverAt the heart of the cloud framework is an API server. This API server makes
command and control of the hypervisor, storage, and networking programmatically
available to users.The API endpoints are basic HTTP web services
which handle authentication, authorization, and
basic command and control functions using various
API interfaces under the Amazon, Rackspace, and
related models. This enables API compatibility
with multiple existing tool sets created for
interaction with offerings from other vendors.
This broad compatibility prevents vendor
lock-in.Message queueA messaging queue brokers the interaction
between compute nodes (processing), the networking
controllers (software which controls network
infrastructure), API endpoints, the scheduler
(determines which physical hardware to allocate to
a virtual resource), and similar components.
Communication to and from the cloud controller is
by HTTP requests through multiple API
endpoints.A typical message passing event begins with the API server receiving a request
from a user. The API server authenticates the user and ensures that the user is
permitted to issue the subject command. The availability of objects implicated in
the request is evaluated and, if available, the request is routed to the queuing
engine for the relevant workers. Workers continually listen to the queue based on
their role, and occasionally their type host name. When an applicable work request
arrives on the queue, the worker takes assignment of the task and begins its
execution. Upon completion, a response is dispatched to the queue which is received
by the API server and relayed to the originating user. Database entries are queried,
added, or removed as necessary throughout the process.Compute workerCompute workers manage computing instances on
host machines. The API dispatches commands to
compute workers to complete these tasks:Run instancesTerminate instancesReboot instancesAttach volumesDetach volumesGet console outputNetwork ControllerThe Network Controller manages the networking
resources on host machines. The API server
dispatches commands through the message queue,
which are subsequently processed by Network
Controllers. Specific operations include:Allocate fixed IP addressesConfiguring VLANs for projectsConfiguring networks for compute
nodesManage Compute usersAccess to the Euca2ools (ec2) API is controlled by
an access and secret key. The user’s access key needs
to be included in the request, and the request must be
signed with the secret key. Upon receipt of API
requests, Compute verifies the signature and runs
commands on behalf of the user.To begin using Compute, you must create a user with
the Identity Service.Manage the cloudA system administrator can use the nova client and the
Euca2ools commands to manage the cloud.Both nova client and euca2ools can be used by all users, though specific commands
might be restricted by Role Based Access Control in the Identity Service.To use the nova clientInstalling the python-novaclient package gives you a
nova shell command that enables Compute API interactions from
the command line. Install the client, and then provide your user name and
password (typically set as environment variables for convenience), and then you
have the ability to send commands to your cloud on the command line.To install python-novaclient, download the tarball from
http://pypi.python.org/pypi/python-novaclient/2.6.3#downloads and
then install it in your favorite python environment.$curl -O http://pypi.python.org/packages/source/p/python-novaclient/python-novaclient-2.6.3.tar.gz$tar -zxvf python-novaclient-2.6.3.tar.gz$cd python-novaclient-2.6.3As root execute:#python setup.py installConfirm the installation by running:$nova helpusage: nova [--version] [--debug] [--os-cache] [--timings]
[--timeout <seconds>] [--os-username <auth-user-name>]
[--os-password <auth-password>]
[--os-tenant-name <auth-tenant-name>]
[--os-tenant-id <auth-tenant-id>] [--os-auth-url <auth-url>]
[--os-region-name <region-name>] [--os-auth-system <auth-system>]
[--service-type <service-type>] [--service-name <service-name>]
[--volume-service-name <volume-service-name>]
[--endpoint-type <endpoint-type>]
[--os-compute-api-version <compute-api-ver>]
[--os-cacert <ca-certificate>] [--insecure]
[--bypass-url <bypass-url>]
<subcommand> ...This command returns a list of nova commands and parameters. To obtain help
for a subcommand, run:$nova help subcommandYou can also refer to the
OpenStack Command-Line Reference
for a complete listing of nova
commands and parameters.Set the required parameters as environment variables to make running
commands easier. For example, you can add --os-username
as a nova option, or set it as an environment variable. To
set the user name, password, and tenant as environment variables, use:$export OS_USERNAME=joecool$export OS_PASSWORD=coolword$export OS_TENANT_NAME=cooluUsing the Identity Service, you are supplied with an authentication
endpoint, which Compute recognizes as the OS_AUTH_URL.$export OS_AUTH_URL=http://hostname:5000/v2.0$export NOVA_VERSION=1.1Use the euca2ools commandsFor a command-line interface to EC2 API calls, use the
euca2ools command-line tool. See http://open.eucalyptus.com/wiki/Euca2oolsGuide_v1.3Manage logsLogging moduleTo specify a configuration file to change the logging behavior, add this line to
the /etc/nova/nova.conf file . To change the logging level,
such as DEBUG, INFO,
WARNING, ERROR), use:
log-config=/etc/nova/logging.confThe logging configuration file is an ini-style configuration file, which must
contain a section called logger_nova, which controls the behavior
of the logging facility in the nova-* services. For
example:[logger_nova]
level = INFO
handlers = stderr
qualname = novaThis example sets the debugging level to INFO (which less
verbose than the default DEBUG setting). For more details on the logging configuration syntax, including the
meaning of the handlers and
quaname variables, see the Python documentation on logging configuration file format
f.For an example logging.conf file with various
defined handlers, see the
OpenStack Configuration Reference.SyslogYou can configure OpenStack Compute services to send logging information to
syslog. This is useful if you want to use
rsyslog, which forwards the logs to a remote machine.
You need to separately configure the Compute service (nova), the Identity service
(keystone), the Image Service (glance), and, if you are using it, the Block Storage
service (cinder) to send log messages to syslog. To do so,
add the following lines to:/etc/nova/nova.conf/etc/keystone/keystone.conf/etc/glance/glance-api.conf/etc/glance/glance-registry.conf/etc/cinder/cinder.confverbose = False
debug = False
use_syslog = True
syslog_log_facility = LOG_LOCAL0In addition to enabling syslog, these settings also
turn off more verbose output and debugging output from the log.Although the example above uses the same local facility for each service
(LOG_LOCAL0, which corresponds to
syslog facility LOCAL0), we
recommend that you configure a separate local facility for each service, as
this provides better isolation and more flexibility. For example, you may
want to capture logging information at different severity levels for
different services. syslog allows you to define up
to seven local facilities, LOCAL0, LOCAL1, ..., LOCAL7.
For more details, see the syslog
documentation.Rsyslogrsyslog is a useful tool for setting up a centralized
log server across multiple machines. We briefly describe the configuration to set up
an rsyslog server; a full treatment of
rsyslog is beyond the scope of this document. We assume
rsyslog has already been installed on your hosts
(default for most Linux distributions).This example provides a minimal configuration for
/etc/rsyslog.conf on the log server host, which receives
the log files:# provides TCP syslog reception
$ModLoad imtcp
$InputTCPServerRun 1024Add a filter rule to /etc/rsyslog.conf which looks for a
host name. The example below uses compute-01 as an
example of a compute host name::hostname, isequal, "compute-01" /mnt/rsyslog/logs/compute-01.logOn each compute host, create a file named
/etc/rsyslog.d/60-nova.conf, with the following
content:# prevent debug from dnsmasq with the daemon.none parameter
*.*;auth,authpriv.none,daemon.none,local0.none -/var/log/syslog
# Specify a log level of ERROR
local0.error @@172.20.1.43:1024Once you have created this file, restart your rsyslog
daemon. Error-level log messages on the compute hosts should now be sent to your log
server.Migrate instancesBefore starting migrations, review the Configure migrations section.Migration provides a scheme to migrate running
instances from one OpenStack Compute server to another
OpenStack Compute server.To migrate instancesLook at the running instances, to get the ID
of the instance you wish to migrate.$nova listLook at information associated with that instance. This example uses 'vm1'
from above.$nova show d1df1b5a-70c4-4fed-98b7-423362f2c47cIn this example, vm1 is running on HostB.Select the server to which instances will be migrated:#nova service-list+------------------+------------+----------+---------+-------+----------------------------+-----------------+
| Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+------------------+------------+----------+---------+-------+----------------------------+-----------------+
| nova-consoleauth | HostA | internal | enabled | up | 2014-03-25T10:33:25.000000 | - |
| nova-scheduler | HostA | internal | enabled | up | 2014-03-25T10:33:25.000000 | - |
| nova-conductor | HostA | internal | enabled | up | 2014-03-25T10:33:27.000000 | - |
| nova-compute | HostB | nova | enabled | up | 2014-03-25T10:33:31.000000 | - |
| nova-compute | HostC | nova | enabled | up | 2014-03-25T10:33:31.000000 | - |
| nova-cert | HostA | internal | enabled | up | 2014-03-25T10:33:31.000000 | - |
+------------------+-----------------------+----------+---------+-------+----------------------------+-----------------+In this example, HostC can be picked up
because nova-compute
is running on it.Ensure that HostC has enough resources for
migration.#nova host-describe HostC+-----------+------------+-----+-----------+---------+
| HOST | PROJECT | cpu | memory_mb | disk_gb |
+-----------+------------+-----+-----------+---------+
| HostC | (total) | 16 | 32232 | 878 |
| HostC | (used_now) | 13 | 21284 | 442 |
| HostC | (used_max) | 13 | 21284 | 442 |
| HostC | p1 | 13 | 21284 | 442 |
| HostC | p2 | 13 | 21284 | 442 |
+-----------+------------+-----+-----------+---------+cpu:the number of
cpumemory_mb:total amount of memory
(in MB)disk_gb:total amount of space for
NOVA-INST-DIR/instances (in GB)1st line shows total amount of
resources for the physical server.2nd line shows currently used
resources.3rd line shows maximum used
resources.4th line and
under shows the resource
for each project.Use the nova live-migration command to migrate the
instances:$nova live-migration serverhost_nameWhere server can be either the server's ID or name.
For example:$nova live-migration d1df1b5a-70c4-4fed-98b7-423362f2c47c HostCEnsure instances are migrated successfully with nova
list. If instances are still running on HostB, check log files
(src/dest nova-compute and nova-scheduler) to determine why. Although the nova command is called
live-migration, under the default Compute
configuration options the instances are suspended before
migration.For more details, see Configure migrations in OpenStack Configuration
Reference.Recover from a failed compute nodeIf you have deployed Compute with a shared file
system, you can quickly recover from a failed compute
node. Of the two methods covered in these sections,
the evacuate API is the preferred method even in the
absence of shared storage. The evacuate API provides
many benefits over manual recovery, such as
re-attachment of volumes and floating IPs.Manual recoveryFor KVM/libvirt compute node recovery, see the previous section. Use the
following procedure for all other hypervisors.To work with host informationIdentify the VMs on the affected hosts, using tools such as a
combination of nova list and nova show
or euca-describe-instances. Here's an example using the
EC2 API - instance i-000015b9 that is running on node np-rcc54:i-000015b9 at3-ui02 running nectarkey (376, np-rcc54) 0 m1.xxlarge 2012-06-19T00:48:11.000Z 115.146.93.60You can review the status of the host by using the Compute database.
Some of the important information is highlighted below. This example
converts an EC2 API instance ID into an OpenStack ID; if you used the
nova commands, you can substitute the ID directly.
You can find the credentials for your database in
/etc/nova.conf.SELECT * FROM instances WHERE id = CONV('15b9', 16, 10) \G;
*************************** 1. row ***************************
created_at: 2012-06-19 00:48:11
updated_at: 2012-07-03 00:35:11
deleted_at: NULL
...
id: 5561
...
power_state: 5
vm_state: shutoff
...
hostname: at3-ui02
host: np-rcc54
...
uuid: 3f57699a-e773-4650-a443-b4b37eed5a06
...
task_state: NULL
...To recover the VMWhen you know the status of the VM on the failed host, determine to
which compute host the affected VM should be moved. For example, run the
following database command to move the VM to np-rcc46:UPDATE instances SET host = 'np-rcc46' WHERE uuid = '3f57699a-e773-4650-a443-b4b37eed5a06'; If using a hypervisor that relies on libvirt (such as KVM), it is a
good idea to update the libvirt.xml file (found in
/var/lib/nova/instances/[instance ID]). The important
changes to make are:Change the DHCPSERVER value to the host IP
address of the compute host that is now the VM's new
home.Update the VNC IP if it isn't already to:
0.0.0.0.Reboot the VM:$nova reboot --hard 3f57699a-e773-4650-a443-b4b37eed5a06In theory, the above database update and nova
reboot command are all that is required to recover a VM from a
failed host. However, if further problems occur, consider looking at
recreating the network filter configuration using virsh,
restarting the Compute services or updating the vm_state
and power_state in the Compute database.Recover from a UID/GID mismatchWhen running OpenStack compute, using a shared file
system or an automated configuration tool, you could
encounter a situation where some files on your compute
node are using the wrong UID or GID. This causes a
raft of errors, such as being unable to live migrate,
or start virtual machines.The following procedure runs on nova-compute hosts, based on the KVM hypervisor, and could help to
restore the situation:To recover from a UID/GID mismatchEnsure you don't use numbers that are already used for some other
user/group.Set the nova uid in /etc/passwd to the same number in
all hosts (for example, 112).Set the libvirt-qemu uid in
/etc/passwd to the
same number in all hosts (for example,
119).Set the nova group in
/etc/group file to
the same number in all hosts (for example,
120).Set the libvirtd group in
/etc/group file to
the same number in all hosts (for example,
119).Stop the services on the compute
node.Change all the files owned by user nova or
by group nova. For example:find / -uid 108 -exec chown nova {} \; # note the 108 here is the old nova uid before the change
find / -gid 120 -exec chgrp nova {} \;Repeat the steps for the libvirt-qemu owned files if those needed to
change.Restart the services.Now you can run the find
command to verify that all files using the
correct identifiers.Compute disaster recovery processUse the following procedures to manage your cloud after a disaster, and to easily
back up its persistent storage volumes. Backups are
mandatory, even outside of disaster scenarios.For a DRP definition, see http://en.wikipedia.org/wiki/Disaster_Recovery_Plan.A- The disaster recovery process
presentationA disaster could happen to several components of
your architecture: a disk crash, a network loss, a
power cut, and so on. In this example, assume the
following set up:A cloud controller (nova-api,
nova-objecstore,
nova-network)A compute node (nova-compute)A Storage Area Network used by
cinder-volumes (aka
SAN)The disaster example is the worst one: a power
loss. That power loss applies to the three
components. Let's see what
runs and how it runs before the
crash:From the SAN to the cloud controller, we
have an active iscsi session (used for the
"cinder-volumes" LVM's VG).From the cloud controller to the compute node, we also have active
iscsi sessions (managed by cinder-volume).For every volume, an iscsi session is made (so 14 ebs volumes equals
14 sessions).From the cloud controller to the compute node, we also have iptables/
ebtables rules which allow access from the cloud controller to the running
instance.And at least, from the cloud controller to the compute node; saved
into database, the current state of the instances (in that case "running" ),
and their volumes attachment (mount point, volume ID, volume status, and so
on.)Now, after the power loss occurs and all
hardware components restart, the situation is as
follows:From the SAN to the cloud, the ISCSI
session no longer exists.From the cloud controller to the compute
node, the ISCSI sessions no longer exist.
From the cloud controller to the compute node, the iptables and
ebtables are recreated, since, at boot,
nova-network reapplies the
configurations.From the cloud controller, instances are in a shutdown state (because
they are no longer running)In the database, data was not updated at all, since Compute could not
have anticipated the crash.Before going further, and to prevent the administrator from making fatal
mistakes, the instances won't be lost, because
no "destroy" or "terminate" command was invoked, so the files for the instances remain
on the compute node.Perform these tasks in this exact order. Any extra
step would be dangerous at this stage :Get the current relation from a
volume to its instance, so that you
can recreate the attachment.Update the database to clean the
stalled state. (After that, you cannot
perform the first step).Restart the instances. In other
words, go from a shutdown to running
state.After the restart, reattach the volumes to their respective
instances (optional).SSH into the instances to reboot them.B - Disaster recoveryTo perform disaster recoveryGet the instance-to-volume
relationshipYou must get the current relationship from a volume to its instance,
because you will re-create the attachment.You can find this relationship by running nova
volume-list. Note that the nova client
includes the ability to get volume information from Block Storage.Update the databaseUpdate the database to clean the stalled state. You must restore for
every volume, using these queries to clean up the database:mysql>use cinder;mysql>update volumes set mountpoint=NULL;mysql>update volumes set status="available" where status <>"error_deleting";mysql>update volumes set attach_status="detached";mysql>update volumes set instance_id=0;Then, when you run nova volume-list commands, all
volumes appear in the listing.Restart instancesRestart the instances using the nova reboot
$instance command.At this stage, depending on your image, some instances completely
reboot and become reachable, while others stop on the "plymouth"
stage.DO NOT reboot a second timeDo not reboot instances that are stopped at this point. Instance state
depends on whether you added an /etc/fstab entry for
that volume. Images built with the cloud-init package
remain in a pending state, while others skip the missing volume and start.
The idea of that stage is only to ask nova to reboot every instance, so the
stored state is preserved. For more information about
cloud-init, see help.ubuntu.com/community/CloudInit.Reattach volumesAfter the restart, you can reattach the volumes to their respective
instances. Now that nova has restored the right status,
it is time to perform the attachments through a nova
volume-attachThis simple snippet uses the created
file:#!/bin/bash
while read line; do
volume=`echo $line | $CUT -f 1 -d " "`
instance=`echo $line | $CUT -f 2 -d " "`
mount_point=`echo $line | $CUT -f 3 -d " "`
echo "ATTACHING VOLUME FOR INSTANCE - $instance"
nova volume-attach $instance $volume $mount_point
sleep 2
done < $volumes_tmp_fileAt that stage, instances that were
pending on the boot sequence (plymouth)
automatically continue their boot, and
restart normally, while the ones that
booted see the volume.SSH into instancesIf some services depend on the volume, or if a volume has an entry
into fstab, it could be good to simply restart the
instance. This restart needs to be made from the instance itself, not
through nova. So, we SSH into the instance and perform a
reboot:#shutdown -r nowBy completing this procedure, you can
successfully recover your cloud.Follow these guidelines:Use the errors=remount parameter in the
fstab file, which prevents data
corruption.The system locks any write to the disk if it detects an I/O error.
This configuration option should be added into the cinder-volume server (the one which
performs the ISCSI connection to the SAN), but also into the instances'
fstab file.Do not add the entry for the SAN's disks to the cinder-volume's
fstab file.Some systems hang on that step, which means you could lose access to
your cloud-controller. To re-run the session manually, you would run the
following command before performing the mount:
#iscsiadm -m discovery -t st -p $SAN_IP $ iscsiadm -m node --target-name $IQN -p $SAN_IP -lFor your instances, if you have the whole /home/
directory on the disk, instead of emptying the
/home directory and map the disk on it, leave a
user's directory with the user's bash files and the
authorized_keys file.This enables you to connect to the instance, even without the volume
attached, if you allow only connections through public keys.C - Scripted DRPTo use scripted DRPYou can download from here a bash script which performs
these steps:The "test mode" allows you to perform
that whole sequence for only one
instance.To reproduce the power loss, connect to
the compute node which runs that same
instance and close the iscsi session.
Do not
detach the volume through
nova
volume-detach,
but instead manually close the iscsi
session.In this example, the iscsi session is
number 15 for that instance:#iscsiadm -m session -u -r 15Do not forget the -r
flag. Otherwise, you close ALL
sessions.