openstack-manuals/doc/src/docbkx/common/support.xml

140 lines
15 KiB
XML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<?xml version="1.0" encoding="UTF-8"?>
<chapter xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0" xml:id="ch_support-and-troubleshooting">
<title>Support and Troubleshooting</title>
<para>Online resources aid in supporting OpenStack and the community members are willing and able to answer questions and help with bug suspicions. We are constantly improving and adding to the main features of OpenStack, but if you have any problems, do not hesitate to ask. Here are some ideas for supporting OpenStack and troubleshooting your existing installations.</para>
<section xml:id="community-support">
<title>Community Support</title> <para>Here are some places you can locate others who want to help.</para>
<simplesect><title>The Launchpad Answers area</title>
<para>During setup or testing, you may have questions about how to do something, or end up in
a situation where you can't seem to get a feature to work correctly. One place to
look for help is the Answers section on Launchpad. Launchpad is the "home" for the
project code and its developers and thus is a natural place to ask about the
project. When visiting the Answers section, it is usually good to at least scan over
recently asked questions to see if your question has already been answered. If that
is not the case, then proceed to adding a new question. Be sure you give a clear,
concise summary in the title and provide as much detail as possible in the
description. Paste in your command output or stack traces, link to screenshots, and
so on. The Launchpad Answers areas are available here - OpenStack Compute: <link
xlink:href="https://answers.launchpad.net/nova"
>https://answers.launchpad.net/nova</link> OpenStack Object Storage: <link
xlink:href="https://answers.launchpad.net/swift"
>https://answers.launchpad.net/swift</link>. </para></simplesect>
<simplesect><title>OpenStack mailing list</title>
<para>Posting your question or scenario to the OpenStack mailing list is a great way to get
answers and insights. You can learn from and help others who may have the same
scenario as you. Go to <link xlink:href="https://launchpad.net/~openstack"
>https://launchpad.net/~openstack</link> and click "Subscribe to mailing list"
or view the archives at <link xlink:href="https://lists.launchpad.net/openstack/"
>https://lists.launchpad.net/openstack/</link>.</para></simplesect><simplesect>
<title>The OpenStack Wiki search </title>
<para>The <link xlink:href="http://wiki.openstack.org/">OpenStack wiki</link> contains content
on a broad range of topics, but some of it sits a bit below the surface. Fortunately, the wiki
search feature is very powerful in that it can do both searches by title and by content. If
you are searching for specific information, say about "networking" or "api" for nova, you can
find lots of content using the search feature. More is being added all the time, so be sure to
check back often. You can find the search box in the upper right hand corner of any OpenStack wiki
page. </para></simplesect>
<simplesect><title>The Launchpad Bugs area </title>
<para>So you think you've found a bug. That's great! Seriously, it is. The OpenStack community
values your setup and testing efforts and wants your feedback. To log a bug you must
have a Launchpad account, so sign up at https://launchpad.net/+login if you do not
already have a Launchpad ID. You can view existing bugs and report your bug in the
Launchpad Bugs area. It is suggested that you first use the search facility to see
if the bug you found has already been reported (or even better, already fixed). If
it still seems like your bug is new or unreported then it is time to fill out a bug
report. </para>
<para>Some tips: </para>
<itemizedlist><listitem><para>Give a clear, concise summary! </para></listitem>
<listitem><para>Provide as much detail as possible
in the description. Paste in your command output or stack traces, link to
screenshots, etc. </para></listitem>
<listitem><para>Be sure to include what version of the software you are using.
This is especially critical if you are using a development branch eg. "Austin
release" vs lp:nova rev.396. </para></listitem>
<listitem><para>Any deployment specific info is helpful as well. eg.
Ubuntu 10.04, multi-node install.</para></listitem> </itemizedlist>
<para>The Launchpad Bugs areas are available here - OpenStack Compute: <link
xlink:href="https://bugs.launchpad.net/nova"
>https://bugs.launchpad.net/nova</link> OpenStack Object Storage: <link
xlink:href="https://bugs.launchpad.net/swift"
>https://bugs.launchpad.net/swift</link>
</para></simplesect>
<simplesect><title>The OpenStack IRC channel </title>
<para>The OpenStack community lives and breathes in the #openstack IRC channel on the
Freenode network. You can come by to hang out, ask questions, or get immediate
feedback for urgent and pressing issues. To get into the IRC channel you need to
install an IRC client or use a browser-based client by going to
http://webchat.freenode.net/. You can also use Colloquy (Mac OS X,
http://colloquy.info/) or mIRC (Windows, http://www.mirc.com/) or XChat (Linux).
When you are in the IRC channel and want to share code or command output, the
generally accepted method is to use a Paste Bin, the OpenStack project has one at
http://paste.openstack.org. Just paste your longer amounts of text or logs in the
web form and you get a URL you can then paste into the channel. The OpenStack IRC
channel is: #openstack on irc.freenode.net. </para></simplesect>
</section>
<section xml:id="troubleshooting-openstack-object-storage"><title>Troubleshooting OpenStack Object Storage</title>
<para>For OpenStack Object Storage, everything is logged in /var/log/syslog (or messages on some distros). Several settings enable further customization of logging, such as log_name, log_facility, and log_level, within the object server configuration files.</para>
<section xml:id="handling-drive-failure">
<title>Handling Drive Failure</title>
<para> In the event that a drive has failed, the first step is to make sure the drive is unmounted. This will make it easier for OpenStack Object Storage to work around the failure until it has been resolved. If the drive is going to be replaced immediately, then it is just best to replace the drive, format it, remount it, and let replication fill it up.</para>
<para>If the drive cant be replaced immediately, then it is best to leave it unmounted, and remove the drive from the ring. This will allow all the replicas that were on that drive to be replicated elsewhere until the drive is replaced. Once the drive is replaced, it can be re-added to the ring.</para></section>
<section xml:id="handling-server-failure">
<title>Handling Server Failure</title>
<para>If a server is having hardware issues, it is a good idea to make sure the OpenStack Object Storage services are not running. This will allow OpenStack Object Storage to work around the failure while you troubleshoot.</para>
<para>If the server just needs a reboot, or a small amount of work that should only last a couple of hours, then it is probably best to let OpenStack Object Storage work around the failure and get the machine fixed and back online. When the machine comes back online, replication will make sure that anything that is missing during the downtime will get updated.</para>
<para>If the server has more serious issues, then it is probably best to remove all of the servers devices from the ring. Once the server has been repaired and is back online, the servers devices can be added back into the ring. It is important that the devices are reformatted before putting them back into the ring as it is likely to be responsible for a different set of partitions than before.</para>
</section>
<section xml:id="detecting-failed-drives">
<title>Detecting Failed Drives</title>
<para>It has been our experience that when a drive is about to fail, error messages will spew into /var/log/kern.log. There is a script called swift-drive-audit that can be run via cron to watch for bad drives. If errors are detected, it will unmount the bad drive, so that OpenStack Object Storage can work around it. The script takes a configuration file with the following settings:
</para>
<programlisting>
[drive-audit]
Option Default Description
log_facility LOG_LOCAL0 Syslog log facility
log_level INFO Log level
device_dir /srv/node Directory devices are mounted under
minutes 60 Number of minutes to look back in /var/log/kern.log
error_limit 1 Number of errors to find before a device is unmounted
</programlisting>
<para>This script has only been tested on Ubuntu 10.04, so if you are using a different distro or OS, some care should be taken before using in production.
</para></section></section>
<section xml:id="troubleshooting-openstack-compute"><title>Troubleshooting OpenStack Compute</title>
<para>Common problems for Compute typically involve misconfigured networking or credentials that are not sourced properly in the environment. Also, most flat networking configurations do not enable ping or ssh from a compute node to the instances running on that node. Another common problem is trying to run 32-bit images on a 64-bit compute node. This section offers more information about how to troubleshoot Compute.</para>
<section xml:id="log-files-for-openstack-compute"><title>Log files for OpenStack Compute</title><para>Log files are stored in /var/log/nova and there is a log file for each service, for example nova-compute.log. You can format the log strings using flags for the nova.log module. The flags used to set format strings are: logging_context_format_string and logging_default_format_string. If the log level is set to debug, you can also specify logging_debug_format_suffix to append extra formatting. For information about what variables are available for the formatter see: http://docs.python.org/library/logging.html#formatter </para>
<para>You have two options for logging for OpenStack Compute based on configuration settings. In nova.conf, include the --logfile flag to enable logging. Alternatively you can set --use_syslog=1, and then the nova daemon logs to syslog.</para></section>
<section xml:id="common-errors-and-fixes-for-openstack-compute">
<title>Common Errors and Fixes for OpenStack Compute</title>
<para>The Launchpad Answers site offers a place to ask and answer questions, and you can also mark questions as frequently asked questions. This section describes some errors people have posted to Launchpad Answers and IRC. We are constantly fixing bugs, so online resources are a great way to get the most up-to-date errors and fixes.</para>
<para>Credential errors, 401, 403 forbidden errors</para>
<para>A 403 forbidden error is caused by missing credentials. Through current installation methods, there are basically two ways to get the novarc file. The manual method requires getting it from within a project zipfile, and the scripted method just generates novarc out of the project zip file and sources it for you. If you do the manual method through a zip file, then the following novarc alone, you end up losing the creds that are tied to the user you created with nova-manage in the steps before.</para>
<para>When you run nova-api the first time, it generates the certificate authority information, including openssl.cnf. If it gets started out of order, you may not be able to create your zip file. Once your CA information is available, you should be able to go back to nova-manage to create your zipfile. </para><para>You may also need to check your proxy settings to see if they are causing problems with the novarc creation.</para>
<para>Instance errors</para>
<para>Sometimes a particular instance shows "pending" or you cannot SSH to it. Sometimes the image itself is the problem. For example, when using flat manager networking, you do not have a dhcp server, and an ami-tiny image doesn't support interface injection so you cannot connect to it. The fix for this type of problem is to use an Ubuntu image, which should obtain an IP address correctly with FlatManager network settings. To troubleshoot other possible problems with an instance, such as one that stays in a spawning state, first check your instances directory for i-ze0bnh1q dir to make sure it has the following files:</para>
<itemizedlist><listitem><para>libvirt.xml</para></listitem>
<listitem><para>disk</para></listitem>
<listitem><para>disk-raw</para></listitem>
<listitem><para>kernel</para></listitem>
<listitem><para>ramdisk</para></listitem>
<listitem><para>console.log (Once the instance actually starts you should see a console.log.)</para></listitem>
</itemizedlist>
<para>Check the file sizes to see if they are reasonable. If any are missing/zero/very small then nova-compute has somehow not completed download of the images from objectstore. </para>
<para>Also check nova-compute.log for exceptions. Sometimes they don't show up in the
console output. </para><para>
Next, check the /var/log/libvirt/qemu/i-ze0bnh1q.log file to see if it exists and has any useful error messages in it.</para>
<para>Finally, from the instances/i-ze0bnh1q directory, try <code>virsh create libvirt.xml</code> and see if you get an error there.</para>
</section></section>
</chapter>