140 lines
15 KiB
XML
140 lines
15 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<chapter xmlns="http://docbook.org/ns/docbook"
|
||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||
xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0" xml:id="ch_support-and-troubleshooting">
|
||
|
||
<title>Support and Troubleshooting</title>
|
||
<para>Online resources aid in supporting OpenStack and the community members are willing and able to answer questions and help with bug suspicions. We are constantly improving and adding to the main features of OpenStack, but if you have any problems, do not hesitate to ask. Here are some ideas for supporting OpenStack and troubleshooting your existing installations.</para>
|
||
<section xml:id="community-support">
|
||
<title>Community Support</title> <para>Here are some places you can locate others who want to help.</para>
|
||
<simplesect><title>The Launchpad Answers area</title>
|
||
<para>During setup or testing, you may have questions about how to do something, or end up in
|
||
a situation where you can't seem to get a feature to work correctly. One place to
|
||
look for help is the Answers section on Launchpad. Launchpad is the "home" for the
|
||
project code and its developers and thus is a natural place to ask about the
|
||
project. When visiting the Answers section, it is usually good to at least scan over
|
||
recently asked questions to see if your question has already been answered. If that
|
||
is not the case, then proceed to adding a new question. Be sure you give a clear,
|
||
concise summary in the title and provide as much detail as possible in the
|
||
description. Paste in your command output or stack traces, link to screenshots, and
|
||
so on. The Launchpad Answers areas are available here - OpenStack Compute: <link
|
||
xlink:href="https://answers.launchpad.net/nova"
|
||
>https://answers.launchpad.net/nova</link> OpenStack Object Storage: <link
|
||
xlink:href="https://answers.launchpad.net/swift"
|
||
>https://answers.launchpad.net/swift</link>. </para></simplesect>
|
||
<simplesect><title>OpenStack mailing list</title>
|
||
<para>Posting your question or scenario to the OpenStack mailing list is a great way to get
|
||
answers and insights. You can learn from and help others who may have the same
|
||
scenario as you. Go to <link xlink:href="https://launchpad.net/~openstack"
|
||
>https://launchpad.net/~openstack</link> and click "Subscribe to mailing list"
|
||
or view the archives at <link xlink:href="https://lists.launchpad.net/openstack/"
|
||
>https://lists.launchpad.net/openstack/</link>.</para></simplesect><simplesect>
|
||
<title>The OpenStack Wiki search </title>
|
||
<para>The <link xlink:href="http://wiki.openstack.org/">OpenStack wiki</link> contains content
|
||
on a broad range of topics, but some of it sits a bit below the surface. Fortunately, the wiki
|
||
search feature is very powerful in that it can do both searches by title and by content. If
|
||
you are searching for specific information, say about "networking" or "api" for nova, you can
|
||
find lots of content using the search feature. More is being added all the time, so be sure to
|
||
check back often. You can find the search box in the upper right hand corner of any OpenStack wiki
|
||
page. </para></simplesect>
|
||
<simplesect><title>The Launchpad Bugs area </title>
|
||
<para>So you think you've found a bug. That's great! Seriously, it is. The OpenStack community
|
||
values your setup and testing efforts and wants your feedback. To log a bug you must
|
||
have a Launchpad account, so sign up at https://launchpad.net/+login if you do not
|
||
already have a Launchpad ID. You can view existing bugs and report your bug in the
|
||
Launchpad Bugs area. It is suggested that you first use the search facility to see
|
||
if the bug you found has already been reported (or even better, already fixed). If
|
||
it still seems like your bug is new or unreported then it is time to fill out a bug
|
||
report. </para>
|
||
<para>Some tips: </para>
|
||
<itemizedlist><listitem><para>Give a clear, concise summary! </para></listitem>
|
||
<listitem><para>Provide as much detail as possible
|
||
in the description. Paste in your command output or stack traces, link to
|
||
screenshots, etc. </para></listitem>
|
||
<listitem><para>Be sure to include what version of the software you are using.
|
||
This is especially critical if you are using a development branch eg. "Austin
|
||
release" vs lp:nova rev.396. </para></listitem>
|
||
<listitem><para>Any deployment specific info is helpful as well. eg.
|
||
Ubuntu 10.04, multi-node install.</para></listitem> </itemizedlist>
|
||
|
||
<para>The Launchpad Bugs areas are available here - OpenStack Compute: <link
|
||
xlink:href="https://bugs.launchpad.net/nova"
|
||
>https://bugs.launchpad.net/nova</link> OpenStack Object Storage: <link
|
||
xlink:href="https://bugs.launchpad.net/swift"
|
||
>https://bugs.launchpad.net/swift</link>
|
||
</para></simplesect>
|
||
<simplesect><title>The OpenStack IRC channel </title>
|
||
<para>The OpenStack community lives and breathes in the #openstack IRC channel on the
|
||
Freenode network. You can come by to hang out, ask questions, or get immediate
|
||
feedback for urgent and pressing issues. To get into the IRC channel you need to
|
||
install an IRC client or use a browser-based client by going to
|
||
http://webchat.freenode.net/. You can also use Colloquy (Mac OS X,
|
||
http://colloquy.info/) or mIRC (Windows, http://www.mirc.com/) or XChat (Linux).
|
||
When you are in the IRC channel and want to share code or command output, the
|
||
generally accepted method is to use a Paste Bin, the OpenStack project has one at
|
||
http://paste.openstack.org. Just paste your longer amounts of text or logs in the
|
||
web form and you get a URL you can then paste into the channel. The OpenStack IRC
|
||
channel is: #openstack on irc.freenode.net. </para></simplesect>
|
||
</section>
|
||
<section xml:id="troubleshooting-openstack-object-storage"><title>Troubleshooting OpenStack Object Storage</title>
|
||
<para>For OpenStack Object Storage, everything is logged in /var/log/syslog (or messages on some distros). Several settings enable further customization of logging, such as log_name, log_facility, and log_level, within the object server configuration files.</para>
|
||
<section xml:id="handling-drive-failure">
|
||
<title>Handling Drive Failure</title>
|
||
<para> In the event that a drive has failed, the first step is to make sure the drive is unmounted. This will make it easier for OpenStack Object Storage to work around the failure until it has been resolved. If the drive is going to be replaced immediately, then it is just best to replace the drive, format it, remount it, and let replication fill it up.</para>
|
||
<para>If the drive can’t be replaced immediately, then it is best to leave it unmounted, and remove the drive from the ring. This will allow all the replicas that were on that drive to be replicated elsewhere until the drive is replaced. Once the drive is replaced, it can be re-added to the ring.</para></section>
|
||
<section xml:id="handling-server-failure">
|
||
|
||
<title>Handling Server Failure</title>
|
||
|
||
<para>If a server is having hardware issues, it is a good idea to make sure the OpenStack Object Storage services are not running. This will allow OpenStack Object Storage to work around the failure while you troubleshoot.</para>
|
||
|
||
<para>If the server just needs a reboot, or a small amount of work that should only last a couple of hours, then it is probably best to let OpenStack Object Storage work around the failure and get the machine fixed and back online. When the machine comes back online, replication will make sure that anything that is missing during the downtime will get updated.</para>
|
||
|
||
<para>If the server has more serious issues, then it is probably best to remove all of the server’s devices from the ring. Once the server has been repaired and is back online, the server’s devices can be added back into the ring. It is important that the devices are reformatted before putting them back into the ring as it is likely to be responsible for a different set of partitions than before.</para>
|
||
</section>
|
||
<section xml:id="detecting-failed-drives">
|
||
<title>Detecting Failed Drives</title>
|
||
|
||
<para>It has been our experience that when a drive is about to fail, error messages will spew into /var/log/kern.log. There is a script called swift-drive-audit that can be run via cron to watch for bad drives. If errors are detected, it will unmount the bad drive, so that OpenStack Object Storage can work around it. The script takes a configuration file with the following settings:
|
||
</para>
|
||
<programlisting>
|
||
[drive-audit]
|
||
Option Default Description
|
||
log_facility LOG_LOCAL0 Syslog log facility
|
||
log_level INFO Log level
|
||
device_dir /srv/node Directory devices are mounted under
|
||
minutes 60 Number of minutes to look back in /var/log/kern.log
|
||
error_limit 1 Number of errors to find before a device is unmounted
|
||
</programlisting>
|
||
<para>This script has only been tested on Ubuntu 10.04, so if you are using a different distro or OS, some care should be taken before using in production.
|
||
</para></section></section>
|
||
|
||
<section xml:id="troubleshooting-openstack-compute"><title>Troubleshooting OpenStack Compute</title>
|
||
<para>Common problems for Compute typically involve misconfigured networking or credentials that are not sourced properly in the environment. Also, most flat networking configurations do not enable ping or ssh from a compute node to the instances running on that node. Another common problem is trying to run 32-bit images on a 64-bit compute node. This section offers more information about how to troubleshoot Compute.</para>
|
||
<section xml:id="log-files-for-openstack-compute"><title>Log files for OpenStack Compute</title><para>Log files are stored in /var/log/nova and there is a log file for each service, for example nova-compute.log. You can format the log strings using flags for the nova.log module. The flags used to set format strings are: logging_context_format_string and logging_default_format_string. If the log level is set to debug, you can also specify logging_debug_format_suffix to append extra formatting. For information about what variables are available for the formatter see: http://docs.python.org/library/logging.html#formatter </para>
|
||
<para>You have two options for logging for OpenStack Compute based on configuration settings. In nova.conf, include the --logfile flag to enable logging. Alternatively you can set --use_syslog=1, and then the nova daemon logs to syslog.</para></section>
|
||
<section xml:id="common-errors-and-fixes-for-openstack-compute">
|
||
<title>Common Errors and Fixes for OpenStack Compute</title>
|
||
<para>The Launchpad Answers site offers a place to ask and answer questions, and you can also mark questions as frequently asked questions. This section describes some errors people have posted to Launchpad Answers and IRC. We are constantly fixing bugs, so online resources are a great way to get the most up-to-date errors and fixes.</para>
|
||
<para>Credential errors, 401, 403 forbidden errors</para>
|
||
<para>A 403 forbidden error is caused by missing credentials. Through current installation methods, there are basically two ways to get the novarc file. The manual method requires getting it from within a project zipfile, and the scripted method just generates novarc out of the project zip file and sources it for you. If you do the manual method through a zip file, then the following novarc alone, you end up losing the creds that are tied to the user you created with nova-manage in the steps before.</para>
|
||
<para>When you run nova-api the first time, it generates the certificate authority information, including openssl.cnf. If it gets started out of order, you may not be able to create your zip file. Once your CA information is available, you should be able to go back to nova-manage to create your zipfile. </para><para>You may also need to check your proxy settings to see if they are causing problems with the novarc creation.</para>
|
||
<para>Instance errors</para>
|
||
<para>Sometimes a particular instance shows "pending" or you cannot SSH to it. Sometimes the image itself is the problem. For example, when using flat manager networking, you do not have a dhcp server, and an ami-tiny image doesn't support interface injection so you cannot connect to it. The fix for this type of problem is to use an Ubuntu image, which should obtain an IP address correctly with FlatManager network settings. To troubleshoot other possible problems with an instance, such as one that stays in a spawning state, first check your instances directory for i-ze0bnh1q dir to make sure it has the following files:</para>
|
||
<itemizedlist><listitem><para>libvirt.xml</para></listitem>
|
||
<listitem><para>disk</para></listitem>
|
||
<listitem><para>disk-raw</para></listitem>
|
||
<listitem><para>kernel</para></listitem>
|
||
<listitem><para>ramdisk</para></listitem>
|
||
<listitem><para>console.log (Once the instance actually starts you should see a console.log.)</para></listitem>
|
||
</itemizedlist>
|
||
<para>Check the file sizes to see if they are reasonable. If any are missing/zero/very small then nova-compute has somehow not completed download of the images from objectstore. </para>
|
||
<para>Also check nova-compute.log for exceptions. Sometimes they don't show up in the
|
||
console output. </para><para>
|
||
Next, check the /var/log/libvirt/qemu/i-ze0bnh1q.log file to see if it exists and has any useful error messages in it.</para>
|
||
|
||
<para>Finally, from the instances/i-ze0bnh1q directory, try <code>virsh create libvirt.xml</code> and see if you get an error there.</para>
|
||
</section></section>
|
||
|
||
</chapter>
|