Adding 'running slowly' troubleshooting section

Added in Steve Deaton's content about troubleshooting a slow cloud.

Also, address the broken link.

Change-Id: Iadf7d2df62e9d4d77e0c36cb33467af3546bb2cb
Closes-Bug: #1251088
Co-Authored-By: Steven Deaton <sdeaton2@gmail.com>
This commit is contained in:
Lana Brindley 2016-03-09 12:38:52 +10:00 committed by KATO Tomoyuki
parent b08db66706
commit a15d78f652

View File

@ -899,7 +899,7 @@ inner join nova.instances on cinder.volumes.instance_uuid=nova.instances.uuid
xlink:href="https://github.com/opscode/openstack-chef-repo">OpenStack Chef recipes</link>.
Other newer configuration tools include <link
xlink:href="https://juju.ubuntu.com/">Juju</link>, <link
xlink:href="http://www.ansible.com/home">Ansible</link>, and <link
xlink:href="https://www.ansible.com/">Ansible</link>, and <link
xlink:href="http://www.saltstack.com/">Salt</link>; and more mature
configuration management tools include <link
xlink:href="http://cfengine.com/">CFEngine</link> and <link
@ -1330,6 +1330,127 @@ sql_connection = mysql+pymysql://cinder:password@cloud.example.com/cinder
<?hard-pagebreak ?>
<section xml:id="runningslow">
<?dbhtml stop-chunking?>
<title>What to do when things are running slowly</title>
<para>
When you are getting slow responses from various services, it can be
hard to know where to start looking. The first thing to check is the
extent of the slowness: is it specific to a single service, or varied
among different services? If your problem is isolated to a specific
service, it can temporarily be fixed by restarting the service, but that
is often only a fix for the symptom and not the actual problem.
</para>
<para>
This is a collection of ideas from experienced operators on common
things to look at that may be the cause of slowness. It is not, however,
designed to be an exhaustive list.
</para>
<section xml:id="runningslow_keystone">
<?dbhtml stop-chunking?>
<title>OpenStack Identity service</title>
<para>
If OpenStack Identity is responding slowly, it could be due to the
token table getting large. This can be fixed by running the
<command>keystone-manage token_flush</command> command.
</para>
<para>
Additionally, for Identity-related issues, try the tips in
<xref linkend="runningslow_sql" />.
</para>
</section>
<section xml:id="runningslow_glance">
<?dbhtml stop-chunking?>
<title>OpenStack Image service</title>
<para>
OpenStack Image service can be slowed down by things related to the
Identity service, but the Image service itself can be slowed down if
connectivity to the back-end storage in use is slow or otherwise
problematic. For example, your back-end NFS server might have gone
down.
</para>
</section>
<section xml:id="runningslow_cinder">
<?dbhtml stop-chunking?>
<title>OpenStack Block Storage service</title>
<para>
OpenStack Block Storage service is similar to the Image service, so
start by checking Identity-related services, and the back-end storage.
Additionally, both the Block Storage and Image services rely on AMQP
and SQL functionality, so consider these when debugging.
</para>
</section>
<section xml:id="runningslow_nova">
<?dbhtml stop-chunking?>
<title>OpenStack Compute service</title>
<para>
Services related to OpenStack Compute are normally fairly fast and
rely on a couple of backend services: Identity for authentication and
authorization), and AMQP for interoperability. Any slowness related to
services is normally related to one of these. Also, as with all other
services, SQL is used extensively.
</para>
</section>
<section xml:id="runningslow_neutron">
<?dbhtml stop-chunking?>
<title>OpenStack Networking service</title>
<para>
Slowness in the OpenStack Networking service can be caused by services
that it relies upon, but it can also be related to either physical or
virtual networking. For example: network namespaces that do not exist
or are not tied to interfaces correctly; DHCP daemons that have hung
or are not running; a cable being physically disconnected; a switch
not being configured correctly. When debugging Networking service
problems, begin by verifying all physical networking functionality
(switch configuration, physical cabling, etc.). After the physical
networking is verified, check to be sure all of the Networking
services are running (neutron-server, neutron-dhcp-agent, etc.), then
check on AMQP and SQL back ends.
</para>
</section>
<section xml:id="runningslow_amqp">
<?dbhtml stop-chunking?>
<title>AMQP broker</title>
<para>
Regardless of which AMQP broker you use, such as RabbitMQ, there are
common issues which not only slow down operations, but can also cause
real problems. Sometimes messages queued for services stay on the
queues and are not consumed. This can be due to dead or stagnant
services and can be commonly cleared up by either restarting the
AMQP-related services or the OpenStack service in question.
</para>
</section>
<section xml:id="runningslow_sql">
<?dbhtml stop-chunking?>
<title>SQL back end</title>
<para>
Whether you use SQLite or an RDBMS (such as MySQL), SQL
interoperability is essential to a functioning OpenStack environment.
A large or fragmented SQLite file can cause slowness when using files
as a back end. A locked or long-running query can cause delays for
most RDBMS services. In this case, do not kill the query immediately,
but look into it to see if it is a problem with something that is
hung, or something that is just taking a long time to run and needs to
finish on its own. The administration of an RDBMS is outside the scope
of this document, but it should be noted that a properly functioning
RDBMS is essential to most OpenStack services.
</para>
</section>
</section>
<?hard-pagebreak ?>
<section xml:id="uninstalling">
<?dbhtml stop-chunking?>