Adding 'running slowly' troubleshooting section

Added in Steve Deaton's content about troubleshooting a slow cloud. Also, address the broken link. Change-Id: Iadf7d2df62e9d4d77e0c36cb33467af3546bb2cb Closes-Bug: #1251088 Co-Authored-By: Steven Deaton <sdeaton2@gmail.com>
2016-03-09 12:38:52 +10:00 · 2016-03-09 12:38:52 +10:00 · a15d78f652
commit a15d78f652
parent b08db66706
1 changed files with 122 additions and 1 deletions
--- a/doc/openstack-ops/ch_ops_maintenance.xml
+++ b/doc/openstack-ops/ch_ops_maintenance.xml
@ -899,7 +899,7 @@ inner join nova.instances on cinder.volumes.instance_uuid=nova.instances.uuid
      xlink:href="https://github.com/opscode/openstack-chef-repo">OpenStack Chef recipes</link>.
      Other newer configuration tools include <link
      xlink:href="https://juju.ubuntu.com/">Juju</link>, <link
-      xlink:href="http://www.ansible.com/home">Ansible</link>, and <link
+      xlink:href="https://www.ansible.com/">Ansible</link>, and <link
      xlink:href="http://www.saltstack.com/">Salt</link>; and more mature
      configuration management tools include <link
      xlink:href="http://cfengine.com/">CFEngine</link> and <link
@ -1330,6 +1330,127 @@ sql_connection = mysql+pymysql://cinder:password@cloud.example.com/cinder

  <?hard-pagebreak ?>

+  <section xml:id="runningslow">
+    <?dbhtml stop-chunking?>
+
+    <title>What to do when things are running slowly</title>
+
+    <para>
+      When you are getting slow responses from various services, it can be
+      hard to know where to start looking. The first thing to check is the
+      extent of the slowness: is it specific to a single service, or varied
+      among different services? If your problem is isolated to a specific
+      service, it can temporarily be fixed by restarting the service, but that
+      is often only a fix for the symptom and not the actual problem.
+    </para>
+
+    <para>
+      This is a collection of ideas from experienced operators on common
+      things to look at that may be the cause of slowness. It is not, however,
+      designed to be an exhaustive list.
+    </para>
+
+    <section xml:id="runningslow_keystone">
+      <?dbhtml stop-chunking?>
+      <title>OpenStack Identity service</title>
+      <para>
+        If OpenStack Identity is responding slowly, it could be due to the
+        token table getting large. This can be fixed by running the
+        <command>keystone-manage token_flush</command> command.
+      </para>
+      <para>
+        Additionally, for Identity-related issues, try the tips in
+        <xref linkend="runningslow_sql" />.
+      </para>
+    </section>
+
+    <section xml:id="runningslow_glance">
+      <?dbhtml stop-chunking?>
+      <title>OpenStack Image service</title>
+      <para>
+        OpenStack Image service can be slowed down by things related to the
+        Identity service, but the Image service itself can be slowed down if
+        connectivity to the back-end storage in use is slow or otherwise
+        problematic. For example, your back-end NFS server might have gone
+        down.
+      </para>
+    </section>
+
+    <section xml:id="runningslow_cinder">
+      <?dbhtml stop-chunking?>
+      <title>OpenStack Block Storage service</title>
+      <para>
+        OpenStack Block Storage service is similar to the Image service, so
+        start by checking Identity-related services, and the back-end storage.
+        Additionally, both the Block Storage and Image services rely on AMQP
+        and SQL functionality, so consider these when debugging.
+      </para>
+    </section>
+
+    <section xml:id="runningslow_nova">
+      <?dbhtml stop-chunking?>
+      <title>OpenStack Compute service</title>
+      <para>
+        Services related to OpenStack Compute are normally fairly fast and
+        rely on a couple of backend services: Identity for authentication and
+        authorization), and AMQP for interoperability. Any slowness related to
+        services is normally related to one of these. Also, as with all other
+        services, SQL is used extensively.
+      </para>
+    </section>
+
+    <section xml:id="runningslow_neutron">
+      <?dbhtml stop-chunking?>
+      <title>OpenStack Networking service</title>
+      <para>
+        Slowness in the OpenStack Networking service can be caused by services
+        that it relies upon, but it can also be related to either physical or
+        virtual networking. For example: network namespaces that do not exist
+        or are not tied to interfaces correctly; DHCP daemons that have hung
+        or are not running; a cable being physically disconnected; a switch
+        not being configured correctly. When debugging Networking service
+        problems, begin by verifying all physical networking functionality
+        (switch configuration, physical cabling, etc.). After the physical
+        networking is verified, check to be sure all of the Networking
+        services are running (neutron-server, neutron-dhcp-agent, etc.), then
+        check on AMQP and SQL back ends.
+      </para>
+    </section>
+
+    <section xml:id="runningslow_amqp">
+      <?dbhtml stop-chunking?>
+      <title>AMQP broker</title>
+      <para>
+        Regardless of which AMQP broker you use, such as RabbitMQ, there are
+        common issues which not only slow down operations, but can also cause
+        real problems. Sometimes messages queued for services stay on the
+        queues and are not consumed. This can be due to dead or stagnant
+        services and can be commonly cleared up by either restarting the
+        AMQP-related services or the OpenStack service in question.
+      </para>
+    </section>
+
+    <section xml:id="runningslow_sql">
+      <?dbhtml stop-chunking?>
+      <title>SQL back end</title>
+      <para>
+        Whether you use SQLite or an RDBMS (such as MySQL), SQL
+        interoperability is essential to a functioning OpenStack environment.
+        A large or fragmented SQLite file can cause slowness when using files
+        as a back end. A locked or long-running query can cause delays for
+        most RDBMS services. In this case, do not kill the query immediately,
+        but look into it to see if it is a problem with something that is
+        hung, or something that is just taking a long time to run and needs to
+        finish on its own. The administration of an RDBMS is outside the scope
+        of this document, but it should be noted that a properly functioning
+        RDBMS is essential to most OpenStack services.
+      </para>
+    </section>
+
+  </section>
+
+  <?hard-pagebreak ?>
+
  <section xml:id="uninstalling">
    <?dbhtml stop-chunking?>