From 3864dadb87427742bd0aa056a155e8daebcebccf Mon Sep 17 00:00:00 2001
From: Tom Fifield <tom@openstack.org>
Date: Sun, 16 Mar 2014 21:01:41 +1100
Subject: [PATCH] Copyedit of logging and monitoring chapter

Using the latest OReilly PDF, update spelling, grammar and markup
for te ops guide logging and monitoring chapter

Change-Id: Ib2452a47c4e9dc1201256113fa8a9dd1ba350f04
---
 doc/openstack-ops/ch_ops_log_monitor.xml | 153 ++++++++++++-----------
 1 file changed, 77 insertions(+), 76 deletions(-)
diff --git a/doc/openstack-ops/ch_ops_log_monitor.xml b/doc/openstack-ops/ch_ops_log_monitor.xml
index 976f8230..ea310366 100644
--- a/doc/openstack-ops/ch_ops_log_monitor.xml
+++ b/doc/openstack-ops/ch_ops_log_monitor.xml
@@ -13,15 +13,16 @@
     <?dbhtml stop-chunking?>
     <title>Logging and Monitoring</title>
     <para>As an OpenStack cloud is composed of so many different
-        services, there are a large number of log files. This section
-        aims to assist you in locating and working with them, and
-        other ways to track the status of your deployment.</para>
+        services, there are a large number of log files. This chapter
+        aims to assist you in locating and working with them and
+        describes other ways to track the status of your deployment.</para>
     <section xml:id="where_are_logs">
         <title>Where Are the Logs?</title>
         <para>Most services use the convention of writing
             their log files to subdirectories of the <code>/var/log
-                directory</code>.</para>
-            <informaltable rules="all">
+                directory</code>, as listed in <link linkend="openstack-log-locations">OpenStack Log Locations</link>.</para>
+            <table xml:id="openstack-log-locations" rules="all">
+                <caption>OpenStack Log Locations</caption>
                 <thead>
                     <tr>
                         <th>Node Type</th>
@@ -31,7 +32,7 @@
                 </thead>
                 <tbody>
                     <tr>
-                        <td><para>Cloud Controller</para></td>
+                        <td><para>Cloud controller</para></td>
                         <td><para>
                                 <code>nova-*</code>
                             </para></td>
@@ -40,7 +41,7 @@
                             </para></td>
                     </tr>
                     <tr>
-                         <td><para>Cloud Controller</para></td>
+                         <td><para>Cloud controller</para></td>
                         <td><para>
                                 <code>glance-*</code>
                             </para></td>
@@ -49,7 +50,7 @@
                             </para></td>
                     </tr>
                     <tr>
-                         <td><para>Cloud Controller</para></td>
+                         <td><para>Cloud controller</para></td>
                         <td><para>
                                 <code>cinder-*</code>
                             </para></td>
@@ -58,7 +59,7 @@
                             </para></td>
                     </tr>
                     <tr>
-                         <td><para>Cloud Controller</para></td>
+                         <td><para>Cloud controller</para></td>
                         <td><para>
                                 <code>keystone-*</code>
                             </para></td>
@@ -67,7 +68,7 @@
                             </para></td>
                     </tr>
                     <tr>
-                         <td><para>Cloud Controller</para></td>
+                         <td><para>Cloud controller</para></td>
                         <td><para>
                                 <code>neutron-*</code>
                             </para></td>
@@ -76,7 +77,7 @@
                             </para></td>
                     </tr>
                     <tr>
-                         <td><para>Cloud Controller</para></td>
+                         <td><para>Cloud controller</para></td>
                         <td><para>horizon</para></td>
                         <td><para>
                                 <code>/var/log/apache2/</code>
@@ -84,21 +85,21 @@
                     </tr>
                     <tr>
                          <td><para>All nodes</para></td>
-                        <td><para>misc (Swift,
+                        <td><para>misc (swift,
                             dnsmasq)</para></td>
                         <td><para>
                                 <code>/var/log/syslog</code>
                             </para></td>
                     </tr>
                     <tr>
-                        <td><para>Compute Nodes</para></td>
+                        <td><para>Compute nodes</para></td>
                         <td><para>libvirt</para></td>
                         <td><para>
                                 <code>/var/log/libvirt/libvirtd.log</code>
                             </para></td>
                     </tr>
                     <tr>
-                        <td><para>Compute Nodes</para></td>
+                        <td><para>Compute nodes</para></td>
                         <td><para>Console (boot up messages) for VM instances:</para></td>
                         <td><para>
                                 <code>/var/lib/nova/instances/instance-&lt;instance
@@ -106,36 +107,36 @@
                             </para></td>
                     </tr>
                     <tr>
-                        <td><para>Block Storage Nodes</para></td>
+                        <td><para>Block Storage nodes</para></td>
                         <td><para>cinder-volume</para></td>
                         <td><para>
                                 <code>/var/log/cinder/cinder-volume.log</code>
                             </para></td>
                     </tr>
                 </tbody>
-            </informaltable>
+            </table>
     </section>
     <section xml:id="how_to_read_logs">
         <title>Reading the Logs</title>
         <para>OpenStack services use the standard logging levels, at
             increasing severity: DEBUG, INFO, AUDIT, WARNING, ERROR,
             CRITICAL, and TRACE. That is, messages only appear in the logs
-            if they are more "severe" than the particular log level
+            if they are more "severe" than the particular log level,
             with DEBUG allowing all log statements through. For
             example, TRACE is logged only if the software has a stack
             trace, while INFO is logged for every message including
             those that are only for information.</para>
         <para>To disable DEBUG-level logging, edit
-                <filename>/etc/nova/nova.conf</filename>:</para>
+                <filename>/etc/nova/nova.conf</filename> as follows:</para>
         <programlisting language="ini">debug=false</programlisting>
         <para>Keystone is handled a little differently. To modify the
             logging level, edit the
                 <filename>/etc/keystone/logging.conf</filename> file and look
             at the <code>logger_root</code> and <code>handler_file</code>
             sections.</para>
-        <para>Logging for Horizon is configured in
+        <para>Logging for horizon is configured in
                 <filename>/etc/openstack_dashboard/local_settings.py</filename>.
-            As Horizon is a Django web application, it follows the
+            Because horizon is a Django web application, it follows the
                 <link xlink:title="Django Logging"
                 xlink:href="https://docs.djangoproject.com/en/dev/topics/logging/"
                 >Django Logging</link>
@@ -144,7 +145,7 @@
         <para>The first step in finding the source of an error is
             typically to search for a CRITICAL, TRACE, or ERROR
             message in the log starting at the bottom of the log file.</para>
-        <para>An example of a CRITICAL log message, with the
+        <para>Here is an example of a CRITICAL log message, with the
             corresponding TRACE (Python traceback) immediately
             following:</para>
         <screen><computeroutput>2013-02-25 21:05:51 17409 CRITICAL cinder [-] Bad or unexpected response from the storage volume backend API: volume group
@@ -179,10 +180,10 @@
 2013-02-25 21:05:51 17409 TRACE cinder</computeroutput></screen>
         <para>In this example, cinder-volumes failed to start and has
             provided a stack trace, since its volume back-end has been
-            unable to setup the storage volume - probably because the
+            unable to set up the storage volume&mdash;probably because the
             LVM volume that is expected from the configuration does
             not exist.</para>
-        <para>An example error log:</para>
+        <para>Here is an example error log:</para>
         <screen><computeroutput>2013-02-25 20:26:33 6619 ERROR nova.openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable:
  [Errno 111] ECONNREFUSED. Trying again in 23 seconds.</computeroutput></screen>
         <para>In this error, a nova service has failed to connect to
@@ -209,10 +210,10 @@
                 <code>faf7ded8-4a46-413b-b113-f19590746ffe</code>. If
             you search for this string on the cloud controller in the
                 <filename>/var/log/nova-*.log</filename> files, it appears in
-                <filename>nova-api.log</filename>, and
+                <filename>nova-api.log</filename> and
                 <filename>nova-scheduler.log</filename>. If you search for
             this on the compute nodes in
-                <filename>/var/log/nova-*.log</filename>, it appears
+                <filename>/var/log/nova-*.log</filename>, it appears in
                 <filename>nova-network.log</filename> and
                 <filename>nova-compute.log</filename>. If no ERROR or CRITICAL
             messages appear, the most recent log entry that reports
@@ -233,11 +234,11 @@
 LOG = logging.getLogger(__name__)</programlisting>
         <para>To add a DEBUG logging statement, you would do:</para>
         <programlisting language="python">LOG.debug("This is a custom debugging statement")</programlisting>
-        <para>You may notice that all of the existing logging messages
+        <para>You may notice that all the existing logging messages
             are preceded by an underscore and surrounded by
             parentheses, for example:</para>
         <programlisting language="python">LOG.debug(_("Logging statement appears here"))</programlisting>
-        <para>This is used to support translation of logging messages
+        <para>This formatting is used to support translation of logging messages
             into different languages using the <link
                 xlink:href="http://docs.python.org/2/library/gettext.html"
                 >gettext</link>
@@ -256,9 +257,7 @@ LOG = logging.getLogger(__name__)</programlisting>
             issues. Instead, we recommend you use the RabbitMQ web
             management interface. Enable it on your cloud
             controller:</para>
-        <screen><prompt>#</prompt>
-            <userinput>/usr/lib/rabbitmq/bin/rabbitmq-plugins enable
-                rabbitmq_management</userinput></screen>
+        <screen><prompt>#</prompt> <userinput>/usr/lib/rabbitmq/bin/rabbitmq-plugins enable rabbitmq_management</userinput></screen>
         <screen><prompt>#</prompt> <userinput>service rabbitmq-server restart</userinput></screen>
         <para>The RabbitMQ web management interface is accessible on
             your cloud controller at http://localhost:55672.</para>
@@ -271,11 +270,11 @@ LOG = logging.getLogger(__name__)</programlisting>
             <screen><prompt>$</prompt> <userinput>dpkg -s rabbitmq-server | grep "Version:"
 Version: 2.7.1-0ubuntu4</userinput></screen>
         </note>
-        <para>An alternative to enabling the RabbitMQ Web Management
-            Interface is to use the <command>rabbitmqctl</command> commands. For example,
+        <para>An alternative to enabling the RabbitMQ web management
+            interface is to use the <command>rabbitmqctl</command> commands. For example,
                 <command>rabbitmqctl list_queues| grep
                 cinder</command> displays any messages
-            left in the queue. If there are, it's a possible sign that
+            left in the queue. If any messages are there, it's a possible sign that
             cinder services didn't connect properly to rabbitmq and
             might have to be restarted.</para>
         <para>Items to monitor for RabbitMQ include the number of
@@ -287,14 +286,14 @@ Version: 2.7.1-0ubuntu4</userinput></screen>
         <para>Because your cloud is most likely composed of many
             servers, you must check logs on each of those servers to
             properly piece an event together. A better solution is to
-            send the logs of all servers to a central location so they
+            send the logs of all servers to a central location so that they
             can all be accessed from the same area.</para>
         <para>Ubuntu uses rsyslog as the default logging service.
             Since it is natively able to send logs to a remote
             location, you don't have to install anything extra to
             enable this feature, just modify the configuration file.
             In doing this, consider running your logging over a
-            management network, or using an encrypted VPN to avoid
+            management network or using an encrypted VPN to avoid
             interception.</para>
         <section xml:id="rsyslog_client_config">
             <title>rsyslog Client Configuration</title>
@@ -327,8 +326,8 @@ syslog_log_facility=LOG_LOCAL3</programlisting>
                 following line:</para>
             <programlisting language="ini">*.* @192.168.1.10</programlisting>
             <para>This instructs rsyslog to send all logs to the IP
-                listed. In this example, the IP points to the Cloud
-                Controller.</para>
+                listed. In this example, the IP points to the cloud
+                controller.</para>
         </section>
         <section xml:id="rsyslog_server_config">
             <title>rsyslog Server Configuration</title>
@@ -360,7 +359,7 @@ $template DynFile,"/var/log/rsyslog/%HOSTNAME%/syslog.log"
 local0.* ?NovaFile
 local0.* ?NovaAll
 &amp; ~</programlisting>
-            <para>The above example configuration handles the nova service only.
+            <para>This example configuration handles the nova service only.
                 It first configures rsyslog to act as a server that runs on port
                 514. Next, it creates a series of logging templates. Logging
                 templates control where received logs are stored. Using
@@ -378,7 +377,7 @@ local0.* ?NovaAll
                     </para>
                 </listitem>
             </itemizedlist>
-            <para>This is useful as logs from c02.example.com go to:</para>
+            <para>This is useful, as logs from c02.example.com go to:</para>
             <itemizedlist>
                 <listitem>
                     <para>
@@ -397,10 +396,12 @@ local0.* ?NovaAll
         </section>
     </section>
     <section xml:id="stacktach">
+        <!-- FIXME This section needs updating, especially with the advent of
+         ceilometer -->
         <title>StackTach</title>
         <para>StackTach is a tool created by Rackspace to collect and
             report the notifications sent by <code>nova</code>.
-            Notifications are essentially the same as logs, but can be
+            Notifications are essentially the same as logs but can be
             much more detailed. A good overview of notifications can
             be found at <link xlink:title="StackTach GitHub repo"
                 xlink:href="https://wiki.openstack.org/wiki/SystemUsageData"
@@ -433,7 +434,7 @@ notification_driver=nova.openstack.common.notifier.rabbit_notifier</programlisti
                 capable of executing arbitrary commands to check the
                 status of server and network services, remotely
                 executing arbitrary commands directly on servers, and
-                allow servers to push notifications back in the form
+                allowing servers to push notifications back in the form
                 of passive monitoring. Nagios has been around since
                 1999. Although newer monitoring services are
                 available, Nagios is a tried-and-true systems
@@ -442,9 +443,9 @@ notification_driver=nova.openstack.common.notifier.rabbit_notifier</programlisti
         <section xml:id="process_monitoring">
             <title>Process Monitoring</title>
             <para>A basic type of alert monitoring is to simply check
-                and see if a required process is running. For example,
+                and see whether a required process is running. For example,
                 ensure that the <code>nova-api</code> service is
-                running on the Cloud Controller:</para>
+                running on the cloud controller:</para>
             <screen><prompt>#</prompt> <userinput>ps aux | grep nova-api</userinput>
 <computeroutput>nova 12786 0.0 0.0 37952 1312 ? Ss Feb11 0:00 su -s /bin/sh -c exec nova-api --config-file=/etc/nova/nova.conf nova
 nova 12787 0.0 0.1 135764 57400 ? S Feb11 0:01 /usr/bin/python /usr/bin/nova-api --config-file=/etc/nova/nova.conf
@@ -477,22 +478,22 @@ root 24121 0.0 0.0 11688 912 pts/5 S+ 13:07 0:00 grep nova-api</computeroutput><
                 more resources are critically low. While the
                 monitoring thresholds should be tuned to your specific
                 OpenStack environment, monitoring resource usage is
-                not specific to OpenStack at all – any generic type of
+                not specific to OpenStack at all–any generic type of
                 alert will work fine.</para>
             <para>Some of the resources that you want to monitor
                 include:</para>
             <itemizedlist>
                 <listitem>
-                    <para>Disk Usage</para>
+                    <para>Disk usage</para>
                 </listitem>
                 <listitem>
-                    <para>Server Load</para>
+                    <para>Server load</para>
                 </listitem>
                 <listitem>
-                    <para>Memory Usage</para>
+                    <para>Memory usage</para>
                 </listitem>
                 <listitem>
-                    <para>Network IO</para>
+                    <para>Network I/O</para>
                 </listitem>
                 <listitem>
                     <para>Available vCPUs</para>
@@ -512,8 +513,8 @@ root 24121 0.0 0.0 11688 912 pts/5 S+ 13:07 0:00 grep nova-api</computeroutput><
                 configuration:</para>
             <programlisting><?db-font-size 75%?>command[check_all_disks]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -e</programlisting>
             <para>Nagios alerts you with a WARNING when any disk on
-                the compute node is 80% full and CRITICAL when 90% is
-                full.</para>
+                the compute node is 80 percent full and CRITICAL when 90
+                percent is full.</para>
         </section>
         <section xml:id="metering_telemetry">
             <title>Metering and Telemetry with Ceilometer</title>
@@ -530,13 +531,13 @@ root 24121 0.0 0.0 11688 912 pts/5 S+ 13:07 0:00 grep nova-api</computeroutput><
                     xlink:href="http://docs.openstack.org/developer/ceilometer/"
                     >http://docs.openstack.org/developer/ceilometer/</link>.</para></section>
         <section xml:id="os_resources">
-            <title>OpenStack-specific Resources</title>
+            <title>OpenStack-Specific Resources</title>
             <para>Resources such as memory, disk, and CPU are generic
                 resources that all servers (even non-OpenStack
                 servers) have and are important to the overall health
                 of the server. When dealing with OpenStack
                 specifically, these resources are important for a
-                second reason: ensuring enough are available in order
+                second reason: ensuring that enough are available
                 to launch instances. There are a few ways you can see
                 OpenStack resource usage.</para>
             <para>The first is through the <code>nova</code>
@@ -545,14 +546,14 @@ root 24121 0.0 0.0 11688 912 pts/5 S+ 13:07 0:00 grep nova-api</computeroutput><
             <para>This command displays a list of how many instances a
                 tenant has running and some light usage statistics
                 about the combined instances. This command is useful
-                for a quick overview of your cloud, but doesn't really
+                for a quick overview of your cloud, but it doesn't really
                 get into a lot of details.</para>
             <para>Next, the <code>nova</code> database contains three
                 tables that store usage information.</para>
             <para>The <code>nova.quotas</code> and
                     <code>nova.quota_usages</code> tables store quota
-                information. If a tenant's quota is different than the
-                default quota settings, their quota is stored in
+                information. If a tenant's quota is different from the
+                default quota settings, its quota is stored in the
                     <code>nova.quotas</code> table. For
                 example:</para>
             <screen><prompt>mysql&gt;</prompt> <userinput>select project_id, resource, hard_limit from quotas;</userinput>
@@ -587,12 +588,12 @@ root 24121 0.0 0.0 11688 912 pts/5 S+ 13:07 0:00 grep nova-api</computeroutput><
             <para>By comparing a tenant's hard limit with their
                 current resource usage, you can see their usage
                 percentage. For example, if this tenant is using 1
-                Floating IP out of 10, then they are using 10% of
-                their Floating IP quota. Rather than doing the
+                floating IP out of 10, then they are using 10 percent of
+                their floating IP quota. Rather than doing the
                 calculation manually, you can use SQL or the scripting
                 language of your choice and create a formatted
                 report:</para>
-            <screen><computeroutput>+----------------------------------+------------+-------------+---------------+
+<screen><computeroutput>+----------------------------------+------------+-------------+---------------+
 | some_tenant                                                                 |
 +-----------------------------------+------------+------------+---------------+
 | Resource                          | Used       | Limit      |               |
@@ -613,8 +614,8 @@ root 24121 0.0 0.0 11688 912 pts/5 S+ 13:07 0:00 grep nova-api</computeroutput><
 | security_groups                   | 0          | 10         |           0 % |
 | volumes                           | 2          | 10         |          20 % |
 +-----------------------------------+------------+------------+---------------+</computeroutput></screen>
-            <para>The above was generated using a custom script which
-                can be found on GitHub
+            <para>The above information was generated by using a custom script
+                that can be found on GitHub
                 (https://github.com/cybera/novac/blob/dev/libexec/novac-quota-report).</para>
             <note>
                 <para>This script is specific to a certain OpenStack
@@ -627,15 +628,15 @@ root 24121 0.0 0.0 11688 912 pts/5 S+ 13:07 0:00 grep nova-api</computeroutput><
             <title>Intelligent Alerting</title>
             <para>Intelligent alerting can be thought of as a form of
                 continuous integration for operations. For example,
-                you can easily check to see if the Image Service is up and
+                you can easily check to see whether the Image Service is up and
                 running by ensuring that the <code>glance-api</code>
                 and <code>glance-registry</code> processes are running
-                or by seeing if <code>glace-api</code> is responding
+                or by seeing whether <code>glace-api</code> is responding
                 on port 9292.</para>
-            <para>But how can you tell if images are being
+            <para>But how can you tell whether images are being
                 successfully uploaded to the Image Service? Maybe the
                 disk that Image Service is storing the images on is
-                full or the S3 back-end is down. You could naturally
+                full or the S3 backend is down. You could naturally
                 check this by doing a quick image upload:</para>
             <programlisting language="bash">#!/bin/bash
 #
@@ -649,35 +650,35 @@ glance image-create --name='cirros image' --is-public=true --container-format=ba
 6_64-disk.img</programlisting>
             <para>By taking this script and rolling it into an alert
                 for your monitoring system (such as Nagios), you now
-                have an automated way of ensuring image uploads to the
+                have an automated way of ensuring that image uploads to the
                 Image Catalog are working.</para>
             <note>
                 <para>You must remove the image after each test. Even
                     better, test whether you can successfully delete
                     an image from the Image Service.</para>
             </note>
-            <para>Intelligent alerting takes a considerable more
-                amount of time to plan and implement than the other
+            <para>Intelligent alerting takes considerably more
+                time to plan and implement than the other
                 alerts described in this chapter. A good outline to
                 implement intelligent alerting is:</para>
             <itemizedlist>
                 <listitem>
-                    <para>Review common actions in your cloud</para>
+                    <para>Review common actions in your cloud.</para>
                 </listitem>
                 <listitem>
                     <para>Create ways to automatically test these
-                        actions</para>
+                        actions.</para>
                 </listitem>
                 <listitem>
                     <para>Roll these tests into an alerting
-                        system</para>
+                        system.</para>
                 </listitem>
             </itemizedlist>
             <para>Some other examples for Intelligent Alerting
                 include:</para>
             <itemizedlist>
                 <listitem>
-                    <para>Can instances launch and destroyed?</para>
+                    <para>Can instances launch and be destroyed?</para>
                 </listitem>
                 <listitem>
                     <para>Can users be created?</para>
@@ -693,7 +694,7 @@ glance image-create --name='cirros image' --is-public=true --container-format=ba
         <section xml:id="trending">
             <title>Trending</title>
             <para>Trending can give you great insight into how your
-                cloud is performing day to day. For example, if a busy
+                cloud is performing day to day. You can learn, for example, if a busy
                 day was simply a rare occurrence or if you should
                 start adding new compute nodes.</para>
             <para>Trending takes a slightly different approach than
@@ -733,7 +734,7 @@ glance image-create --name='cirros image' --is-public=true --container-format=ba
             <para>As an example, recording <code>nova-api</code> usage
                 can allow you to track the need to scale your cloud
                 controller. By keeping an eye on <code>nova-api</code>
-                requests, you can determine if you need to spawn more
+                requests, you can determine whether you need to spawn more
                 nova-api processes or go as far as introducing an
                 entirely new server to run <code>nova-api</code>. To
                 get an approximate count of the requests, look for
@@ -762,10 +763,10 @@ glance image-create --name='cirros image' --is-public=true --container-format=ba
         <title>Summary</title>
         <para>For stable operations, you want to detect failure promptly and
         determine causes efficiently. With a distributed system, it's even
-        more important to track the right items to meet a service level target.
+        more important to track the right items to meet a service-level target.
         Learning where these logs are located in the file system or API gives
-        you an advantage. Plus, we have discussed how to read, interpret, and
-        manipulate information from OpenStack services so you can monitor
+        you an advantage. This chapter also showed how to read, interpret, and
+        manipulate information from OpenStack services so that you can monitor
         effectively.</para>
     </section>
 </chapter>

Node Type
Cloud Controller	Cloud controller	`nova-*`
Cloud Controller	Cloud controller	`glance-*`
Cloud Controller	Cloud controller	`cinder-*`
Cloud Controller	Cloud controller	`keystone-*`
Cloud Controller	Cloud controller	`neutron-*`
Cloud Controller	Cloud controller	horizon	`/var/log/apache2/` @@ -84,21 +85,21 @@
All nodes	misc (Swift, +	misc (swift, dnsmasq)	`/var/log/syslog`
Compute Nodes	Compute nodes	libvirt	`/var/log/libvirt/libvirtd.log`
Compute Nodes	Compute nodes	Console (boot up messages) for VM instances:	`/var/lib/nova/instances/instance-<instance @@ -106,36 +107,36 @@`
Block Storage Nodes	Block Storage nodes	cinder-volume	`/var/log/cinder/cinder-volume.log`