Cleanup of Swift Ops Runbook

This patch cleans up some rough edges that were left (due to time constraints) in the original commit. Change-Id: Id4480be8dc1b5c920c19988cb89ca8b60ace91b4 Co-Authored-By: Gerry Drudy gerry.drudy@hpe.com
2016-03-09 14:28:17 +00:00 · 2016-03-09 14:28:17 +00:00 · e38b53393f
commit e38b53393f
parent 643dbce134
8 changed files with 517 additions and 562 deletions
--- a/doc/source/admin_guide.rst
+++ b/doc/source/admin_guide.rst
@ -234,9 +234,11 @@ using the format `regex_pattern_X = regex_expression`, where `X` is a number.
 This script has been tested on Ubuntu 10.04 and Ubuntu 12.04, so if you are
 using a different distro or OS, some care should be taken before using in production.
--------------
+.. _dispersion_report:
-Cluster Health
+
--------------
+-----------------
 Dispersion Report
 -----------------
 There is a swift-dispersion-report tool for measuring overall cluster health.
 This is accomplished by checking if a set of deliberately distributed
--- a/doc/source/ops_runbook/diagnose.rst
+++ b/doc/source/ops_runbook/diagnose.rst
@ -2,15 +2,53 @@
 Identifying issues and resolutions
 ==================================
 Is the system up?
 -----------------
 If you have a report that Swift is down, perform the following basic checks:
 #. Run swift functional tests.
 #. From a server in your data center, use ``curl`` to check ``/healthcheck``
   (see below).
 #. If you have a monitoring system, check your monitoring system.
 #. Check your hardware load balancers infrastructure.
 #. Run swift-recon on a proxy node.
 Functional tests usage
 -----------------------
 We would recommend that you set up the functional tests to run against your
 production system. Run regularly this can be a useful tool to validate
 that the system is configured correctly. In addition, it can provide
 early warning about failures in your system (if the functional tests stop
 working, user applications will also probably stop working).
 A script for running the function tests is located in ``swift/.functests``.
 External monitoring
 -------------------
 We use pingdom.com to monitor the external Swift API. We suggest the
 following:
   -  Do a GET on ``/healthcheck``
   -  Create a container, make it public (x-container-read:
      .r*,.rlistings), create a small file in the container; do a GET
      on the object
 Diagnose: General approach
 --------------------------
 -  Look at service status in your monitoring system.
 -  In addition to system monitoring tools and issue logging by users,
-   swift errors will often result in log entries in the ``/var/log/swift``
+   swift errors will often result in log entries (see :ref:`swift_logs`).
   files: ``proxy.log``, ``server.log`` and ``background.log`` (see:``Swift
   logs``).
 -  Look at any logs your deployment tool produces.
@ -33,22 +71,24 @@ Diagnose: Swift-dispersion-report
 ---------------------------------
 The swift-dispersion-report is a useful tool to gauge the general
-health of the system. Configure the ``swift-dispersion`` report for
+health of the system. Configure the ``swift-dispersion`` report to cover at
-100% coverage. The dispersion report regularly monitors
+a minimum every disk drive in your system (usually 1% coverage).
-these and gives a report of the amount of objects/containers are still
+See :ref:`dispersion_report` for details of how to configure and
-available as well as how many copies of them are also there.
+use the dispersion reporting tool.
-The dispersion-report output is logged on the first proxy of the first
+The ``swift-dispersion-report`` tool can take a long time to run, especially
-AZ or each system (proxy with the monitoring role) under
+if any servers are down. We suggest you run it regularly
-``/var/log/swift/swift-dispersion-report.log``.
+(e.g., in a cron job) and save the results. This makes it easy to refer
 to the last report without having to wait for a long-running command
 to complete.
-Diagnose: Is swift running?
+Diagnose: Is system responding to /healthcheck?
---------------------------
+-----------------------------------------------
 When you want to establish if a swift endpoint is running, run ``curl -k``
-against either: https://*[REPLACEABLE]*./healthcheck OR
+against https://*[ENDPOINT]*/healthcheck.
 https:*[REPLACEABLE]*.crossdomain.xml
 .. _swift_logs:
 Diagnose: Interpreting messages in ``/var/log/swift/`` files
 ------------------------------------------------------------
@ -70,25 +110,20 @@ The following table lists known issues:
     - **Signature**
     - **Issue**
     - **Steps to take**
   * - /var/log/syslog
     - kernel: [] hpsa .... .... .... has check condition: unknown type:
       Sense: 0x5, ASC: 0x20, ASC Q: 0x0 ....
     - An unsupported command was issued to the storage hardware
     - Understood to be a benign monitoring issue, ignore
   * - /var/log/syslog
     - kernel: [] sd .... [csbu:sd...] Sense Key: Medium Error
     - Suggests disk surface issues
-     - Run swift diagnostics on the target node to check for disk errors,
+     - Run ``swift-drive-audit`` on the target node to check for disk errors,
       repair disk errors
   * - /var/log/syslog
     - kernel: [] sd .... [csbu:sd...] Sense Key: Hardware Error
     - Suggests storage hardware issues
-     - Run swift diagnostics on the target node to check for disk failures,
+     - Run diagnostics on the target node to check for disk failures,
       replace failed disks
   * - /var/log/syslog
     - kernel: [] .... I/O error, dev sd.... ,sector ....
     -
-     - Run swift diagnostics on the target node to check for disk errors
+     - Run diagnostics on the target node to check for disk errors
   * - /var/log/syslog
     - pound: NULL get_thr_arg
     - Multiple threads woke up
@ -96,59 +131,61 @@ The following table lists known issues:
   * - /var/log/swift/proxy.log
     - .... ERROR .... ConnectionTimeout ....
     - A storage node is not responding in a timely fashion
-     - Run swift diagnostics on the target node to check for node down,
+     - Check if node is down, not running Swift,
-       node unconfigured, storage off-line or network issues between the
+       unconfigured, storage off-line or for network issues between the
       proxy and non responding node
   * - /var/log/swift/proxy.log
     - proxy-server .... HTTP/1.0 500 ....
     - A proxy server has reported an internal server error
-     - Run swift diagnostics on the target node to check for issues
+     - Examine the logs for any errors at the time the error was reported to
       attempt to understand the cause of the error.
   * - /var/log/swift/server.log
     - .... ERROR .... ConnectionTimeout ....
     - A storage server is not responding in a timely fashion
-     - Run swift diagnostics on the target node to check for a node or
+     - Check if node is down, not running Swift,
-       service, down, unconfigured, storage off-line or network issues
+       unconfigured, storage off-line or for network issues between the
-       between the two nodes
+       server and non responding node
   * - /var/log/swift/server.log
     - .... ERROR .... Remote I/O error: '/srv/node/disk....
     - A storage device is not responding as expected
-     - Run swift diagnostics and check the filesystem named in the error
+     - Run ``swift-drive-audit`` and check the filesystem named in the error
-       for corruption (unmount & xfs_repair)
+       for corruption (unmount & xfs_repair). Check if the filesystem
       is mounted and working.
   * - /var/log/swift/background.log
     - object-server ERROR container update failed .... Connection refused
-     - Peer node is not responding
+     - A container server node could not be contacted
-     - Check status of the network and peer node
+     - Check if node is down, not running Swift,
       unconfigured, storage off-line or for network issues between the
       server and non responding node
   * - /var/log/swift/background.log
     - object-updater ERROR with remote .... ConnectionTimeout
-     -
+     - The remote container server is busy
-     - Check status of the network and peer node
+     - If the container is very large, some errors updating it can be
       expected. However, this error can also occur if there is a networking
       issue.
   * - /var/log/swift/background.log
     - account-reaper STDOUT: .... error: ECONNREFUSED
-     - Network connectivity issue
+     - Network connectivity issue or the target server is down.
-     - Resolve network issue and re-run diagnostics
+     - Resolve network issue or reboot the target server
   * - /var/log/swift/background.log
     - .... ERROR .... ConnectionTimeout
     - A storage server is not responding in a timely fashion
-     - Run swift diagnostics on the target node to check for a node
+     - The target server may be busy. However, this error can also occur if
-       or service, down, unconfigured, storage off-line or network issues
+       there is a networking issue.
       between the two nodes
   * - /var/log/swift/background.log
     - .... ERROR syncing .... Timeout
-     - A storage server is not responding in a timely fashion
+     - A timeout occurred syncing data to another node.
-     - Run swift diagnostics on the target node to check for a node
+     - The target server may be busy. However, this error can also occur if
-       or service, down, unconfigured, storage off-line or network issues
+       there is a networking issue.
       between the two nodes
   * - /var/log/swift/background.log
     - .... ERROR Remote drive not mounted ....
     - A storage server disk is unavailable
-     - Run swift diagnostics on the target node to check for a node or
+     - Repair and remount the file system (on the remote node)
       service, failed or unmounted disk on the target, or a network issue
   * - /var/log/swift/background.log
     - object-replicator .... responded as unmounted
     - A storage server disk is unavailable
-     - Run swift diagnostics on the target node to check for a node or
+     - Repair and remount the file system (on the remote node)
-       service, failed or unmounted disk on the target, or a network issue
+   * - /var/log/swift/*.log
   * - /var/log/swift/\*.log
     - STDOUT: EXCEPTION IN
     - A unexpected error occurred
     - Read the Traceback details, if it matches known issues
@ -157,19 +194,14 @@ The following table lists known issues:
   * - /var/log/rsyncd.log
     - rsync: mkdir "/disk....failed: No such file or directory....
     - A local storage server disk is unavailable
-     - Run swift diagnostics on the node to check for a failed or
+     - Run diagnostics on the node to check for a failed or
       unmounted disk
   * - /var/log/swift*
-     - Exception: Could not bind to 0.0.0.0:600xxx
+     - Exception: Could not bind to 0.0.0.0:6xxx
     - Possible Swift process restart issue. This indicates an old swift
       process is still running.
-     - Run swift diagnostics, if some swift services are reported down,
+     - Restart Swift services. If some swift services are reported down,
       check if they left residual process behind.
   * - /var/log/rsyncd.log
     - rsync: recv_generator: failed to stat "/disk....." (in object)
       failed: Not a directory (20)
     - Swift directory structure issues
     - Run swift diagnostics on the node to check for issues
 Diagnose: Parted reports the backup GPT table is corrupt
 --------------------------------------------------------
@ -188,7 +220,7 @@ Diagnose: Parted reports the backup GPT table is corrupt
      OK/Cancel?
-To fix, go to: Fix broken GPT table (broken disk partition)
+To fix, go to :ref:`fix_broken_gpt_table`
 Diagnose: Drives diagnostic reports a FS label is not acceptable
@ -240,9 +272,10 @@ Diagnose: Failed LUNs
 .. note::
-   The HPE Helion Public Cloud uses direct attach SmartArry
+   The HPE Helion Public Cloud uses direct attach SmartArray
   controllers/drives. The information here is specific to that
-   environment.
+   environment. The hpacucli utility mentioned here may be called
   hpssacli in your environment.
 The ``swift_diagnostics`` mount checks may return a warning that a LUN has
 failed, typically accompanied by DriveAudit check failures and device
@ -254,7 +287,7 @@ the procedure to replace the disk.
 Otherwise the lun can be re-enabled as follows:
-#. Generate a hpssacli diagnostic report. This report allows the swift
+#. Generate a hpssacli diagnostic report. This report allows the DC
   team to troubleshoot potential cabling or hardware issues so it is
   imperative that you run it immediately when troubleshooting a failed
   LUN. You will come back later and grep this file for more details, but
@ -262,8 +295,7 @@ Otherwise the lun can be re-enabled as follows:
   .. code::
-      sudo hpssacli controller all diag file=/tmp/hpacu.diag ris=on \
+      sudo hpssacli controller all diag file=/tmp/hpacu.diag ris=on xml=off zip=off
      xml=off zip=off
 Export the following variables using the below instructions before
 proceeding further.
@ -317,8 +349,7 @@ proceeding further.
   .. code::
-      sudo hpssacli controller slot=1 ld ${LDRIVE} show detail \
+      sudo hpssacli controller slot=1 ld ${LDRIVE} show detail | grep -i "Disk Name"
      grep -i "Disk Name"
 #. Export the device name variable from the preceding command (example:
   /dev/sdk):
@ -396,6 +427,8 @@ proceeding further.
   should be checked. For example, log a DC ticket to check the sas cables
   between the drive and the expander.
 .. _diagnose_slow_disk_drives:
 Diagnose: Slow disk devices
 ---------------------------
@ -404,7 +437,8 @@ Diagnose: Slow disk devices
   collectl is an open-source performance gathering/analysis tool.
 If the diagnostics report a message such as ``sda: drive is slow``, you
-should log onto the node and run the following comand:
+should log onto the node and run the following command (remove ``-c 1`` option to continuously monitor
 the data):
 .. code::
@ -431,13 +465,12 @@ should log onto the node and run the following comand:
   dm-3             0      0    0    0       0      0    0    0       0     0     0      0    0
   dm-4             0      0    0    0       0      0    0    0       0     0     0      0    0
   dm-5             0      0    0    0       0      0    0    0       0     0     0      0    0
-   ...
+
   (repeats -- type Ctrl/C to stop)
 Look at the ``Wait`` and ``SvcTime`` values. It is not normal for
 these values to exceed 50msec. This is known to impact customer
-performance (upload/download. For a controller problem, many/all drives
+performance (upload/download). For a controller problem, many/all drives
-will show how wait and service times. A reboot may correct the prblem;
+will show long wait and service times. A reboot may correct the problem;
 otherwise hardware replacement is needed.
 Another way to look at the data is as follows:
@ -526,12 +559,12 @@ be disabled on a per-drive basis.
 Diagnose: Slow network link - Measuring network performance
 -----------------------------------------------------------
-Network faults can cause performance between Swift nodes to degrade. The
+Network faults can cause performance between Swift nodes to degrade. Testing
-following tests are recommended. Other methods (such as copying large
+with ``netperf`` is recommended. Other methods (such as copying large
 files) may also work, but can produce inconclusive results.
-Use netperf on all production systems. Install on all systems if not
+Install ``netperf`` on all systems if not
-already installed. And the UFW rules for its control port are in place.
+already installed. Check that the UFW rules for its control port are in place.
 However, there are no pre-opened ports for netperf's data connection. Pick a
 port number. In this example, 12866 is used because it is one higher
 than netperf's default control port number, 12865. If you get very
@ -561,11 +594,11 @@ Running tests
 #. On the ``source`` node, run the following command to check
   throughput. Note the double-dash before the -P option.
-   The command takes 10 seconds to complete.
+   The command takes 10 seconds to complete. The ``target`` node is 192.168.245.5.
   .. code::
-      $ netperf -H <redacted>.72.4
+      $ netperf -H 192.168.245.5 -- -P 12866
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 12866 AF_INET to
      <redacted>.72.4 (<redacted>.72.4) port 12866 AF_INET : demo
      Recv   Send    Send
@ -578,7 +611,7 @@ Running tests
   .. code::
-      $ netperf -H <redacted>.72.4 -t TCP_RR -- -P 12866
+      $ netperf -H 192.168.245.5 -t TCP_RR -- -P 12866
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 12866
      AF_INET to <redacted>.72.4 (<redacted>.72.4) port 12866 AF_INET : demo
      : first burst 0
@ -763,7 +796,7 @@ Diagnose: High system latency
   used by the monitor program happen to live on the bad object server.
 -  A general network problem within the data canter. Compare the results
-   with the Pingdom monitors too see if they also have a problem.
+   with the Pingdom monitors to see if they also have a problem.
 Diagnose: Interface reports errors
 ----------------------------------
@ -802,59 +835,21 @@ If the nick supports self test, this can be performed with:
 Self tests should read ``PASS`` if the nic is operating correctly.
 Nic module drivers can be re-initialised by carefully removing and
-re-installing the modules. Case in point being the mellanox drivers on
+re-installing the modules (this avoids rebooting the server).
-Swift Proxy servers. which use a two part driver mlx4_en and
+For example, mellanox drivers use a two part driver mlx4_en and
 mlx4_core. To reload these you must carefully remove the mlx4_en
 (ethernet) then the mlx4_core modules, and reinstall them in the
 reverse order.
 As the interface will be disabled while the modules are unloaded, you
-must be very careful not to lock the interface out. The following
+must be very careful not to lock yourself out so it may be better
-script can be used to reload the melanox drivers, as a side effect, this
+to script this.
 resets error counts on the interface.
 Diagnose: CorruptDir diagnostic reports corrupt directories
 -----------------------------------------------------------
 From time to time Swift data structures may become corrupted by
 misplaced files in filesystem locations that swift would normally place
 a directory. This causes issues for swift when directory creation is
 attempted at said location, it may fail due to the pre-existent file. If
 the CorruptDir diagnostic reports Corrupt directories, they should be
 checked to see if they exist.
 Checking existence of entries
 -----------------------------
 Swift data filesystems are located under the ``/srv/node/disk``
 mountpoints and contain accounts, containers and objects
 subdirectories which in turn contain partition number subdirectories.
 The partition number directories contain md5 hash subdirectories. md5
 hash directories contain md5sum subdirectories. md5sum directories
 contain the Swift data payload as either a database (.db), for
 accounts and containers, or a data file (.data) for objects.
 If the entries reported in diagnostics correspond to a partition
 number, md5 hash or md5sum directory, check the entry with ``ls
 -ld *entry*``.
 If it turns out to be a file rather than a directory, it should be
 carefully removed.
 .. note::
   Please do not ``ls`` the partition level directory contents, as
   this *especially objects* may take a lot of time and system resources,
   if you need to check the contents, use:
   .. code::
      echo /srv/node/disk#/type/partition#/
 Diagnose: Hung swift object replicator
 --------------------------------------
-The swift diagnostic message ``Object replicator: remaining exceeds
+A replicator reports in its log that remaining time exceeds
-100hrs:`` may indicate that the swift ``object-replicator`` is stuck and not
+100 hours. This may indicate that the swift ``object-replicator`` is stuck and not
 making progress. Another useful way to check this is with the
 'swift-recon -r' command on a swift proxy server:
@ -866,14 +861,13 @@ making progress. Another useful way to check this is with the
   --> Starting reconnaissance on 384 hosts
   ===============================================================================
   [2013-07-17 12:56:19] Checking on replication
   http://<redacted>.72.63:6000/recon/replication: <urlopen error timed out>
   [replication_time] low: 2, high: 80, avg: 28.8, total: 11037, Failed: 0.0%, no_result: 0, reported: 383
-   Oldest completion was 2013-06-12 22:46:50 (12 days ago) by <redacted>.31:6000.
+   Oldest completion was 2013-06-12 22:46:50 (12 days ago) by 192.168.245.3:6000.
-   Most recent completion was 2013-07-17 12:56:19 (5 seconds ago) by <redacted>.204.113:6000.
+   Most recent completion was 2013-07-17 12:56:19 (5 seconds ago) by 192.168.245.5:6000.
   ===============================================================================
 The ``Oldest completion`` line in this example indicates that the
-object-replicator on swift object server <redacted>.31 has not completed
+object-replicator on swift object server 192.168.245.3 has not completed
 the replication cycle in 12 days. This replicator is stuck. The object
 replicator cycle is generally less than 1 hour. Though an replicator
 cycle of 15-20 hours can occur if nodes are added to the system and a
@ -886,22 +880,22 @@ the following command:
 .. code::
   #  sudo grep object-rep /var/log/swift/background.log | grep -e "Starting object replication" -e "Object replication complete" -e "partitions rep"
-   Jul 16 06:25:46 <redacted> object-replicator 15344/16450 (93.28%) partitions replicated in 69018.48s (0.22/sec, 22h remaining)
+   Jul 16 06:25:46 192.168.245.4 object-replicator 15344/16450 (93.28%) partitions replicated in 69018.48s (0.22/sec, 22h remaining)
-   Jul 16 06:30:46 <redacted> object-replicator 15344/16450 (93.28%) partitions replicated in 69318.58s (0.22/sec, 22h remaining)
+   Jul 16 06:30:46 192.168.245.4object-replicator 15344/16450 (93.28%) partitions replicated in 69318.58s (0.22/sec, 22h remaining)
-   Jul 16 06:35:46 <redacted> object-replicator 15344/16450 (93.28%) partitions replicated in 69618.63s (0.22/sec, 23h remaining)
+   Jul 16 06:35:46 192.168.245.4 object-replicator 15344/16450 (93.28%) partitions replicated in 69618.63s (0.22/sec, 23h remaining)
-   Jul 16 06:40:46 <redacted> object-replicator 15344/16450 (93.28%) partitions replicated in 69918.73s (0.22/sec, 23h remaining)
+   Jul 16 06:40:46 192.168.245.4 object-replicator 15344/16450 (93.28%) partitions replicated in 69918.73s (0.22/sec, 23h remaining)
-   Jul 16 06:45:46 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 70218.75s (0.22/sec, 24h remaining)
+   Jul 16 06:45:46 192.168.245.4 object-replicator 15348/16450 (93.30%) partitions replicated in 70218.75s (0.22/sec, 24h remaining)
-   Jul 16 06:50:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 70518.85s (0.22/sec, 24h remaining)
+   Jul 16 06:50:47 192.168.245.4object-replicator 15348/16450 (93.30%) partitions replicated in 70518.85s (0.22/sec, 24h remaining)
-   Jul 16 06:55:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 70818.95s (0.22/sec, 25h remaining)
+   Jul 16 06:55:47 192.168.245.4 object-replicator 15348/16450 (93.30%) partitions replicated in 70818.95s (0.22/sec, 25h remaining)
-   Jul 16 07:00:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 71119.05s (0.22/sec, 25h remaining)
+   Jul 16 07:00:47 192.168.245.4 object-replicator 15348/16450 (93.30%) partitions replicated in 71119.05s (0.22/sec, 25h remaining)
-   Jul 16 07:05:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 71419.15s (0.21/sec, 26h remaining)
+   Jul 16 07:05:47 192.168.245.4 object-replicator 15348/16450 (93.30%) partitions replicated in 71419.15s (0.21/sec, 26h remaining)
-   Jul 16 07:10:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 71719.25s (0.21/sec, 26h remaining)
+   Jul 16 07:10:47 192.168.245.4object-replicator 15348/16450 (93.30%) partitions replicated in 71719.25s (0.21/sec, 26h remaining)
-   Jul 16 07:15:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 72019.27s (0.21/sec, 27h remaining)
+   Jul 16 07:15:47 192.168.245.4 object-replicator 15348/16450 (93.30%) partitions replicated in 72019.27s (0.21/sec, 27h remaining)
-   Jul 16 07:20:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 72319.37s (0.21/sec, 27h remaining)
+   Jul 16 07:20:47 192.168.245.4object-replicator 15348/16450 (93.30%) partitions replicated in 72319.37s (0.21/sec, 27h remaining)
-   Jul 16 07:25:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 72619.47s (0.21/sec, 28h remaining)
+   Jul 16 07:25:47 192.168.245.4 object-replicator 15348/16450 (93.30%) partitions replicated in 72619.47s (0.21/sec, 28h remaining)
-   Jul 16 07:30:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 72919.56s (0.21/sec, 28h remaining)
+   Jul 16 07:30:47 192.168.245.4 object-replicator 15348/16450 (93.30%) partitions replicated in 72919.56s (0.21/sec, 28h remaining)
-   Jul 16 07:35:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 73219.67s (0.21/sec, 29h remaining)
+   Jul 16 07:35:47 192.168.245.4 object-replicator 15348/16450 (93.30%) partitions replicated in 73219.67s (0.21/sec, 29h remaining)
-   Jul 16 07:40:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 73519.76s (0.21/sec, 29h remaining)
+   Jul 16 07:40:47 192.168.245.4 object-replicator 15348/16450 (93.30%) partitions replicated in 73519.76s (0.21/sec, 29h remaining)
 The above status is output every 5 minutes to ``/var/log/swift/background.log``.
@ -921,7 +915,7 @@ of a corrupted filesystem detected by the object replicator:
 .. code::
   # sudo bzgrep "Remote I/O error" /var/log/swift/background.log* |grep srv | - tail -1
-   Jul 12 03:33:30 <redacted> object-replicator STDOUT: ERROR:root:Error hashing suffix#012Traceback (most recent call last):#012 File
+   Jul 12 03:33:30 192.168.245.4 object-replicator STDOUT: ERROR:root:Error hashing suffix#012Traceback (most recent call last):#012 File
   "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 199, in get_hashes#012 hashes[suffix] = hash_suffix(suffix_dir,
   reclaim_age)#012 File "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 84, in hash_suffix#012 path_contents =
   sorted(os.listdir(path))#012OSError: [Errno 121] Remote I/O error: '/srv/node/disk4/objects/1643763/b51'
@ -996,7 +990,7 @@ to repair the problem filesystem.
      # sudo xfs_repair -P /dev/sde1
 #. If the ``xfs_repair`` fails then it may be necessary to re-format the
-   filesystem. See Procedure: fix broken XFS filesystem. If the
+   filesystem. See :ref:`fix_broken_xfs_filesystem`. If the
   ``xfs_repair`` is successful, re-enable chef using the following command
   and replication should commence again.
@ -1025,7 +1019,183 @@ load:
   $ uptime
   07:44:02 up 18:22,  1 user,  load average: 407.12, 406.36, 404.59
-.. toctree::
+Further issues and resolutions
-   :maxdepth: 2
+------------------------------
 .. note::
   The urgency levels in each **Action** column indicates whether or
   not it is required to take immediate action, or if the problem can be worked
   on during business hours.
 .. list-table::
   :widths: 33 33 33
   :header-rows: 1
   * - **Scenario**
     - **Description**
     - **Action**
   * - ``/healthcheck`` latency is high.
     - The ``/healthcheck`` test does not tax the proxy very much so any drop in value is probably related to
       network issues, rather than the proxies being very busy. A very slow proxy might impact the average
       number, but it would need to be very slow to shift the number that much.
     - Check networks. Do a ``curl https://<ip-address>:<port>/healthcheck`` where
       ``ip-address`` is individual proxy IP address.
       Repeat this for every proxy server to see if you can pin point the problem.
       Urgency: If there are other indications that your system is slow, you should treat
       this as an urgent problem.
   * - Swift process is not running.
     - You can use ``swift-init`` status to check if swift processes are running on any
       given server.
     - Run this command:
       .. code::
          sudo swift-init all start
       Examine messages in the swift log files to see if there are any
       error messages related to any of the swift processes since the time you
       ran the ``swift-init`` command.
       Take any corrective actions that seem necessary.
       Urgency: If this only affects one server, and you have more than one,
       identifying and fixing the problem can wait until business hours.
       If this same problem affects many servers, then you need to take corrective
       action immediately.
   * - ntpd is not running.
     - NTP is not running.
     - Configure and start NTP.
       Urgency: For proxy servers, this is vital.
   * - Host clock is not syncd to an NTP server.
     - Node time settings does not match NTP server time.
       This may take some time to sync after a reboot.
     - Assuming NTP is configured and running, you have to wait until the times sync.
   * - A swift process has hundreds, to thousands of open file descriptors.
     - May happen to any of the swift processes.
       Known to have happened with a ``rsyslod`` restart and where ``/tmp`` was hanging.
     - Restart the swift processes on the affected node:
       .. code::
          % sudo swift-init all reload
       Urgency:
                If known performance problem: Immediate
                If system seems fine: Medium
   * - A swift process is not owned by the swift user.
     - If the UID of the swift user has changed, then the processes might not be
       owned by that UID.
     - Urgency: If this only affects one server, and you have more than one,
       identifying and fixing the problem can wait until business hours.
       If this same problem affects many servers, then you need to take corrective
       action immediately.
   * - Object account or container files not owned by swift.
     - This typically happens if during a reinstall or a re-image of a server that the UID
       of the swift user was changed. The data files in the object account and container
       directories are owned by the original swift UID. As a result, the current swift
       user does not own these files.
     - Correct the UID of the swift user to reflect that of the original UID. An alternate
       action is to change the ownership of every file on all file systems. This alternate
       action is often impractical and will take considerable time.
       Urgency: If this only affects one server, and you have more than one,
       identifying and fixing the problem can wait until business hours.
       If this same problem affects many servers, then you need to take corrective
       action immediately.
   * - A disk drive has a high IO wait or service time.
     - If high wait IO times are seen for a single disk, then the disk drive is the problem.
       If most/all devices are slow, the controller is probably the source of the problem.
       The controller cache may also be miss configured – which will cause similar long
       wait or service times.
     - As a first step, if your controllers have a cache, check that it is enabled and their battery/capacitor
       is working.
       Second, reboot the server.
       If problem persists, file a DC ticket to have the drive or controller replaced.
       See :ref:`diagnose_slow_disk_drives` on how to check the drive wait or service times.
       Urgency: Medium
   * - The network interface is not up.
     - Use the ``ifconfig`` and ``ethtool`` commands to determine the network state.
     - You can try restarting the interface. However, generally the interface
       (or cable) is probably broken, especially if the interface is flapping.
       Urgency: If this only affects one server, and you have more than one,
       identifying and fixing the problem can wait until business hours.
       If this same problem affects many servers, then you need to take corrective
       action immediately.
   * - Network interface card (NIC) is not operating at the expected speed.
     - The NIC is running at a slower speed than its nominal rated speed.
       For example, it is running at 100 Mb/s and the NIC is a 1Ge NIC.
     - 1. Try resetting the interface with:
       .. code::
          sudo ethtool -s eth0 speed 1000
       ... and then run:
       .. code::
          sudo lshw -class
       See if size goes to the expected speed. Failing
       that, check hardware (NIC cable/switch port).
       2. If persistent, consider shutting down the server (especially if a proxy)
          until the problem is identified and resolved. If you leave this server
          running it can have a large impact on overall performance.
       Urgency: High
   * - The interface RX/TX error count is non-zero.
     - A value of 0 is typical, but counts of 1 or 2 do not indicate a problem.
     - 1. For low numbers (For example, 1 or 2), you can simply ignore. Numbers in the range
          3-30 probably indicate that the error count has crept up slowly over a long time.
          Consider rebooting the server to remove the report from the noise.
          Typically, when a cable or interface is bad, the error count goes to 400+. For example,
          it stands out. There may be other symptoms such as the interface going up and down or
          not running at correct speed. A server with a high error count should be watched.
       2. If the error count continues to climb, consider taking the server down until
          it can be properly investigated. In any case, a reboot should be done to clear
          the error count.
       Urgency: High, if the error count increasing.
   * - In a swift log you see a message that a process has not replicated in over 24 hours.
     - The replicator has not successfully completed a run in the last 24 hours.
       This indicates that the replicator has probably hung.
     - Use ``swift-init`` to stop and then restart the replicator process.
       Urgency: Low. However if you
       recently added or replaced disk drives then you should treat this urgently.
   * - Container Updater has not run in 4 hour(s).
     - The service may appear to be running however, it may be hung. Examine their swift
       logs to see if there are any error messages relating to the container updater. This
       may potentially explain why the container is not running.
     - Urgency: Medium
       This may have been triggered by a recent restart of the  rsyslog daemon.
       Restart the service with:
       .. code::
          sudo swift-init <service> reload
   * - Object replicator: Reports the remaining time and that time is more than 100 hours.
     - Each replication cycle the object replicator writes a log message to its log
       reporting statistics about the current cycle. This includes an estimate for the
       remaining time needed to replicate all objects. If this time is longer than
       100 hours, there is a problem with the replication process.
     - Urgency: Medium
       Restart the service with:
       .. code::
          sudo swift-init object-replicator reload
       Check that the remaining replication time is going down.
   sec-furtherdiagnose.rst
--- a/doc/source/ops_runbook/general.rst
+++ b/doc/source/ops_runbook/general.rst
@ -1,36 +0,0 @@
 ==================
 General Procedures
 ==================
 Getting a swift account stats
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 .. note::
   ``swift-direct`` is specific to the HPE Helion Public Cloud. Go look at
   ``swifty`` for an alternate, this is an example.
 This procedure describes how you determine the swift usage for a given
 swift account, that is the number of containers, number of objects and
 total bytes used. To do this you will need the project ID.
 Log onto one of the swift proxy servers.
 Use swift-direct to show this accounts usage:
 .. code::
   $ sudo -u swift /opt/hp/swift/bin/swift-direct show AUTH_redacted-9a11-45f8-aa1c-9e7b1c7904c8
   Status: 200
         Content-Length: 0
         Accept-Ranges: bytes
         X-Timestamp: 1379698586.88364
         X-Account-Bytes-Used: 67440225625994
         X-Account-Container-Count: 1
         Content-Type: text/plain; charset=utf-8
         X-Account-Object-Count: 8436776
         Status: 200
         name: my_container  count: 8436776  bytes: 67440225625994
 This account has 1 container. That container has 8436776 objects. The
 total bytes used is 67440225625994.
--- a/doc/source/ops_runbook/index.rst
+++ b/doc/source/ops_runbook/index.rst
@ -13,67 +13,15 @@ information, suggestions or recommendations. This document are provided
 for reference only. We are not responsible for your use of any
 information, suggestions or recommendations contained herein.
 This document also contains references to certain tools that we use to
 operate the Swift system within the HPE Helion Public Cloud.
 Descriptions of these tools are provided for reference only, as the tools themselves
 are not publically available at this time.
 -  ``swift-direct``: This is similar to the ``swiftly`` tool.
 .. toctree::
   :maxdepth: 2
   general.rst
   diagnose.rst
   procedures.rst
   maintenance.rst
   troubleshooting.rst
 Is the system up?
 ~~~~~~~~~~~~~~~~~
 If you have a report that Swift is down, perform the following basic checks:
 #. Run swift functional tests.
 #. From a server in your data center, use ``curl`` to check ``/healthcheck``.
 #. If you have a monitoring system, check your monitoring system.
 #. Check on your hardware load balancers infrastructure.
 #. Run swift-recon on a proxy node.
 Run swift function tests
 ------------------------
 We would recommend that you set up your function tests against your production
 system.
 A script for running the function tests is located in ``swift/.functests``.
 External monitoring
 -------------------
 -  We use pingdom.com to monitor the external Swift API. We suggest the
   following:
   -  Do a GET on ``/healthcheck``
   -  Create a container, make it public (x-container-read:
      .r\*,.rlistings), create a small file in the container; do a GET
      on the object
 Reference information
 ~~~~~~~~~~~~~~~~~~~~~
 Reference: Swift startup/shutdown
 ---------------------------------
 -  Use reload - not stop/start/restart.
 -  Try to roll sets of servers (especially proxy) in groups of less
   than 20% of your servers.
--- a/doc/source/ops_runbook/maintenance.rst
+++ b/doc/source/ops_runbook/maintenance.rst
@ -54,8 +54,8 @@ system. Rules-of-thumb for 'good' recon output are:
   .. code::
-      \-> [http://<redacted>.29:6000/recon/load:] <urlopen error [Errno 111] ECONNREFUSED>
+      -> [http://<redacted>.29:6000/recon/load:] <urlopen error [Errno 111] ECONNREFUSED>
-      \-> [http://<redacted>.31:6000/recon/load:] <urlopen error timed out>
+      -> [http://<redacted>.31:6000/recon/load:] <urlopen error timed out>
 -  That could be okay or could require investigation.
@ -154,18 +154,18 @@ Running reccon shows some async pendings:
 .. code::
-   bob@notso:~/swift-1.4.4/swift$ ssh \\-q <redacted>.132.7 sudo swift-recon \\-alr
+   bob@notso:~/swift-1.4.4/swift$ ssh -q <redacted>.132.7 sudo swift-recon -alr
   ===============================================================================
-   \[2012-03-14 17:25:55\\] Checking async pendings on 384 hosts...
+   [2012-03-14 17:25:55] Checking async pendings on 384 hosts...
   Async stats: low: 0, high: 23, avg: 8, total: 3356
   ===============================================================================
-   \[2012-03-14 17:25:55\\] Checking replication times on 384 hosts...
+   [2012-03-14 17:25:55] Checking replication times on 384 hosts...
-   \[Replication Times\\] shortest: 1.49303831657, longest: 39.6982825994, avg: 4.2418222066
+   [Replication Times] shortest: 1.49303831657, longest: 39.6982825994, avg: 4.2418222066
   ===============================================================================
-   \[2012-03-14 17:25:56\\] Checking load avg's on 384 hosts...
+   [2012-03-14 17:25:56] Checking load avg's on 384 hosts...
-   \[5m load average\\] lowest: 2.35, highest: 8.88, avg: 4.45911458333
+   [5m load average] lowest: 2.35, highest: 8.88, avg: 4.45911458333
-   \[15m load average\\] lowest: 2.41, highest: 9.11, avg: 4.504765625
+   [15m load average] lowest: 2.41, highest: 9.11, avg: 4.504765625
-   \[1m load average\\] lowest: 1.95, highest: 8.56, avg: 4.40588541667
+   [1m load average] lowest: 1.95, highest: 8.56, avg: 4.40588541667
    ===============================================================================
 Why? Running recon again with -av swift (not shown here) tells us that
@ -231,7 +231,7 @@ Procedure
   This procedure should be run three times, each time specifying the
   appropriate ``*.builder`` file.
-#. Determine whether all three nodes are different Swift zones by
+#. Determine whether all three nodes are in different Swift zones by
   running the ring builder on a proxy node to determine which zones
   the storage nodes are in. For example:
@ -253,10 +253,10 @@ Procedure
   have any ring partitions in common; there is little/no data
   availability risk if all three nodes are down.
-#. If the nodes are in three distinct Swift zonesit is necessary to
+#. If the nodes are in three distinct Swift zones it is necessary to
   whether the nodes have ring partitions in common. Run ``swift-ring``
   builder again, this time with the ``list_parts`` option and specify
-   the nodes under consideration. For example (all on one line):
+   the nodes under consideration. For example:
   .. code::
@ -302,12 +302,12 @@ Procedure
   .. code::
-      % sudo swift-ring-builder /etc/swift/object.builder list_parts <redacted>.8 <redacted>.15 <redacted>.72.2 | grep “3$” - wc \\-l
+      % sudo swift-ring-builder /etc/swift/object.builder list_parts <redacted>.8 <redacted>.15 <redacted>.72.2 | grep "3$" | wc -l
      30
 #. In this case the nodes have 30 out of a total of 2097152 partitions
-   in common; about 0.001%. In this case the risk is small nonzero.
+   in common; about 0.001%. In this case the risk is small/nonzero.
   Recall that a partition is simply a portion of the ring mapping
   space, not actual data. So having partitions in common is a necessary
   but not sufficient condition for data unavailability.
@ -320,3 +320,11 @@ Procedure
      If three nodes that have 3 partitions in common are all down, there is
      a nonzero probability that data are unavailable and we should work to
      bring some or all of the nodes up ASAP.
 Swift startup/shutdown
 ~~~~~~~~~~~~~~~~~~~~~~
 -  Use reload - not stop/start/restart.
 -  Try to roll sets of servers (especially proxy) in groups of less
   than 20% of your servers.
--- a/doc/source/ops_runbook/procedures.rst
+++ b/doc/source/ops_runbook/procedures.rst
@ -2,6 +2,8 @@
 Software configuration procedures
 =================================
 .. _fix_broken_gpt_table:
 Fix broken GPT table (broken disk partition)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -102,6 +104,8 @@ Fix broken GPT table (broken disk partition)
      $ sudo aptitude remove gdisk
 .. _fix_broken_xfs_filesystem:
 Procedure: Fix broken XFS filesystem
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -165,7 +169,7 @@ Procedure: Fix broken XFS filesystem
   .. code::
-      $ sudo dd if=/dev/zero of=/dev/sdb2 bs=$((1024\*1024)) count=1
+      $ sudo dd if=/dev/zero of=/dev/sdb2 bs=$((1024*1024)) count=1
      1+0 records in
      1+0 records out
      1048576 bytes (1.0 MB) copied, 0.00480617 s, 218 MB/s
@ -187,129 +191,173 @@ Procedure: Fix broken XFS filesystem
      $ mount
 .. _checking_if_account_ok:
 Procedure: Checking if an account is okay
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 .. note::
   ``swift-direct`` is only available in the HPE Helion Public Cloud.
-   Use ``swiftly`` as an alternate.
+   Use ``swiftly`` as an alternate (or use ``swift-get-nodes`` as explained
   here).
-If you have a tenant ID you can check the account is okay as follows from a proxy.
+You must know the tenant/project ID. You can check if the account is okay as follows from a proxy.
 .. code::
-   $ sudo -u swift  /opt/hp/swift/bin/swift-direct show <Api-Auth-Hash-or-TenantId>
+   $ sudo -u swift  /opt/hp/swift/bin/swift-direct show AUTH_<project-id>
 The response will either be similar to a swift list of the account
 containers, or an error indicating that the resource could not be found.
-In the latter case you can establish if a backend database exists for
+Alternatively, you can use ``swift-get-nodes`` to find the account database
-the tenantId by running the following on a proxy:
+files. Run the following on a proxy:
 .. code::
-   $ sudo -u swift  swift-get-nodes /etc/swift/account.ring.gz  <Api-Auth-Hash-or-TenantId>
+   $ sudo swift-get-nodes /etc/swift/account.ring.gz  AUTH_<project-id>
-The response will list ssh commands that will list the replicated
+The response will print curl/ssh commands that will list the replicated
-account databases, if they exist.
+account databases. Use the indicated ``curl`` or ``ssh`` commands to check
 the status and existence of the account.
 Procedure: Getting  swift account stats
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 .. note::
   ``swift-direct`` is specific to the HPE Helion Public Cloud. Go look at
   ``swifty`` for an alternate or use ``swift-get-nodes`` as explained
   in :ref:`checking_if_account_ok`.
 This procedure describes how you determine the swift usage for a given
 swift account, that is the number of containers, number of objects and
 total bytes used. To do this you will need the project ID.
 Log onto one of the swift proxy servers.
 Use swift-direct to show this accounts usage:
 .. code::
   $ sudo -u swift /opt/hp/swift/bin/swift-direct show AUTH_<project-id>
   Status: 200
         Content-Length: 0
         Accept-Ranges: bytes
         X-Timestamp: 1379698586.88364
         X-Account-Bytes-Used: 67440225625994
         X-Account-Container-Count: 1
         Content-Type: text/plain; charset=utf-8
         X-Account-Object-Count: 8436776
         Status: 200
         name: my_container  count: 8436776  bytes: 67440225625994
 This account has 1 container. That container has 8436776 objects. The
 total bytes used is 67440225625994.
 Procedure: Revive a deleted account
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Swift accounts are normally not recreated. If a tenant unsubscribes from
+Swift accounts are normally not recreated. If a tenant/project is deleted,
-Swift, the account is deleted. To re-subscribe to Swift, you can create
+the account can then be deleted. If the user wishes to use Swift again,
-a new tenant (new tenant ID), and subscribe to Swift. This creates a
+the normal process is to create a new tenant/project -- and hence a
-new Swift account with the new tenant ID.
+new Swift account.
-However, until the unsubscribe/new tenant process is supported, you may
+However, if the Swift account is deleted, but the tenant/project is not
-hit a situation where a Swift account is deleted and the user is locked
+deleted from Keystone, the user can no longer access the account. This
-out of Swift.
+is because the account is marked deleted in Swift. You can revive
 the account as described in this process.
-Deleting the account database files
+.. note::
 -----------------------------------
-Here is one possible solution. The containers and objects may be lost
+    The containers and objects in the "old" account cannot be listed
-forever. The solution is to delete the account database files and
+    anymore. In addition, if the Account Reaper process has not
-re-create the account. This may only be done once the containers and
+    finished reaping the containers and objects in the "old" account, these
-objects are completely deleted. This process is untested, but could
+    are effectively orphaned and it is virtually impossible to find and delete
-work as follows:
+    them to free up disk space.
-#. Use swift-get-nodes to locate the account's database file (on three
+The solution is to delete the account database files and
-   servers).
+re-create the account as follows:
-#. Rename the database files (on three servers).
+#. You must know the tenant/project ID. The account name is AUTH_<project-id>.
   In this example, the tenant/project is is ``4ebe3039674d4864a11fe0864ae4d905``
   so the Swift account name is ``AUTH_4ebe3039674d4864a11fe0864ae4d905``.
-#. Use ``swiftly`` to create the account (use original name).
+#. Use ``swift-get-nodes`` to locate the account's database files (on three
-
+   servers). The output has been truncated so we can focus on the import pieces
-Renaming account database so it can be revived
+   of data:
 ----------------------------------------------
 Get the locations of the database files that hold the account data.
   .. code::
-      sudo swift-get-nodes /etc/swift/account.ring.gz AUTH_redacted-1856-44ae-97db-31242f7ad7a1
+       $ sudo swift-get-nodes /etc/swift/account.ring.gz AUTH_4ebe3039674d4864a11fe0864ae4d905
       ...
       curl -I -XHEAD "http://192.168.245.5:6002/disk1/3934/AUTH_4ebe3039674d4864a11fe0864ae4d905"
       curl -I -XHEAD "http://192.168.245.3:6002/disk0/3934/AUTH_4ebe3039674d4864a11fe0864ae4d905"
       curl -I -XHEAD "http://192.168.245.4:6002/disk1/3934/AUTH_4ebe3039674d4864a11fe0864ae4d905"
       ...
       Use your own device location of servers:
       such as "export DEVICE=/srv/node"
       ssh 192.168.245.5 "ls -lah ${DEVICE:-/srv/node*}/disk1/accounts/3934/052/f5ecf8b40de3e1b0adb0dbe576874052"
       ssh 192.168.245.3 "ls -lah ${DEVICE:-/srv/node*}/disk0/accounts/3934/052/f5ecf8b40de3e1b0adb0dbe576874052"
       ssh 192.168.245.4 "ls -lah ${DEVICE:-/srv/node*}/disk1/accounts/3934/052/f5ecf8b40de3e1b0adb0dbe576874052"
       ...
       note: `/srv/node*` is used as default value of `devices`, the real value is set in the config file on each storage node.
      Account  AUTH_redacted-1856-44ae-97db-31242f7ad7a1
      Container None
-      Object    None
+#. Before proceeding check that the account is really deleted by using curl. Execute the
   commands printed by ``swift-get-nodes``. For example:
-      Partition 18914
+   .. code::
-      Hash        93c41ef56dd69173a9524193ab813e78
+       $ curl -I -XHEAD "http://192.168.245.5:6002/disk1/3934/AUTH_4ebe3039674d4864a11fe0864ae4d905"
       HTTP/1.1 404 Not Found
       Content-Length: 0
       Content-Type: text/html; charset=utf-8
-      Server:Port Device 15.184.9.126:6002 disk7
+   Repeat for the other two servers (192.168.245.3 and 192.168.245.4).
-      Server:Port Device 15.184.9.94:6002 disk11
+   A ``404 Not Found`` indicates that the account is deleted (or never existed).
      Server:Port Device 15.184.9.103:6002 disk10
      Server:Port Device 15.184.9.80:6002 disk2  [Handoff]
      Server:Port Device 15.184.9.120:6002 disk2  [Handoff]
      Server:Port Device 15.184.9.98:6002 disk2  [Handoff]
-      curl -I -XHEAD "`*http://15.184.9.126:6002/disk7/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.126:6002/disk7/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_
+   If you get a ``204 No Content`` response, do **not** proceed.
      curl -I -XHEAD "`*http://15.184.9.94:6002/disk11/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.94:6002/disk11/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_
-      curl -I -XHEAD "`*http://15.184.9.103:6002/disk10/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.103:6002/disk10/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_
+#. Use the ssh commands printed by ``swift-get-nodes`` to check if database
   files exist. For example:
-      curl -I -XHEAD "`*http://15.184.9.80:6002/disk2/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.80:6002/disk2/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ # [Handoff]
+   .. code::
      curl -I -XHEAD "`*http://15.184.9.120:6002/disk2/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.120:6002/disk2/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ # [Handoff]
      curl -I -XHEAD "`*http://15.184.9.98:6002/disk2/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.98:6002/disk2/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ # [Handoff]
-      ssh 15.184.9.126 "ls -lah /srv/node/disk7/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/"
+       $  ssh 192.168.245.5 "ls -lah ${DEVICE:-/srv/node*}/disk1/accounts/3934/052/f5ecf8b40de3e1b0adb0dbe576874052"
-      ssh 15.184.9.94 "ls -lah /srv/node/disk11/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/"
+       total 20K
-      ssh 15.184.9.103 "ls -lah /srv/node/disk10/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/"
+       drwxr-xr-x 2 swift swift 110 Mar  9 10:22 .
-      ssh 15.184.9.80 "ls -lah /srv/node/disk2/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/" # [Handoff]
+       drwxr-xr-x 3 swift swift  45 Mar  9 10:18 ..
-      ssh 15.184.9.120 "ls -lah /srv/node/disk2/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/" # [Handoff]
+       -rw------- 1 swift swift 17K Mar  9 10:22 f5ecf8b40de3e1b0adb0dbe576874052.db
-      ssh 15.184.9.98 "ls -lah /srv/node/disk2/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/" # [Handoff]
+       -rw-r--r-- 1 swift swift   0 Mar  9 10:22 f5ecf8b40de3e1b0adb0dbe576874052.db.pending
       -rwxr-xr-x 1 swift swift   0 Mar  9 10:18 .lock
-      $ sudo swift-get-nodes /etc/swift/account.ring.gz AUTH\_redacted-1856-44ae-97db-31242f7ad7a1Account  AUTH_redacted-1856-44ae-97db-
+   Repeat for the other two servers (192.168.245.3 and 192.168.245.4).
      31242f7ad7a1Container  NoneObject      NonePartition   18914Hash           93c41ef56dd69173a9524193ab813e78Server:Port Device  15.184.9.126:6002 disk7Server:Port Device   15.184.9.94:6002 disk11Server:Port Device   15.184.9.103:6002 disk10Server:Port Device  15.184.9.80:6002
      disk2   [Handoff]Server:Port Device    15.184.9.120:6002 disk2  [Handoff]Server:Port Device    15.184.9.98:6002 disk2   [Handoff]curl -I -XHEAD
      "`*http://15.184.9.126:6002/disk7/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"*<http://15.184.9.126:6002/disk7/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ curl -I -XHEAD
-      "`*http://15.184.9.94:6002/disk11/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.94:6002/disk11/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ curl -I -XHEAD
+   If no files exist, no further action is needed.
-      "`*http://15.184.9.103:6002/disk10/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.103:6002/disk10/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ curl -I -XHEAD
+#. Stop Swift processes on all nodes listed by ``swift-get-nodes``
   (In this example, that is 192.168.245.3, 192.168.245.4 and 192.168.245.5).
-      "`*http://15.184.9.80:6002/disk2/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.80:6002/disk2/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ # [Handoff]curl -I -XHEAD
+#. We recommend you make backup copies of the database files.
-      "`*http://15.184.9.120:6002/disk2/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.120:6002/disk2/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ # [Handoff]curl -I -XHEAD
+#. Delete the database files. For example:
-      "`*http://15.184.9.98:6002/disk2/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.98:6002/disk2/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ # [Handoff]ssh 15.184.9.126
+   .. code::
-      "ls -lah /srv/node/disk7/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/"ssh 15.184.9.94 "ls -lah /srv/node/disk11/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/"ssh 15.184.9.103
+       $ ssh 192.168.245.5
-      "ls -lah /srv/node/disk10/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/"ssh 15.184.9.80 "ls -lah /srv/node/disk2/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/" # [Handoff]ssh 15.184.9.120
+       $ cd /srv/node/disk1/accounts/3934/052/f5ecf8b40de3e1b0adb0dbe576874052
-      "ls -lah /srv/node/disk2/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/" # [Handoff]ssh 15.184.9.98 "ls -lah /srv/node/disk2/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/" # [Handoff]
+       $ sudo rm *
-Check that the handoff nodes do not have account databases:
+   Repeat for the other two servers (192.168.245.3 and 192.168.245.4).
-.. code::
+#. Restart Swift on all three servers
-   $ ssh 15.184.9.80 "ls -lah /srv/node/disk2/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/"
+At this stage, the account is fully deleted. If you enable the auto-create option, the
-   ls: cannot access /srv/node/disk2/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/: No such file or directory
+next time the user attempts to access the account, the account will be created.
 You may also use swiftly to recreate the account.
 If the handoff node has a database, wait for rebalancing to occur.
 Procedure: Temporarily stop load balancers from directing traffic to a proxy server
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -319,7 +367,7 @@ follows. This can be useful when a proxy is misbehaving but you need
 Swift running to help diagnose the problem. By removing from the load
 balancers, customer's are not impacted by the misbehaving proxy.
-#. Ensure that in proxyserver.com the ``disable_path`` variable is set to
+#. Ensure that in /etc/swift/proxy-server.conf the ``disable_path`` variable is set to
   ``/etc/swift/disabled-by-file``.
 #. Log onto the proxy node.
@ -346,13 +394,10 @@ balancers, customer's are not impacted by the misbehaving proxy.
      sudo swift-init proxy start
-It works because the healthcheck middleware looks for this file. If it
+It works because the healthcheck middleware looks for /etc/swift/disabled-by-file.
-find it, it will return 503 error instead of 200/OK. This means the load balancer
+If it exists, the middleware will return 503/error instead of 200/OK. This means the load balancer
 should stop sending traffic to the proxy.
 ``/healthcheck`` will report
 ``FAIL: disabled by file`` if the ``disabled-by-file`` file exists.
 Procedure: Ad-Hoc disk performance test
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--- a/doc/source/ops_runbook/sec-furtherdiagnose.rst
+++ b/doc/source/ops_runbook/sec-furtherdiagnose.rst
@ -1,177 +0,0 @@
 ==============================
 Further issues and resolutions
 ==============================
 .. note::
   The urgency levels in each **Action** column indicates whether or
   not it is required to take immediate action, or if the problem can be worked
   on during business hours.
 .. list-table::
   :widths: 33 33 33
   :header-rows: 1
   * - **Scenario**
     - **Description**
     - **Action**
   * - ``/healthcheck`` latency is high.
     - The ``/healthcheck`` test does not tax the proxy very much so any drop in value is probably related to
       network issues, rather than the proxies being very busy. A very slow proxy might impact the average
       number, but it would need to be very slow to shift the number that much.
     - Check networks. Do a ``curl https://<ip-address>/healthcheck where ip-address`` is individual proxy
       IP address to see if you can pin point a problem in the network.
       Urgency: If there are other indications that your system is slow, you should treat
       this as an urgent problem.
   * - Swift process is not running.
     - You can use ``swift-init`` status to check if swift processes are running on any
       given server.
     - Run this command:
       .. code::
          sudo swift-init all start
       Examine messages in the swift log files to see if there are any
       error messages related to any of the swift processes since the time you
       ran the ``swift-init`` command.
       Take any corrective actions that seem necessary.
       Urgency: If this only affects one server, and you have more than one,
       identifying and fixing the problem can wait until business hours.
       If this same problem affects many servers, then you need to take corrective
       action immediately.
   * - ntpd is not running.
     - NTP is not running.
     - Configure and start NTP.
       Urgency: For proxy servers, this is vital.
   * - Host clock is not syncd to an NTP server.
     - Node time settings does not match NTP server time.
       This may take some time to sync after a reboot.
     - Assuming NTP is configured and running, you have to wait until the times sync.
   * - A swift process has hundreds, to thousands of open file descriptors.
     - May happen to any of the swift processes.
       Known to have happened with a ``rsyslod restart`` and where ``/tmp`` was hanging.
     - Restart the swift processes on the affected node:
       .. code::
          % sudo swift-init all reload
       Urgency:
                If known performance problem: Immediate
                If system seems fine: Medium
   * - A swift process is not owned by the swift user.
     - If the UID of the swift user has changed, then the processes might not be
       owned by that UID.
     - Urgency: If this only affects one server, and you have more than one,
       identifying and fixing the problem can wait until business hours.
       If this same problem affects many servers, then you need to take corrective
       action immediately.
   * - Object account or container files not owned by swift.
     - This typically happens if during a reinstall or a re-image of a server that the UID
       of the swift user was changed. The data files in the object account and container
       directories are owned by the original swift UID. As a result, the current swift
       user does not own these files.
     - Correct the UID of the swift user to reflect that of the original UID. An alternate
       action is to change the ownership of every file on all file systems. This alternate
       action is often impractical and will take considerable time.
       Urgency: If this only affects one server, and you have more than one,
       identifying and fixing the problem can wait until business hours.
       If this same problem affects many servers, then you need to take corrective
       action immediately.
   * - A disk drive has a high IO wait or service time.
     - If high wait IO times are seen for a single disk, then the disk drive is the problem.
       If most/all devices are slow, the controller is probably the source of the problem.
       The controller cache may also be miss configured – which will cause similar long
       wait or service times.
     - As a first step, if your controllers have a cache, check that it is enabled and their battery/capacitor
       is working.
       Second, reboot the server.
       If problem persists, file a DC ticket to have the drive or controller replaced.
       See `Diagnose: Slow disk devices` on how to check the drive wait or service times.
       Urgency: Medium
   * - The network interface is not up.
     - Use the ``ifconfig`` and ``ethtool`` commands to determine the network state.
     - You can try restarting the interface. However, generally the interface
       (or cable) is probably broken, especially if the interface is flapping.
       Urgency: If this only affects one server, and you have more than one,
       identifying and fixing the problem can wait until business hours.
       If this same problem affects many servers, then you need to take corrective
       action immediately.
   * - Network interface card (NIC) is not operating at the expected speed.
     - The NIC is running at a slower speed than its nominal rated speed.
       For example, it is running at 100 Mb/s and the NIC is a 1Ge NIC.
     - 1. Try resetting the interface with:
       .. code::
          sudo ethtool -s eth0 speed 1000
       ... and then run:
       .. code::
          sudo lshw -class
       See if size goes to the expected speed. Failing
       that, check hardware (NIC cable/switch port).
       2. If persistent, consider shutting down the server (especially if a proxy)
          until the problem is identified and resolved. If you leave this server
          running it can have a large impact on overall performance.
       Urgency: High
   * - The interface RX/TX error count is non-zero.
     - A value of 0 is typical, but counts of 1 or 2 do not indicate a problem.
     - 1. For low numbers (For example, 1 or 2), you can simply ignore. Numbers in the range
          3-30 probably indicate that the error count has crept up slowly over a long time.
          Consider rebooting the server to remove the report from the noise.
          Typically, when a cable or interface is bad, the error count goes to 400+. For example,
          it stands out. There may be other symptoms such as the interface going up and down or
          not running at correct speed. A server with a high error count should be watched.
       2. If the error count continue to climb, consider taking the server down until
          it can be properly investigated. In any case, a reboot should be done to clear
          the error count.
       Urgency: High, if the error count increasing.
   * - In a swift log you see a message that a process has not replicated in over 24 hours.
     - The replicator has not successfully completed a run in the last 24 hours.
       This indicates that the replicator has probably hung.
     - Use ``swift-init`` to stop and then restart the replicator process.
       Urgency: Low (high if recent adding or replacement of disk drives), however if you
       recently added or replaced disk drives then you should treat this urgently.
   * - Container Updater has not run in 4 hour(s).
     - The service may appear to be running however, it may be hung. Examine their swift
       logs to see if there are any error messages relating to the container updater. This
       may potentially explain why the container is not running.
     - Urgency: Medium
       This may have been triggered by a recent restart of the  rsyslog daemon.
       Restart the service with:
       .. code::
          sudo swift-init <service> reload
   * - Object replicator: Reports the remaining time and that time is more than 100 hours.
     - Each replication cycle the object replicator writes a log message to its log
       reporting statistics about the current cycle. This includes an estimate for the
       remaining time needed to replicate all objects. If this time is longer than
       100 hours, there is a problem with the replication process.
     - Urgency: Medium
       Restart the service with:
       .. code::
          sudo swift-init object-replicator reload
       Check that the remaining replication time is going down.
--- a/doc/source/ops_runbook/troubleshooting.rst
+++ b/doc/source/ops_runbook/troubleshooting.rst
@ -18,16 +18,14 @@ files. For example:
 .. code::
-   $ PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -l <yourusername> -R ssh
+   $ PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -l <yourusername> -R ssh \
-
+     -w <redacted>.68.[4-11,132-139 4-11,132-139],<redacted>.132.[4-11,132-139] \
-    -w <redacted>.68.[4-11,132-139 4-11,132-139],<redacted>.132.[4-11,132-139
+     'sudo bzgrep -w AUTH_redacted-4962-4692-98fb-52ddda82a5af /var/log/swift/proxy.log*' |  dshbak -c
    4-11,132-139] 'sudo bzgrep -w AUTH_redacted-4962-4692-98fb-52ddda82a5af /var/log/swift/proxy.log\*' 
    dshbak -c
    .
    .
-    \---------------\-
+    ----------------
    <redacted>.132.6
-    \---------------\-
+    ----------------
    Feb 29 08:51:57 sw-aw2az2-proxy011 proxy-server <redacted>.16.132
    <redacted>.66.8 29/Feb/2012/08/51/57 GET /v1.0/AUTH_redacted-4962-4692-98fb-52ddda82a5af
    /%3Fformat%3Djson HTTP/1.0 404 - - <REDACTED>_4f4d50c5e4b064d88bd7ab82 - - -
@ -37,39 +35,36 @@ This shows a ``GET`` operation on the users account.
 .. note::
-   The HTTP status returned is 404, not found, rather than 500 as reported by the user.
+   The HTTP status returned is 404, Not found, rather than 500 as reported by the user.
 Using the transaction ID, ``tx429fc3be354f434ab7f9c6c4206c1dc3`` you can
 search the swift object servers log files for this transaction ID:
 .. code::
-   $ PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -l <yourusername>
+   $ PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -l <yourusername> -R ssh \
-
+     -w <redacted>.72.[4-67|4-67],<redacted>.[4-67|4-67],<redacted>.[4-67|4-67],<redacted>.204.[4-131] \
-   -R ssh
+     'sudo bzgrep tx429fc3be354f434ab7f9c6c4206c1dc3 /var/log/swift/server.log*' | dshbak -c
   -w <redacted>.72.[4-67|4-67],<redacted>.[4-67|4-67],<redacted>.[4-67|4-67],<redacted>.204.[4-131| 4-131]
   'sudo bzgrep tx429fc3be354f434ab7f9c6c4206c1dc3 /var/log/swift/server.log*'
      | dshbak -c
   .
   .
-   \---------------\-
+   ----------------
   <redacted>.72.16
-   \---------------\-
+   ----------------
   Feb 29 08:51:57 sw-aw2az1-object013 account-server <redacted>.132.6 - -
   [29/Feb/2012:08:51:57 +0000|] "GET /disk9/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
   404 - "tx429fc3be354f434ab7f9c6c4206c1dc3" "-" "-"
   0.0016 ""
-    \---------------\-
+   ----------------
   <redacted>.31
-    \---------------\-
+   ----------------
   Feb 29 08:51:57 node-az2-object060 account-server <redacted>.132.6 - -
   [29/Feb/2012:08:51:57 +0000|] "GET /disk6/198875/AUTH_redacted-4962-
   4692-98fb-52ddda82a5af" 404 - "tx429fc3be354f434ab7f9c6c4206c1dc3" "-" "-" 0.0011 ""
-    \---------------\-
+   ----------------
   <redacted>.204.70
-    \---------------\-
+   ----------------
   Feb 29 08:51:57 sw-aw2az3-object0067 account-server <redacted>.132.6 - -
   [29/Feb/2012:08:51:57 +0000|] "GET /disk6/198875/AUTH_redacted-4962-
@ -79,10 +74,10 @@ search the swift object servers log files for this transaction ID:
   The 3 GET operations to 3 different object servers that hold the 3
   replicas of this users account. Each ``GET`` returns a HTTP status of 404,
-   not found.
+   Not found.
 Next, use the ``swift-get-nodes`` command to determine exactly where the
-users account data is stored:
+user's account data is stored:
 .. code::
@ -114,23 +109,23 @@ users account data is stored:
   curl -I -XHEAD "`http://<redacted>.72.27:6002/disk11/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
   <http://15.185.72.27:6002/disk11/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_ # [Handoff]
-   ssh <redacted>.31 "ls \-lah /srv/node/disk6/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/"
+   ssh <redacted>.31 "ls -lah /srv/node/disk6/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/"
-   ssh <redacted>.204.70 "ls \-lah /srv/node/disk6/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/"
+   ssh <redacted>.204.70 "ls -lah /srv/node/disk6/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/"
-   ssh <redacted>.72.16 "ls \-lah /srv/node/disk9/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/"
+   ssh <redacted>.72.16 "ls -lah /srv/node/disk9/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/"
-   ssh <redacted>.204.64 "ls \-lah /srv/node/disk11/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/" # [Handoff]
+   ssh <redacted>.204.64 "ls -lah /srv/node/disk11/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/" # [Handoff]
-   ssh <redacted>.26 "ls \-lah /srv/node/disk11/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/" # [Handoff]
+   ssh <redacted>.26 "ls -lah /srv/node/disk11/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/" # [Handoff]
-   ssh <redacted>.72.27 "ls \-lah /srv/node/disk11/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/" # [Handoff]
+   ssh <redacted>.72.27 "ls -lah /srv/node/disk11/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/" # [Handoff]
 Check each of the primary servers, <redacted>.31, <redacted>.204.70  and <redacted>.72.16, for
 this users account. For example on <redacted>.72.16:
 .. code::
-   $ ls \\-lah /srv/node/disk9/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/
+   $ ls -lah /srv/node/disk9/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/
   total 1.0M
   drwxrwxrwx 2 swift swift 98 2012-02-23 14:49 .
   drwxrwxrwx 3 swift swift 45 2012-02-03 23:28 ..
-   -rw-\\-----\\- 1 swift swift 15K 2012-02-23 14:49 1846d99185f8a0edaf65cfbf37439696.db
+   -rw------- 1 swift swift 15K 2012-02-23 14:49 1846d99185f8a0edaf65cfbf37439696.db
   -rw-rw-rw- 1 swift swift 0 2012-02-23 14:49 1846d99185f8a0edaf65cfbf37439696.db.pending
 So this users account db, an sqlite db is present. Use sqlite to
@ -155,7 +150,7 @@ checkout the account:
   status_changed_at = 1330001026.00514
   metadata =
-.. note::
+.. note:
   The status is ``DELETED``. So this account was deleted. This explains
   why the GET operations are returning 404, not found. Check the account
@ -174,14 +169,14 @@ server logs:
 .. code::
-   $ PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -l <yourusername> -R ssh -w <redacted>.68.[4-11,132-139 4-11,132-
+   $ PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -l <yourusername> -R ssh \
-   139],<redacted>.132.[4-11,132-139|4-11,132-139] 'sudo bzgrep AUTH_redacted-4962-4692-98fb-52ddda82a5af /var/log/swift/proxy.log\* | grep -w
+     -w <redacted>.68.[4-11,132-139 4-11,132-139],<redacted>.132.[4-11,132-139|4-11,132-139] \
-   DELETE |awk "{print \\$3,\\$10,\\$12}"' |- dshbak -c
+     'sudo bzgrep AUTH_redacted-4962-4692-98fb-52ddda82a5af /var/log/swift/proxy.log* \
     | grep -w DELETE | awk "{print $3,$10,$12}"' |- dshbak -c
   .
   .
-   Feb 23 12:43:46 sw-aw2az2-proxy001 proxy-server 15.203.233.76 <redacted>.66.7 23/Feb/2012/12/43/46 DELETE /v1.0/AUTH_redacted-4962-4692-98fb-
+   Feb 23 12:43:46 sw-aw2az2-proxy001 proxy-server <redacted> <redacted>.66.7 23/Feb/2012/12/43/46 DELETE /v1.0/AUTH_redacted-4962-4692-98fb-
   52ddda82a5af/ HTTP/1.0 204 - Apache-HttpClient/4.1.2%20%28java%201.5%29 <REDACTED>_4f458ee4e4b02a869c3aad02 - - -
   tx4471188b0b87406899973d297c55ab53 - 0.0086
 From this you can see the operation that resulted in the account being deleted.
@ -252,8 +247,8 @@ Finally, use ``swift-direct`` to delete the container.
 Procedure: Decommissioning swift nodes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Should Swift nodes need to be decommissioned. For example, where they are being
+Should Swift nodes need to be decommissioned (e.g.,, where they are being
-re-purposed, it is very important to follow the following steps.
+re-purposed), it is very important to follow the following steps.
 #. In the case of object servers, follow the procedure for removing
   the node from the rings.