Merge "[User Guides] Object Storage chapter edits"

2016-03-23 04:29:55 +00:00 · 2016-03-23 04:29:55 +00:00 · 3214af2522
commit 3214af2522
parent 098ab6546b 299dc70f28
4 changed files with 106 additions and 136 deletions
--- a/doc/admin-guide-cloud/source/objectstorage-monitoring.rst
+++ b/doc/admin-guide-cloud/source/objectstorage-monitoring.rst
@ -2,8 +2,11 @@
 Object Storage monitoring
 =========================

-Excerpted from a blog post by `Darrell
-Bishop <http://swiftstack.com/blog/2012/04/11/swift-monitoring-with-statsd>`__
+.. note::
+
+   This section was excerpted from a blog post by `Darrell
+   Bishop <http://swiftstack.com/blog/2012/04/11/swift-monitoring-with-statsd>`_ and
+   has since been edited.

 An OpenStack Object Storage cluster is a collection of many daemons that
 work together across many nodes. With so many different components, you
@ -11,30 +14,22 @@ must be able to tell what is going on inside the cluster. Tracking
 server-level meters like CPU utilization, load, memory consumption, disk
 usage and utilization, and so on is necessary, but not sufficient.

-What are different daemons are doing on each server? What is the volume
-of object replication on node8? How long is it taking? Are there errors?
-If so, when did they happen?
-
-In such a complex ecosystem, you can use multiple approaches to get the
-answers to these questions. This section describes several approaches.
-
 Swift Recon
 ~~~~~~~~~~~

 The Swift Recon middleware (see
-http://swift.openstack.org/admin_guide.html#cluster-telemetry-and-monitoring)
+`Defining Storage Policies <http://swift.openstack.org/admin_guide.html#cluster-telemetry-and-monitoring>`_)
 provides general machine statistics, such as load average, socket
-statistics, ``/proc/meminfo`` contents, and so on, as well as
-Swift-specific meters:
+statistics, ``/proc/meminfo`` contents, as well as Swift-specific meters:

-  The MD5 sum of each ring file.
+-  The ``MD5`` sum of each ring file.

 -  The most recent object replication time.

 -  Count of each type of quarantined file: Account, container, or
   object.

-  Count of "async\_pendings" (deferred container updates) on disk.
+-  Count of "async_pendings" (deferred container updates) on disk.

 Swift Recon is middleware that is installed in the object servers
 pipeline and takes one required option: A local cache directory. To
@ -43,24 +38,23 @@ each object server. You access data by either sending HTTP requests
 directly to the object server or using the ``swift-recon`` command-line
 client.

-There are some good Object Storage cluster statistics but the general
+There are Object Storage cluster statistics but the typical
 server meters overlap with existing server monitoring systems. To get
 the Swift-specific meters into a monitoring system, they must be polled.
-Swift Recon essentially acts as a middleware meters collector. The
+Swift Recon acts as a middleware meters collector. The
 process that feeds meters to your statistics system, such as
-``collectd`` and ``gmond``, probably already runs on the storage node.
-So, you can choose to either talk to Swift Recon or collect the meters
+``collectd`` and ``gmond``, should already run on the storage node.
+You can choose to either talk to Swift Recon or collect the meters
 directly.

 Swift-Informant
 ~~~~~~~~~~~~~~~

-Florian Hines developed the Swift-Informant middleware (see
-https://github.com/pandemicsyn/swift-informant) to get real-time
-visibility into Object Storage client requests. It sits in the pipeline
-for the proxy server, and after each request to the proxy server, sends
-three meters to a StatsD server (see
-http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/):
+Swift-Informant middleware (see
+`swift-informant <https://github.com/pandemicsyn/swift-informant>_`) has
+real-time visibility into Object Storage client requests. It sits in the
+pipeline for the proxy server, and after each request to the proxy server it
+sends three meters to a ``StatsD`` server:

 -  A counter increment for a meter like ``obj.GET.200`` or
   ``cont.PUT.404``.
@ -77,26 +71,24 @@ http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/):
 -  A counter increase by the bytes transferred for a meter like
   ``tfer.obj.PUT.201``.

-This is good for getting a feel for the quality of service clients are
-experiencing with the timing meters, as well as getting a feel for the
-volume of the various permutations of request server type, command, and
-response code. Swift-Informant also requires no change to core Object
+This is used for receiving information on the quality of service clients
+experience with the timing meters, as well as sensing the volume of the
+various modifications of a request server type, command, and response
+code. Swift-Informant requires no change to core Object
 Storage code because it is implemented as middleware. However, it gives
-you no insight into the workings of the cluster past the proxy server.
+no insight into the workings of the cluster past the proxy server.
 If the responsiveness of one storage node degrades, you can only see
-that some of your requests are bad, either as high latency or error
-status codes. You do not know exactly why or where that request tried to
-go. Maybe the container server in question was on a good node but the
-object server was on a different, poorly-performing node.
+that some of the requests are bad, either as high latency or error
+status codes.

 Statsdlog
 ~~~~~~~~~

-Florian's `Statsdlog <https://github.com/pandemicsyn/statsdlog>`__
+The `Statsdlog <https://github.com/pandemicsyn/statsdlog>`_
 project increments StatsD counters based on logged events. Like
-Swift-Informant, it is also non-intrusive, but statsdlog can track
+Swift-Informant, it is also non-intrusive, however statsdlog can track
 events from all Object Storage daemons, not just proxy-server. The
-daemon listens to a UDP stream of syslog messages and StatsD counters
+daemon listens to a UDP stream of syslog messages, and StatsD counters
 are incremented when a log line matches a regular expression. Meter
 names are mapped to regex match patterns in a JSON file, allowing
 flexible configuration of what meters are extracted from the log stream.
@ -123,7 +115,7 @@ Swift StatsD logging

 StatsD (see
 http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/)
-was designed for application code to be deeply instrumented; meters are
+was designed for application code to be deeply instrumented. Meters are
 sent in real-time by the code that just noticed or did something. The
 overhead of sending a meter is extremely low: a ``sendto`` of one UDP
 packet. If that overhead is still too high, the StatsD client library
@ -133,10 +125,10 @@ actual number when flushing meters upstream.
 To avoid the problems inherent with middleware-based monitoring and
 after-the-fact log processing, the sending of StatsD meters is
 integrated into Object Storage itself. The submitted change set (see
-https://review.openstack.org/#change,6058) currently reports 124 meters
+`<https://review.openstack.org/#change,6058>`_) currently reports 124 meters
 across 15 Object Storage daemons and the tempauth middleware. Details of
 the meters tracked are in the `Administrator's
-Guide <http://docs.openstack.org/developer/swift/admin_guide.html>`__.
+Guide <http://docs.openstack.org/developer/swift/admin_guide.html>`_.

 The sending of meters is integrated with the logging framework. To
 enable, configure ``log_statsd_host`` in the relevant config file. You
@ -144,7 +136,7 @@ can also specify the port and a default sample rate. The specified
 default sample rate is used unless a specific call to a statsd logging
 method (see the list below) overrides it. Currently, no logging calls
 override the sample rate, but it is conceivable that some meters may
-require accuracy (sample\_rate == 1) while others may not.
+require accuracy (sample_rate == 1) while others may not.

 .. code-block:: ini

@ -184,63 +176,53 @@ in ``self.logger``, has these new methods:
   Convenience method to record a timing meter whose value is "now"
   minus an existing timestamp.

-Note that these logging methods may safely be called anywhere you have a
-logger object. If StatsD logging has not been configured, the methods
-are no-ops. This avoids messy conditional logic each place a meter is
-recorded. These example usages show the new logging methods:
+.. note::

-.. code-block:: python
+   These logging methods may safely be called anywhere you have a
+   logger object. If StatsD logging has not been configured, the methods
+   are no-ops. This avoids messy conditional logic each place a meter is
+   recorded. These example usages show the new logging methods:

-   # swift/obj/replicator.py
-   def update(self, job):
-       # ...
-       begin = time.time()
-       try:
-           hashed, local_hash = tpool.execute(tpooled_get_hashes, job['path'],
-                   do_listdir=(self.replication_count % 10) == 0,
-                   reclaim_age=self.reclaim_age)
-           # See tpooled_get_hashes "Hack".
-           if isinstance(hashed, BaseException):
-               raise hashed
-           self.suffix_hash += hashed
-           self.logger.update_stats('suffix.hashes', hashed)
+   .. code-block:: python
+
+      # swift/obj/replicator.py
+      def update(self, job):
           # ...
-       finally:
-           self.partition_times.append(time.time() - begin)
-           self.logger.timing_since('partition.update.timing', begin)
+          begin = time.time()
+          try:
+              hashed, local_hash = tpool.execute(tpooled_get_hashes, job['path'],
+                      do_listdir=(self.replication_count % 10) == 0,
+                      reclaim_age=self.reclaim_age)
+              # See tpooled_get_hashes "Hack".
+              if isinstance(hashed, BaseException):
+                  raise hashed
+              self.suffix_hash += hashed
+              self.logger.update_stats('suffix.hashes', hashed)
+              # ...
+          finally:
+              self.partition_times.append(time.time() - begin)
+              self.logger.timing_since('partition.update.timing', begin)

-.. code-block:: python
+   .. code-block:: python

-   # swift/container/updater.py
-   def process_container(self, dbfile):
-       # ...
-       start_time = time.time()
-       # ...
-           for event in events:
-               if 200 <= event.wait() < 300:
-                   successes += 1
-               else:
-                   failures += 1
-           if successes > failures:
-               self.logger.increment('successes')
-               # ...
-           else:
-               self.logger.increment('failures')
-               # ...
-           # Only track timing data for attempted updates:
-           self.logger.timing_since('timing', start_time)
-       else:
-           self.logger.increment('no_changes')
-           self.no_changes += 1
-
-The development team of StatsD wanted to use the
-`pystatsd <https://github.com/sivy/py-statsd>`__ client library (not to
-be confused with a `similar-looking
-project <https://github.com/sivy/py-statsd>`__ also hosted on GitHub),
-but the released version on PyPI was missing two desired features the
-latest version in GitHub had: the ability to configure a meters prefix
-in the client object and a convenience method for sending timing data
-between ``now`` and a ``start`` timestamp you already have. So they just
-implemented a simple StatsD client library from scratch with the same
-interface. This has the nice fringe benefit of not introducing another
-external library dependency into Object Storage.
+      # swift/container/updater.py
+      def process_container(self, dbfile):
+          # ...
+          start_time = time.time()
+          # ...
+              for event in events:
+                  if 200 <= event.wait() < 300:
+                      successes += 1
+                  else:
+                      failures += 1
+              if successes > failures:
+                self.logger.increment('successes')
+                  # ...
+              else:
+                  self.logger.increment('failures')
+                  # ...
+              # Only track timing data for attempted updates:
+              self.logger.timing_since('timing', start_time)
+          else:
+              self.logger.increment('no_changes')
+              self.no_changes += 1
--- a/doc/admin-guide-cloud/source/objectstorage_account_reaper.rst
+++ b/doc/admin-guide-cloud/source/objectstorage_account_reaper.rst
@ -2,12 +2,11 @@
 Account reaper
 ==============

-In the background, the account reaper removes data from the deleted
-accounts.
+The purpose of the account reaper is to remove data from the deleted accounts.

 A reseller marks an account for deletion by issuing a ``DELETE`` request
 on the account's storage URL. This action sets the ``status`` column of
-the account\_stat table in the account database and replicas to
+the account_stat table in the account database and replicas to
 ``DELETED``, marking the account's data for deletion.

 Typically, a specific retention time or undelete are not provided.
@ -15,8 +14,8 @@ However, you can set a ``delay_reaping`` value in the
 ``[account-reaper]`` section of the ``account-server.conf`` file to
 delay the actual deletion of data. At this time, to undelete you have to update
 the account database replicas directly, set the status column to an
-empty string and update the put\_timestamp to be greater than the
-delete\_timestamp.
+empty string and update the put_timestamp to be greater than the
+delete_timestamp.

 .. note::

--- a/doc/admin-guide-cloud/source/objectstorage_components.rst
+++ b/doc/admin-guide-cloud/source/objectstorage_components.rst
@ -2,26 +2,26 @@
 Components
 ==========

-The components that enable Object Storage to deliver high availability,
-high durability, and high concurrency are:
+Object Storage uses the following components to deliver high
+availability, high durability, and high concurrency:

-  **Proxy servers.** Handle all of the incoming API requests.
+-  **Proxy servers** - Handle all of the incoming API requests.

-  **Rings.** Map logical names of data to locations on particular
+-  **Rings** - Map logical names of data to locations on particular
   disks.

-  **Zones.** Isolate data from other zones. A failure in one zone
-   doesn't impact the rest of the cluster because data is replicated
+-  **Zones** - Isolate data from other zones. A failure in one zone
+   does not impact the rest of the cluster as data replicates
   across zones.

-  **Accounts and containers.** Each account and container are
+-  **Accounts and containers** - Each account and container are
   individual databases that are distributed across the cluster. An
   account database contains the list of containers in that account. A
   container database contains the list of objects in that container.

-  **Objects.** The data itself.
+-  **Objects** - The data itself.

-  **Partitions.** A partition stores objects, account databases, and
+-  **Partitions** - A partition stores objects, account databases, and
   container databases and helps manage locations where data lives in
   the cluster.

@ -38,7 +38,7 @@ Proxy servers

 Proxy servers are the public face of Object Storage and handle all of
 the incoming API requests. Once a proxy server receives a request, it
-determines the storage node based on the object's URL, for example,
+determines the storage node based on the object's URL, for example:
 https://swift.example.com/v1/account/container/object. Proxy servers
 also coordinate responses, handle failures, and coordinate timestamps.

@ -47,14 +47,14 @@ needed based on projected workloads. A minimum of two proxy servers
 should be deployed for redundancy. If one proxy server fails, the others
 take over.

-For more information concerning proxy server configuration, please see the
+For more information concerning proxy server configuration, see
 `Configuration Reference
-<http://docs.openstack.org/liberty/config-reference/content/proxy-server-configuration.html>`__.
+<http://docs.openstack.org/liberty/config-reference/content/proxy-server-configuration.html>`_.

 Rings
 -----

-A ring represents a mapping between the names of entities stored on disk
+A ring represents a mapping between the names of entities stored on disks
 and their physical locations. There are separate rings for accounts,
 containers, and objects. When other components need to perform any
 operation on an object, container, or account, they need to interact
@ -90,15 +90,14 @@ The ring is used by the proxy server and several background processes

 .. figure:: figures/objectstorage-ring.png

+These rings are externally managed. The server processes themselves
+do not modify the rings, they are instead given new rings modified by
+other tools.

-These rings are externally managed, in that the server processes
-themselves do not modify the rings, they are instead given new rings
-modified by other tools.
-
-The ring uses a configurable number of bits from an MD5 hash for a path
+The ring uses a configurable number of bits from an ``MD5`` hash for a path
 as a partition index that designates a device. The number of bits kept
 from the hash is known as the partition power, and 2 to the partition
-power indicates the partition count. Partitioning the full MD5 hash ring
+power indicates the partition count. Partitioning the full ``MD5`` hash ring
 allows other parts of the cluster to work in batches of items at once
 which ends up either more efficient or at least less complex than
 working with each item separately or the entire cluster all at once.
@ -115,7 +114,7 @@ Zones
 -----

 Object Storage allows configuring zones in order to isolate failure
-boundaries. Each data replica resides in a separate zone, if possible.
+boundaries. If possible, each data replica resides in a separate zone.
 At the smallest level, a zone could be a single drive or a grouping of a
 few drives. If there were five object storage servers, then each server
 would represent its own zone. Larger deployments would have an entire
@ -123,13 +122,6 @@ rack (or multiple racks) of object servers, each representing a zone.
 The goal of zones is to allow the cluster to tolerate significant
 outages of storage servers without losing all replicas of the data.

-As mentioned earlier, everything in Object Storage is stored, by
-default, three times. Swift will place each replica
-"as-uniquely-as-possible" to ensure both high availability and high
-durability. This means that when choosing a replica location, Object
-Storage chooses a server in an unused zone before an unused server in a
-zone that already has a replica of the data.
-

 .. _objectstorage-zones-figure:

@ -138,9 +130,6 @@ zone that already has a replica of the data.
 .. figure:: figures/objectstorage-zones.png


-When a disk fails, replica data is automatically distributed to the
-other zones to ensure there are three copies of the data.
-
 Accounts and containers
 -----------------------

@ -164,7 +153,7 @@ database references each object.
 Partitions
 ----------

-A partition is a collection of stored data, including account databases,
+A partition is a collection of stored data. This includes account databases,
 container databases, and objects. Partitions are core to the replication
 system.

--- a/doc/admin-guide-cloud/source/objectstorage_intro.rst
+++ b/doc/admin-guide-cloud/source/objectstorage_intro.rst
@ -2,11 +2,11 @@
 Introduction to Object Storage
 ==============================

-OpenStack Object Storage (code-named swift) is an open source software for
-creating redundant, scalable data storage using clusters of standardized
-servers to store petabytes of accessible data. It is a long-term storage
-system for large amounts of static data that can be retrieved,
-leveraged, and updated. Object Storage uses a distributed architecture
+OpenStack Object Storage (swift) is used for redundant, scalable data
+storage using clusters of standardized servers to store petabytes of
+accessible data. It is a long-term storage system for large amounts of
+static data which can be retrieved and updated. Object Storage uses a
+distributed architecture
 with no central point of control, providing greater scalability,
 redundancy, and permanence. Objects are written to multiple hardware
 devices, with the OpenStack software responsible for ensuring data