Convert object-storage sections to rst

Change-Id: Ic5af9e87475f1bad657f72d7b65d1a05211f3ca5 Implements: blueprint reorganise-user-guides
2015-06-11 10:39:07 +10:00 · 2015-06-11 10:39:07 +10:00 · eb85580cc4
commit eb85580cc4
parent a1588d26c7
3 changed files with 261 additions and 2 deletions
--- a/doc/admin-guide-cloud-rst/source/objectstorage-admin.rst
+++ b/doc/admin-guide-cloud-rst/source/objectstorage-admin.rst
@ -0,0 +1,12 @@
+========================================
+System administration for Object Storage
+========================================
+
+By understanding Object Storage concepts, you can better monitor and
+administer your storage solution. The majority of the administration
+information is maintained in developer documentation at
+`docs.openstack.org/developer/swift/ <http://docs.openstack.org/developer/swift/>`__.
+
+See the `OpenStack Configuration
+Reference <http://docs.openstack.org/kilo/config-reference/content/>`__
+for a list of configuration options for Object Storage.
--- a/doc/admin-guide-cloud-rst/source/objectstorage-monitoring.rst
+++ b/doc/admin-guide-cloud-rst/source/objectstorage-monitoring.rst
@ -0,0 +1,247 @@
+=========================
+Object Storage monitoring
+=========================
+
+Excerpted from a blog post by `Darrell
+Bishop <http://swiftstack.com/blog/2012/04/11/swift-monitoring-with-statsd>`__
+
+An OpenStack Object Storage cluster is a collection of many daemons that
+work together across many nodes. With so many different components, you
+must be able to tell what is going on inside the cluster. Tracking
+server-level meters like CPU utilization, load, memory consumption, disk
+usage and utilization, and so on is necessary, but not sufficient.
+
+What are different daemons are doing on each server? What is the volume
+of object replication on node8? How long is it taking? Are there errors?
+If so, when did they happen?
+
+In such a complex ecosystem, you can use multiple approaches to get the
+answers to these questions. This section describes several approaches.
+
+Swift Recon
+~~~~~~~~~~~
+
+The Swift Recon middleware (see
+http://swift.openstack.org/admin_guide.html#cluster-telemetry-and-monitoring)
+provides general machine statistics, such as load average, socket
+statistics, ``/proc/meminfo`` contents, and so on, as well as
+Swift-specific meters:
+
+-  The MD5 sum of each ring file.
+
+-  The most recent object replication time.
+
+-  Count of each type of quarantined file: Account, container, or
+   object.
+
+-  Count of "async\_pendings" (deferred container updates) on disk.
+
+Swift Recon is middleware that is installed in the object servers
+pipeline and takes one required option: A local cache directory. To
+track ``async_pendings``, you must set up an additional cron job for
+each object server. You access data by either sending HTTP requests
+directly to the object server or using the ``swift-recon`` command-line
+client.
+
+There are some good Object Storage cluster statistics but the general
+server meters overlap with existing server monitoring systems. To get
+the Swift-specific meters into a monitoring system, they must be polled.
+Swift Recon essentially acts as a middleware meters collector. The
+process that feeds meters to your statistics system, such as
+``collectd`` and ``gmond``, probably already runs on the storage node.
+So, you can choose to either talk to Swift Recon or collect the meters
+directly.
+
+Swift-Informant
+~~~~~~~~~~~~~~~
+
+Florian Hines developed the Swift-Informant middleware (see
+https://github.com/pandemicsyn/swift-informant) to get real-time
+visibility into Object Storage client requests. It sits in the pipeline
+for the proxy server, and after each request to the proxy server, sends
+three meters to a StatsD server (see
+http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/):
+
+-  A counter increment for a meter like ``obj.GET.200`` or
+   ``cont.PUT.404``.
+
+-  Timing data for a meter like ``acct.GET.200`` or ``obj.GET.200``.
+   [The README says the meters look like ``duration.acct.GET.200``, but
+   I do not see the ``duration`` in the code. I am not sure what the
+   Etsy server does but our StatsD server turns timing meters into five
+   derivative meters with new segments appended, so it probably works as
+   coded. The first meter turns into ``acct.GET.200.lower``,
+   ``acct.GET.200.upper``, ``acct.GET.200.mean``,
+   ``acct.GET.200.upper_90``, and ``acct.GET.200.count``].
+
+-  A counter increase by the bytes transferred for a meter like
+   ``tfer.obj.PUT.201``.
+
+This is good for getting a feel for the quality of service clients are
+experiencing with the timing meters, as well as getting a feel for the
+volume of the various permutations of request server type, command, and
+response code. Swift-Informant also requires no change to core Object
+Storage code because it is implemented as middleware. However, it gives
+you no insight into the workings of the cluster past the proxy server.
+If the responsiveness of one storage node degrades, you can only see
+that some of your requests are bad, either as high latency or error
+status codes. You do not know exactly why or where that request tried to
+go. Maybe the container server in question was on a good node but the
+object server was on a different, poorly-performing node.
+
+Statsdlog
+~~~~~~~~~
+
+Florian's `Statsdlog <https://github.com/pandemicsyn/statsdlog>`__
+project increments StatsD counters based on logged events. Like
+Swift-Informant, it is also non-intrusive, but statsdlog can track
+events from all Object Storage daemons, not just proxy-server. The
+daemon listens to a UDP stream of syslog messages and StatsD counters
+are incremented when a log line matches a regular expression. Meter
+names are mapped to regex match patterns in a JSON file, allowing
+flexible configuration of what meters are extracted from the log stream.
+
+Currently, only the first matching regex triggers a StatsD counter
+increment, and the counter is always incremented by one. There is no way
+to increment a counter by more than one or send timing data to StatsD
+based on the log line content. The tool could be extended to handle more
+meters for each line and data extraction, including timing data. But a
+coupling would still exist between the log textual format and the log
+parsing regexes, which would themselves be more complex to support
+multiple matches for each line and data extraction. Also, log processing
+introduces a delay between the triggering event and sending the data to
+StatsD. It would be preferable to increment error counters where they
+occur and send timing data as soon as it is known to avoid coupling
+between a log string and a parsing regex and prevent a time delay
+between events and sending data to StatsD.
+
+The next section describes another method for gathering Object Storage
+operational meters.
+
+Swift StatsD logging
+~~~~~~~~~~~~~~~~~~~~
+
+StatsD (see
+http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/)
+was designed for application code to be deeply instrumented; meters are
+sent in real-time by the code that just noticed or did something. The
+overhead of sending a meter is extremely low: a ``sendto`` of one UDP
+packet. If that overhead is still too high, the StatsD client library
+can send only a random portion of samples and StatsD approximates the
+actual number when flushing meters upstream.
+
+To avoid the problems inherent with middleware-based monitoring and
+after-the-fact log processing, the sending of StatsD meters is
+integrated into Object Storage itself. The submitted change set (see
+https://review.openstack.org/#change,6058) currently reports 124 meters
+across 15 Object Storage daemons and the tempauth middleware. Details of
+the meters tracked are in the `Administrator's
+Guide <http://docs.openstack.org/developer/swift/admin_guide.html>`__.
+
+The sending of meters is integrated with the logging framework. To
+enable, configure ``log_statsd_host`` in the relevant config file. You
+can also specify the port and a default sample rate. The specified
+default sample rate is used unless a specific call to a statsd logging
+method (see the list below) overrides it. Currently, no logging calls
+override the sample rate, but it is conceivable that some meters may
+require accuracy (sample\_rate == 1) while others may not.
+
+.. code:
+
+    [DEFAULT]
+         ...
+    log_statsd_host = 127.0.0.1
+    log_statsd_port = 8125
+    log_statsd_default_sample_rate = 1
+
+Then the LogAdapter object returned by ``get_logger()``, usually stored
+in ``self.logger``, has these new methods:
+
+-  ``set_statsd_prefix(self, prefix)`` Sets the client library stat
+   prefix value which gets prefixed to every meter. The default prefix
+   is the "name" of the logger such as "object-server",
+   "container-auditor", and so on. This is currently used to turn
+   "proxy-server" into one of "proxy-server.Account",
+   "proxy-server.Container", or "proxy-server.Object" as soon as the
+   Controller object is determined and instantiated for the request.
+
+-  ``update_stats(self, metric, amount, sample_rate=1)`` Increments the supplied
+   metric by the given amount. This is used when you need to add or
+   subtract more that one from a counter, like incrementing
+   "suffix.hashes" by the number of computed hashes in the object
+   replicator.
+
+-  ``increment(self, metric, sample_rate=1)`` Increments the given counter
+   metric by one.
+
+-  ``decrement(self, metric, sample_rate=1)`` Lowers the given counter
+   metric by one.
+
+-  ``timing(self, metric, timing_ms, sample_rate=1)`` Record that the given metric
+   took the supplied number of milliseconds.
+
+-  ``timing_since(self, metric, orig_time, sample_rate=1)`` Convenience method to record
+   a timing metric whose value is "now" minus an existing timestamp.
+
+Note that these logging methods may safely be called anywhere you have a
+logger object. If StatsD logging has not been configured, the methods
+are no-ops. This avoids messy conditional logic each place a meter is
+recorded. These example usages show the new logging methods:
+
+.. code-block:: bash
+   :linenos:
+
+    # swift/obj/replicator.py
+    def update(self, job):
+        # ...
+        begin = time.time()
+        try:
+            hashed, local_hash = tpool.execute(tpooled_get_hashes, job['path'],
+                    do_listdir=(self.replication_count % 10) == 0,
+                    reclaim_age=self.reclaim_age)
+            # See tpooled_get_hashes "Hack".
+            if isinstance(hashed, BaseException):
+                raise hashed
+            self.suffix_hash += hashed
+            self.logger.update_stats('suffix.hashes', hashed)
+            # ...
+        finally:
+            self.partition_times.append(time.time() - begin)
+            self.logger.timing_since('partition.update.timing', begin)
+
+.. code-block:: bash
+   :linenos:
+
+    # swift/container/updater.py
+    def process_container(self, dbfile):
+        # ...
+        start_time = time.time()
+        # ...
+            for event in events:
+                if 200 <= event.wait() < 300:
+                    successes += 1
+                else:
+                    failures += 1
+            if successes > failures:
+                self.logger.increment('successes')
+                # ...
+            else:
+                self.logger.increment('failures')
+                # ...
+            # Only track timing data for attempted updates:
+            self.logger.timing_since('timing', start_time)
+        else:
+            self.logger.increment('no_changes')
+            self.no_changes += 1
+
+The development team of StatsD wanted to use the
+`pystatsd <https://github.com/sivy/py-statsd>`__ client library (not to
+be confused with a `similar-looking
+project <https://github.com/sivy/py-statsd>`__ also hosted on GitHub),
+but the released version on PyPI was missing two desired features the
+latest version in GitHub had: the ability to configure a meters prefix
+in the client object and a convenience method for sending timing data
+between "now" and a "start" timestamp you already have. So they just
+implemented a simple StatsD client library from scratch with the same
+interface. This has the nice fringe benefit of not introducing another
+external library dependency into Object Storage.
--- a/doc/admin-guide-cloud-rst/source/objectstorage.rst
+++ b/doc/admin-guide-cloud-rst/source/objectstorage.rst
@ -12,6 +12,8 @@ Contents
   objectstorage_features.rst
   objectstorage_characteristics.rst
   objectstorage_components.rst
+   objectstorage-monitoring.rst
+   objectstorage-admin.rst

 .. TODO (karenb)
   objectstorage_ringbuilder.rst
@ -19,6 +21,4 @@ Contents
   objectstorage_replication.rst
   objectstorage_account_reaper.rst
   objectstorage_tenant_specific_image_storage.rst
-   objectstorage_monitoring.rst
-   objectstorage_admin.rst
   objectstorage_troubleshoot.rst