Merge "[User Guides] Object Storage chapter edits"
This commit is contained in:
commit
3214af2522
doc/admin-guide-cloud/source
@ -2,8 +2,11 @@
|
||||
Object Storage monitoring
|
||||
=========================
|
||||
|
||||
Excerpted from a blog post by `Darrell
|
||||
Bishop <http://swiftstack.com/blog/2012/04/11/swift-monitoring-with-statsd>`__
|
||||
.. note::
|
||||
|
||||
This section was excerpted from a blog post by `Darrell
|
||||
Bishop <http://swiftstack.com/blog/2012/04/11/swift-monitoring-with-statsd>`_ and
|
||||
has since been edited.
|
||||
|
||||
An OpenStack Object Storage cluster is a collection of many daemons that
|
||||
work together across many nodes. With so many different components, you
|
||||
@ -11,30 +14,22 @@ must be able to tell what is going on inside the cluster. Tracking
|
||||
server-level meters like CPU utilization, load, memory consumption, disk
|
||||
usage and utilization, and so on is necessary, but not sufficient.
|
||||
|
||||
What are different daemons are doing on each server? What is the volume
|
||||
of object replication on node8? How long is it taking? Are there errors?
|
||||
If so, when did they happen?
|
||||
|
||||
In such a complex ecosystem, you can use multiple approaches to get the
|
||||
answers to these questions. This section describes several approaches.
|
||||
|
||||
Swift Recon
|
||||
~~~~~~~~~~~
|
||||
|
||||
The Swift Recon middleware (see
|
||||
http://swift.openstack.org/admin_guide.html#cluster-telemetry-and-monitoring)
|
||||
`Defining Storage Policies <http://swift.openstack.org/admin_guide.html#cluster-telemetry-and-monitoring>`_)
|
||||
provides general machine statistics, such as load average, socket
|
||||
statistics, ``/proc/meminfo`` contents, and so on, as well as
|
||||
Swift-specific meters:
|
||||
statistics, ``/proc/meminfo`` contents, as well as Swift-specific meters:
|
||||
|
||||
- The MD5 sum of each ring file.
|
||||
- The ``MD5`` sum of each ring file.
|
||||
|
||||
- The most recent object replication time.
|
||||
|
||||
- Count of each type of quarantined file: Account, container, or
|
||||
object.
|
||||
|
||||
- Count of "async\_pendings" (deferred container updates) on disk.
|
||||
- Count of "async_pendings" (deferred container updates) on disk.
|
||||
|
||||
Swift Recon is middleware that is installed in the object servers
|
||||
pipeline and takes one required option: A local cache directory. To
|
||||
@ -43,24 +38,23 @@ each object server. You access data by either sending HTTP requests
|
||||
directly to the object server or using the ``swift-recon`` command-line
|
||||
client.
|
||||
|
||||
There are some good Object Storage cluster statistics but the general
|
||||
There are Object Storage cluster statistics but the typical
|
||||
server meters overlap with existing server monitoring systems. To get
|
||||
the Swift-specific meters into a monitoring system, they must be polled.
|
||||
Swift Recon essentially acts as a middleware meters collector. The
|
||||
Swift Recon acts as a middleware meters collector. The
|
||||
process that feeds meters to your statistics system, such as
|
||||
``collectd`` and ``gmond``, probably already runs on the storage node.
|
||||
So, you can choose to either talk to Swift Recon or collect the meters
|
||||
``collectd`` and ``gmond``, should already run on the storage node.
|
||||
You can choose to either talk to Swift Recon or collect the meters
|
||||
directly.
|
||||
|
||||
Swift-Informant
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
Florian Hines developed the Swift-Informant middleware (see
|
||||
https://github.com/pandemicsyn/swift-informant) to get real-time
|
||||
visibility into Object Storage client requests. It sits in the pipeline
|
||||
for the proxy server, and after each request to the proxy server, sends
|
||||
three meters to a StatsD server (see
|
||||
http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/):
|
||||
Swift-Informant middleware (see
|
||||
`swift-informant <https://github.com/pandemicsyn/swift-informant>_`) has
|
||||
real-time visibility into Object Storage client requests. It sits in the
|
||||
pipeline for the proxy server, and after each request to the proxy server it
|
||||
sends three meters to a ``StatsD`` server:
|
||||
|
||||
- A counter increment for a meter like ``obj.GET.200`` or
|
||||
``cont.PUT.404``.
|
||||
@ -77,26 +71,24 @@ http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/):
|
||||
- A counter increase by the bytes transferred for a meter like
|
||||
``tfer.obj.PUT.201``.
|
||||
|
||||
This is good for getting a feel for the quality of service clients are
|
||||
experiencing with the timing meters, as well as getting a feel for the
|
||||
volume of the various permutations of request server type, command, and
|
||||
response code. Swift-Informant also requires no change to core Object
|
||||
This is used for receiving information on the quality of service clients
|
||||
experience with the timing meters, as well as sensing the volume of the
|
||||
various modifications of a request server type, command, and response
|
||||
code. Swift-Informant requires no change to core Object
|
||||
Storage code because it is implemented as middleware. However, it gives
|
||||
you no insight into the workings of the cluster past the proxy server.
|
||||
no insight into the workings of the cluster past the proxy server.
|
||||
If the responsiveness of one storage node degrades, you can only see
|
||||
that some of your requests are bad, either as high latency or error
|
||||
status codes. You do not know exactly why or where that request tried to
|
||||
go. Maybe the container server in question was on a good node but the
|
||||
object server was on a different, poorly-performing node.
|
||||
that some of the requests are bad, either as high latency or error
|
||||
status codes.
|
||||
|
||||
Statsdlog
|
||||
~~~~~~~~~
|
||||
|
||||
Florian's `Statsdlog <https://github.com/pandemicsyn/statsdlog>`__
|
||||
The `Statsdlog <https://github.com/pandemicsyn/statsdlog>`_
|
||||
project increments StatsD counters based on logged events. Like
|
||||
Swift-Informant, it is also non-intrusive, but statsdlog can track
|
||||
Swift-Informant, it is also non-intrusive, however statsdlog can track
|
||||
events from all Object Storage daemons, not just proxy-server. The
|
||||
daemon listens to a UDP stream of syslog messages and StatsD counters
|
||||
daemon listens to a UDP stream of syslog messages, and StatsD counters
|
||||
are incremented when a log line matches a regular expression. Meter
|
||||
names are mapped to regex match patterns in a JSON file, allowing
|
||||
flexible configuration of what meters are extracted from the log stream.
|
||||
@ -123,7 +115,7 @@ Swift StatsD logging
|
||||
|
||||
StatsD (see
|
||||
http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/)
|
||||
was designed for application code to be deeply instrumented; meters are
|
||||
was designed for application code to be deeply instrumented. Meters are
|
||||
sent in real-time by the code that just noticed or did something. The
|
||||
overhead of sending a meter is extremely low: a ``sendto`` of one UDP
|
||||
packet. If that overhead is still too high, the StatsD client library
|
||||
@ -133,10 +125,10 @@ actual number when flushing meters upstream.
|
||||
To avoid the problems inherent with middleware-based monitoring and
|
||||
after-the-fact log processing, the sending of StatsD meters is
|
||||
integrated into Object Storage itself. The submitted change set (see
|
||||
https://review.openstack.org/#change,6058) currently reports 124 meters
|
||||
`<https://review.openstack.org/#change,6058>`_) currently reports 124 meters
|
||||
across 15 Object Storage daemons and the tempauth middleware. Details of
|
||||
the meters tracked are in the `Administrator's
|
||||
Guide <http://docs.openstack.org/developer/swift/admin_guide.html>`__.
|
||||
Guide <http://docs.openstack.org/developer/swift/admin_guide.html>`_.
|
||||
|
||||
The sending of meters is integrated with the logging framework. To
|
||||
enable, configure ``log_statsd_host`` in the relevant config file. You
|
||||
@ -144,7 +136,7 @@ can also specify the port and a default sample rate. The specified
|
||||
default sample rate is used unless a specific call to a statsd logging
|
||||
method (see the list below) overrides it. Currently, no logging calls
|
||||
override the sample rate, but it is conceivable that some meters may
|
||||
require accuracy (sample\_rate == 1) while others may not.
|
||||
require accuracy (sample_rate == 1) while others may not.
|
||||
|
||||
.. code-block:: ini
|
||||
|
||||
@ -184,63 +176,53 @@ in ``self.logger``, has these new methods:
|
||||
Convenience method to record a timing meter whose value is "now"
|
||||
minus an existing timestamp.
|
||||
|
||||
Note that these logging methods may safely be called anywhere you have a
|
||||
logger object. If StatsD logging has not been configured, the methods
|
||||
are no-ops. This avoids messy conditional logic each place a meter is
|
||||
recorded. These example usages show the new logging methods:
|
||||
.. note::
|
||||
|
||||
.. code-block:: python
|
||||
These logging methods may safely be called anywhere you have a
|
||||
logger object. If StatsD logging has not been configured, the methods
|
||||
are no-ops. This avoids messy conditional logic each place a meter is
|
||||
recorded. These example usages show the new logging methods:
|
||||
|
||||
# swift/obj/replicator.py
|
||||
def update(self, job):
|
||||
# ...
|
||||
begin = time.time()
|
||||
try:
|
||||
hashed, local_hash = tpool.execute(tpooled_get_hashes, job['path'],
|
||||
do_listdir=(self.replication_count % 10) == 0,
|
||||
reclaim_age=self.reclaim_age)
|
||||
# See tpooled_get_hashes "Hack".
|
||||
if isinstance(hashed, BaseException):
|
||||
raise hashed
|
||||
self.suffix_hash += hashed
|
||||
self.logger.update_stats('suffix.hashes', hashed)
|
||||
.. code-block:: python
|
||||
|
||||
# swift/obj/replicator.py
|
||||
def update(self, job):
|
||||
# ...
|
||||
finally:
|
||||
self.partition_times.append(time.time() - begin)
|
||||
self.logger.timing_since('partition.update.timing', begin)
|
||||
begin = time.time()
|
||||
try:
|
||||
hashed, local_hash = tpool.execute(tpooled_get_hashes, job['path'],
|
||||
do_listdir=(self.replication_count % 10) == 0,
|
||||
reclaim_age=self.reclaim_age)
|
||||
# See tpooled_get_hashes "Hack".
|
||||
if isinstance(hashed, BaseException):
|
||||
raise hashed
|
||||
self.suffix_hash += hashed
|
||||
self.logger.update_stats('suffix.hashes', hashed)
|
||||
# ...
|
||||
finally:
|
||||
self.partition_times.append(time.time() - begin)
|
||||
self.logger.timing_since('partition.update.timing', begin)
|
||||
|
||||
.. code-block:: python
|
||||
.. code-block:: python
|
||||
|
||||
# swift/container/updater.py
|
||||
def process_container(self, dbfile):
|
||||
# ...
|
||||
start_time = time.time()
|
||||
# ...
|
||||
for event in events:
|
||||
if 200 <= event.wait() < 300:
|
||||
successes += 1
|
||||
else:
|
||||
failures += 1
|
||||
if successes > failures:
|
||||
self.logger.increment('successes')
|
||||
# ...
|
||||
else:
|
||||
self.logger.increment('failures')
|
||||
# ...
|
||||
# Only track timing data for attempted updates:
|
||||
self.logger.timing_since('timing', start_time)
|
||||
else:
|
||||
self.logger.increment('no_changes')
|
||||
self.no_changes += 1
|
||||
|
||||
The development team of StatsD wanted to use the
|
||||
`pystatsd <https://github.com/sivy/py-statsd>`__ client library (not to
|
||||
be confused with a `similar-looking
|
||||
project <https://github.com/sivy/py-statsd>`__ also hosted on GitHub),
|
||||
but the released version on PyPI was missing two desired features the
|
||||
latest version in GitHub had: the ability to configure a meters prefix
|
||||
in the client object and a convenience method for sending timing data
|
||||
between ``now`` and a ``start`` timestamp you already have. So they just
|
||||
implemented a simple StatsD client library from scratch with the same
|
||||
interface. This has the nice fringe benefit of not introducing another
|
||||
external library dependency into Object Storage.
|
||||
# swift/container/updater.py
|
||||
def process_container(self, dbfile):
|
||||
# ...
|
||||
start_time = time.time()
|
||||
# ...
|
||||
for event in events:
|
||||
if 200 <= event.wait() < 300:
|
||||
successes += 1
|
||||
else:
|
||||
failures += 1
|
||||
if successes > failures:
|
||||
self.logger.increment('successes')
|
||||
# ...
|
||||
else:
|
||||
self.logger.increment('failures')
|
||||
# ...
|
||||
# Only track timing data for attempted updates:
|
||||
self.logger.timing_since('timing', start_time)
|
||||
else:
|
||||
self.logger.increment('no_changes')
|
||||
self.no_changes += 1
|
||||
|
@ -2,12 +2,11 @@
|
||||
Account reaper
|
||||
==============
|
||||
|
||||
In the background, the account reaper removes data from the deleted
|
||||
accounts.
|
||||
The purpose of the account reaper is to remove data from the deleted accounts.
|
||||
|
||||
A reseller marks an account for deletion by issuing a ``DELETE`` request
|
||||
on the account's storage URL. This action sets the ``status`` column of
|
||||
the account\_stat table in the account database and replicas to
|
||||
the account_stat table in the account database and replicas to
|
||||
``DELETED``, marking the account's data for deletion.
|
||||
|
||||
Typically, a specific retention time or undelete are not provided.
|
||||
@ -15,8 +14,8 @@ However, you can set a ``delay_reaping`` value in the
|
||||
``[account-reaper]`` section of the ``account-server.conf`` file to
|
||||
delay the actual deletion of data. At this time, to undelete you have to update
|
||||
the account database replicas directly, set the status column to an
|
||||
empty string and update the put\_timestamp to be greater than the
|
||||
delete\_timestamp.
|
||||
empty string and update the put_timestamp to be greater than the
|
||||
delete_timestamp.
|
||||
|
||||
.. note::
|
||||
|
||||
|
@ -2,26 +2,26 @@
|
||||
Components
|
||||
==========
|
||||
|
||||
The components that enable Object Storage to deliver high availability,
|
||||
high durability, and high concurrency are:
|
||||
Object Storage uses the following components to deliver high
|
||||
availability, high durability, and high concurrency:
|
||||
|
||||
- **Proxy servers.** Handle all of the incoming API requests.
|
||||
- **Proxy servers** - Handle all of the incoming API requests.
|
||||
|
||||
- **Rings.** Map logical names of data to locations on particular
|
||||
- **Rings** - Map logical names of data to locations on particular
|
||||
disks.
|
||||
|
||||
- **Zones.** Isolate data from other zones. A failure in one zone
|
||||
doesn't impact the rest of the cluster because data is replicated
|
||||
- **Zones** - Isolate data from other zones. A failure in one zone
|
||||
does not impact the rest of the cluster as data replicates
|
||||
across zones.
|
||||
|
||||
- **Accounts and containers.** Each account and container are
|
||||
- **Accounts and containers** - Each account and container are
|
||||
individual databases that are distributed across the cluster. An
|
||||
account database contains the list of containers in that account. A
|
||||
container database contains the list of objects in that container.
|
||||
|
||||
- **Objects.** The data itself.
|
||||
- **Objects** - The data itself.
|
||||
|
||||
- **Partitions.** A partition stores objects, account databases, and
|
||||
- **Partitions** - A partition stores objects, account databases, and
|
||||
container databases and helps manage locations where data lives in
|
||||
the cluster.
|
||||
|
||||
@ -38,7 +38,7 @@ Proxy servers
|
||||
|
||||
Proxy servers are the public face of Object Storage and handle all of
|
||||
the incoming API requests. Once a proxy server receives a request, it
|
||||
determines the storage node based on the object's URL, for example,
|
||||
determines the storage node based on the object's URL, for example:
|
||||
https://swift.example.com/v1/account/container/object. Proxy servers
|
||||
also coordinate responses, handle failures, and coordinate timestamps.
|
||||
|
||||
@ -47,14 +47,14 @@ needed based on projected workloads. A minimum of two proxy servers
|
||||
should be deployed for redundancy. If one proxy server fails, the others
|
||||
take over.
|
||||
|
||||
For more information concerning proxy server configuration, please see the
|
||||
For more information concerning proxy server configuration, see
|
||||
`Configuration Reference
|
||||
<http://docs.openstack.org/liberty/config-reference/content/proxy-server-configuration.html>`__.
|
||||
<http://docs.openstack.org/liberty/config-reference/content/proxy-server-configuration.html>`_.
|
||||
|
||||
Rings
|
||||
-----
|
||||
|
||||
A ring represents a mapping between the names of entities stored on disk
|
||||
A ring represents a mapping between the names of entities stored on disks
|
||||
and their physical locations. There are separate rings for accounts,
|
||||
containers, and objects. When other components need to perform any
|
||||
operation on an object, container, or account, they need to interact
|
||||
@ -90,15 +90,14 @@ The ring is used by the proxy server and several background processes
|
||||
|
||||
.. figure:: figures/objectstorage-ring.png
|
||||
|
||||
These rings are externally managed. The server processes themselves
|
||||
do not modify the rings, they are instead given new rings modified by
|
||||
other tools.
|
||||
|
||||
These rings are externally managed, in that the server processes
|
||||
themselves do not modify the rings, they are instead given new rings
|
||||
modified by other tools.
|
||||
|
||||
The ring uses a configurable number of bits from an MD5 hash for a path
|
||||
The ring uses a configurable number of bits from an ``MD5`` hash for a path
|
||||
as a partition index that designates a device. The number of bits kept
|
||||
from the hash is known as the partition power, and 2 to the partition
|
||||
power indicates the partition count. Partitioning the full MD5 hash ring
|
||||
power indicates the partition count. Partitioning the full ``MD5`` hash ring
|
||||
allows other parts of the cluster to work in batches of items at once
|
||||
which ends up either more efficient or at least less complex than
|
||||
working with each item separately or the entire cluster all at once.
|
||||
@ -115,7 +114,7 @@ Zones
|
||||
-----
|
||||
|
||||
Object Storage allows configuring zones in order to isolate failure
|
||||
boundaries. Each data replica resides in a separate zone, if possible.
|
||||
boundaries. If possible, each data replica resides in a separate zone.
|
||||
At the smallest level, a zone could be a single drive or a grouping of a
|
||||
few drives. If there were five object storage servers, then each server
|
||||
would represent its own zone. Larger deployments would have an entire
|
||||
@ -123,13 +122,6 @@ rack (or multiple racks) of object servers, each representing a zone.
|
||||
The goal of zones is to allow the cluster to tolerate significant
|
||||
outages of storage servers without losing all replicas of the data.
|
||||
|
||||
As mentioned earlier, everything in Object Storage is stored, by
|
||||
default, three times. Swift will place each replica
|
||||
"as-uniquely-as-possible" to ensure both high availability and high
|
||||
durability. This means that when choosing a replica location, Object
|
||||
Storage chooses a server in an unused zone before an unused server in a
|
||||
zone that already has a replica of the data.
|
||||
|
||||
|
||||
.. _objectstorage-zones-figure:
|
||||
|
||||
@ -138,9 +130,6 @@ zone that already has a replica of the data.
|
||||
.. figure:: figures/objectstorage-zones.png
|
||||
|
||||
|
||||
When a disk fails, replica data is automatically distributed to the
|
||||
other zones to ensure there are three copies of the data.
|
||||
|
||||
Accounts and containers
|
||||
-----------------------
|
||||
|
||||
@ -164,7 +153,7 @@ database references each object.
|
||||
Partitions
|
||||
----------
|
||||
|
||||
A partition is a collection of stored data, including account databases,
|
||||
A partition is a collection of stored data. This includes account databases,
|
||||
container databases, and objects. Partitions are core to the replication
|
||||
system.
|
||||
|
||||
|
@ -2,11 +2,11 @@
|
||||
Introduction to Object Storage
|
||||
==============================
|
||||
|
||||
OpenStack Object Storage (code-named swift) is an open source software for
|
||||
creating redundant, scalable data storage using clusters of standardized
|
||||
servers to store petabytes of accessible data. It is a long-term storage
|
||||
system for large amounts of static data that can be retrieved,
|
||||
leveraged, and updated. Object Storage uses a distributed architecture
|
||||
OpenStack Object Storage (swift) is used for redundant, scalable data
|
||||
storage using clusters of standardized servers to store petabytes of
|
||||
accessible data. It is a long-term storage system for large amounts of
|
||||
static data which can be retrieved and updated. Object Storage uses a
|
||||
distributed architecture
|
||||
with no central point of control, providing greater scalability,
|
||||
redundancy, and permanence. Objects are written to multiple hardware
|
||||
devices, with the OpenStack software responsible for ensuring data
|
||||
|
Loading…
x
Reference in New Issue
Block a user