[admin-guide] Fix the rst markups for Object Storage files

Change-Id: I67fa565e6d6826d5486146278186997ce0dbe7d0
This commit is contained in:
venkatamahesh 2015-12-15 11:07:25 +05:30
parent a68ac6d95a
commit 610345f2cc
11 changed files with 128 additions and 121 deletions

View File

@ -146,29 +146,29 @@ method (see the list below) overrides it. Currently, no logging calls
override the sample rate, but it is conceivable that some meters may
require accuracy (sample\_rate == 1) while others may not.
.. code::
.. code-block:: ini
[DEFAULT]
...
log_statsd_host = 127.0.0.1
log_statsd_port = 8125
log_statsd_default_sample_rate = 1
[DEFAULT]
...
log_statsd_host = 127.0.0.1
log_statsd_port = 8125
log_statsd_default_sample_rate = 1
Then the LogAdapter object returned by ``get_logger()``, usually stored
in ``self.logger``, has these new methods:
- ``set_statsd_prefix(self, prefix)`` Sets the client library stat
prefix value which gets prefixed to every meter. The default prefix
is the "name" of the logger such as "object-server",
"container-auditor", and so on. This is currently used to turn
"proxy-server" into one of "proxy-server.Account",
"proxy-server.Container", or "proxy-server.Object" as soon as the
is the ``name`` of the logger such as ``object-server``,
``container-auditor``, and so on. This is currently used to turn
``proxy-server`` into one of ``proxy-server.Account``,
``proxy-server.Container``, or ``proxy-server.Object`` as soon as the
Controller object is determined and instantiated for the request.
- ``update_stats(self, metric, amount, sample_rate=1)`` Increments
the supplied meter by the given amount. This is used when you need
to add or subtract more that one from a counter, like incrementing
"suffix.hashes" by the number of computed hashes in the object
``suffix.hashes`` by the number of computed hashes in the object
replicator.
- ``increment(self, metric, sample_rate=1)`` Increments the given counter
@ -189,49 +189,49 @@ logger object. If StatsD logging has not been configured, the methods
are no-ops. This avoids messy conditional logic each place a meter is
recorded. These example usages show the new logging methods:
.. code-block:: bash
.. code-block:: python
# swift/obj/replicator.py
def update(self, job):
# ...
begin = time.time()
try:
hashed, local_hash = tpool.execute(tpooled_get_hashes, job['path'],
do_listdir=(self.replication_count % 10) == 0,
reclaim_age=self.reclaim_age)
# See tpooled_get_hashes "Hack".
if isinstance(hashed, BaseException):
raise hashed
self.suffix_hash += hashed
self.logger.update_stats('suffix.hashes', hashed)
# ...
finally:
self.partition_times.append(time.time() - begin)
self.logger.timing_since('partition.update.timing', begin)
# swift/obj/replicator.py
def update(self, job):
# ...
begin = time.time()
try:
hashed, local_hash = tpool.execute(tpooled_get_hashes, job['path'],
do_listdir=(self.replication_count % 10) == 0,
reclaim_age=self.reclaim_age)
# See tpooled_get_hashes "Hack".
if isinstance(hashed, BaseException):
raise hashed
self.suffix_hash += hashed
self.logger.update_stats('suffix.hashes', hashed)
# ...
finally:
self.partition_times.append(time.time() - begin)
self.logger.timing_since('partition.update.timing', begin)
.. code-block:: bash
.. code-block:: python
# swift/container/updater.py
def process_container(self, dbfile):
# ...
start_time = time.time()
# ...
for event in events:
if 200 <= event.wait() < 300:
successes += 1
else:
failures += 1
if successes > failures:
self.logger.increment('successes')
# ...
else:
self.logger.increment('failures')
# ...
# Only track timing data for attempted updates:
self.logger.timing_since('timing', start_time)
else:
self.logger.increment('no_changes')
self.no_changes += 1
# swift/container/updater.py
def process_container(self, dbfile):
# ...
start_time = time.time()
# ...
for event in events:
if 200 <= event.wait() < 300:
successes += 1
else:
failures += 1
if successes > failures:
self.logger.increment('successes')
# ...
else:
self.logger.increment('failures')
# ...
# Only track timing data for attempted updates:
self.logger.timing_since('timing', start_time)
else:
self.logger.increment('no_changes')
self.no_changes += 1
The development team of StatsD wanted to use the
`pystatsd <https://github.com/sivy/py-statsd>`__ client library (not to
@ -240,7 +240,7 @@ project <https://github.com/sivy/py-statsd>`__ also hosted on GitHub),
but the released version on PyPI was missing two desired features the
latest version in GitHub had: the ability to configure a meters prefix
in the client object and a convenience method for sending timing data
between "now" and a "start" timestamp you already have. So they just
between ``now`` and a ``start`` timestamp you already have. So they just
implemented a simple StatsD client library from scratch with the same
interface. This has the nice fringe benefit of not introducing another
external library dependency into Object Storage.

View File

@ -2,8 +2,8 @@
Troubleshoot Object Storage
===========================
For Object Storage, everything is logged in :file:`/var/log/syslog` (or
:file:`messages` on some distros). Several settings enable further
For Object Storage, everything is logged in ``/var/log/syslog`` (or
``messages`` on some distros). Several settings enable further
customization of logging, such as ``log_name``, ``log_facility``, and
``log_level``, within the object server configuration files.
@ -22,7 +22,7 @@ replicas that were on that drive to be replicated elsewhere until the
drive is replaced. Once the drive is replaced, it can be re-added to the
ring.
You can look at error messages in :file:`/var/log/kern.log` for hints of
You can look at error messages in the ``/var/log/kern.log`` file for hints of
drive failure.
Server failure
@ -49,7 +49,7 @@ Detect failed drives
~~~~~~~~~~~~~~~~~~~~
It has been our experience that when a drive is about to fail, error
messages appear in :file:`/var/log/kern.log`. There is a script called
messages appear in the ``/var/log/kern.log`` file. There is a script called
``swift-drive-audit`` that can be run via cron to watch for bad drives. If
errors are detected, it will unmount the bad drive, so that Object
Storage can work around it. The script takes a configuration file with
@ -79,7 +79,7 @@ the following settings:
* - ``log_to_console = False``
- No help text available for this option.
* - ``minutes = 60``
- Number of minutes to look back in :file:`/var/log/kern.log`
- Number of minutes to look back in ``/var/log/kern.log``
* - ``recon_cache_path = /var/cache/swift``
- Directory where stats for a few items will be stored
* - ``regex_pattern_1 = \berror\b.*\b(dm-[0-9]{1,2}\d?)\b``
@ -100,7 +100,7 @@ an emergency occurs, this procedure may assist in returning your cluster
to an operational state.
Using existing swift tools, there is no way to recover a builder file
from a :file:`ring.gz` file. However, if you have a knowledge of Python, it
from a ``ring.gz`` file. However, if you have a knowledge of Python, it
is possible to construct a builder file that is pretty close to the one
you have lost.
@ -161,13 +161,13 @@ you have lost.
>>> pickle.dump(builder.to_dict(), open('account.builder', 'wb'), protocol=2)
>>> exit ()
#. You should now have a file called :file:`account.builder` in the current
#. You should now have a file called ``account.builder`` in the current
working directory. Run
:command:`swift-ring-builder account.builder write_ring` and compare the new
:file:`account.ring.gz` to the :file:`account.ring.gz` that you started
``account.ring.gz`` to the ``account.ring.gz`` that you started
from. They probably are not byte-for-byte identical, but if you load them
in a REPL and their ``_replica2part2dev_id`` and ``devs`` attributes are
the same (or nearly so), then you are in good shape.
#. Repeat the procedure for :file:`container.ring.gz` and
:file:`object.ring.gz`, and you might get usable builder files.
#. Repeat the procedure for ``container.ring.gz`` and
``object.ring.gz``, and you might get usable builder files.

View File

@ -21,9 +21,10 @@ not only separate devices but possibly even entire nodes dedicated for erasure
coding.
.. important::
The erasure code support in Object Storage is considered beta in Kilo.
Most major functionality is included, but it has not been tested or
validated at large scale. This feature relies on `ssync` for durability.
validated at large scale. This feature relies on ``ssync`` for durability.
We recommend deployers do extensive testing and not deploy production
data using an erasure code storage policy.
If any bugs are found during testing, please report them to

View File

@ -12,7 +12,7 @@ the account\_stat table in the account database and replicas to
Typically, a specific retention time or undelete are not provided.
However, you can set a ``delay_reaping`` value in the
``[account-reaper]`` section of the :file:`account-server.conf` file to
``[account-reaper]`` section of the ``account-server.conf`` file to
delay the actual deletion of data. At this time, to undelete you have to update
the account database replicas directly, set the status column to an
empty string and update the put\_timestamp to be greater than the
@ -41,10 +41,12 @@ which point the database reclaim process within the db\_replicator will
remove the database files.
A persistent error state may prevent the deletion of an object or
container. If this happens, you will see a message in the log, for example::
container. If this happens, you will see a message in the log, for example:
"Account <name> has not been reaped since <date>"
.. code-block:: console
Account <name> has not been reaped since <date>
You can control when this is logged with the ``reap_warn_after`` value in the
``[account-reaper]`` section of the :file:`account-server.conf` file.
``[account-reaper]`` section of the ``account-server.conf`` file.
The default value is 30 days.

View File

@ -12,18 +12,17 @@ and authentication services. It runs the (distributed) brain of the
Object Storage system: the proxy server processes.
.. note::
If you want to use OpenStack Identity API v3 for authentication, you
have the following options available in :file:`/etc/swift/dispersion.conf`:
have the following options available in ``/etc/swift/dispersion.conf``:
``auth_version``, ``user_domain_name``, ``project_domain_name``,
and ``project_name``.
**Object Storage architecture**
|
.. image:: figures/objectstorage-arch.png
.. figure:: figures/objectstorage-arch.png
|
Because access servers are collocated in their own tier, you can scale
out read/write access regardless of the storage capacity. For example,
@ -39,11 +38,12 @@ Typically, the tier consists of a collection of 1U servers. These
machines use a moderate amount of RAM and are network I/O intensive.
Since these systems field each incoming API request, you should
provision them with two high-throughput (10GbE) interfaces - one for the
incoming "front-end" requests and the other for the "back-end" access to
incoming ``front-end`` requests and the other for the ``back-end`` access to
the object storage nodes to put and fetch data.
Factors to consider
-------------------
For most publicly facing deployments as well as private deployments
available across a wide-reaching corporate network, you use SSL to
encrypt traffic to the client. SSL adds significant processing load to
@ -53,6 +53,7 @@ deployments on trusted networks.
Storage nodes
~~~~~~~~~~~~~
In most configurations, each of the five zones should have an equal
amount of storage capacity. Storage nodes use a reasonable amount of
memory and CPU. Metadata needs to be readily available to return objects
@ -64,11 +65,10 @@ workload and desired performance.
**Object Storage (swift)**
|
.. image:: figures/objectstorage-nodes.png
.. figure:: figures/objectstorage-nodes.png
|
Currently, a 2 TB or 3 TB SATA disk delivers good performance for the
price. You can use desktop-grade drives if you have responsive remote
@ -76,6 +76,7 @@ hands in the datacenter and enterprise-grade drives if you don't.
Factors to consider
-------------------
You should keep in mind the desired I/O performance for single-threaded
requests. This system does not use RAID, so a single disk handles each
request for an object. Disk performance impacts single-threaded response

View File

@ -6,20 +6,26 @@ On system failures, the XFS file system can sometimes truncate files it is
trying to write and produce zero-byte files. The object-auditor will catch
these problems but in the case of a system crash it is advisable to run
an extra, less rate limited sweep, to check for these specific files.
You can run this command as follows::
You can run this command as follows:
swift-object-auditor /path/to/object-server/config/file.conf once -z 1000
.. code-block:: console
$ swift-object-auditor /path/to/object-server/config/file.conf once -z 1000
.. note::
"-z" means to only check for zero-byte files at 1000 files per second.
It is useful to run the object auditor on a specific device or set of devices.
You can run the object-auditor once as follows::
You can run the object-auditor once as follows:
swift-object-auditor /path/to/object-server/config/file.conf once /
--devices=sda,sdb
.. code-block:: console
$ swift-object-auditor /path/to/object-server/config/file.conf once \
--devices=sda,sdb
.. note::
This will run the object auditor on only the sda and sdb devices.
This will run the object auditor on only the ``sda`` and ``sdb`` devices.
This parameter accepts a comma-separated list of values.

View File

@ -25,7 +25,6 @@ high durability, and high concurrency are:
container databases and helps manage locations where data lives in
the cluster.
|
.. _objectstorage-building-blocks-figure:
@ -33,7 +32,6 @@ high durability, and high concurrency are:
.. figure:: figures/objectstorage-buildingblocks.png
|
Proxy servers
-------------
@ -85,7 +83,6 @@ sized drives are used in a cluster.
The ring is used by the proxy server and several background processes
(like replication).
|
.. _objectstorage-ring-figure:
@ -93,7 +90,6 @@ The ring is used by the proxy server and several background processes
.. figure:: figures/objectstorage-ring.png
|
These rings are externally managed, in that the server processes
themselves do not modify the rings, they are instead given new rings
@ -134,7 +130,6 @@ durability. This means that when choosing a replica location, Object
Storage chooses a server in an unused zone before an unused server in a
zone that already has a replica of the data.
|
.. _objectstorage-zones-figure:
@ -142,7 +137,6 @@ zone that already has a replica of the data.
.. figure:: figures/objectstorage-zones.png
|
When a disk fails, replica data is automatically distributed to the
other zones to ensure there are three copies of the data.
@ -155,7 +149,6 @@ distributed across the cluster. An account database contains the list of
containers in that account. A container database contains the list of
objects in that container.
|
.. _objectstorage-accountscontainers-figure:
@ -163,7 +156,6 @@ objects in that container.
.. figure:: figures/objectstorage-accountscontainers.png
|
To keep track of object data locations, each account in the system has a
database that references all of its containers, and each container
@ -190,7 +182,6 @@ Implementing a partition is conceptually simple, a partition is just a
directory sitting on a disk with a corresponding hash table of what it
contains.
|
.. _objectstorage-partitions-figure:
@ -198,7 +189,6 @@ contains.
.. figure:: figures/objectstorage-partitions.png
|
Replicators
-----------
@ -224,7 +214,6 @@ of hashes to compare.
The cluster eventually has a consistent behavior where the newest data
has a priority.
|
.. _objectstorage-replication-figure:
@ -232,7 +221,6 @@ has a priority.
.. figure:: figures/objectstorage-replication.png
|
If a zone goes down, one of the nodes containing a replica notices and
proactively copies data to a handoff location.
@ -263,7 +251,6 @@ successful before the client is notified that the upload was successful.
Next, the container database is updated asynchronously to reflect that
there is a new object in it.
|
.. _objectstorage-usecase-figure:
@ -271,7 +258,6 @@ there is a new object in it.
.. figure:: figures/objectstorage-usecase.png
|
Download
~~~~~~~~

View File

@ -2,7 +2,7 @@
Introduction to Object Storage
==============================
OpenStack Object Storage (code-named swift) is open source software for
OpenStack Object Storage (code-named swift) is an open source software for
creating redundant, scalable data storage using clusters of standardized
servers to store petabytes of accessible data. It is a long-term storage
system for large amounts of static data that can be retrieved,

View File

@ -20,8 +20,8 @@ that data gets to where it belongs. The ring handles replica placement.
To replicate deletions in addition to creations, every deleted record or
file in the system is marked by a tombstone. The replication process
cleans up tombstones after a time period known as the *consistency
window*. This window defines the duration of the replication and how
cleans up tombstones after a time period known as the ``consistency
window``. This window defines the duration of the replication and how
long transient failure can remove a node from the cluster. Tombstone
cleanup must be tied to replication to reach replica convergence.
@ -47,6 +47,7 @@ The main replication types are:
Database replication
~~~~~~~~~~~~~~~~~~~~
Database replication completes a low-cost hash comparison to determine
whether two replicas already match. Normally, this check can quickly
verify that most databases in the system are already synchronized. If
@ -73,6 +74,7 @@ performed.
Object replication
~~~~~~~~~~~~~~~~~~
The initial implementation of object replication performed an rsync to
push data from a local partition to all remote servers where it was
expected to reside. While this worked at small scale, replication times

View File

@ -25,6 +25,7 @@ loss is possible, but data would be unreachable for an extended time.
Ring data structure
~~~~~~~~~~~~~~~~~~~
The ring data structure consists of three top level fields: a list of
devices in the cluster, a list of lists of device ids indicating
partition to device assignments, and an integer indicating the number of
@ -32,6 +33,7 @@ bits to shift an MD5 hash to calculate the partition for the hash.
Partition assignment list
~~~~~~~~~~~~~~~~~~~~~~~~~
This is a list of ``array('H')`` of devices ids. The outermost list
contains an ``array('H')`` for each replica. Each ``array('H')`` has a
length equal to the partition count for the ring. Each integer in the
@ -39,10 +41,12 @@ length equal to the partition count for the ring. Each integer in the
list is known internally to the Ring class as ``_replica2part2dev_id``.
So, to create a list of device dictionaries assigned to a partition, the
Python code would look like::
Python code would look like:
devices = [self.devs[part2dev_id[partition]] for
part2dev_id in self._replica2part2dev_id]
.. code-block:: python
devices = [self.devs[part2dev_id[partition]] for
part2dev_id in self._replica2part2dev_id]
That code is a little simplistic because it does not account for the
removal of duplicate devices. If a ring has more replicas than devices,
@ -91,6 +95,7 @@ and B's disks are only 72.7% full.
Replica counts
~~~~~~~~~~~~~~
To support the gradual change in replica counts, a ring can have a real
number of replicas and is not restricted to an integer number of
replicas.
@ -100,12 +105,12 @@ partitions. It indicates the average number of replicas for each
partition. For example, a replica count of 3.2 means that 20 percent of
partitions have four replicas and 80 percent have three replicas.
The replica count is adjustable.
The replica count is adjustable. For example:
Example::
.. code-block:: console
$ swift-ring-builder account.builder set_replicas 4
$ swift-ring-builder account.builder rebalance
$ swift-ring-builder account.builder set_replicas 4
$ swift-ring-builder account.builder rebalance
You must rebalance the replica ring in globally distributed clusters.
Operators of these clusters generally want an equal number of replicas
@ -114,42 +119,46 @@ operator adds or removes a replica. Removing unneeded replicas saves on
the cost of disks.
You can gradually increase the replica count at a rate that does not
adversely affect cluster performance.
adversely affect cluster performance. For example:
For example::
.. code-block:: console
$ swift-ring-builder object.builder set_replicas 3.01
$ swift-ring-builder object.builder rebalance
<distribute rings and wait>...
$ swift-ring-builder object.builder set_replicas 3.01
$ swift-ring-builder object.builder rebalance
<distribute rings and wait>...
$ swift-ring-builder object.builder set_replicas 3.02
$ swift-ring-builder object.builder rebalance
<distribute rings and wait>...
$ swift-ring-builder object.builder set_replicas 3.02
$ swift-ring-builder object.builder rebalance
<distribute rings and wait>...
Changes take effect after the ring is rebalanced. Therefore, if you
intend to change from 3 replicas to 3.01 but you accidentally type
2.01, no data is lost.
Additionally, the ``swift-ring-builder X.builder create`` command can now
take a decimal argument for the number of replicas.
Additionally, the :command:`swift-ring-builder X.builder create` command can
now take a decimal argument for the number of replicas.
Partition shift value
~~~~~~~~~~~~~~~~~~~~~
The partition shift value is known internally to the Ring class as
``_part_shift``. This value is used to shift an MD5 hash to calculate
the partition where the data for that hash should reside. Only the top
four bytes of the hash is used in this process. For example, to compute
the partition for the :file:`/account/container/object` path using Python::
the partition for the ``/account/container/object`` path using Python:
partition = unpack_from('>I',
md5('/account/container/object').digest())[0] >>
self._part_shift
.. code-block:: python
partition = unpack_from('>I',
md5('/account/container/object').digest())[0] >>
self._part_shift
For a ring generated with part\_power P, the partition shift value is
``32 - P``.
Build the ring
~~~~~~~~~~~~~~
The ring builder process includes these high-level steps:
#. The utility calculates the number of partitions to assign to each

View File

@ -15,9 +15,9 @@ created image:
**To configure tenant-specific image locations**
#. Configure swift as your ``default_store`` in the
:file:`glance-api.conf` file.
``glance-api.conf`` file.
#. Set these configuration options in the :file:`glance-api.conf` file:
#. Set these configuration options in the ``glance-api.conf`` file:
- swift_store_multi_tenant
Set to ``True`` to enable tenant-specific storage locations.