Convert Object Storage files to RST
-Object Storage Intro to RST. -Object Storage Features table to RST. -Object Storage Components to RST. -Added related figures. Change-Id: I835a4387d64afa38705fbf8e67ff89f1d7f45a3f Implements: blueprint reorganise-user-guides
After Width: | Height: | Size: 32 KiB |
After Width: | Height: | Size: 48 KiB |
After Width: | Height: | Size: 28 KiB |
After Width: | Height: | Size: 45 KiB |
BIN
doc/admin-guide-cloud-rst/source/figures/objectstorage-ring.png
Normal file
After Width: | Height: | Size: 23 KiB |
After Width: | Height: | Size: 61 KiB |
BIN
doc/admin-guide-cloud-rst/source/figures/objectstorage-zones.png
Normal file
After Width: | Height: | Size: 10 KiB |
@ -8,13 +8,12 @@ Contents
|
|||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 2
|
:maxdepth: 2
|
||||||
|
|
||||||
objectstorage_characteristics.rst
|
|
||||||
|
|
||||||
.. TODO (karenb)
|
|
||||||
objectstorage_intro.rst
|
objectstorage_intro.rst
|
||||||
objectstorage_features.rst
|
objectstorage_features.rst
|
||||||
objectstorage_characteristics.rst
|
objectstorage_characteristics.rst
|
||||||
objectstorage_components.rst
|
objectstorage_components.rst
|
||||||
|
|
||||||
|
.. TODO (karenb)
|
||||||
objectstorage_ringbuilder.rst
|
objectstorage_ringbuilder.rst
|
||||||
objectstorage_arch.rst
|
objectstorage_arch.rst
|
||||||
objectstorage_replication.rst
|
objectstorage_replication.rst
|
||||||
|
283
doc/admin-guide-cloud-rst/source/objectstorage_components.rst
Normal file
@ -0,0 +1,283 @@
|
|||||||
|
==========
|
||||||
|
Components
|
||||||
|
==========
|
||||||
|
|
||||||
|
The components that enable Object Storage to deliver high availability,
|
||||||
|
high durability, and high concurrency are:
|
||||||
|
|
||||||
|
- **Proxy servers.** Handle all of the incoming API requests.
|
||||||
|
|
||||||
|
- **Rings.** Map logical names of data to locations on particular
|
||||||
|
disks.
|
||||||
|
|
||||||
|
- **Zones.** Isolate data from other zones. A failure in one zone
|
||||||
|
doesn't impact the rest of the cluster because data is replicated
|
||||||
|
across zones.
|
||||||
|
|
||||||
|
- **Accounts and containers.** Each account and container are
|
||||||
|
individual databases that are distributed across the cluster. An
|
||||||
|
account database contains the list of containers in that account. A
|
||||||
|
container database contains the list of objects in that container.
|
||||||
|
|
||||||
|
- **Objects.** The data itself.
|
||||||
|
|
||||||
|
- **Partitions.** A partition stores objects, account databases, and
|
||||||
|
container databases and helps manage locations where data lives in
|
||||||
|
the cluster.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
.. _objectstorage-building-blocks-figure:
|
||||||
|
|
||||||
|
**Object Storage building blocks**
|
||||||
|
|
||||||
|
.. figure:: figures/objectstorage-buildingblocks.png
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
Proxy servers
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Proxy servers are the public face of Object Storage and handle all of
|
||||||
|
the incoming API requests. Once a proxy server receives a request, it
|
||||||
|
determines the storage node based on the object's URL, for example,
|
||||||
|
https://swift.example.com/v1/account/container/object. Proxy servers
|
||||||
|
also coordinate responses, handle failures, and coordinate timestamps.
|
||||||
|
|
||||||
|
Proxy servers use a shared-nothing architecture and can be scaled as
|
||||||
|
needed based on projected workloads. A minimum of two proxy servers
|
||||||
|
should be deployed for redundancy. If one proxy server fails, the others
|
||||||
|
take over.
|
||||||
|
|
||||||
|
For more information concerning proxy server configuration, please see
|
||||||
|
the `Configuration
|
||||||
|
Reference <http://docs.openstack.org/trunk/config-reference/content/proxy-server-configuration.html>`__.
|
||||||
|
|
||||||
|
Rings
|
||||||
|
-----
|
||||||
|
|
||||||
|
A ring represents a mapping between the names of entities stored on disk
|
||||||
|
and their physical locations. There are separate rings for accounts,
|
||||||
|
containers, and objects. When other components need to perform any
|
||||||
|
operation on an object, container, or account, they need to interact
|
||||||
|
with the appropriate ring to determine their location in the cluster.
|
||||||
|
|
||||||
|
The ring maintains this mapping using zones, devices, partitions, and
|
||||||
|
replicas. Each partition in the ring is replicated, by default, three
|
||||||
|
times across the cluster, and partition locations are stored in the
|
||||||
|
mapping maintained by the ring. The ring is also responsible for
|
||||||
|
determining which devices are used for handoff in failure scenarios.
|
||||||
|
|
||||||
|
Data can be isolated into zones in the ring. Each partition replica is
|
||||||
|
guaranteed to reside in a different zone. A zone could represent a
|
||||||
|
drive, a server, a cabinet, a switch, or even a data center.
|
||||||
|
|
||||||
|
The partitions of the ring are equally divided among all of the devices
|
||||||
|
in the Object Storage installation. When partitions need to be moved
|
||||||
|
around (for example, if a device is added to the cluster), the ring
|
||||||
|
ensures that a minimum number of partitions are moved at a time, and
|
||||||
|
only one replica of a partition is moved at a time.
|
||||||
|
|
||||||
|
You can use weights to balance the distribution of partitions on drives
|
||||||
|
across the cluster. This can be useful, for example, when differently
|
||||||
|
sized drives are used in a cluster.
|
||||||
|
|
||||||
|
The ring is used by the proxy server and several background processes
|
||||||
|
(like replication).
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
.. _objectstorage-ring-figure:
|
||||||
|
|
||||||
|
**The ring**
|
||||||
|
|
||||||
|
.. figure:: figures/objectstorage-ring.png
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
These rings are externally managed, in that the server processes
|
||||||
|
themselves do not modify the rings, they are instead given new rings
|
||||||
|
modified by other tools.
|
||||||
|
|
||||||
|
The ring uses a configurable number of bits from an MD5 hash for a path
|
||||||
|
as a partition index that designates a device. The number of bits kept
|
||||||
|
from the hash is known as the partition power, and 2 to the partition
|
||||||
|
power indicates the partition count. Partitioning the full MD5 hash ring
|
||||||
|
allows other parts of the cluster to work in batches of items at once
|
||||||
|
which ends up either more efficient or at least less complex than
|
||||||
|
working with each item separately or the entire cluster all at once.
|
||||||
|
|
||||||
|
Another configurable value is the replica count, which indicates how
|
||||||
|
many of the partition-device assignments make up a single ring. For a
|
||||||
|
given partition number, each replica's device will not be in the same
|
||||||
|
zone as any other replica's device. Zones can be used to group devices
|
||||||
|
based on physical locations, power separations, network separations, or
|
||||||
|
any other attribute that would improve the availability of multiple
|
||||||
|
replicas at the same time.
|
||||||
|
|
||||||
|
Zones
|
||||||
|
-----
|
||||||
|
|
||||||
|
Object Storage allows configuring zones in order to isolate failure
|
||||||
|
boundaries. Each data replica resides in a separate zone, if possible.
|
||||||
|
At the smallest level, a zone could be a single drive or a grouping of a
|
||||||
|
few drives. If there were five object storage servers, then each server
|
||||||
|
would represent its own zone. Larger deployments would have an entire
|
||||||
|
rack (or multiple racks) of object servers, each representing a zone.
|
||||||
|
The goal of zones is to allow the cluster to tolerate significant
|
||||||
|
outages of storage servers without losing all replicas of the data.
|
||||||
|
|
||||||
|
As mentioned earlier, everything in Object Storage is stored, by
|
||||||
|
default, three times. Swift will place each replica
|
||||||
|
"as-uniquely-as-possible" to ensure both high availability and high
|
||||||
|
durability. This means that when chosing a replica location, Object
|
||||||
|
Storage chooses a server in an unused zone before an unused server in a
|
||||||
|
zone that already has a replica of the data.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
.. _objectstorage-zones-figure:
|
||||||
|
|
||||||
|
**Zones**
|
||||||
|
|
||||||
|
.. figure:: figures/objectstorage-zones.png
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
When a disk fails, replica data is automatically distributed to the
|
||||||
|
other zones to ensure there are three copies of the data.
|
||||||
|
|
||||||
|
Accounts and containers
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
Each account and container is an individual SQLite database that is
|
||||||
|
distributed across the cluster. An account database contains the list of
|
||||||
|
containers in that account. A container database contains the list of
|
||||||
|
objects in that container.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
.. _objectstorage-accountscontainers-figure:
|
||||||
|
|
||||||
|
**Accounts and containers**
|
||||||
|
|
||||||
|
.. figure:: figures/objectstorage-accountscontainers.png
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
To keep track of object data locations, each account in the system has a
|
||||||
|
database that references all of its containers, and each container
|
||||||
|
database references each object.
|
||||||
|
|
||||||
|
Partitions
|
||||||
|
----------
|
||||||
|
|
||||||
|
A partition is a collection of stored data, including account databases,
|
||||||
|
container databases, and objects. Partitions are core to the replication
|
||||||
|
system.
|
||||||
|
|
||||||
|
Think of a partition as a bin moving throughout a fulfillment center
|
||||||
|
warehouse. Individual orders get thrown into the bin. The system treats
|
||||||
|
that bin as a cohesive entity as it moves throughout the system. A bin
|
||||||
|
is easier to deal with than many little things. It makes for fewer
|
||||||
|
moving parts throughout the system.
|
||||||
|
|
||||||
|
System replicators and object uploads/downloads operate on partitions.
|
||||||
|
As the system scales up, its behavior continues to be predictable
|
||||||
|
because the number of partitions is a fixed number.
|
||||||
|
|
||||||
|
Implementing a partition is conceptually simple, a partition is just a
|
||||||
|
directory sitting on a disk with a corresponding hash table of what it
|
||||||
|
contains.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
.. _objectstorage-partitions-figure:
|
||||||
|
|
||||||
|
**Partitions**
|
||||||
|
|
||||||
|
.. figure:: figures/objectstorage-partitions.png
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
Replicators
|
||||||
|
-----------
|
||||||
|
|
||||||
|
In order to ensure that there are three copies of the data everywhere,
|
||||||
|
replicators continuously examine each partition. For each local
|
||||||
|
partition, the replicator compares it against the replicated copies in
|
||||||
|
the other zones to see if there are any differences.
|
||||||
|
|
||||||
|
The replicator knows if replication needs to take place by examining
|
||||||
|
hashes. A hash file is created for each partition, which contains hashes
|
||||||
|
of each directory in the partition. Each of the three hash files is
|
||||||
|
compared. For a given partition, the hash files for each of the
|
||||||
|
partition's copies are compared. If the hashes are different, then it is
|
||||||
|
time to replicate, and the directory that needs to be replicated is
|
||||||
|
copied over.
|
||||||
|
|
||||||
|
This is where partitions come in handy. With fewer things in the system,
|
||||||
|
larger chunks of data are transferred around (rather than lots of little
|
||||||
|
TCP connections, which is inefficient) and there is a consistent number
|
||||||
|
of hashes to compare.
|
||||||
|
|
||||||
|
The cluster eventually has a consistent behavior where the newest data
|
||||||
|
has a priority.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
.. _objectstorage-replication-figure:
|
||||||
|
|
||||||
|
**Replication**
|
||||||
|
|
||||||
|
.. figure:: figures/objectstorage-replication.png
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
If a zone goes down, one of the nodes containing a replica notices and
|
||||||
|
proactively copies data to a handoff location.
|
||||||
|
|
||||||
|
Use cases
|
||||||
|
---------
|
||||||
|
|
||||||
|
The following sections show use cases for object uploads and downloads
|
||||||
|
and introduce the components.
|
||||||
|
|
||||||
|
|
||||||
|
Upload
|
||||||
|
~~~~~~
|
||||||
|
|
||||||
|
A client uses the REST API to make a HTTP request to PUT an object into
|
||||||
|
an existing container. The cluster receives the request. First, the
|
||||||
|
system must figure out where the data is going to go. To do this, the
|
||||||
|
account name, container name, and object name are all used to determine
|
||||||
|
the partition where this object should live.
|
||||||
|
|
||||||
|
Then a lookup in the ring figures out which storage nodes contain the
|
||||||
|
partitions in question.
|
||||||
|
|
||||||
|
The data is then sent to each storage node where it is placed in the
|
||||||
|
appropriate partition. At least two of the three writes must be
|
||||||
|
successful before the client is notified that the upload was successful.
|
||||||
|
|
||||||
|
Next, the container database is updated asynchronously to reflect that
|
||||||
|
there is a new object in it.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
.. _objectstorage-usecase-figure:
|
||||||
|
|
||||||
|
**Object Storage in use**
|
||||||
|
|
||||||
|
.. figure:: figures/objectstorage-usecase.png
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
Download
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
A request comes in for an account/container/object. Using the same
|
||||||
|
consistent hashing, the partition name is generated. A lookup in the
|
||||||
|
ring reveals which storage nodes contain that partition. A request is
|
||||||
|
made to one of the storage nodes to fetch the object and, if that fails,
|
||||||
|
requests are made to the other nodes.
|
72
doc/admin-guide-cloud-rst/source/objectstorage_features.rst
Normal file
@ -0,0 +1,72 @@
|
|||||||
|
=====================
|
||||||
|
Features and benefits
|
||||||
|
=====================
|
||||||
|
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Features | Benefits |
|
||||||
|
+=============================+==================================================+
|
||||||
|
| Leverages commodity | No lock-in, lower price/GB. |
|
||||||
|
| hardware | |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| HDD/node failure agnostic | Self-healing, reliable, data redundancy protects |
|
||||||
|
| | from failures. |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Unlimited storage | Large and flat namespace, highly scalable |
|
||||||
|
| | read/write access, able to serve content |
|
||||||
|
| | directly from storage system. |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Multi-dimensional | Scale-out architecture: Scale vertically and |
|
||||||
|
| scalability | horizontally-distributed storage. Backs up |
|
||||||
|
| | and archives large amounts of data with |
|
||||||
|
| | linear performance. |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Account/container/object | No nesting, not a traditional file system: |
|
||||||
|
| structure | Optimized for scale, it scales to multiple |
|
||||||
|
| | petabytes and billions of objects. |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Built-in replication | A configurable number of accounts, containers |
|
||||||
|
| 3✕ + data redundancy | and object copies for high availability. |
|
||||||
|
| (compared with 2✕ on RAID) | |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Easily add capacity (unlike | Elastic data scaling with ease |
|
||||||
|
| RAID resize) | |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| No central database | Higher performance, no bottlenecks |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| RAID not required | Handle many small, random reads and writes |
|
||||||
|
| | efficiently |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Built-in management | Account management: Create, add, verify, |
|
||||||
|
| utilities | and delete users; Container management: Upload, |
|
||||||
|
| | download, and verify; Monitoring: Capacity, |
|
||||||
|
| | host, network, log trawling, and cluster health. |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Drive auditing | Detect drive failures preempting data corruption |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Expiring objects | Users can set an expiration time or a TTL on an |
|
||||||
|
| | object to control access |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Direct object access | Enable direct browser access to content, such as |
|
||||||
|
| | for a control panel |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Realtime visibility into | Know what users are requesting. |
|
||||||
|
| client requests | |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Supports S3 API | Utilize tools that were designed for the popular |
|
||||||
|
| | S3 API. |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Restrict containers per | Limit access to control usage by user. |
|
||||||
|
| account | |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Support for NetApp, | Unified support for block volumes using a |
|
||||||
|
| Nexenta, SolidFire | variety of storage systems. |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Snapshot and backup API for | Data protection and recovery for VM data. |
|
||||||
|
| block volumes | |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Standalone volume API | Separate endpoint and API for integration with |
|
||||||
|
| available | other compute systems. |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
||||||
|
| Integration with Compute | Fully integrated with Compute for attaching |
|
||||||
|
| | block volumes and reporting on usage. |
|
||||||
|
+-----------------------------+--------------------------------------------------+
|
23
doc/admin-guide-cloud-rst/source/objectstorage_intro.rst
Normal file
@ -0,0 +1,23 @@
|
|||||||
|
==============================
|
||||||
|
Introduction to Object Storage
|
||||||
|
==============================
|
||||||
|
|
||||||
|
OpenStack Object Storage (code-named swift) is open source software for
|
||||||
|
creating redundant, scalable data storage using clusters of standardized
|
||||||
|
servers to store petabytes of accessible data. It is a long-term storage
|
||||||
|
system for large amounts of static data that can be retrieved,
|
||||||
|
leveraged, and updated. Object Storage uses a distributed architecture
|
||||||
|
with no central point of control, providing greater scalability,
|
||||||
|
redundancy, and permanence. Objects are written to multiple hardware
|
||||||
|
devices, with the OpenStack software responsible for ensuring data
|
||||||
|
replication and integrity across the cluster. Storage clusters scale
|
||||||
|
horizontally by adding new nodes. Should a node fail, OpenStack works to
|
||||||
|
replicate its content from other active nodes. Because OpenStack uses
|
||||||
|
software logic to ensure data replication and distribution across
|
||||||
|
different devices, inexpensive commodity hard drives and servers can be
|
||||||
|
used in lieu of more expensive equipment.
|
||||||
|
|
||||||
|
Object Storage is ideal for cost effective, scale-out storage. It
|
||||||
|
provides a fully distributed, API-accessible storage platform that can
|
||||||
|
be integrated directly into applications or used for backup, archiving,
|
||||||
|
and data retention.
|