Attempted to differentiate between what a ring represents and what it is constructed of without losing the conversational tone of the document. Also added a link to the more in-depth document 'overview_ring'. More work and more links to more information need to be added, but this is my first change, and I want to be sure I'm in line with others' ideas.
This commit is contained in:
@@ -2,28 +2,53 @@
|
|||||||
Swift Architectural Overview
|
Swift Architectural Overview
|
||||||
============================
|
============================
|
||||||
|
|
||||||
|
.. TODO - add links to more detailed overview in each section below.
|
||||||
|
|
||||||
|
------------
|
||||||
|
Proxy Server
|
||||||
|
------------
|
||||||
|
|
||||||
|
The Proxy Server is responsible for tying together the rest of the Swift
|
||||||
|
architecture. For each request, it will look up the location of the account,
|
||||||
|
container, or object in the ring (see below) and route the request accordingly.
|
||||||
|
The public API is also exposed through the Proxy Server.
|
||||||
|
|
||||||
|
A large number of failures are also handled in the Proxy Server. For
|
||||||
|
example, if a server is unavailable for an object PUT, it will ask the
|
||||||
|
ring for a handoff server and route there instead.
|
||||||
|
|
||||||
|
When objects are streamed to or from an object server, they are streamed
|
||||||
|
directly through the proxy server to of from the user -- the proxy server
|
||||||
|
does not spool them.
|
||||||
|
|
||||||
--------
|
--------
|
||||||
The Ring
|
The Ring
|
||||||
--------
|
--------
|
||||||
|
|
||||||
The Ring is a mapping of a requested account, container, or object to the
|
A ring represents a mapping between the names of entities stored on disk and
|
||||||
server, device, and partition that it should reside in. The partitions
|
their physical location. There are separate rings for accounts, containers, and
|
||||||
of the ring are equally divided among all the devices in the cluster.
|
objects. When other components need to perform any operation on an object,
|
||||||
When an event occurs that requires partitions to be moved around (for
|
container, or account, they need to interact with the appropriate ring to
|
||||||
example if a device is added to the cluster), it ensures that a minimum
|
determine its location in the cluster.
|
||||||
number of partitions are moved at a time, and only one replica of a
|
|
||||||
partition is moved at a time.
|
|
||||||
|
|
||||||
|
The Ring maintains this mapping using zones, devices, partitions, and replicas.
|
||||||
Each partition in the ring is replicated, by default, 3 times accross the
|
Each partition in the ring is replicated, by default, 3 times accross the
|
||||||
cluster, and thus stored in the mapping. The ring is also responsible
|
cluster, and the locations for a partition are stored in the mapping maintained
|
||||||
for determining which devices are used for handoff in failure scenarios.
|
by the ring. The ring is also responsible for determining which devices are
|
||||||
|
used for handoff in failure scenarios.
|
||||||
|
|
||||||
Data can be isolated with the concept of zones in the ring. Each replica
|
Data can be isolated with the concept of zones in the ring. Each replica
|
||||||
of a partition is guaranteed to reside in a different zone, A zone could
|
of a partition is guaranteed to reside in a different zone. A zone could
|
||||||
represent a drive, a server, a cabinet, a switch, or even a datacenter.
|
represent a drive, a server, a cabinet, a switch, or even a datacenter.
|
||||||
|
|
||||||
|
The partitions of the ring are equally divided among all the devices in the
|
||||||
|
Swift installation. When partitions need to be moved around (for example if a
|
||||||
|
device is added to the cluster), the ring ensures that a minimum number of
|
||||||
|
partitions are moved at a time, and only one replica of a partition is moved at
|
||||||
|
a time.
|
||||||
|
|
||||||
Weights can be used to balance the distribution of partitions on drives
|
Weights can be used to balance the distribution of partitions on drives
|
||||||
across the cluster. This can be useful, for example, if different sized
|
across the cluster. This can be useful, for example, when different sized
|
||||||
drives are used in a cluster.
|
drives are used in a cluster.
|
||||||
|
|
||||||
The ring is used by the Proxy server and several background processes
|
The ring is used by the Proxy server and several background processes
|
||||||
@@ -37,7 +62,7 @@ The Object Server is a very simple blob storage server that can store,
|
|||||||
retrieve and delete objects stored on local devices. Objects are stored
|
retrieve and delete objects stored on local devices. Objects are stored
|
||||||
as binary files on the filesystem with metadata stored in the file's
|
as binary files on the filesystem with metadata stored in the file's
|
||||||
extended attributes (xattrs). This requires that the underlying filesystem
|
extended attributes (xattrs). This requires that the underlying filesystem
|
||||||
choice for object servers must support xattrs on files. Some filesystems,
|
choice for object servers support xattrs on files. Some filesystems,
|
||||||
like ext3, have xattrs turned off by default.
|
like ext3, have xattrs turned off by default.
|
||||||
|
|
||||||
Each object is stored using a path derived from the object name's hash and
|
Each object is stored using a path derived from the object name's hash and
|
||||||
@@ -65,29 +90,12 @@ Account Server
|
|||||||
The Account Server is very similar to the Container Server, excepting that
|
The Account Server is very similar to the Container Server, excepting that
|
||||||
it is responsible for listings of containers rather than objects.
|
it is responsible for listings of containers rather than objects.
|
||||||
|
|
||||||
------------
|
|
||||||
Proxy Server
|
|
||||||
------------
|
|
||||||
|
|
||||||
The Proxy Server is responsible for tying the above servers together. For
|
|
||||||
each request, it will look up the location of the account, container, or
|
|
||||||
object in the ring and route the request accordingly. The public API is
|
|
||||||
also exposed through the Proxy Server.
|
|
||||||
|
|
||||||
A large number of failures are also handled in the Proxy Server. For
|
|
||||||
example, if a server is unavailable for an object PUT, it will ask the
|
|
||||||
ring for a handoff server, and route there instead.
|
|
||||||
|
|
||||||
When objects are streamed to or from an object server, they are streamed
|
|
||||||
directly through the proxy server to or from the user -- the proxy server
|
|
||||||
does not spool them.
|
|
||||||
|
|
||||||
-----------
|
-----------
|
||||||
Replication
|
Replication
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
Replication is designed to keep the system in a consistent state in the face
|
Replication is designed to keep the system in a consistent state in the face
|
||||||
of temporary error conditions like network partitions or drive failures.
|
of temporary error conditions like network outages or drive failures.
|
||||||
|
|
||||||
The replication processes compare local data with each remote copy to ensure
|
The replication processes compare local data with each remote copy to ensure
|
||||||
they all contain the latest version. Object replication uses a hash list to
|
they all contain the latest version. Object replication uses a hash list to
|
||||||
@@ -134,3 +142,4 @@ for example), the file is quarantined, and replication will replace the bad
|
|||||||
file from another replica. If other errors are found they are logged (for
|
file from another replica. If other errors are found they are logged (for
|
||||||
example, an object's listing can't be found on any container server it
|
example, an object's listing can't be found on any container server it
|
||||||
should be).
|
should be).
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user