Attempted to differentiate between what a ring represents and what it is constructed of without losing the conversational tone of the document. Also added a link to the more in-depth document 'overview_ring'. More work and more links to more information need to be added, but this is my first change, and I want to be sure I'm in line with others' ideas.

This commit is contained in:
Brian K. Jones
2010-07-23 18:16:58 +00:00
committed by Tarmac

View File

@@ -2,28 +2,53 @@
Swift Architectural Overview
============================
.. TODO - add links to more detailed overview in each section below.
------------
Proxy Server
------------
The Proxy Server is responsible for tying together the rest of the Swift
architecture. For each request, it will look up the location of the account,
container, or object in the ring (see below) and route the request accordingly.
The public API is also exposed through the Proxy Server.
A large number of failures are also handled in the Proxy Server. For
example, if a server is unavailable for an object PUT, it will ask the
ring for a handoff server and route there instead.
When objects are streamed to or from an object server, they are streamed
directly through the proxy server to of from the user -- the proxy server
does not spool them.
--------
The Ring
--------
The Ring is a mapping of a requested account, container, or object to the
server, device, and partition that it should reside in. The partitions
of the ring are equally divided among all the devices in the cluster.
When an event occurs that requires partitions to be moved around (for
example if a device is added to the cluster), it ensures that a minimum
number of partitions are moved at a time, and only one replica of a
partition is moved at a time.
A ring represents a mapping between the names of entities stored on disk and
their physical location. There are separate rings for accounts, containers, and
objects. When other components need to perform any operation on an object,
container, or account, they need to interact with the appropriate ring to
determine its location in the cluster.
The Ring maintains this mapping using zones, devices, partitions, and replicas.
Each partition in the ring is replicated, by default, 3 times accross the
cluster, and thus stored in the mapping. The ring is also responsible
for determining which devices are used for handoff in failure scenarios.
cluster, and the locations for a partition are stored in the mapping maintained
by the ring. The ring is also responsible for determining which devices are
used for handoff in failure scenarios.
Data can be isolated with the concept of zones in the ring. Each replica
of a partition is guaranteed to reside in a different zone, A zone could
of a partition is guaranteed to reside in a different zone. A zone could
represent a drive, a server, a cabinet, a switch, or even a datacenter.
The partitions of the ring are equally divided among all the devices in the
Swift installation. When partitions need to be moved around (for example if a
device is added to the cluster), the ring ensures that a minimum number of
partitions are moved at a time, and only one replica of a partition is moved at
a time.
Weights can be used to balance the distribution of partitions on drives
across the cluster. This can be useful, for example, if different sized
across the cluster. This can be useful, for example, when different sized
drives are used in a cluster.
The ring is used by the Proxy server and several background processes
@@ -37,7 +62,7 @@ The Object Server is a very simple blob storage server that can store,
retrieve and delete objects stored on local devices. Objects are stored
as binary files on the filesystem with metadata stored in the file's
extended attributes (xattrs). This requires that the underlying filesystem
choice for object servers must support xattrs on files. Some filesystems,
choice for object servers support xattrs on files. Some filesystems,
like ext3, have xattrs turned off by default.
Each object is stored using a path derived from the object name's hash and
@@ -65,29 +90,12 @@ Account Server
The Account Server is very similar to the Container Server, excepting that
it is responsible for listings of containers rather than objects.
------------
Proxy Server
------------
The Proxy Server is responsible for tying the above servers together. For
each request, it will look up the location of the account, container, or
object in the ring and route the request accordingly. The public API is
also exposed through the Proxy Server.
A large number of failures are also handled in the Proxy Server. For
example, if a server is unavailable for an object PUT, it will ask the
ring for a handoff server, and route there instead.
When objects are streamed to or from an object server, they are streamed
directly through the proxy server to or from the user -- the proxy server
does not spool them.
-----------
Replication
-----------
Replication is designed to keep the system in a consistent state in the face
of temporary error conditions like network partitions or drive failures.
of temporary error conditions like network outages or drive failures.
The replication processes compare local data with each remote copy to ensure
they all contain the latest version. Object replication uses a hash list to
@@ -134,3 +142,4 @@ for example), the file is quarantined, and replication will replace the bad
file from another replica. If other errors are found they are logged (for
example, an object's listing can't be found on any container server it
should be).