Add Storage Policy Documentation

Add overview and example information for using Storage Policies. DocImpact Implements: blueprint storage-policies Change-Id: I6f11f7a1bdaa6f3defb3baa56a820050e5f727f1
2014-04-07 14:22:27 -07:00
parent c1dc2fa624
commit e52e8bc917
13 changed files with 923 additions and 20 deletions
--- a/doc/saio/bin/remakerings
+++ b/doc/saio/bin/remakerings
@@ -10,6 +10,12 @@ swift-ring-builder object.builder add r1z2-127.0.0.1:6020/sdb2 1
 swift-ring-builder object.builder add r1z3-127.0.0.1:6030/sdb3 1
 swift-ring-builder object.builder add r1z4-127.0.0.1:6040/sdb4 1
 swift-ring-builder object.builder rebalance
+swift-ring-builder object-1.builder create 10 2 1
+swift-ring-builder object-1.builder add r1z1-127.0.0.1:6010/sdb1 1
+swift-ring-builder object-1.builder add r1z2-127.0.0.1:6020/sdb2 1
+swift-ring-builder object-1.builder add r1z3-127.0.0.1:6030/sdb3 1
+swift-ring-builder object-1.builder add r1z4-127.0.0.1:6040/sdb4 1
+swift-ring-builder object-1.builder rebalance
 swift-ring-builder container.builder create 10 3 1
 swift-ring-builder container.builder add r1z1-127.0.0.1:6011/sdb1 1
 swift-ring-builder container.builder add r1z2-127.0.0.1:6021/sdb2 1
--- a/doc/saio/swift/container-reconciler.conf
+++ b/doc/saio/swift/container-reconciler.conf
@@ -0,0 +1,47 @@
+[DEFAULT]
+# swift_dir = /etc/swift
+user = <your-user-name>
+# You can specify default log routing here if you want:
+# log_name = swift
+# log_facility = LOG_LOCAL0
+# log_level = INFO
+# log_address = /dev/log
+#
+# comma separated list of functions to call to setup custom log handlers.
+# functions get passed: conf, name, log_to_console, log_route, fmt, logger,
+# adapted_logger
+# log_custom_handlers =
+#
+# If set, log_udp_host will override log_address
+# log_udp_host =
+# log_udp_port = 514
+#
+# You can enable StatsD logging here:
+# log_statsd_host = localhost
+# log_statsd_port = 8125
+# log_statsd_default_sample_rate = 1.0
+# log_statsd_sample_rate_factor = 1.0
+# log_statsd_metric_prefix =
+
+[container-reconciler]
+# reclaim_age = 604800
+# interval = 300
+# request_tries = 3
+
+[pipeline:main]
+pipeline = catch_errors proxy-logging cache proxy-server
+
+[app:proxy-server]
+use = egg:swift#proxy
+# See proxy-server.conf-sample for options
+
+[filter:cache]
+use = egg:swift#memcache
+# See proxy-server.conf-sample for options
+
+[filter:proxy-logging]
+use = egg:swift#proxy_logging
+
+[filter:catch_errors]
+use = egg:swift#catch_errors
+# See proxy-server.conf-sample for options
--- a/doc/saio/swift/swift.conf
+++ b/doc/saio/swift/swift.conf
@@ -2,3 +2,10 @@
 # random unique strings that can never change (DO NOT LOSE)
 swift_hash_path_prefix = changeme
 swift_hash_path_suffix = changeme
+
+[storage-policy:0]
+name = gold
+default = yes
+
+[storage-policy:1]
+name = silver
--- a/doc/source/admin_guide.rst
+++ b/doc/source/admin_guide.rst
@@ -2,6 +2,33 @@
 Administrator's Guide
 =====================

+-------------------------
+Defining Storage Policies
+-------------------------
+
+Defining your Storage Policies is very easy to do with Swift.  It is important
+that the administrator understand the concepts behind Storage Policies
+before actually creating and using them in order to get the most benefit out
+of the feature and, more importantly, to avoid having to make unnecessary changes
+once a set of policies have been deployed to a cluster.
+
+It is highly recommended that the reader fully read and comprehend
+:doc:`overview_policies` before proceeding with administration of
+policies.  Plan carefully and it is suggested that experimentation be
+done first on a non-production cluster to be certain that the desired
+configuration meets the needs of the users.  See :ref:`upgrade-policy`
+before planning the upgrade of your existing deployment.
+
+Following is a high level view of the very few steps it takes to configure
+policies once you have decided what you want to do:
+
+  #. Define your policies in ``/etc/swift/swift.conf``
+  #. Create the corresponding object rings
+  #. Communicate the names of the Storage Policies to cluster users
+
+For a specific example that takes you through these steps, please see
+:doc:`policies_saio`
+
 ------------------
 Managing the Rings
 ------------------
@@ -32,15 +59,15 @@ For more information see :doc:`overview_ring`.
 Removing a device from the ring::

    swift-ring-builder <builder-file> remove <ip_address>/<device_name>
-    
+
 Removing a server from the ring::

    swift-ring-builder <builder-file> remove <ip_address>
-    
+
 Adding devices to the ring:

 See :ref:`ring-preparing`
-    
+
 See what devices for a server are in the ring::

    swift-ring-builder <builder-file> search <ip_address>
@@ -49,7 +76,7 @@ Once you are done with all changes to the ring, the changes need to be
 "committed"::

    swift-ring-builder <builder-file> rebalance
-    
+
 Once the new rings are built, they should be pushed out to all the servers
 in the cluster.

@@ -126,7 +153,7 @@ is replaced.  Once the drive is replaced, it can be re-added to the ring.
 Handling Server Failure
 -----------------------

-If a server is having hardware issues, it is a good idea to make sure the 
+If a server is having hardware issues, it is a good idea to make sure the
 swift services are not running.  This will allow Swift to work around the
 failure while you troubleshoot.

@@ -149,7 +176,7 @@ Detecting Failed Drives

 It has been our experience that when a drive is about to fail, error messages
 will spew into `/var/log/kern.log`.  There is a script called
-`swift-drive-audit` that can be run via cron to watch for bad drives.  If 
+`swift-drive-audit` that can be run via cron to watch for bad drives.  If
 errors are detected, it will unmount the bad drive, so that Swift can
 work around it.  The script takes a configuration file with the following
 settings:
@@ -170,7 +197,7 @@ log_file_pattern    /var/log/kern*  Location of the log file with globbing
                                    pattern to check against device errors
 regex_pattern_X     (see below)     Regular expression patterns to be used to
                                    locate device blocks with errors in the
-                                    log file  
+                                    log file
 ==================  ==============  ===========================================

 The default regex pattern used to locate device blocks with errors are
@@ -235,7 +262,7 @@ the cluster. Here is an example of a cluster in perfect health::
    Queried 2621 containers for dispersion reporting, 19s, 0 retries
    100.00% of container copies found (7863 of 7863)
    Sample represents 1.00% of the container partition space
-    
+
    Queried 2619 objects for dispersion reporting, 7s, 0 retries
    100.00% of object copies found (7857 of 7857)
    Sample represents 1.00% of the object partition space
@@ -251,7 +278,7 @@ that has::
    Queried 2621 containers for dispersion reporting, 8s, 0 retries
    100.00% of container copies found (7863 of 7863)
    Sample represents 1.00% of the container partition space
-    
+
    Queried 2619 objects for dispersion reporting, 7s, 0 retries
    There were 1763 partitions missing one copy.
    77.56% of object copies found (6094 of 7857)
@@ -285,7 +312,7 @@ You can also run the report for only containers or objects::
    100.00% of object copies found (7857 of 7857)
    Sample represents 1.00% of the object partition space

-Alternatively, the dispersion report can also be output in json format. This 
+Alternatively, the dispersion report can also be output in json format. This
 allows it to be more easily consumed by third party utilities::

    $ swift-dispersion-report -j
@@ -499,7 +526,7 @@ Request URI                 Description
 This information can also be queried via the swift-recon command line utility::

    fhines@ubuntu:~$ swift-recon -h
-    Usage: 
+    Usage:
            usage: swift-recon <server_type> [-v] [--suppress] [-a] [-r] [-u] [-d]
            [-l] [--md5] [--auditor] [--updater] [--expirer] [--sockstat]

@@ -893,8 +920,8 @@ Metric Name                              Description
 `object-server.PUT.timing`               Timing data for each PUT request not resulting in an
                                         error.
 `object-server.PUT.<device>.timing`      Timing data per kB transferred (ms/kB) for each
-                                         non-zero-byte PUT request on each device. 
-                                         Monitoring problematic devices, higher is bad. 
+                                         non-zero-byte PUT request on each device.
+                                         Monitoring problematic devices, higher is bad.
 `object-server.GET.errors.timing`        Timing data for GET request errors: bad request,
                                         not mounted, header timestamps before the epoch,
                                         precondition failed.
@@ -1046,7 +1073,7 @@ Managing Services
 -----------------

 Swift services are generally managed with `swift-init`. the general usage is
-``swift-init <service> <command>``, where service is the swift service to 
+``swift-init <service> <command>``, where service is the swift service to
 manage (for example object, container, account, proxy) and command is one of:

 ==========  ===============================================
@@ -1059,8 +1086,8 @@ shutdown    Attempt to gracefully shutdown the service
 reload      Attempt to gracefully restart the service
 ==========  ===============================================

-A graceful shutdown or reload will finish any current requests before 
-completely stopping the old service.  There is also a special case of 
+A graceful shutdown or reload will finish any current requests before
+completely stopping the old service.  There is also a special case of
 `swift-init all <command>`, which will run the command for all swift services.

 In cases where there are multiple configs for a service, a specific config
--- a/doc/source/container.rst
+++ b/doc/source/container.rst
@@ -34,6 +34,18 @@ Container Server
    :undoc-members:
    :show-inheritance:

+.. _container-replicator:
+
+Container Replicator
+====================
+
+.. automodule:: swift.container.replicator
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+.. _container-sync-daemon:
+
 Container Sync
 ==============

--- a/doc/source/development_saio.rst
+++ b/doc/source/development_saio.rst
@@ -341,6 +341,10 @@ commands are as follows:

     .. literalinclude:: /../saio/swift/object-expirer.conf

+  #. ``/etc/swift/container-reconciler.conf``
+
+     .. literalinclude:: /../saio/swift/container-reconciler.conf
+
  #. ``/etc/swift/account-server/1.conf``

     .. literalinclude:: /../saio/swift/account-server/1.conf
@@ -447,8 +451,15 @@ Setting up scripts for running Swift

        .. literalinclude:: /../saio/bin/remakerings

-     You can expect the ouptut from this command to produce the following::
+     You can expect the output from this command to produce the following (note
+     that 2 object rings are created in order to test storage policies in the
+     SAIO environment however they map to the same nodes)::

+        Device d0r1z1-127.0.0.1:6010R127.0.0.1:6010/sdb1_"" with 1.0 weight got id 0
+        Device d1r1z2-127.0.0.1:6020R127.0.0.1:6020/sdb2_"" with 1.0 weight got id 1
+        Device d2r1z3-127.0.0.1:6030R127.0.0.1:6030/sdb3_"" with 1.0 weight got id 2
+        Device d3r1z4-127.0.0.1:6040R127.0.0.1:6040/sdb4_"" with 1.0 weight got id 3
+        Reassigned 1024 (100.00%) partitions. Balance is now 0.00.
        Device d0r1z1-127.0.0.1:6010R127.0.0.1:6010/sdb1_"" with 1.0 weight got id 0
        Device d1r1z2-127.0.0.1:6020R127.0.0.1:6020/sdb2_"" with 1.0 weight got id 1
        Device d2r1z3-127.0.0.1:6030R127.0.0.1:6030/sdb3_"" with 1.0 weight got id 2
@@ -465,6 +476,8 @@ Setting up scripts for running Swift
        Device d3r1z4-127.0.0.1:6042R127.0.0.1:6042/sdb4_"" with 1.0 weight got id 3
        Reassigned 1024 (100.00%) partitions. Balance is now 0.00.

+  #. Read more about Storage Policies and your SAIO :doc:`policies_saio`
+
  #. Verify the unit tests run::

        $HOME/swift/.unittests
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -45,6 +45,7 @@ Overview and Concepts
    Swift's API docs <http://docs.openstack.org/api/openstack-object-storage/1.0/content/>
    overview_architecture
    overview_ring
+    overview_policies
    overview_reaper
    overview_auth
    overview_replication
@@ -65,6 +66,7 @@ Developer Documentation

    development_guidelines
    development_saio
+    policies_saio
    development_auth
    development_middleware
    development_ondisk_backends
--- a/doc/source/middleware.rst
+++ b/doc/source/middleware.rst
@@ -130,6 +130,8 @@ KeystoneAuth
    :members:
    :show-inheritance:

+.. _list_endpoints:
+
 List Endpoints
 ==============

--- a/doc/source/misc.rst
+++ b/doc/source/misc.rst
@@ -120,3 +120,12 @@ WSGI
 .. automodule:: swift.common.wsgi
    :members:
    :show-inheritance:
+
+.. _storage_policy:
+
+Storage Policy
+==============
+
+.. automodule:: swift.common.storage_policy
+    :members:
+    :show-inheritance:
--- a/doc/source/overview_architecture.rst
+++ b/doc/source/overview_architecture.rst
@@ -27,9 +27,9 @@ The Ring

 A ring represents a mapping between the names of entities stored on disk and
 their physical location. There are separate rings for accounts, containers, and
-objects. When other components need to perform any operation on an object,
-container, or account, they need to interact with the appropriate ring to
-determine its location in the cluster.
+one object ring per storage policy. When other components need to perform any
+operation on an object, container, or account, they need to interact with the
+appropriate ring to determine its location in the cluster.

 The Ring maintains this mapping using zones, devices, partitions, and replicas.
 Each partition in the ring is replicated, by default, 3 times across the
@@ -54,6 +54,33 @@ drives are used in a cluster.
 The ring is used by the Proxy server and several background processes
 (like replication).

+----------------
+Storage Policies
+----------------
+
+Storage Policies provide a way for object storage providers to differentiate
+service levels, features and behaviors of a Swift deployment.  Each Storage
+Policy configured in Swift is exposed to the client via an abstract name.
+Each device in the system is assigned to one or more Storage Policies.  This
+is accomplished through the use of multiple object rings, where each Storage
+Policy has an independent object ring, which may include a subset of hardware
+implementing a particular differentiation.
+
+For example, one might have the default policy with 3x replication, and create
+a second policy which, when applied to new containers only uses 2x replication.
+Another might add SSDs to a set of storage nodes and create a performance tier
+storage policy for certain containers to have their objects stored there.
+
+This mapping is then exposed on a per-container basis, where each container
+can be assigned a specific storage policy when it is created, which remains in
+effect for the lifetime of the container.  Applications require minimal
+awareness of storage policies to use them; once a container has been created
+with a specific policy, all objects stored in it will be done so in accordance
+with that policy.
+
+Storage Policies are not implemented as a separate code module but are a core
+abstraction of Swift architecture.
+
 -------------
 Object Server
 -------------
--- a/doc/source/overview_policies.rst
+++ b/doc/source/overview_policies.rst
@@ -0,0 +1,603 @@
+================
+Storage Policies
+================
+
+Storage Policies allow for some level of segmenting the cluster for various
+purposes through the creation of multiple object rings. Storage Policies are
+not implemented as a separate code module but are an important concept in
+understanding Swift architecture.
+
+As described in :doc:`overview_ring`, Swift uses modified hashing rings to
+determine where data should reside in the cluster. There is a separate ring
+for account databases, container databases, and there is also one object
+ring per storage policy.  Each object ring behaves exactly the same way
+and is maintained in the same manner, but with policies, different devices
+can belong to different rings with varying levels of replication. By supporting
+multiple object rings, Swift allows the application and/or deployer to
+essentially segregate the object storage within a single cluster.  There are
+many reasons why this might be desirable:
+
+* Different levels of replication:  If a provider wants to offer, for example,
+  2x replication and 3x replication but doesn't want to maintain 2 separate clusters,
+  they would setup a 2x policy and a 3x policy and assign the nodes to their
+  respective rings.
+
+* Performance:  Just as SSDs can be used as the exclusive members of an account or
+  database ring, an SSD-only object ring can be created as well and used to
+  implement a low-latency/high performance policy.
+
+* Collecting nodes into group:  Different object rings may have different
+  physical servers so that objects in specific storage policies are always
+  placed in a particular data center or geography.
+
+* Different Storage implementations:  Another example would be to collect
+  together a set of nodes that use a different Diskfile (e.g., Kinetic,
+  GlusterFS) and use a policy to direct traffic just to those nodes.
+
+.. note::
+
+    Today, choosing a different storage policy allows the use of different
+    object rings, but future policies (such as Erasure Coding) will also
+    change some of the actual code paths when processing a request.  Also note
+    that Diskfile refers to backend object storage plug-in architecture.
+
+-----------------------
+Containers and Policies
+-----------------------
+
+Policies are implemented at the container level.  There are many advantages to
+this approach, not the least of which is how easy it makes life on
+applications that want to take advantage of them.  It also ensures that
+Storage Policies remain a core feature of swift independent of the auth
+implementation.  Policies were not implemented at the account/auth layer
+because it would require changes to all auth systems in use by Swift
+deployers.  Each container has a new special immutable metadata element called
+the storage policy index.  Note that internally, Swift relies on policy
+indexes and not policy names.  Policy names exist for human readability and
+translation is managed in the proxy.  When a container is created, one new
+optional header is supported to specify the policy name.  If nothing is
+specified, the default policy is used (and if no other policies defined,
+Policy-0 is considered the default).  We will be covering the difference
+between default and Policy-0 in the next section.
+
+Policies are assigned when a container is created.  Once a container has been
+assigned a policy, it cannot be changed until the container is deleted.  The implications
+on data placement/movement for large datasets would make this a task best left for
+applications to perform. Therefore, if a container has an existing policy of,
+for example 3x replication, and one wanted to migrate that data to a policy that specifies,
+a different replication level, the application would create another container
+specifying the other policy name and then simply move the data from one container
+to the other.  Policies apply on a per container basis allowing for minimal application
+awareness; once a container has been created with a specific policy, all objects stored
+in it will be done so in accordance with that policy.  If a container with a
+specific name is deleted (requires the container be empty) a new container may
+be created with the same name without any restriction on storage policy
+enforced by the deleted container which previously shared the same name.
+
+Containers have a many-to-one relationship with policies meaning that any number
+of containers can share one policy.  There is no limit to how many containers can use
+a specific policy.
+
+The notion of associating a ring with a container introduces an interesting scenario:
+What would happen if 2 containers of the same name were created with different
+Storage Policies on either side of a network outage at the same time?  Furthermore,
+what would happen if objects were placed in those containers, a whole bunch of them,
+and then later the network outage was restored?  Well, without special care it would
+be a big problem as an application could end up using the wrong ring to try and find
+an object.  Luckily there is a solution for this problem, a daemon covered in more
+detail later, works tirelessly to identify and rectify this potential scenario.
+
+--------------------
+Container Reconciler
+--------------------
+
+Because atomicity of container creation cannot be enforced in a
+distributed eventually consistent system, object writes into the wrong
+storage policy must be eventually merged into the correct storage policy
+by an asynchronous daemon.  Recovery from object writes during a network
+partition which resulted in a split brain container created with
+different storage policies are handled by the
+`swift-container-reconciler` daemon.
+
+The container reconciler works off a queue similar to the
+object-expirer.  The queue is populated during container-replication.
+It is never considered incorrect to enqueue an object to be evaluated by
+the container-reconciler because if there is nothing wrong with the location
+of the object the reconciler will simply dequeue it.  The
+container-reconciler queue is an indexed log for the real location of an
+object for which a discrepancy in the storage policy of the container was
+discovered.
+
+To determine the correct storage policy of a container, it is necessary
+to update the status_changed_at field in the container_stat table when a
+container changes status from deleted to re-created.  This transaction
+log allows the container-replicator to update the correct storage policy
+both when replicating a container and handling REPLICATE requests.
+
+Because each object write is a separate distributed transaction it is
+not possible to determine the correctness of the storage policy for each
+object write with respect to the entire transaction log at a given
+container database.  As such, container databases will always record the
+object write regardless of the storage policy on a per object row basis.
+Object byte and count stats are tracked per storage policy in each
+container and reconciled using normal object row merge semantics.
+
+The object rows are ensured to be fully durable during replication using
+the normal container replication.  After the container
+replicator pushes its object rows to available primary nodes any
+misplaced object rows are bulk loaded into containers based off the
+object timestamp under the ".misplaced_objects" system account.  The
+rows are initially written to a handoff container on the local node, and
+at the end of the replication pass the .misplaced_object containers are
+replicated to the correct primary nodes.
+
+The container-reconciler processes the .misplaced_objects containers in
+descending order and reaps its containers as the objects represented by
+the rows are successfully reconciled.  The container-reconciler will
+always validate the correct storage policy for enqueued objects using
+direct container HEAD requests which are accelerated via caching.
+
+Because failure of individual storage nodes in aggregate is assumed to
+be common at scale the container-reconciler will make forward progress
+with a simple quorum majority.  During a combination of failures and
+rebalances it is possible that a quorum could provide an incomplete
+record of the correct storage policy - so an object write may have to be
+applied more than once.  Because storage nodes and container databases
+will not process writes with an ``X-Timestamp`` less than or equal to
+their existing record when objects writes are re-applied their timestamp
+is slightly incremented.  In order for this increment to be applied
+transparently to the client a second vector of time has been added to
+Swift for internal use.  See :class:`~swift.common.utils.Timestamp`.
+
+As the reconciler applies object writes to the correct storage policy it
+cleans up writes which no longer apply to the incorrect storage policy
+and removes the rows from the ``.misplaced_objects`` containers.  After all
+rows have been successfully processed it sleeps and will periodically
+check for newly enqueued rows to be discovered during container
+replication.
+
+.. _default-policy:
+
+-------------------------
+Default versus 'Policy-0'
+-------------------------
+
+Storage Policies is a versatile feature intended to support both new and
+pre-existing clusters with the same level of flexibility.  For that reason, we
+introduce the ``Policy-0`` concept which is not the same as the "default"
+policy.  As you will see when we begin to configure policies, each policy has
+both a name (human friendly, configurable) as well as an index (or simply
+policy number). Swift reserves index 0 to map to the object ring that's
+present in all installations (e.g., ``/etc/swift/object.ring.gz``).  You can
+name this policy anything you like, and if no policies are defined it will
+report itself as ``Policy-0``, however you cannot change the index as there must
+always be a policy with index 0.
+
+Another important concept is the default policy which can be any policy
+in the cluster.  The default policy is the policy that is automatically
+chosen when a container creation request is sent without a storage
+policy being specified. :ref:`configure-policy` describes how to set the
+default policy.  The difference from ``Policy-0`` is subtle but
+extremely important.  ``Policy-0`` is what is used by Swift when
+accessing pre-storage-policy containers which won't have a policy - in
+this case we would not use the default as it might not have the same
+policy as legacy containers.  When no other policies are defined, Swift
+will always choose ``Policy-0`` as the default.
+
+In other words, default means "create using this policy if nothing else is specified"
+and ``Policy-0`` means "use the legacy policy if a container doesn't have one" which
+really means use ``object.ring.gz`` for lookups.
+
+.. note::
+
+    With the Storage Policy based code, it's not possible to create a
+    container that doesn't have a policy.  If nothing is provided, Swift will
+    still select the default and assign it to the container.  For containers
+    created before Storage Policies were introduced, the legacy Policy-0 will
+    be used.
+
+.. _deprecate-policy:
+
+--------------------
+Deprecating Policies
+--------------------
+
+There will be times when a policy is no longer desired; however simply
+deleting the policy and associated rings would be problematic for existing
+data.  In order to ensure that resources are not orphaned in the cluster (left
+on disk but no longer accessible) and to provide proper messaging to
+applications when a policy needs to be retired, the notion of deprecation is
+used.  :ref:`configure-policy` describes how to deprecate a policy.
+
+Swift's behavior with deprecated policies will change as follows:
+
+ * The deprecated policy will not appear in /info
+ * PUT/GET/DELETE/POST/HEAD are still allowed on the pre-existing containers
+   created with a deprecated policy
+ * Clients will get an ''400 Bad Request'' error when trying to create a new
+   container using the deprecated policy
+ * Clients still have access to policy statistics via HEAD on pre-existing
+   containers
+
+.. note::
+
+    A policy can not be both the default and deprecated.  If you deprecate the
+    default policy, you must specify a new default.
+
+You can also use the deprecated feature to rollout new policies.  If you
+want to test a new storage policy before making it generally available
+you could deprecate the policy when you initially roll it the new
+configuration and rings to all nodes.  Being deprecated will render it
+innate and unable to be used.  To test it you will need to create a
+container with that storage policy; which will require a single proxy
+instance (or a set of proxy-servers which are only internally
+accessible) that has been one-off configured with the new policy NOT
+marked deprecated.  Once the container has been created with the new
+storage policy any client authorized to use that container will be able
+to add and access data stored in that container in the new storage
+policy.  When satisfied you can roll out a new ``swift.conf`` which does
+not mark the policy as deprecated to all nodes.
+
+.. _configure-policy:
+
+--------------------
+Configuring Policies
+--------------------
+
+Policies are configured in ``swift.conf`` and it is important that the deployer have a solid
+understanding of the semantics for configuring policies.  Recall that a policy must have
+a corresponding ring file, so configuring a policy is a two-step process.  First, edit
+your ``/etc/swift/swift.conf`` file to add your new policy and, second, create the
+corresponding policy object ring file.
+
+See :doc:`policies_saio` for a step by step guide on adding a policy to the SAIO setup.
+
+Note that each policy has a section starting with ``[storage-policy:N]`` where N is the
+policy index.  There's no reason other than readability that these be sequential but there
+are a number of rules enforced by Swift when parsing this file:
+
+    * If a policy with index 0 is not declared and no other policies defined,
+      Swift will create one
+    * The policy index must be a non-negative integer
+    * If no policy is declared as the default and no other policies are
+      defined, the policy with index 0 is set as the default
+    * Policy indexes must be unique
+    * Policy names are required
+    * Policy names are case insensitive
+    * Policy names must contain only letters, digits or a dash
+    * Policy names must be unique
+    * The policy name 'Policy-0' can only be used for the policy with index 0
+    * If any policies are defined, exactly one policy must be declared default
+    * Deprecated policies can not be declared the default
+
+The following is an example of a properly configured ''swift.conf'' file. See :doc:`policies_saio`
+for full instructions on setting up an all-in-one with this example configuration.::
+
+        [swift-hash]
+        # random unique strings that can never change (DO NOT LOSE)
+        swift_hash_path_prefix = changeme
+        swift_hash_path_suffix = changeme
+
+        [storage-policy:0]
+        name = gold
+        default = yes
+
+        [storage-policy:1]
+        name = silver
+        deprecated = yes
+
+Review :ref:`default-policy` and :ref:`deprecate-policy` for more
+information about the ``default`` and ``deprecated`` options.
+
+There are some other considerations when managing policies:
+
+    * Policy names can be changed (but be sure that users are aware, aliases are
+      not currently supported but could be implemented in custom middleware!)
+    * You cannot change the index of a policy once it has been created
+    * The default policy can be changed at any time, by adding the
+      default directive to the desired policy section
+    * Any policy may be deprecated by adding the deprecated directive to
+      the desired policy section, but a deprecated policy may not also
+      be declared the default, and you must specify a default - so you
+      must have policy which is not deprecated at all times.
+
+There will be additional parameters for policies as new features are added
+(e.g., Erasure Code), but for now only a section name/index and name are
+required.  Once ``swift.conf`` is configured for a new policy, a new ring must be
+created.  The ring tools are not policy name aware so it's critical that the
+correct policy index be used when creating the new policy's ring file.
+Additional object rings are created in the same manner as the legacy ring
+except that '-N' is appended after the word ``object`` where N matches the
+policy index used in ``swift.conf``.  This naming convention follows the pattern
+for per-policy storage node data directories as well.  So, to create the ring
+for policy 1::
+
+        swift-ring-builder object-1.builder create 10 3 1
+        <and add devices, rebalance using the same naming convention>
+
+.. note::
+
+    The same drives can indeed be used for multiple policies and the details
+    of how that's managed on disk will be covered in a later section, it's
+    important to understand the implications of such a configuration before
+    setting one up.  Make sure it's really what you want to do, in many cases
+    it will be, but in others maybe not.
+
+--------------
+Using Policies
+--------------
+
+Using policies is very simple, a policy is only specified when a container is
+initially created, there are no other API changes.  Creating a container can
+be done without any special policy information::
+
+        curl -v -X PUT -H 'X-Auth-Token: <your auth token>' \
+            http://127.0.0.1:8080/v1/AUTH_test/myCont0
+
+Which will result in a container created that is associated with the
+policy name 'gold' assuming we're using the swift.conf example from
+above.  It would use 'gold' because it was specified as the default.
+Now, when we put an object into this container, it will get placed on
+nodes that are part of the ring we created for policy 'gold'.
+
+If we wanted to explicitly state that we wanted policy 'gold' the command
+would simply need to include a new header as shown below::
+
+        curl -v -X PUT -H 'X-Auth-Token: <your auth token>' \
+            -H 'X-Storage-Policy: gold' http://127.0.0.1:8080/v1/AUTH_test/myCont1
+
+And that's it!  The application does not need to specify the policy name ever
+again.  There are some illegal operations however:
+
+* If an invalid (typo, non-existent) policy is specified: 400 Bad Request
+* if you try to change the policy either via PUT or POST: 409 Conflict
+
+If you'd like to see how the storage in the cluster is being used, simply HEAD
+the account and you'll see not only the cumulative numbers, as before, but
+per policy statistics as well.  In the example below there's 3 objects total
+with two of them in policy 'gold' and one in policy 'silver'::
+
+        curl -i -X HEAD -H 'X-Auth-Token: <your auth token>' \
+            http://127.0.0.1:8080/v1/AUTH_test
+
+and your results will include (some output removed for readability)::
+
+        X-Account-Container-Count: 3
+        X-Account-Object-Count: 3
+        X-Account-Bytes-Used: 21
+        X-Storage-Policy-Gold-Object-Count: 2
+        X-Storage-Policy-Gold-Bytes-Used: 14
+        X-Storage-Policy-Silver-Object-Count: 1
+        X-Storage-Policy-Silver-Bytes-Used: 7
+
+--------------
+Under the Hood
+--------------
+
+Now that we've explained a little about what Policies are and how to
+configure/use them, let's explore how Storage Policies fit in at the
+nuts-n-bolts level.
+
+Parsing and Configuring
+-----------------------
+
+The module, :ref:`storage_policy`, is responsible for parsing the
+``swift.conf`` file, validating the input, and creating a global collection of
+configured policies via class :class:`.StoragePolicyCollection`.  This
+collection is made up of policies of class :class:`.StoragePolicy`. The
+collection class includes handy functions for getting to a policy either by
+name or by index , getting info about the policies, etc.  There's also one
+very important function, :meth:`~.StoragePolicyCollection.get_object_ring`.
+Object rings are now members of the :class:`.StoragePolicy` class and are
+actually not instantiated until the :meth:`~.StoragePolicy.load_ring`
+method is called.  Any caller anywhere in the code base that needs to access
+an object ring must use the :data:`.POLICIES` global singleton to access the
+:meth:`~.StoragePolicyCollection.get_object_ring` function and provide the
+policy index which will call :meth:`~.StoragePolicy.load_ring` if
+needed; however, when starting request handling services such as the
+:ref:`proxy-server` rings are proactively loaded to provide moderate
+protection against a mis-configuration resulting in a run time error.  The
+global is instantiated when Swift starts and provides a mechanism to patch
+policies for the test code.
+
+Middleware
+----------
+
+Middleware can take advantage of policies through the :data:`.POLICIES` global
+and by importing :func:`.get_container_info` to gain access to the policy
+index associated with the container in question.  From the index it
+can then use the :data:`.POLICIES` singleton to grab the right ring.  For example,
+:ref:`list_endpoints` is policy aware using the means just described. Another
+example is :ref:`recon` which will report the md5 sums for all object rings.
+
+Proxy Server
+------------
+
+The :ref:`proxy-server` module's role in Storage Policies is essentially to make sure the
+correct ring is used as its member element.  Before policies, the one object ring
+would be instantiated when the :class:`.Application` class was instantiated and could
+be overridden by test code via init parameter.  With policies, however, there is
+no init parameter and the :class:`.Application` class instead depends on the :data:`.POLICIES`
+global singleton to retrieve the ring which is instantiated the first time it's
+needed.  So, instead of an object ring member of the :class:`.Application` class, there is
+an accessor function, :meth:`~.Application.get_object_ring`, that gets the ring from :data:`.POLICIES`.
+
+In general, when any module running on the proxy requires an object ring, it
+does so via first getting the policy index from the cached container info.  The
+exception is during container creation where it uses the policy name from the
+request header to look up policy index from the :data:`.POLICIES` global.  Once the
+proxy has determined the policy index, it can use the :meth:`~.Application.get_object_ring` method
+described earlier to gain access to the correct ring.  It then has the responsibility
+of passing the index information, not the policy name, on to the back-end servers
+via the header ``X-Backend-Storage-Policy-Index``. Going the other way, the proxy also
+strips the index out of headers that go back to clients, and makes sure they only
+see the friendly policy names.
+
+On Disk Storage
+---------------
+
+Policies each have their own directories on the back-end servers and are identified by
+their storage policy indexes.  Organizing the back-end directory structures by policy
+index helps keep track of things and also allows for sharing of disks between policies
+which may or may not make sense depending on the needs of the provider.  More
+on this later, but for now be aware of the following directory naming convention:
+
+* ``/objects`` maps to objects associated with Policy-0
+* ``/objects-N`` maps to storage policy index #N
+* ``/async_pending`` maps to async pending update for Policy-0
+* ``/async_pending-N`` maps to async pending update for storage policy index #N
+* ``/tmp`` maps to the DiskFile temporary directory for Policy-0
+* ``/tmp-N`` maps to the DiskFile temporary directory for policy index #N
+* ``/quarantined/objects`` maps to the quarantine directory for Policy-0
+* ``/quarantined/objects-N`` maps to the quarantine directory for policy index #N
+
+Note that these directory names are actually owned by the specific Diskfile
+Implementation, the names shown above are used by the default Diskfile.
+
+Object Server
+-------------
+
+The :ref:`object-server` is not involved with selecting the storage policy
+placement directly.  However, because of how back-end directory structures are
+setup for policies, as described earlier, the object server modules do play a
+role.  When the object server gets a :class:`.Diskfile`, it passes in the
+policy index and leaves the actual directory naming/structure mechanisms to
+:class:`.Diskfile`.  By passing in the index, the instance of
+:class:`.Diskfile` being used will assure that data is properly located in the
+tree based on its policy.
+
+For the same reason, the :ref:`object-updater` also is policy aware; as previously
+described, different policies use different async pending directories so the
+updater needs to know how to scan them appropriately.
+
+The :ref:`object-replicator` is policy aware in that, depending on the policy, it may have to
+do drastically different things, or maybe not.  For example, the difference in
+handling a replication job for 2x versus 3x is trivial; however, the difference in
+handling replication between 3x and erasure code is most definitely not.  In
+fact, the term 'replication' really isn't appropriate for some policies
+like erasure code; however, the majority of the framework for collecting and
+processing jobs remains the same.  Thus, those functions in the replicator are
+leveraged for all policies and then there is policy specific code required for
+each policy, added when the policy is defined if needed.
+
+The ssync functionality is policy aware for the same reason. Some of the
+other modules may not obviously be affected, but the back-end directory
+structure owned by :class:`.Diskfile` requires the policy index
+parameter.  Therefore ssync being policy aware really means passing the
+policy index along.  See :class:`~swift.obj.ssync_sender` and
+:class:`~swift.obj.ssync_receiver` for more information on ssync.
+
+For :class:`.Diskfile` itself, being policy aware is all about managing the back-end
+structure using the provided policy index.  In other words, callers who get
+a :class:`.Diskfile` instance provide a policy index and :class:`.Diskfile`'s job is to keep data
+separated via this index (however it chooses) such that policies can share
+the same media/nodes if desired.  The included implementation of :class:`.Diskfile`
+lays out the directory structure described earlier but that's owned within
+:class:`.Diskfile`; external modules have no visibility into that detail.  A common
+function is provided to map various directory names and/or strings
+based on their policy index. For example :class:`.Diskfile` defines :func:`.get_data_dir`
+which builds off of a generic :func:`.get_policy_string` to consistently build
+policy aware strings for various usage.
+
+Container Server
+----------------
+
+The :ref:`container-server` plays a very important role in Storage Policies, it is
+responsible for handling the assignment of a policy to a container and the
+prevention of bad things like changing policies or picking the wrong policy
+to use when nothing is specified (recall earlier discussion on Policy-0 versus
+default).
+
+The :ref:`container-updater` is policy aware, however its job is very simple, to
+pass the policy index along to the :ref:`account-server` via a request header.
+
+The :ref:`container-backend` is responsible for both altering existing DB
+schema as well as assuring new DBs are created with a schema that supports
+storage policies.  The "on-demand" migration of container schemas allows Swift
+to upgrade without downtime (sqlite's alter statements are fast regardless of
+row count).  To support rolling upgrades (and downgrades) the incompatible
+schema changes to the ``container_stat`` table are made to a
+``container_info`` table, and the ``container_stat`` table is replaced with a
+view that includes an ``INSTEAD OF UPDATE`` trigger which makes it behave like
+the old table.
+
+The policy index is stored here for use in reporting information
+about the container as well as managing split-brain scenario induced
+discrepancies between containers and their storage policies.  Furthermore,
+during split-brain containers must be prepared to track object updates from
+multiple policies, so the object table also includes a
+``storage_policy_index`` column.  Per-policy object counts and bytes are
+updated in the ``policy_stat`` table using ``INSERT`` and ``DELETE`` triggers
+similar to the pre-policy triggers that updated ``container_stat`` directly.
+
+The :ref:`container-replicator` daemon will pro-actively migrate legacy
+schemas as part of its normal consistency checking process when it updates the
+``reconciler_sync_point`` entry in the ``container_info`` table.  This ensures
+that read heavy containers which do not encounter any writes will still get
+migrated to be fully compatible with the post-storage-policy queries without
+having to fall-back and retry queries with the legacy schema to service
+container read requests.
+
+The :ref:`container-sync-daemon` functionality only needs to be policy aware in that it
+accesses the object rings.  Therefore, it needs to pull the policy index
+out of the container information and use it to select the appropriate
+object ring from the :data:`.POLICIES` global.
+
+Account Server
+--------------
+
+The :ref:`account-server`'s role in Storage Policies is really limited to reporting.
+When a HEAD request is made on an account (see example provided earlier),
+the account server is provided with the storage policy index and builds
+the ``object_count`` and ``byte_count`` information for the client on a per
+policy basis.
+
+The account servers are able to report per-storage-policy object and byte
+counts because of some policy specific DB schema changes.  A policy specific
+table, ``policy_stat``, maintains information on a per policy basis (one row
+per policy) in the same manner in which the ``account_stat`` table does.  The
+``account_stat`` table still serves the same purpose and is not replaced by
+``policy_stat``, it holds the total account stats whereas ``policy_stat`` just
+has the break downs.  The backend is also responsible for migrating
+pre-storage-policy accounts by altering the DB schema and populating the
+``policy_stat`` table for Policy-0 with current ``account_stat`` data at that
+point in time.
+
+The per-storage-policy object and byte counts are not updated with each object
+PUT and DELETE request container updates to the account server is performed
+asynchronously by the ``swift-container-updater``.
+
+.. _upgrade-policy:
+
+Upgrading and Confirming Functionality
+--------------------------------------
+
+Upgrading to a version of Swift that has Storage Policy support is not difficult,
+in fact, the cluster administrator isn't required to make any special configuration
+changes to get going.  Swift will automatically begin using the existing object
+ring as both the default ring and the Policy-0 ring.  Adding the declaration of
+policy 0 is totally optional and in its absence, the name given to the implicit
+policy 0 will be 'Policy-0'.  Let's say for testing purposes that you wanted to take
+an existing cluster that already has lots of data on it and upgrade to Swift with
+Storage Policies. From there you want to go ahead and create a policy and test a
+few things out.  All you need to do is:
+
+  #. Define your policies in ``/etc/swift/swift.conf``
+  #. Create the corresponding object rings
+  #. Create containers and objects and confirm their placement is as expected
+
+For a specific example that takes you through these steps, please see
+:doc:`policies_saio`
+
+.. note::
+
+    If you downgrade from a Storage Policy enabled version of Swift to an
+    older version that doesn't support policies, you will not be able to
+    access any data stored in policies other than the policy with index 0 but
+    those objects WILL appear in container listings (possibly as duplicates if
+    there was a network partition and un-reconciled objects).  It is EXTREMELY
+    important that you perform any necessary integration testing on the
+    upgraded deployment before enabling an additional storage policy to ensure
+    a consistent API experience for your clients.  DO NOT downgrade to a
+    version of Swift that does not support storage policies once you expose
+    multiple storage policies.
--- a/doc/source/overview_replication.rst
+++ b/doc/source/overview_replication.rst
@@ -93,6 +93,8 @@ systems, it was designed so that around 2% of the hash space on a normal node
 will be invalidated per day, which has experimentally given us acceptable
 replication speeds.

+.. _ssync:
+
 Work continues with a new ssync method where rsync is not used at all and
 instead all-Swift code is used to transfer the objects. At first, this ssync
 will just strive to emulate the rsync behavior. Once deemed stable it will open
--- a/doc/source/policies_saio.rst
+++ b/doc/source/policies_saio.rst
@@ -0,0 +1,146 @@
+===========================================
+Adding Storage Policies to an Existing SAIO
+===========================================
+
+Depending on when you downloaded your SAIO environment, it may already
+be prepared with two storage policies that enable some basic functional
+tests.  In the event that you are adding a storage policy to an existing
+installation, however, the following section will walk you through the
+steps for setting up Storage Policies.  Note that configuring more than
+one storage policy on your development environment is recommended but
+optional.  Enabling multiple Storage Policies is very easy regardless of
+whether you are working with an existing installation or starting a
+brand new one.
+
+Now we will create two policies - the first one will be a standard triple
+replication policy that we will also explicitly set as the default and
+the second will be setup for reduced replication using a factor of 2x.
+We will call the first one 'gold' and the second one 'silver'.  In this
+example both policies map to the same devices because it's also
+important for this sample implementation to be simple and easy
+to understand and adding a bunch of new devices isn't really required
+to implement a usable set of policies.
+
+1. To define your policies, add the following to your ``/etc/swift/swift.conf``
+   file::
+
+        [storage-policy:0]
+        name = gold
+        default = yes
+
+        [storage-policy:1]
+        name = silver
+
+  See :doc:`overview_policies` for detailed information on ``swift.conf`` policy
+  options.
+
+2. To create the object ring for the silver policy (index 1), add the following
+   to your ``bin/remakerings`` script and re-run it (your script may already have
+   these changes)::
+
+        swift-ring-builder object-1.builder create 10 2 1
+        swift-ring-builder object-1.builder add r1z1-127.0.0.1:6010/sdb1 1
+        swift-ring-builder object-1.builder add r1z2-127.0.0.1:6020/sdb2 1
+        swift-ring-builder object-1.builder add r1z3-127.0.0.1:6030/sdb3 1
+        swift-ring-builder object-1.builder add r1z4-127.0.0.1:6040/sdb4 1
+        swift-ring-builder object-1.builder rebalance
+
+  Note that the reduced replication of the silver policy is only a function
+  of the replication parameter in the ``swift-ring-builder create`` command
+  and is not specified  in ``/etc/swift/swift.conf``.
+
+3. Copy ``etc/container-reconciler.conf-sample`` to
+   ``/etc/swift/container-reconciler.conf`` and fix the user option::
+
+        cp etc/container-reconciler.conf-sample /etc/swift/container-reconciler.conf
+        sed -i "s/# user.*/user = $USER/g" /etc/swift/container-reconciler.conf
+
+------------------
+Using Policies
+------------------
+
+Setting up Storage Policies was very simple, and using them is even
+simpler.  In this section, we will run some commands to create a few
+containers with different policies and store objects in them and see how
+Storage Policies effect placement of data in Swift.
+
+1. We will be using the list_endpoints middleware to confirm object locations,
+   so enable that now in your ``proxy-server.conf`` file by adding it to the pipeline
+   and including the filter section as shown below (be sure to restart your proxy
+   after making these changes)::
+
+        pipeline = catch_errors gatekeeper healthcheck proxy-logging cache bulk \
+          slo dlo ratelimit crossdomain list-endpoints tempurl tempauth staticweb \
+          container-quotas account-quotas proxy-logging proxy-server
+
+        [filter:list-endpoints]
+        use = egg:swift#list_endpoints
+
+2. Check to see that your policies are reported via /info::
+
+        swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing info
+
+  You should see this: (only showing the policy output here)::
+
+        policies: [{'default': True, 'name': 'gold'}, {'name': 'silver'}]
+
+3. Now create a container without specifying a policy, it will use the
+   default, 'gold' and then put a test object in it (create the file ``file0.txt``
+   with your favorite editor with some content)::
+
+        curl -v -X PUT -H 'X-Auth-Token: <your auth token>' \
+            http://127.0.0.1:8080/v1/AUTH_test/myCont0
+        curl -X PUT -v -T file0.txt -H 'X-Auth-Token: <your auth token>' \
+            http://127.0.0.1:8080/v1/AUTH_test/myCont0/file0.txt
+
+4. Now confirm placement of the object with the :ref:`list_endpoints` middleware::
+
+        curl -X GET -v http://127.0.0.1:8080/endpoints/AUTH_test/myCont0/file0.txt
+
+  You should see this: (note placement on expected devices)::
+
+        ["http://127.0.0.1:6030/sdb3/761/AUTH_test/myCont0/file0.txt",
+        "http://127.0.0.1:6010/sdb1/761/AUTH_test/myCont0/file0.txt",
+        "http://127.0.0.1:6020/sdb2/761/AUTH_test/myCont0/file0.txt"]
+
+5. Create a container using policy 'silver' and put a different file in it::
+
+        curl -v -X PUT -H 'X-Auth-Token: <your auth token>' -H \
+            "X-Storage-Policy: silver" \
+            http://127.0.0.1:8080/v1/AUTH_test/myCont1
+        curl -X PUT -v -T file1.txt -H 'X-Auth-Token: <your auth token>' \
+            http://127.0.0.1:8080/v1/AUTH_test/myCont1/
+
+6. Confirm placement of the object for policy 'silver'::
+
+         curl -X GET -v http://127.0.0.1:8080/endpoints/AUTH_test/myCont1/file1.txt
+
+  You should see this: (note placement on expected devices)::
+
+        ["http://127.0.0.1:6010/sdb1/32/AUTH_test/myCont1/file1.txt",
+         "http://127.0.0.1:6040/sdb4/32/AUTH_test/myCont1/file1.txt"]
+
+7. Confirm account information with HEAD, make sure that your container-updater
+   service is running and has executed once since you performed the PUTs or the
+   account database won't be updated yet::
+
+        curl -i -X HEAD -H 'X-Auth-Token: <your auth token>' \
+            http://127.0.0.1:8080/v1/AUTH_test
+
+  You should see something like this (note that total and per policy stats
+  object sizes will vary)::
+
+        HTTP/1.1 204 No Content
+        Content-Length: 0
+        X-Account-Object-Count: 2
+        X-Account-Bytes-Used: 174
+        X-Account-Container-Count: 2
+        X-Account-Storage-Policy-Gold-Object-Count: 1
+        X-Account-Storage-Policy-Gold-Bytes-Used: 84
+        X-Account-Storage-Policy-Silver-Object-Count: 1
+        X-Account-Storage-Policy-Silver-Bytes-Used: 90
+        X-Timestamp: 1397230339.71525
+        Content-Type: text/plain; charset=utf-8
+        Accept-Ranges: bytes
+        X-Trans-Id: tx96e7496b19bb44abb55a3-0053482c75
+        Date: Fri, 11 Apr 2014 17:55:01 GMT