Changing Policies spec

This is the proposal to give swift users power to change storage policies
of containers after creating them.

Implements: blueprint changing-policies

Change-Id: Ia4b3f8471e9b8347439dc2f6c41df15c5d84db8d
This commit is contained in:
Daisuke Morita 2015-03-27 15:51:10 +09:00
parent 2df51741a3
commit 0d00d362b7
1 changed files with 368 additions and 0 deletions

View File

@ -0,0 +1,368 @@
::
This work is licensed under a Creative Commons Attribution 3.0
Unported License.
http://creativecommons.org/licenses/by/3.0/legalcode
=============================
Changing Policy of Containers
=============================
Our proposal is to give swift users power to change storage policies of
containers and objects which are contained in those containers.
Problem description
===================
Swift currently prohibits users from changing containers' storage policies so
this constraint raises at least two problems.
One problem is the flexibility. For example, there is an organization using
Swift as a backup storage of office data and all data is archived monthly in a
container named after date like 'backup-201502'. Older archive becomes less
important so users want to reduce the consumed capacity to store it. Then Swift
users will try to change the storage policy of the container into cheaper one
like '2-replica policy' or 'EC policy' but they will be strongly
disappointed to find out that they cannot change the policy of the container
once created. The workaround for this problem is creating other new container
with other storage policy then copying all objects from an existing container
to it but this workaround raises another problem.
Another problem is the reachability. Copying all files to other container
brings about the change of all files' URLs. That makes users confused and
frustrated. The workaround for this problem is that after copying all files to
new container, users delete an old container and create the same name container
again with other storage policy then copy all objects back to the original name
container. However this obviously involves twice as heavy workload and long
time as a single copy.
Proposed change
===============
The ring normally differs from one policy to another so 'a/c/o' object of
policy 1 is likely to be placed in devices of different nodes from 'a/c/o'
object of policy 0. Therefore, objects replacement associated with the policy
change needs very long time and heavy internal traffic. For this reason,
an user request to change a policy must be translated
into asynchronous behavior of transferring objects among storage nodes which is
driven by background daemons. Obviously, Swift must not suspend any
user's requests to store or get information during changing policies.
We need to add or modify Swift servers' and daemons' behaviors as follows:
**Servers' changes**
1. Adding POST container API to send a request for changing a storage policy
of a container
#. Adding response headers for GET/HEAD container API to notify how many
objects are placed in a new policy or still in an old policy
#. Modifying GET/HEAD object API to get an object even if replicas are placed
in a new policy or in an old policy
**Daemons' changes**
1. Adding container-replicator a behavior to watch a container which is
requested to change its storage policy
#. Adding a new background daemon which transfers objects among storage nodes
from an old policy to a new policy
Servers' changes
----------------
1. Add New Behavior for POST Container
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Currently, Swift returns "204 No Content" for the user POST container request
with X-Storage-Policy header. This indicates "nothing done." For the purpose
of maintaining backward compatibility and avoiding accidental execution, we
prefer to remain this behavior unchanged. Therefore, we propose introducing the
new header to 'forcibly' execute policy changing as follows.
.. list-table:: Table 1: New Request Header to change Storage Policy
:widths: 30 8 12 50
:header-rows: 1
* - Parameter
- Style
- Type
- Description
* - X-Forced-Change-Storage-Policy: <policy_name> (Optional)
- header
- xsd:string
- Change a storage policy of a container to the policy specified by
'policy_name'. This change accompanies asynchronous background process
to transfer objects.
Possible responses for this API are as follows.
.. list-table:: Table 2: Possible Response Codes for the New Request
:widths: 2 8
:header-rows: 1
* - Code
- Notes
* - 202 Accepted
- Accept the request properly and start to prepare objects replacement.
* - 400 Bad Request
- Reject the request with a policy which is deprecated or is not defined
in a configuration file.
* - 409 Conflict
- Reject the request because another changing policy process is not
completed yet (relating to 3-c change)
When a request of changing policies is accepted (response code is 202), a
target container stores following two sysmetas.
.. list-table:: Table 3: Container Sysmetas for Changing Policies
:widths: 2 8
:header-rows: 1
* - Sysmeta
- Notes
* - X-Container-Sysmeta-Prev-Index: <int>
- "Pre-change" policy index. It will be used for GET or DELETE objects
which are not transferred to the new policy yet.
* - X-Container-Sysmeta-Objects-Queued: <bool>
- This will be used for determining the status of policy changing by
daemon processes. If False, policy change request is accepted but not
ready for objects transferring. If True, objects have been queued to the
special container for policy changing so those are ready for
transferring. If undefined, policy change is not requested to that
container.
This feature should be implemented as middleware 'change-policy' because of
the following two reasons:
1. This operation probably should be authorized only to limitted group
(e.g., swift cluster's admin (reseller_admin)) because this operation
occurs heavy internal traffic.
Therefore, authority of this operation should be managed in the middleware
level.
#. This operation needs to POST sysmetas to the container. Sysmeta must be
managed in middleware level according to Swift's design principle
2. Add Response Headers for GET/HEAD Container
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Objects will be transferred gradually by backend processes. From the viewpoint
of Swift operators, it is important to know the progress of policy changing,
that is, how many objects are already transferred or still remain
untransferred. This can be accomplished by simply exposing policy_stat table of
container DB file for each storage policy. Each policy's stat will be exposed
by ``X-Container-Storage-Policy-<Policy_name>-Bytes-Used`` and
``X-Container-Storage-Policy-<Policy_name>-Object-Count`` headers as follows::
$ curl -v -X HEAD -H "X-Auth-Token: tkn" http://<host>/v1/AUTH_test/container
< HTTP/1.1 200 OK
< X-Container-Storage-Policy-Gold-Object-Count: 3
< X-Container-Storage-Policy-Gold-Bytes-Used: 12
< X-Container-Storage-Policy-Ec42-Object-Count: 7
< X-Container-Storage-Policy-Ec42-Bytes-Used: 28
< X-Container-Object-Count: 10
< X-Container-Bytes-Used: 40
< Accept-Ranges: bytes
< X-Storage-Policy: ec42
< ...
Above response indicates 70% of object transferring is done.
3. Modify Behavior of GET/HEAD object API
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In my current consideration, object PUT should be done only to the new policy.
This does not affect any object in the previous policy so this makes the
process of changing policies simple.
Therefore, the best way to get an object is firstly sending a GET request to
object servers according to the new policy's ring, and if the response code is
404 NOT FOUND, then a proxy resends GET requests to the previous policy's
object servers.
However, this behavior is in discussion because sending GET/HEAD requests twice
to object servers can increase the latency of user's GET object request,
especially in the early phase of changing policies.
Daemons' changes
----------------
1. container-replicator
^^^^^^^^^^^^^^^^^^^^^^^
To enqueue objects to the list for changing policies, some process must watch
what a container is requested for changing its policy. Adding this task to
container-replicator seems best way because container-replicator originally
has a role to seek all container DBs for sanity check of Swift cluster.
Therefore, this can minimize extra time to lock container DBs for adding this
new feature.
Container-replicator will check if a container has
``X-Container-Sysmeta-Objects-Queued`` sysmeta and its value is False. Objects
in that container should be enqueued to the object list of a special container
for changing policies. That special container is created under the special
account ``.change_policy``. The name of a special container should be unique
and one-to-one relationship with a container to which policy changing is
requested. The name of a special container is simply defined as
``<account_name>:<container_name>``. This special account and containers are
accessed by the new daemon ``object-transferrer``, which really transfers
objects from the old policy to the new policy.
2. object-transferrer
^^^^^^^^^^^^^^^^^^^^^
Object-transferrer is newly introduced daemon process for changing policies.
Object-transferrer reads lists of special containers from the account
``.change_policy`` and reads lists of objects from each special container.
Object-transferrer transfers those objects from the old policy to the new
policy by using internal client. After an object is successfully transferred
to the new policy, an object in the old policy will be deleted by DELETE
method.
If transferrer finishes to transfer all objects in a special container, it
deletes a special container and deletes sysmetas
``X-Container-Sysmeta-Prev-Index`` and ``X-Container-Sysmeta-Objects-Queued``
from a container to change that container's status from IN-CHANGING to normal
(POLICY CHANGE COMPLETED).
Example
-------
.. list-table:: Table 4: Example of data transition during changing policies
:widths: 1 4 2 4 2
:header-rows: 1
* - Step
- Description
- Container /a/c
objects
- Container /a/c/ metadata
- Container /.change_policy/a:c
objects
* - | 0
- | Init.
- | ('o1', 1)
| ('o2', 1)
| ('o3', 1)
- | X-Backend-Storage-Policy-Index: 1
- | N/A
* - | 1
- | POST /a/c X-Forced-Change-Storage-Policy: Pol-2
- | ('o1', 1)
| ('o2', 1)
| ('o3', 1)
- | X-Backend-Storage-Policy-Index: 2
| X-Container-Sysmeta-Prev-Policy-Index: 1
| X-Container-Sysmeta-Objects-Queued: False
- | N/A
* - | 2
- | container-replicator seeks policy changing containers
- | ('o1', 1)
| ('o2', 1)
| ('o3', 1)
- | X-Backend-Storage-Policy-Index: 2
| X-Container-Sysmeta-Prev-Policy-Index: 1
| X-Container-Sysmeta-Objects-Queued: True
- | ('o1', 0, 'application/x-transfer-1-to-2')
| ('o2', 0, 'application/x-transfer-1-to-2')
| ('o3', 0, 'application/x-transfer-1-to-2')
* - | 3
- | object-transferrer transfers 'o1' and 'o3'
- | ('o1', 2)
| ('o2', 1)
| ('o3', 2)
- | X-Backend-Storage-Policy-Index: 2
| X-Container-Sysmeta-Prev-Policy-Index: 1
| X-Container-Sysmeta-Objects-Queued: True
- | ('o2', 0, 'application/x-transfer-1-to-2')
* - | 4
- | object-transferrer transfers 'o2'
- | ('o1', 2)
| ('o2', 2)
| ('o3', 2)
- | X-Backend-Storage-Policy-Index: 2
| X-Container-Sysmeta-Prev-Policy-Index: 1
| X-Container-Sysmeta-Objects-Queued: True
- | Empty
* - | 5
- | object-transferrer deletes a special container and metadatas from
container /a/c
- | ('o1', 2)
| ('o2', 2)
| ('o3', 2)
- | X-Backend-Storage-Policy-Index: 2
- | N/A
Above table focuses data transition of a container in changing a storage policy
and a corresponding special container. A tuple indicates object info, first
element is an object name, second one is a policy index and third one, if
available, is a value of content-type, which is defined for policy changing.
Given that three objects are stored in the container ``/a/c`` as policy-1
(Step 0). When the request to change this container's
policy to policy-2 is accepted (Step 1), a backend policy index will be
changed to 2 and two sysmetas are stored in this container. In the periodical
container-replicator process, replicator finds a container with policy change
sysmetas and then creates a special container ``/.change_policy/a:c`` with
a list of objects (Step 2). Those objects have info of old policy and new policy
with the field of content-type. When object-transferrer finds this special
container from ``.change_policy`` account, it gets some objects from the old
policy (usually from a local device) and puts them to the new policy's storage
nodes (Step 3 and 4). If the special container becomes empty (Step 5), it
indicates policy changing for that container finished so the special container
is deleted and policy changing metadatas of an original container are also
deleted.
Alternatives: As Sub-Function of Container-Reconciler
-----------------------------------------------------
Container-reconciler is a daemon process which restores objects registered in
an incorrect policy into a correct policy. Therefore, the reconciling procedure
satisfies almost all of functional requirements for policy changing. The
advantage of using container-reconciler for policy changing is that we need to
modify a very few points of existing Swift sources. However, there is a big
problem to use container-reconciler. This problem is that container-reconciler
has no function to determine the completeness of changing policy of objects
contained in a specific container. As a result, this problem makes it
complicated to handle GET/HEAD object from the previous policy and to allow
the next storage policy change request. Based on discussion in Swift hack-a-thon
(held in Feb. 2015) and Tokyo Summit (held in Oct. 2015), we decided to add
object-transferrer to change container's policy.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Daisuke Morita (dmorita)
Milestones
----------
Target Milestone for completion:
Mitaka
Work Items
----------
* Add API for Policy Changing
* Add a middleware 'policy-change' to process Container POST request with
"X-Forced-Change-Storage-Policy" header. This middleware stores sysmeta
headers to target container DB for policy changing.
* Modify container-server to add response headers for Container GET/HEAD
request to show the progress of changing policies by exposing all the info
from policy_stat table
* Modify proxy-server (or add a feature to new middleware) to get object for
referring both new and old policy index to allow users' object read during
changing policy
* Add daemon process among storage nodes for policy changing
* Modify container-replicator to watch a container if it should be initialized
(creation of a corresponding special container) for changing policies
* Write object-transferrer code
* Daemonize object-transferrer
* Add unit, functional and probe tests to check that new code works
intentionally and that it is OK for splitted brain cases