Changing Policies spec
This is the proposal to give swift users power to change storage policies of containers after creating them. Implements: blueprint changing-policies Change-Id: Ia4b3f8471e9b8347439dc2f6c41df15c5d84db8d
This commit is contained in:
parent
2df51741a3
commit
0d00d362b7
|
@ -0,0 +1,368 @@
|
||||||
|
::
|
||||||
|
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0
|
||||||
|
Unported License.
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
=============================
|
||||||
|
Changing Policy of Containers
|
||||||
|
=============================
|
||||||
|
|
||||||
|
Our proposal is to give swift users power to change storage policies of
|
||||||
|
containers and objects which are contained in those containers.
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
Swift currently prohibits users from changing containers' storage policies so
|
||||||
|
this constraint raises at least two problems.
|
||||||
|
|
||||||
|
One problem is the flexibility. For example, there is an organization using
|
||||||
|
Swift as a backup storage of office data and all data is archived monthly in a
|
||||||
|
container named after date like 'backup-201502'. Older archive becomes less
|
||||||
|
important so users want to reduce the consumed capacity to store it. Then Swift
|
||||||
|
users will try to change the storage policy of the container into cheaper one
|
||||||
|
like '2-replica policy' or 'EC policy' but they will be strongly
|
||||||
|
disappointed to find out that they cannot change the policy of the container
|
||||||
|
once created. The workaround for this problem is creating other new container
|
||||||
|
with other storage policy then copying all objects from an existing container
|
||||||
|
to it but this workaround raises another problem.
|
||||||
|
|
||||||
|
Another problem is the reachability. Copying all files to other container
|
||||||
|
brings about the change of all files' URLs. That makes users confused and
|
||||||
|
frustrated. The workaround for this problem is that after copying all files to
|
||||||
|
new container, users delete an old container and create the same name container
|
||||||
|
again with other storage policy then copy all objects back to the original name
|
||||||
|
container. However this obviously involves twice as heavy workload and long
|
||||||
|
time as a single copy.
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
The ring normally differs from one policy to another so 'a/c/o' object of
|
||||||
|
policy 1 is likely to be placed in devices of different nodes from 'a/c/o'
|
||||||
|
object of policy 0. Therefore, objects replacement associated with the policy
|
||||||
|
change needs very long time and heavy internal traffic. For this reason,
|
||||||
|
an user request to change a policy must be translated
|
||||||
|
into asynchronous behavior of transferring objects among storage nodes which is
|
||||||
|
driven by background daemons. Obviously, Swift must not suspend any
|
||||||
|
user's requests to store or get information during changing policies.
|
||||||
|
|
||||||
|
We need to add or modify Swift servers' and daemons' behaviors as follows:
|
||||||
|
|
||||||
|
**Servers' changes**
|
||||||
|
|
||||||
|
1. Adding POST container API to send a request for changing a storage policy
|
||||||
|
of a container
|
||||||
|
#. Adding response headers for GET/HEAD container API to notify how many
|
||||||
|
objects are placed in a new policy or still in an old policy
|
||||||
|
#. Modifying GET/HEAD object API to get an object even if replicas are placed
|
||||||
|
in a new policy or in an old policy
|
||||||
|
|
||||||
|
**Daemons' changes**
|
||||||
|
|
||||||
|
1. Adding container-replicator a behavior to watch a container which is
|
||||||
|
requested to change its storage policy
|
||||||
|
#. Adding a new background daemon which transfers objects among storage nodes
|
||||||
|
from an old policy to a new policy
|
||||||
|
|
||||||
|
Servers' changes
|
||||||
|
----------------
|
||||||
|
|
||||||
|
1. Add New Behavior for POST Container
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Currently, Swift returns "204 No Content" for the user POST container request
|
||||||
|
with X-Storage-Policy header. This indicates "nothing done." For the purpose
|
||||||
|
of maintaining backward compatibility and avoiding accidental execution, we
|
||||||
|
prefer to remain this behavior unchanged. Therefore, we propose introducing the
|
||||||
|
new header to 'forcibly' execute policy changing as follows.
|
||||||
|
|
||||||
|
.. list-table:: Table 1: New Request Header to change Storage Policy
|
||||||
|
:widths: 30 8 12 50
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Parameter
|
||||||
|
- Style
|
||||||
|
- Type
|
||||||
|
- Description
|
||||||
|
* - X-Forced-Change-Storage-Policy: <policy_name> (Optional)
|
||||||
|
- header
|
||||||
|
- xsd:string
|
||||||
|
- Change a storage policy of a container to the policy specified by
|
||||||
|
'policy_name'. This change accompanies asynchronous background process
|
||||||
|
to transfer objects.
|
||||||
|
|
||||||
|
Possible responses for this API are as follows.
|
||||||
|
|
||||||
|
.. list-table:: Table 2: Possible Response Codes for the New Request
|
||||||
|
:widths: 2 8
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Code
|
||||||
|
- Notes
|
||||||
|
* - 202 Accepted
|
||||||
|
- Accept the request properly and start to prepare objects replacement.
|
||||||
|
* - 400 Bad Request
|
||||||
|
- Reject the request with a policy which is deprecated or is not defined
|
||||||
|
in a configuration file.
|
||||||
|
* - 409 Conflict
|
||||||
|
- Reject the request because another changing policy process is not
|
||||||
|
completed yet (relating to 3-c change)
|
||||||
|
|
||||||
|
When a request of changing policies is accepted (response code is 202), a
|
||||||
|
target container stores following two sysmetas.
|
||||||
|
|
||||||
|
.. list-table:: Table 3: Container Sysmetas for Changing Policies
|
||||||
|
:widths: 2 8
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Sysmeta
|
||||||
|
- Notes
|
||||||
|
* - X-Container-Sysmeta-Prev-Index: <int>
|
||||||
|
- "Pre-change" policy index. It will be used for GET or DELETE objects
|
||||||
|
which are not transferred to the new policy yet.
|
||||||
|
* - X-Container-Sysmeta-Objects-Queued: <bool>
|
||||||
|
- This will be used for determining the status of policy changing by
|
||||||
|
daemon processes. If False, policy change request is accepted but not
|
||||||
|
ready for objects transferring. If True, objects have been queued to the
|
||||||
|
special container for policy changing so those are ready for
|
||||||
|
transferring. If undefined, policy change is not requested to that
|
||||||
|
container.
|
||||||
|
|
||||||
|
This feature should be implemented as middleware 'change-policy' because of
|
||||||
|
the following two reasons:
|
||||||
|
|
||||||
|
1. This operation probably should be authorized only to limitted group
|
||||||
|
(e.g., swift cluster's admin (reseller_admin)) because this operation
|
||||||
|
occurs heavy internal traffic.
|
||||||
|
Therefore, authority of this operation should be managed in the middleware
|
||||||
|
level.
|
||||||
|
#. This operation needs to POST sysmetas to the container. Sysmeta must be
|
||||||
|
managed in middleware level according to Swift's design principle
|
||||||
|
|
||||||
|
2. Add Response Headers for GET/HEAD Container
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Objects will be transferred gradually by backend processes. From the viewpoint
|
||||||
|
of Swift operators, it is important to know the progress of policy changing,
|
||||||
|
that is, how many objects are already transferred or still remain
|
||||||
|
untransferred. This can be accomplished by simply exposing policy_stat table of
|
||||||
|
container DB file for each storage policy. Each policy's stat will be exposed
|
||||||
|
by ``X-Container-Storage-Policy-<Policy_name>-Bytes-Used`` and
|
||||||
|
``X-Container-Storage-Policy-<Policy_name>-Object-Count`` headers as follows::
|
||||||
|
|
||||||
|
$ curl -v -X HEAD -H "X-Auth-Token: tkn" http://<host>/v1/AUTH_test/container
|
||||||
|
< HTTP/1.1 200 OK
|
||||||
|
< X-Container-Storage-Policy-Gold-Object-Count: 3
|
||||||
|
< X-Container-Storage-Policy-Gold-Bytes-Used: 12
|
||||||
|
< X-Container-Storage-Policy-Ec42-Object-Count: 7
|
||||||
|
< X-Container-Storage-Policy-Ec42-Bytes-Used: 28
|
||||||
|
< X-Container-Object-Count: 10
|
||||||
|
< X-Container-Bytes-Used: 40
|
||||||
|
< Accept-Ranges: bytes
|
||||||
|
< X-Storage-Policy: ec42
|
||||||
|
< ...
|
||||||
|
|
||||||
|
Above response indicates 70% of object transferring is done.
|
||||||
|
|
||||||
|
3. Modify Behavior of GET/HEAD object API
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
In my current consideration, object PUT should be done only to the new policy.
|
||||||
|
This does not affect any object in the previous policy so this makes the
|
||||||
|
process of changing policies simple.
|
||||||
|
Therefore, the best way to get an object is firstly sending a GET request to
|
||||||
|
object servers according to the new policy's ring, and if the response code is
|
||||||
|
404 NOT FOUND, then a proxy resends GET requests to the previous policy's
|
||||||
|
object servers.
|
||||||
|
|
||||||
|
However, this behavior is in discussion because sending GET/HEAD requests twice
|
||||||
|
to object servers can increase the latency of user's GET object request,
|
||||||
|
especially in the early phase of changing policies.
|
||||||
|
|
||||||
|
Daemons' changes
|
||||||
|
----------------
|
||||||
|
|
||||||
|
1. container-replicator
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
To enqueue objects to the list for changing policies, some process must watch
|
||||||
|
what a container is requested for changing its policy. Adding this task to
|
||||||
|
container-replicator seems best way because container-replicator originally
|
||||||
|
has a role to seek all container DBs for sanity check of Swift cluster.
|
||||||
|
Therefore, this can minimize extra time to lock container DBs for adding this
|
||||||
|
new feature.
|
||||||
|
|
||||||
|
Container-replicator will check if a container has
|
||||||
|
``X-Container-Sysmeta-Objects-Queued`` sysmeta and its value is False. Objects
|
||||||
|
in that container should be enqueued to the object list of a special container
|
||||||
|
for changing policies. That special container is created under the special
|
||||||
|
account ``.change_policy``. The name of a special container should be unique
|
||||||
|
and one-to-one relationship with a container to which policy changing is
|
||||||
|
requested. The name of a special container is simply defined as
|
||||||
|
``<account_name>:<container_name>``. This special account and containers are
|
||||||
|
accessed by the new daemon ``object-transferrer``, which really transfers
|
||||||
|
objects from the old policy to the new policy.
|
||||||
|
|
||||||
|
2. object-transferrer
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Object-transferrer is newly introduced daemon process for changing policies.
|
||||||
|
Object-transferrer reads lists of special containers from the account
|
||||||
|
``.change_policy`` and reads lists of objects from each special container.
|
||||||
|
Object-transferrer transfers those objects from the old policy to the new
|
||||||
|
policy by using internal client. After an object is successfully transferred
|
||||||
|
to the new policy, an object in the old policy will be deleted by DELETE
|
||||||
|
method.
|
||||||
|
|
||||||
|
If transferrer finishes to transfer all objects in a special container, it
|
||||||
|
deletes a special container and deletes sysmetas
|
||||||
|
``X-Container-Sysmeta-Prev-Index`` and ``X-Container-Sysmeta-Objects-Queued``
|
||||||
|
from a container to change that container's status from IN-CHANGING to normal
|
||||||
|
(POLICY CHANGE COMPLETED).
|
||||||
|
|
||||||
|
Example
|
||||||
|
-------
|
||||||
|
|
||||||
|
.. list-table:: Table 4: Example of data transition during changing policies
|
||||||
|
:widths: 1 4 2 4 2
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Step
|
||||||
|
- Description
|
||||||
|
- Container /a/c
|
||||||
|
objects
|
||||||
|
- Container /a/c/ metadata
|
||||||
|
- Container /.change_policy/a:c
|
||||||
|
objects
|
||||||
|
* - | 0
|
||||||
|
- | Init.
|
||||||
|
- | ('o1', 1)
|
||||||
|
| ('o2', 1)
|
||||||
|
| ('o3', 1)
|
||||||
|
- | X-Backend-Storage-Policy-Index: 1
|
||||||
|
- | N/A
|
||||||
|
* - | 1
|
||||||
|
- | POST /a/c X-Forced-Change-Storage-Policy: Pol-2
|
||||||
|
- | ('o1', 1)
|
||||||
|
| ('o2', 1)
|
||||||
|
| ('o3', 1)
|
||||||
|
- | X-Backend-Storage-Policy-Index: 2
|
||||||
|
| X-Container-Sysmeta-Prev-Policy-Index: 1
|
||||||
|
| X-Container-Sysmeta-Objects-Queued: False
|
||||||
|
- | N/A
|
||||||
|
* - | 2
|
||||||
|
- | container-replicator seeks policy changing containers
|
||||||
|
- | ('o1', 1)
|
||||||
|
| ('o2', 1)
|
||||||
|
| ('o3', 1)
|
||||||
|
- | X-Backend-Storage-Policy-Index: 2
|
||||||
|
| X-Container-Sysmeta-Prev-Policy-Index: 1
|
||||||
|
| X-Container-Sysmeta-Objects-Queued: True
|
||||||
|
- | ('o1', 0, 'application/x-transfer-1-to-2')
|
||||||
|
| ('o2', 0, 'application/x-transfer-1-to-2')
|
||||||
|
| ('o3', 0, 'application/x-transfer-1-to-2')
|
||||||
|
* - | 3
|
||||||
|
- | object-transferrer transfers 'o1' and 'o3'
|
||||||
|
- | ('o1', 2)
|
||||||
|
| ('o2', 1)
|
||||||
|
| ('o3', 2)
|
||||||
|
- | X-Backend-Storage-Policy-Index: 2
|
||||||
|
| X-Container-Sysmeta-Prev-Policy-Index: 1
|
||||||
|
| X-Container-Sysmeta-Objects-Queued: True
|
||||||
|
- | ('o2', 0, 'application/x-transfer-1-to-2')
|
||||||
|
* - | 4
|
||||||
|
- | object-transferrer transfers 'o2'
|
||||||
|
- | ('o1', 2)
|
||||||
|
| ('o2', 2)
|
||||||
|
| ('o3', 2)
|
||||||
|
- | X-Backend-Storage-Policy-Index: 2
|
||||||
|
| X-Container-Sysmeta-Prev-Policy-Index: 1
|
||||||
|
| X-Container-Sysmeta-Objects-Queued: True
|
||||||
|
- | Empty
|
||||||
|
* - | 5
|
||||||
|
- | object-transferrer deletes a special container and metadatas from
|
||||||
|
container /a/c
|
||||||
|
- | ('o1', 2)
|
||||||
|
| ('o2', 2)
|
||||||
|
| ('o3', 2)
|
||||||
|
- | X-Backend-Storage-Policy-Index: 2
|
||||||
|
- | N/A
|
||||||
|
|
||||||
|
Above table focuses data transition of a container in changing a storage policy
|
||||||
|
and a corresponding special container. A tuple indicates object info, first
|
||||||
|
element is an object name, second one is a policy index and third one, if
|
||||||
|
available, is a value of content-type, which is defined for policy changing.
|
||||||
|
|
||||||
|
Given that three objects are stored in the container ``/a/c`` as policy-1
|
||||||
|
(Step 0). When the request to change this container's
|
||||||
|
policy to policy-2 is accepted (Step 1), a backend policy index will be
|
||||||
|
changed to 2 and two sysmetas are stored in this container. In the periodical
|
||||||
|
container-replicator process, replicator finds a container with policy change
|
||||||
|
sysmetas and then creates a special container ``/.change_policy/a:c`` with
|
||||||
|
a list of objects (Step 2). Those objects have info of old policy and new policy
|
||||||
|
with the field of content-type. When object-transferrer finds this special
|
||||||
|
container from ``.change_policy`` account, it gets some objects from the old
|
||||||
|
policy (usually from a local device) and puts them to the new policy's storage
|
||||||
|
nodes (Step 3 and 4). If the special container becomes empty (Step 5), it
|
||||||
|
indicates policy changing for that container finished so the special container
|
||||||
|
is deleted and policy changing metadatas of an original container are also
|
||||||
|
deleted.
|
||||||
|
|
||||||
|
Alternatives: As Sub-Function of Container-Reconciler
|
||||||
|
-----------------------------------------------------
|
||||||
|
|
||||||
|
Container-reconciler is a daemon process which restores objects registered in
|
||||||
|
an incorrect policy into a correct policy. Therefore, the reconciling procedure
|
||||||
|
satisfies almost all of functional requirements for policy changing. The
|
||||||
|
advantage of using container-reconciler for policy changing is that we need to
|
||||||
|
modify a very few points of existing Swift sources. However, there is a big
|
||||||
|
problem to use container-reconciler. This problem is that container-reconciler
|
||||||
|
has no function to determine the completeness of changing policy of objects
|
||||||
|
contained in a specific container. As a result, this problem makes it
|
||||||
|
complicated to handle GET/HEAD object from the previous policy and to allow
|
||||||
|
the next storage policy change request. Based on discussion in Swift hack-a-thon
|
||||||
|
(held in Feb. 2015) and Tokyo Summit (held in Oct. 2015), we decided to add
|
||||||
|
object-transferrer to change container's policy.
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
Daisuke Morita (dmorita)
|
||||||
|
|
||||||
|
Milestones
|
||||||
|
----------
|
||||||
|
|
||||||
|
Target Milestone for completion:
|
||||||
|
Mitaka
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
* Add API for Policy Changing
|
||||||
|
|
||||||
|
* Add a middleware 'policy-change' to process Container POST request with
|
||||||
|
"X-Forced-Change-Storage-Policy" header. This middleware stores sysmeta
|
||||||
|
headers to target container DB for policy changing.
|
||||||
|
* Modify container-server to add response headers for Container GET/HEAD
|
||||||
|
request to show the progress of changing policies by exposing all the info
|
||||||
|
from policy_stat table
|
||||||
|
* Modify proxy-server (or add a feature to new middleware) to get object for
|
||||||
|
referring both new and old policy index to allow users' object read during
|
||||||
|
changing policy
|
||||||
|
|
||||||
|
* Add daemon process among storage nodes for policy changing
|
||||||
|
|
||||||
|
* Modify container-replicator to watch a container if it should be initialized
|
||||||
|
(creation of a corresponding special container) for changing policies
|
||||||
|
* Write object-transferrer code
|
||||||
|
* Daemonize object-transferrer
|
||||||
|
|
||||||
|
* Add unit, functional and probe tests to check that new code works
|
||||||
|
intentionally and that it is OK for splitted brain cases
|
||||||
|
|
Loading…
Reference in New Issue