From 0d00d362b7e4911823786876703b7536d7fbc599 Mon Sep 17 00:00:00 2001 From: Daisuke Morita Date: Fri, 27 Mar 2015 15:51:10 +0900 Subject: [PATCH] Changing Policies spec This is the proposal to give swift users power to change storage policies of containers after creating them. Implements: blueprint changing-policies Change-Id: Ia4b3f8471e9b8347439dc2f6c41df15c5d84db8d --- specs/in_progress/changing_policies.rst | 368 ++++++++++++++++++++++++ 1 file changed, 368 insertions(+) create mode 100644 specs/in_progress/changing_policies.rst diff --git a/specs/in_progress/changing_policies.rst b/specs/in_progress/changing_policies.rst new file mode 100644 index 0000000..fb4cf76 --- /dev/null +++ b/specs/in_progress/changing_policies.rst @@ -0,0 +1,368 @@ +:: + + This work is licensed under a Creative Commons Attribution 3.0 + Unported License. + http://creativecommons.org/licenses/by/3.0/legalcode + +============================= +Changing Policy of Containers +============================= + +Our proposal is to give swift users power to change storage policies of +containers and objects which are contained in those containers. + +Problem description +=================== + +Swift currently prohibits users from changing containers' storage policies so +this constraint raises at least two problems. + +One problem is the flexibility. For example, there is an organization using +Swift as a backup storage of office data and all data is archived monthly in a +container named after date like 'backup-201502'. Older archive becomes less +important so users want to reduce the consumed capacity to store it. Then Swift +users will try to change the storage policy of the container into cheaper one +like '2-replica policy' or 'EC policy' but they will be strongly +disappointed to find out that they cannot change the policy of the container +once created. The workaround for this problem is creating other new container +with other storage policy then copying all objects from an existing container +to it but this workaround raises another problem. + +Another problem is the reachability. Copying all files to other container +brings about the change of all files' URLs. That makes users confused and +frustrated. The workaround for this problem is that after copying all files to +new container, users delete an old container and create the same name container +again with other storage policy then copy all objects back to the original name +container. However this obviously involves twice as heavy workload and long +time as a single copy. + +Proposed change +=============== + +The ring normally differs from one policy to another so 'a/c/o' object of +policy 1 is likely to be placed in devices of different nodes from 'a/c/o' +object of policy 0. Therefore, objects replacement associated with the policy +change needs very long time and heavy internal traffic. For this reason, +an user request to change a policy must be translated +into asynchronous behavior of transferring objects among storage nodes which is +driven by background daemons. Obviously, Swift must not suspend any +user's requests to store or get information during changing policies. + +We need to add or modify Swift servers' and daemons' behaviors as follows: + +**Servers' changes** + +1. Adding POST container API to send a request for changing a storage policy + of a container +#. Adding response headers for GET/HEAD container API to notify how many + objects are placed in a new policy or still in an old policy +#. Modifying GET/HEAD object API to get an object even if replicas are placed + in a new policy or in an old policy + +**Daemons' changes** + +1. Adding container-replicator a behavior to watch a container which is + requested to change its storage policy +#. Adding a new background daemon which transfers objects among storage nodes + from an old policy to a new policy + +Servers' changes +---------------- + +1. Add New Behavior for POST Container +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Currently, Swift returns "204 No Content" for the user POST container request +with X-Storage-Policy header. This indicates "nothing done." For the purpose +of maintaining backward compatibility and avoiding accidental execution, we +prefer to remain this behavior unchanged. Therefore, we propose introducing the +new header to 'forcibly' execute policy changing as follows. + +.. list-table:: Table 1: New Request Header to change Storage Policy + :widths: 30 8 12 50 + :header-rows: 1 + + * - Parameter + - Style + - Type + - Description + * - X-Forced-Change-Storage-Policy: (Optional) + - header + - xsd:string + - Change a storage policy of a container to the policy specified by + 'policy_name'. This change accompanies asynchronous background process + to transfer objects. + +Possible responses for this API are as follows. + +.. list-table:: Table 2: Possible Response Codes for the New Request + :widths: 2 8 + :header-rows: 1 + + * - Code + - Notes + * - 202 Accepted + - Accept the request properly and start to prepare objects replacement. + * - 400 Bad Request + - Reject the request with a policy which is deprecated or is not defined + in a configuration file. + * - 409 Conflict + - Reject the request because another changing policy process is not + completed yet (relating to 3-c change) + +When a request of changing policies is accepted (response code is 202), a +target container stores following two sysmetas. + +.. list-table:: Table 3: Container Sysmetas for Changing Policies + :widths: 2 8 + :header-rows: 1 + + * - Sysmeta + - Notes + * - X-Container-Sysmeta-Prev-Index: + - "Pre-change" policy index. It will be used for GET or DELETE objects + which are not transferred to the new policy yet. + * - X-Container-Sysmeta-Objects-Queued: + - This will be used for determining the status of policy changing by + daemon processes. If False, policy change request is accepted but not + ready for objects transferring. If True, objects have been queued to the + special container for policy changing so those are ready for + transferring. If undefined, policy change is not requested to that + container. + +This feature should be implemented as middleware 'change-policy' because of +the following two reasons: + +1. This operation probably should be authorized only to limitted group + (e.g., swift cluster's admin (reseller_admin)) because this operation + occurs heavy internal traffic. + Therefore, authority of this operation should be managed in the middleware + level. +#. This operation needs to POST sysmetas to the container. Sysmeta must be + managed in middleware level according to Swift's design principle + +2. Add Response Headers for GET/HEAD Container +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Objects will be transferred gradually by backend processes. From the viewpoint +of Swift operators, it is important to know the progress of policy changing, +that is, how many objects are already transferred or still remain +untransferred. This can be accomplished by simply exposing policy_stat table of +container DB file for each storage policy. Each policy's stat will be exposed +by ``X-Container-Storage-Policy--Bytes-Used`` and +``X-Container-Storage-Policy--Object-Count`` headers as follows:: + + $ curl -v -X HEAD -H "X-Auth-Token: tkn" http:///v1/AUTH_test/container + < HTTP/1.1 200 OK + < X-Container-Storage-Policy-Gold-Object-Count: 3 + < X-Container-Storage-Policy-Gold-Bytes-Used: 12 + < X-Container-Storage-Policy-Ec42-Object-Count: 7 + < X-Container-Storage-Policy-Ec42-Bytes-Used: 28 + < X-Container-Object-Count: 10 + < X-Container-Bytes-Used: 40 + < Accept-Ranges: bytes + < X-Storage-Policy: ec42 + < ... + +Above response indicates 70% of object transferring is done. + +3. Modify Behavior of GET/HEAD object API +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In my current consideration, object PUT should be done only to the new policy. +This does not affect any object in the previous policy so this makes the +process of changing policies simple. +Therefore, the best way to get an object is firstly sending a GET request to +object servers according to the new policy's ring, and if the response code is +404 NOT FOUND, then a proxy resends GET requests to the previous policy's +object servers. + +However, this behavior is in discussion because sending GET/HEAD requests twice +to object servers can increase the latency of user's GET object request, +especially in the early phase of changing policies. + +Daemons' changes +---------------- + +1. container-replicator +^^^^^^^^^^^^^^^^^^^^^^^ + +To enqueue objects to the list for changing policies, some process must watch +what a container is requested for changing its policy. Adding this task to +container-replicator seems best way because container-replicator originally +has a role to seek all container DBs for sanity check of Swift cluster. +Therefore, this can minimize extra time to lock container DBs for adding this +new feature. + +Container-replicator will check if a container has +``X-Container-Sysmeta-Objects-Queued`` sysmeta and its value is False. Objects +in that container should be enqueued to the object list of a special container +for changing policies. That special container is created under the special +account ``.change_policy``. The name of a special container should be unique +and one-to-one relationship with a container to which policy changing is +requested. The name of a special container is simply defined as +``:``. This special account and containers are +accessed by the new daemon ``object-transferrer``, which really transfers +objects from the old policy to the new policy. + +2. object-transferrer +^^^^^^^^^^^^^^^^^^^^^ + +Object-transferrer is newly introduced daemon process for changing policies. +Object-transferrer reads lists of special containers from the account +``.change_policy`` and reads lists of objects from each special container. +Object-transferrer transfers those objects from the old policy to the new +policy by using internal client. After an object is successfully transferred +to the new policy, an object in the old policy will be deleted by DELETE +method. + +If transferrer finishes to transfer all objects in a special container, it +deletes a special container and deletes sysmetas +``X-Container-Sysmeta-Prev-Index`` and ``X-Container-Sysmeta-Objects-Queued`` +from a container to change that container's status from IN-CHANGING to normal +(POLICY CHANGE COMPLETED). + +Example +------- + +.. list-table:: Table 4: Example of data transition during changing policies + :widths: 1 4 2 4 2 + :header-rows: 1 + + * - Step + - Description + - Container /a/c + objects + - Container /a/c/ metadata + - Container /.change_policy/a:c + objects + * - | 0 + - | Init. + - | ('o1', 1) + | ('o2', 1) + | ('o3', 1) + - | X-Backend-Storage-Policy-Index: 1 + - | N/A + * - | 1 + - | POST /a/c X-Forced-Change-Storage-Policy: Pol-2 + - | ('o1', 1) + | ('o2', 1) + | ('o3', 1) + - | X-Backend-Storage-Policy-Index: 2 + | X-Container-Sysmeta-Prev-Policy-Index: 1 + | X-Container-Sysmeta-Objects-Queued: False + - | N/A + * - | 2 + - | container-replicator seeks policy changing containers + - | ('o1', 1) + | ('o2', 1) + | ('o3', 1) + - | X-Backend-Storage-Policy-Index: 2 + | X-Container-Sysmeta-Prev-Policy-Index: 1 + | X-Container-Sysmeta-Objects-Queued: True + - | ('o1', 0, 'application/x-transfer-1-to-2') + | ('o2', 0, 'application/x-transfer-1-to-2') + | ('o3', 0, 'application/x-transfer-1-to-2') + * - | 3 + - | object-transferrer transfers 'o1' and 'o3' + - | ('o1', 2) + | ('o2', 1) + | ('o3', 2) + - | X-Backend-Storage-Policy-Index: 2 + | X-Container-Sysmeta-Prev-Policy-Index: 1 + | X-Container-Sysmeta-Objects-Queued: True + - | ('o2', 0, 'application/x-transfer-1-to-2') + * - | 4 + - | object-transferrer transfers 'o2' + - | ('o1', 2) + | ('o2', 2) + | ('o3', 2) + - | X-Backend-Storage-Policy-Index: 2 + | X-Container-Sysmeta-Prev-Policy-Index: 1 + | X-Container-Sysmeta-Objects-Queued: True + - | Empty + * - | 5 + - | object-transferrer deletes a special container and metadatas from + container /a/c + - | ('o1', 2) + | ('o2', 2) + | ('o3', 2) + - | X-Backend-Storage-Policy-Index: 2 + - | N/A + +Above table focuses data transition of a container in changing a storage policy +and a corresponding special container. A tuple indicates object info, first +element is an object name, second one is a policy index and third one, if +available, is a value of content-type, which is defined for policy changing. + +Given that three objects are stored in the container ``/a/c`` as policy-1 +(Step 0). When the request to change this container's +policy to policy-2 is accepted (Step 1), a backend policy index will be +changed to 2 and two sysmetas are stored in this container. In the periodical +container-replicator process, replicator finds a container with policy change +sysmetas and then creates a special container ``/.change_policy/a:c`` with +a list of objects (Step 2). Those objects have info of old policy and new policy +with the field of content-type. When object-transferrer finds this special +container from ``.change_policy`` account, it gets some objects from the old +policy (usually from a local device) and puts them to the new policy's storage +nodes (Step 3 and 4). If the special container becomes empty (Step 5), it +indicates policy changing for that container finished so the special container +is deleted and policy changing metadatas of an original container are also +deleted. + +Alternatives: As Sub-Function of Container-Reconciler +----------------------------------------------------- + +Container-reconciler is a daemon process which restores objects registered in +an incorrect policy into a correct policy. Therefore, the reconciling procedure +satisfies almost all of functional requirements for policy changing. The +advantage of using container-reconciler for policy changing is that we need to +modify a very few points of existing Swift sources. However, there is a big +problem to use container-reconciler. This problem is that container-reconciler +has no function to determine the completeness of changing policy of objects +contained in a specific container. As a result, this problem makes it +complicated to handle GET/HEAD object from the previous policy and to allow +the next storage policy change request. Based on discussion in Swift hack-a-thon +(held in Feb. 2015) and Tokyo Summit (held in Oct. 2015), we decided to add +object-transferrer to change container's policy. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + Daisuke Morita (dmorita) + +Milestones +---------- + +Target Milestone for completion: + Mitaka + +Work Items +---------- + +* Add API for Policy Changing + + * Add a middleware 'policy-change' to process Container POST request with + "X-Forced-Change-Storage-Policy" header. This middleware stores sysmeta + headers to target container DB for policy changing. + * Modify container-server to add response headers for Container GET/HEAD + request to show the progress of changing policies by exposing all the info + from policy_stat table + * Modify proxy-server (or add a feature to new middleware) to get object for + referring both new and old policy index to allow users' object read during + changing policy + +* Add daemon process among storage nodes for policy changing + + * Modify container-replicator to watch a container if it should be initialized + (creation of a corresponding special container) for changing policies + * Write object-transferrer code + * Daemonize object-transferrer + +* Add unit, functional and probe tests to check that new code works + intentionally and that it is OK for splitted brain cases +