555 lines
21 KiB
ReStructuredText
555 lines
21 KiB
ReStructuredText
|
|
..
|
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
|
License.
|
|
|
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
|
|
|
==========================================
|
|
Volume Replication
|
|
==========================================
|
|
|
|
https://blueprints.launchpad.net/cinder/+spec/volume-replication
|
|
|
|
Volume replication is a key storage feature and a requirement for
|
|
features such as high-availability and disaster recovery of applications
|
|
running on top of OpenStack clouds.
|
|
This blueprint is an attempt to add initial support for volume replication
|
|
in Cinder, and is considered a first take which will include support for:
|
|
* Replicate volumes (primary to secondary approach)
|
|
* Promote a secondary to primary (and stop replication)
|
|
* Re-enable replication
|
|
* Test that replication is running properly
|
|
|
|
It is important to note that this is a first pass at volume replication.
|
|
The process of implementing replication for drivers has uncovered a
|
|
number of challenges that will be addressed in a future revision of
|
|
replication that will address the ability to have different replication
|
|
types and the ability to replicate across multiple backends.
|
|
|
|
While this blueprint focuses on volume replication, a related blueprint
|
|
focuses on consistency groups, and replication will be extended to
|
|
support it.
|
|
|
|
Use Cases
|
|
=========
|
|
|
|
Problem description
|
|
===================
|
|
|
|
The main use of volume replication is resiliency in presence of failures.
|
|
Examples of possible failures are:
|
|
|
|
* Storage system failure
|
|
* Rack(s) level failure
|
|
* Datacenter level failure
|
|
|
|
Here we specifically exclude failures like media failures, disk failures, etc.
|
|
Such failures are typically addressed by local resiliency schemes.
|
|
|
|
Replication can be implemented in the following ways:
|
|
|
|
* Host-based - Requires Nova integration
|
|
|
|
* Storage-based
|
|
|
|
- Typical block based approach - replication is specified between two
|
|
existing volumes (or groups of volumes) on the controllers.
|
|
- Typically file system based approach - a file
|
|
(in Cinder context, the file representing a block device) placed in a
|
|
directory (or group or fileset, etc) that is automatically copied to a
|
|
specified remote location.
|
|
|
|
Assumptions:
|
|
|
|
* Replication should be transparent to the end-user, failover, failback
|
|
and test will be executed by the cloud admin.
|
|
However, to test that the application is working, the end-user may be
|
|
involved, as they will be required to verify that his application is
|
|
working with the volume replica.
|
|
|
|
* The storage admin will provide the setup and configuration to enable the
|
|
actual replication between the storage systems. This could be performed
|
|
at the storage back-end or storage driver level depending on the storage
|
|
back-end. Specifically, storage drivers are expected to report with whom
|
|
they can replicate and report this to the scheduler.
|
|
|
|
* The cloud admin will enable the replication feature through the use of
|
|
volume types.
|
|
|
|
* The end-user will not be directly exposed to the replication feature.
|
|
Selecting a volume-type will determine if the volume will be replicated,
|
|
based on the actual extra-spec definition of the volume type (defined by
|
|
the cloud admin).
|
|
|
|
* Quota management: quota are consumed as 2x as two volumes are
|
|
created and the consumed space is doubled.
|
|
We can re-examine this mechanism after we get comments from deployers.
|
|
|
|
Proposed change
|
|
===============
|
|
|
|
Introduction:
|
|
|
|
The proposed design provides just a framework in Cinder for backend volume
|
|
drivers to implement volume replication using the facilities in the storage
|
|
backend. As such, this spec provides guidance as to how volume replication
|
|
should be implemented but the actual implementation will vary depending
|
|
upon the backend in question.
|
|
|
|
The key to enabling replication starts with adding an extra spec to the
|
|
volume type to indicate that replication is desired. That extra spec is
|
|
then used in the volume driver to enable the set-up and control of
|
|
replication on the storage backend in each of the different functions
|
|
documented below.
|
|
|
|
Since Cinder is just providing the framework for backend volume drivers
|
|
to implement replication, details of replication implementation are left
|
|
to the backend to implement. The backend driver developer will need to
|
|
decide for their storage backend the best way to enable replication. For
|
|
instance one storage provider may feel that implementing synchronous
|
|
replication is the best choice while another storage provider may choose
|
|
asynchronous. A provider could also choose to make it a configurable
|
|
option. Implementing volume replication in Cinder in this manner allows
|
|
the greatest flexibility to the backend developer to implement replication.
|
|
|
|
It is also important to note that the developer documentation must provide
|
|
examples of how this is implemented in the Storwize driver. This is an
|
|
important item to note as it is not currently possible to demonstrate
|
|
volume replication in Cinder's reference implementation, LVM. Therefore
|
|
developer documentation will have to serve as the reference.
|
|
|
|
Add extra-specs in the volume type to indicate replication:
|
|
|
|
* capabilities:replication <is> True - if True, the volume is to be replicated,
|
|
if supported, by the backend driver. If the option is not specified or
|
|
False, then replication is not enabled. This option is required to enable
|
|
replication.
|
|
|
|
Create volume with replication enabled:
|
|
|
|
* Backend drivers that wish to enable replication will need to update their
|
|
create_volume() function to check for the
|
|
'capabilities:replication <is> True' extra spec. It is up to the backend
|
|
driver developers to implement replication in a manner that is compatible
|
|
with their storage backend.
|
|
|
|
When a replicated volume is created it is expected that the volume dictionary
|
|
will be populated as follows:
|
|
|
|
** volume['replication_status'] = 'copying'
|
|
** volume['replication_extended_status'] = <driver specific value>
|
|
** volume['driver_data'] = <driver specific value>
|
|
|
|
The replica volume is hidden from the end user as the end user will
|
|
never need to directly interact with the replica volume. Any interaction
|
|
with the replica happens through the primary volume.
|
|
|
|
Further details around the dictionary fields above may be seen in the data
|
|
"Data Model Impact" section below.
|
|
|
|
Create Volume from Snapshot:
|
|
|
|
If the volume type extra specs include 'capabilities:replication <is> True'
|
|
for the new volume, the driver needs to create a volume replica at volume
|
|
creation time and set up replication between the newly created volume and its
|
|
associated replica. The volume dictionary should be populated in the same
|
|
manner as create volume.
|
|
|
|
Create Cloned Volume:
|
|
|
|
If the volume type extra specs include 'capabilities:replication <is> True'
|
|
for the new volume, the driver needs to create a volume replica at clone
|
|
creation time and set up replication between the newly created volume and its
|
|
associated replica. The volume dictionary should be populated in the same
|
|
manner as create volume.
|
|
|
|
Create Replica Test Volume:
|
|
|
|
Create a clone of the replica (secondary) volume. This clone can then be
|
|
used for testing replication to ensure that fail-over can be executed when
|
|
necessary. It is important to note that this doesn't actually execute the
|
|
the promote path as the intention is not to promote the replica but it gives
|
|
a method to ensure that the replica contains data and would be useful if
|
|
it had to be promoted.
|
|
|
|
The administrator is able to access this functionality using the
|
|
--source-replica option when creating a volume.
|
|
|
|
Delete volume:
|
|
|
|
For volumes with replication enabled the replica needs to be deleted
|
|
along with the primary copy. So, if a volume type has
|
|
'capabilities:replication <is> True' set, the driver will need to do the
|
|
additional deletion.
|
|
|
|
Get Volume Stats:
|
|
|
|
If the storage backend driver supports replication the following state should
|
|
be reported:
|
|
* replication = True (None or False disables replication)
|
|
|
|
Re-type volume:
|
|
|
|
Changing volume-type is the mechanism an admin can use to make an existing
|
|
volume replicated, or to disable replication for a volume. Change the
|
|
volume-type of a volume to a volume-type that includes
|
|
'capabilities:replication: <is> True' (and didn't have it before) should
|
|
result in adding a secondary copy to a volume. Change the volume-type of
|
|
a volume to a volume-type that no longer includes
|
|
'capabilities:replication: <is> True' should result in removing the secondary
|
|
copy while preserving the primary copy.
|
|
|
|
Returns either:
|
|
A boolean indicating whether the retype occurred, or
|
|
A tuple (retyped, model_update) where retyped is a boolean
|
|
indicating if the retype occurred, and the model_update includes
|
|
changes for the volume db.
|
|
|
|
The steps to implement this would look as follows:
|
|
* Do a diff['extra_specs'] and see if 'replication' is included.
|
|
* If replication was enabled for the original volume_type but is not
|
|
not enabled for the new volume_type, then replication should be disabled.
|
|
* The replica should be deleted.
|
|
* The volume dictionary should be updated as follows:
|
|
** volume['replication_status'] = 'disabled'
|
|
** volume['replication_extended_status'] = None
|
|
** volume['driver_data'] = None
|
|
* If replication was not enabled for the original volume_type but is
|
|
enabled for the new volume_type, then replication should be enabled.
|
|
* A volume replica should be created and the replication should
|
|
be set up between the volume and the newly created replica.
|
|
* The volume dictionary should be updated as follows:
|
|
** volume['replication_status'] = 'copying'
|
|
** volume['replication_extended_status'] = <driver specific value>
|
|
** volume['driver_data'] = <driver specific value>
|
|
|
|
Get Replication Status:
|
|
|
|
This will be used to update the status of replication between the primary and
|
|
secondary volume.
|
|
|
|
This function is called by the "_update_replication_relationship_status"
|
|
function in 'manager.py' and is the mechanism to update the status
|
|
replication between the primary and secondary copies.
|
|
|
|
The actual state of the replication, as the storage backed is aware of,
|
|
should be returned and the Cinder database should be updated to reflect the
|
|
status reported from the storage backend.
|
|
|
|
It is expected that the following model update for the volume will
|
|
happen:
|
|
|
|
* volume['replication_status'] = <error | copying | active | active-stopped |
|
|
inactive>
|
|
** 'error' if an error occurred with replication.
|
|
** 'copying' replication copying data to secondary (inconsistent)
|
|
** 'active' replication copying data to secondary (consistent)
|
|
** 'active-stopped' replication data copy on hold (consistent)
|
|
** 'inactive' if replication data copy is stopped (inconsistent)
|
|
* volume['replication_extended_status'] = <driver specific value>
|
|
* volume['driver_data'] = <driver specific value>
|
|
|
|
Note for get replication status, that the replication_extended_status and
|
|
driver_data may not need to be updated.
|
|
|
|
Promote replica:
|
|
|
|
Promotion of a replica means that the secondary volume will take over
|
|
for the primary volume. This can be thought of as a 'fail over' operation.
|
|
Once promotion has happened replication between the two volumes, at the
|
|
storage level, should be stopped, the replica should be available to be
|
|
attached and the replication status should be changed to 'inactive' if the
|
|
change is successful, otherwise it should be 'error'.
|
|
|
|
A model update for the volume is returned.
|
|
|
|
As with the functions above, the volume driver is expected to update the
|
|
volume dictionary as follows:
|
|
* volume['replication_status'] = <error | inactive>
|
|
** 'error' if an error occurred with replication.
|
|
** 'inactive' if replication data copy on hold (inconsistent)
|
|
* volume['replication_extended_status'] = <driver specific value>
|
|
* volume['driver_data'] = <driver specific value>
|
|
|
|
Re-enable replication:
|
|
|
|
Re-enabling replication would be used to fix the replication between
|
|
the primary and secondary volumes. Replication would need to be
|
|
re-enabled as part of the fail-back process to make the promoted
|
|
volume and the old primary volume consistent again.
|
|
|
|
The volume driver returns a model update to reflect the actions taken.
|
|
|
|
The backend driver is expected to update the following volume dictionary
|
|
entries:
|
|
* volume['replication_status'] = <error | copying | active | active-stopped |
|
|
inactive>
|
|
** 'error' if an error occurred with replication.
|
|
** 'copying' replication copying data to secondary (inconsistent)
|
|
** 'active' replication copying data to secondary (consistent)
|
|
** 'active-stopped' replication data copy on hold (consistent)
|
|
** 'inactive' if replication data copy is stopped (inconsistent)
|
|
* volume['replication_extended_status'] = <driver specific value>
|
|
* volume['driver_data'] = <driver specific value>
|
|
|
|
Notes:
|
|
|
|
The replication_extended_status should be used to store information that
|
|
the backend driver will need to track replication status. For instance,
|
|
the Storwize driver, will use the replication_extended_status to track
|
|
the primary copy status and synchronization status for the primary volume
|
|
and the copy status, synchronization status and synchronization progress for
|
|
the replica (secondary) volume.
|
|
|
|
The driver_data field may be, optionally, used to contain any additional data
|
|
that the backend driver may require. Some backend drivers may not need to
|
|
use the driver_data field.
|
|
|
|
Driver API:
|
|
|
|
* promote: Promotes a replica that is in active or active-stopped state to
|
|
be the primary.
|
|
* reenable: Reenables replication on a volume that is in inactive,
|
|
active-stopped or error status.
|
|
|
|
|
|
Alternatives
|
|
------------
|
|
|
|
Replication can be performed outside of Cinder, and OpenStack can be
|
|
unaware of it. However, this requires vendor specific scripts, and
|
|
is not visible to the admin user, as only the storage system admin
|
|
will see the replica and the state of the replication.
|
|
Also all recovery actions (failover, failback) will require both the
|
|
the storage and cloud admins to work together.
|
|
While replication in Cinder reduces the role of the storage admin to
|
|
only the setup phase, and the cloud admin is responsible for failover
|
|
and failback with (typically) no need for intervention from the cloud
|
|
admin.
|
|
|
|
Data model impact
|
|
-----------------
|
|
|
|
* The volumes table will be updated:
|
|
** Add replication_status column (string) for indicating the status of
|
|
replication for a give volume. Possible values are:
|
|
*** 'copying' - Data is being copied between volumes, the secondary is
|
|
inconsistent.
|
|
*** 'disabled' - Volume replication is disabled.
|
|
*** 'error' - Replication is in error state.
|
|
*** 'active' - Data is being copied to the secondary and the secondary is
|
|
consistent.
|
|
*** 'active-stopped' - Data is not being copied to the secondary (on hold),
|
|
the secondary volume is consistent.
|
|
*** 'inactive' - Data is not being copied to the secondary, the secondary
|
|
copy is inconsistent.
|
|
** Add replication_extended_status column to contain details with regards
|
|
to replication status of the primary and secondary volumes.
|
|
** Add replication_driver_data column to contain additional details that
|
|
may be needed by a vendor's driver to implement replication on a backend.
|
|
|
|
|
|
State diagram for replication (status)
|
|
|
|
::
|
|
|
|
<start>
|
|
any error
|
|
condition +-------+
|
|
Create volume +-----+ +------------> | error |
|
|
| +---+---+
|
|
| | Storage admin to
|
|
| | fix, and status
|
|
| | check will update
|
|
+-----+-----+ |
|
|
+-------------> | copying | any state <-------+
|
|
| +-----+-----+
|
|
| |
|
|
| status |
|
|
| check | status check
|
|
| +----+-----+ +------> +----------------+
|
|
| | active | | active-stopped |
|
|
| +----+-----+ <------+ +----------------+
|
|
| | status check
|
|
| |
|
|
| | promote to primary
|
|
| |
|
|
| re-enable +----+-----+
|
|
+------------+ | inactive |
|
|
+----------+
|
|
|
|
<end>
|
|
|
|
REST API impact
|
|
---------------
|
|
|
|
Create volume API will have "source-replica" added:
|
|
|
|
{
|
|
"volume":
|
|
{
|
|
"source-replica": "Volume uuid of primary to clone",
|
|
}
|
|
}
|
|
|
|
|
|
* Promote volume to be the primary volume
|
|
|
|
* Promote the secondary copy to be primary. the primary will become
|
|
secondary and Replication should become inactive.
|
|
* Method type: POST
|
|
* Normal Response Code: 202
|
|
* Expected error http response code(s)
|
|
|
|
* 500: Replication is not enabled for volume
|
|
* 500: Replication status for volume must be active or active-stopped,
|
|
but current status is: <status>
|
|
* 500: Volume status for volume must be available, but current status
|
|
is: <status>
|
|
|
|
* V2/<tenant id>/volumes/os-promote-replica/<volume uuid>
|
|
* This API has no body
|
|
|
|
|
|
* Re-enable replication between the primary and secondary volume.
|
|
|
|
* Re-enable the replication between the primary and secondary volume.
|
|
Typically follows a promote operation on the replication.
|
|
* Method type: POST
|
|
* Normal Response Code: 202
|
|
* Expected error http response code(s)
|
|
|
|
* 500: Replication is not enabled
|
|
* 500: Replication status for volume must be inactive, active-stopped,
|
|
or error, but current status is: <status>
|
|
|
|
* /v2/<tenant id>/volumes/os-reenable-replica/<volume uuid>
|
|
* This API has no body
|
|
|
|
Security impact
|
|
---------------
|
|
|
|
* Does this change touch sensitive data such as tokens, keys, or user data?
|
|
*No*.
|
|
|
|
* Does this change alter the API in a way that may impact security, such as
|
|
a new way to access sensitive information or a new way to login?
|
|
*No*.
|
|
|
|
* Does this change involve cryptography or hashing?
|
|
*No*.
|
|
|
|
* Does this change require the use of sudo or any elevated privileges?
|
|
*No*.
|
|
|
|
* Does this change involve using or parsing user-provided data? This could
|
|
be directly at the API level or indirectly such as changes to a cache layer.
|
|
*No*.
|
|
|
|
* Can this change enable a resource exhaustion attack, such as allowing a
|
|
single API interaction to consume significant server resources? Some
|
|
examples of this include launching subprocesses for each connection, or
|
|
entity expansion attacks in XML.
|
|
*Yes*, enabling replication consume cloud and storage resources.
|
|
|
|
Notifications impact
|
|
--------------------
|
|
|
|
Will add notification for promoting and re-enabling replication for
|
|
volumes.
|
|
|
|
Other end user impact
|
|
---------------------
|
|
|
|
* End-user to use volume types to enable replication.
|
|
|
|
* Cloud admin to use the *replication-promote*, *replication-reenable* and
|
|
*create --source-replica* commands in the python-cinderclient to execute
|
|
failover, failback and test.
|
|
|
|
Performance Impact
|
|
------------------
|
|
|
|
* Extra db calls identifying if replication exists are added to retype,
|
|
snapshot operations, etc will add a small latency to these functions.
|
|
|
|
Other deployer impact
|
|
---------------------
|
|
|
|
* Added options for volume types (see above)
|
|
|
|
* Add new driver capabilities, needs to be supported by the volume drivers,
|
|
which may imply changes to the driver configuration options.
|
|
|
|
* This change will require explicit enablement (to be used by users)
|
|
from the cloud administrator.
|
|
|
|
Developer impact
|
|
----------------
|
|
|
|
* Change to the driver API is noted above. Third party backends that wish
|
|
to enable replication will need to add replication support to their driver.
|
|
|
|
* The API will expand to include consistency groups following the merge of
|
|
consistency group support to Cinder.
|
|
|
|
|
|
Implementation
|
|
==============
|
|
|
|
Assignee(s)
|
|
-----------
|
|
|
|
Primary assignee:
|
|
ronenkat
|
|
|
|
Other contributors:
|
|
Jay Bryant - E-Mail: jsbryant@us.ibm.com IRC: jungleboyj
|
|
|
|
Work Items
|
|
----------
|
|
|
|
* Cinder public (admin) APIs for replication
|
|
* DB schema updates for replication
|
|
* Cinder driver API additions for replication
|
|
* Cinder manager update for replication
|
|
* Testing
|
|
|
|
|
|
Dependencies
|
|
============
|
|
|
|
* Related blueprints: Consistency groups
|
|
https://blueprints.launchpad.net/cinder/+spec/consistency-groups
|
|
|
|
* LVM to support replication using DRBD, in a separate contribution.
|
|
|
|
Testing
|
|
=======
|
|
|
|
* Testing in gate is not supported due to the following considerations:
|
|
|
|
* LVM has no replication support, to be addressed using DRBD in a separate
|
|
contribution.
|
|
* requires setting up at least two nodes using DRBD
|
|
|
|
* Should be discussed/addressed as support for LVM is added.
|
|
|
|
* 3rd party driver CI will be expected to test replication.
|
|
|
|
Documentation Impact
|
|
====================
|
|
|
|
* Public (admin) API changes.
|
|
* Details how replication is used by leveraging volume types.
|
|
* Driver docs explaining how replication is setup for each driver.
|
|
* Provide examples of volume replication implementation for
|
|
the Storwize backend.
|
|
|
|
References
|
|
==========
|
|
Etherpad on improvements needed in documentation:
|
|
https://etherpad.openstack.org/p/cinder-replication-redoc
|
|
|