Merge "Add volume replication support"
This commit is contained in:
commit
59e7c7572f
505
specs/juno/volume-replication.rst
Normal file
505
specs/juno/volume-replication.rst
Normal file
@ -0,0 +1,505 @@
|
|||||||
|
|
||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
==========================================
|
||||||
|
Volume Replication
|
||||||
|
==========================================
|
||||||
|
|
||||||
|
https://blueprints.launchpad.net/cinder/+spec/volume-replication
|
||||||
|
|
||||||
|
Volume replication is a key storage feature and a requirement for
|
||||||
|
features such as high-availability and disaster recovery of applications
|
||||||
|
running on top of OpenStack clouds.
|
||||||
|
This blueprint is an attempt to add initial support for volume replication
|
||||||
|
in Cinder, and is considered a first take, and will include support for:
|
||||||
|
* Replicate volumes (primary to secondary approach)
|
||||||
|
* Promote a secondary to primary (and stop replication)
|
||||||
|
* Synchronize replication with direction
|
||||||
|
|
||||||
|
This would further be enhanced in the future.
|
||||||
|
|
||||||
|
While this blueprint focuses on volume replication, a related blueprint
|
||||||
|
focuses on consistency groups, and replication would be extended to
|
||||||
|
support it.
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
The main use of volume replication is resiliency in presence of failures.
|
||||||
|
Examples of possible failures are:
|
||||||
|
|
||||||
|
* Storage system failure
|
||||||
|
* Rack(s) level failure
|
||||||
|
* Datacenter level failure
|
||||||
|
|
||||||
|
Here we specifically exclude failures like media failures, disk failures, etc.
|
||||||
|
Such failures are typically addressed by local resiliency schemes.
|
||||||
|
|
||||||
|
Replication can be implemented in the following ways:
|
||||||
|
|
||||||
|
* Host-based - Requires Nova integration
|
||||||
|
|
||||||
|
* Storage-based
|
||||||
|
|
||||||
|
- Typical block based approach - replication is specified between two
|
||||||
|
existing volumes (or groups of volumes) on the controllers.
|
||||||
|
- Typically file system based approach - a file
|
||||||
|
(in Cinder context, the file representing a block device) placed in a
|
||||||
|
directory (or group or fileset, etc) that is automatically copied to a
|
||||||
|
specified remote location.
|
||||||
|
|
||||||
|
Assumptions:
|
||||||
|
|
||||||
|
* Replication should be transparent to the end-user, failover, failback
|
||||||
|
and test will be executed by the cloud admin.
|
||||||
|
However, to test that the application is working, the end-user may be
|
||||||
|
involved, as he will be required to verify that his application is
|
||||||
|
working with the volume replica.
|
||||||
|
|
||||||
|
* The storage admin will provide the setup and configuration to enable the
|
||||||
|
actual replication between the storage systems. This could be performed
|
||||||
|
at the storage back-end or storage driver level depending on the storage
|
||||||
|
back-end. Specifically, storage drivers are expected to report with whom
|
||||||
|
they can replicate and report this to the scheduler.
|
||||||
|
|
||||||
|
* The cloud admin will enable the replication feature through the use of
|
||||||
|
volume types.
|
||||||
|
|
||||||
|
* The end-user will not be directly exposed to the replication feature.
|
||||||
|
Selecting a volume-type will determine if the volume will be replicated,
|
||||||
|
based on the actual extra-spec definition of the volume type (defined by
|
||||||
|
the cloud admin).
|
||||||
|
|
||||||
|
* Quota management: quota are consumed as 2x as two volumes are
|
||||||
|
created and the consumed space id double.
|
||||||
|
We can re-examine this mechanism after we get comments from deployers.
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
Each Cinder host will report replication capabilities:
|
||||||
|
|
||||||
|
* Replication_support: indicate if replication is enabled for this driver
|
||||||
|
instance
|
||||||
|
* Replication_unit_id: device specific id used for replication
|
||||||
|
* Replication_partners: list of device specific ids that this node can
|
||||||
|
replicate with
|
||||||
|
* Replication_rpo_range - supported RPO by this driver instance <min,max>
|
||||||
|
* replication_supported_methods - list of methods supported by the back-end
|
||||||
|
|
||||||
|
Add extra-specs in the volume type to indicate replication:
|
||||||
|
|
||||||
|
* Replication_enabled - if True, volume to be replicated if exists as extra
|
||||||
|
specs. if option is not specified or False, then replication is not
|
||||||
|
enabled. This option is required to enable replication.
|
||||||
|
* replica_same_az - (optional) indicate if replica should be in the same AZ
|
||||||
|
* replica_volume_backend_name - (optional) specify back-end to be used as
|
||||||
|
target
|
||||||
|
* replication_target_rpo - (optional) requested RPO (numeric, minutes) for
|
||||||
|
the volume
|
||||||
|
|
||||||
|
Create volume with replication enabled:
|
||||||
|
|
||||||
|
* Scheduler selects two hosts for volume placement and sets up the replication
|
||||||
|
DB entry
|
||||||
|
* Manager on primary creates the primary volume (as is done today)
|
||||||
|
* Manager on secondary creates the replica volume
|
||||||
|
* Manager on primary sets up the replication
|
||||||
|
|
||||||
|
Re-type volume:
|
||||||
|
|
||||||
|
* Replication_enabled: True->False:
|
||||||
|
drop the replication and continue with the regular retype logic.
|
||||||
|
* Replication_enabled: False->True:
|
||||||
|
after the retype logic selects back-ends (scheduler) and enables
|
||||||
|
replication.
|
||||||
|
|
||||||
|
Promote to primary:
|
||||||
|
|
||||||
|
* Manager on secondary stops the replication.
|
||||||
|
* Switch between volume ids of primary and secondary
|
||||||
|
(user sees no change in volume ids)
|
||||||
|
|
||||||
|
Sync replication:
|
||||||
|
|
||||||
|
* Manager on primary restarts the replication
|
||||||
|
|
||||||
|
Test:
|
||||||
|
|
||||||
|
* Create a clone of the secondary volume.
|
||||||
|
|
||||||
|
Delete volume:
|
||||||
|
|
||||||
|
* Disable the replication
|
||||||
|
* Delete secondary volume
|
||||||
|
* Delete primary volume (as is done today)
|
||||||
|
|
||||||
|
Cloning a volume:
|
||||||
|
|
||||||
|
* Since the replica are added after the primary is created, if we
|
||||||
|
clone a volume and keep the volume-type, it will be replicated.
|
||||||
|
|
||||||
|
Snapshots:
|
||||||
|
|
||||||
|
* Snapshot for the primary volume works as is today, and create
|
||||||
|
a snapshot on the primary. No snapshot is done for the replica.
|
||||||
|
* Snapshot for the replica (secondary) volume will fail.
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
* Manager acts via the driver for back-end replication specific functions.
|
||||||
|
* Failover is "promote to primary" as described above.
|
||||||
|
* Failback is "sync replication" + "promote to primary".
|
||||||
|
|
||||||
|
Driver API:
|
||||||
|
|
||||||
|
* create_replica: to be run on secondary to create the volume
|
||||||
|
* enable_replica: to be run on primary to start replication
|
||||||
|
* disable_replica: to be run on primary, stops the replication
|
||||||
|
* delete_replica: to be run on secondary, deletes the replica target volume
|
||||||
|
* replication_status_check: to be run on all hosts, updating the replication
|
||||||
|
status as observed from the back-end perspective
|
||||||
|
* promote_replica: to be run on secondary, make secondary the primary
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
Replication can be performed outside of Cinder, and OpenStack can be
|
||||||
|
unaware of it. However, this requires vendor specific scripts, and
|
||||||
|
is not visible to the admin user, as only the storage system admin
|
||||||
|
will see the replica and the state of the replication.
|
||||||
|
Also all recovery actions (failover, failback) will require both the
|
||||||
|
the storage and cloud admins to work together.
|
||||||
|
While replication in Cinder reduces the role of the storage admin to
|
||||||
|
only the setup phase, and the cloud admin is responsible for failover
|
||||||
|
and failback with (typically) not need for intervention from the clouds
|
||||||
|
admin.
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
* A new replication relationship table will be created.
|
||||||
|
(with its database migration support).
|
||||||
|
|
||||||
|
* On promote to primary, the ids of the primary and secondary volume entries
|
||||||
|
will change (switch).
|
||||||
|
|
||||||
|
Replication relationship db table:
|
||||||
|
|
||||||
|
* id = Column(String(36), primary_key=True)
|
||||||
|
* deleted = Column(Boolean, default=False)
|
||||||
|
* primary_id = Column(String(36), ForeignKey('volumes.id'), nullable=False)
|
||||||
|
* secondary_id = Column(String(36), ForeignKey('volumes.id'), nullable=False)
|
||||||
|
* primary_replication_unit_id = Column(String(255))
|
||||||
|
* secondary_replication_unit_id = Column(String(255))
|
||||||
|
* status = Column(Enum('error', 'creating', 'copying', 'active', 'active-stopped',
|
||||||
|
'stopping', 'deleting', 'deleted', 'inactive',
|
||||||
|
name='replicationrelationship_status'))
|
||||||
|
* extended_status = Column(String(255))
|
||||||
|
* driver_data = Column(String(255))
|
||||||
|
|
||||||
|
State diagram for replication (status)::
|
||||||
|
<start>
|
||||||
|
any error
|
||||||
|
Create replica +----------+ condition +-------+
|
||||||
|
+--------------> | creating | +------------> | error |
|
||||||
|
+----+-----+ +---+---+
|
||||||
|
| | Storage admin to
|
||||||
|
| enable replication | fix, and status
|
||||||
|
| | check will update
|
||||||
|
+----+-----+ |
|
||||||
|
+-------------> | copying | any state <--------+
|
||||||
|
| +----+-----+
|
||||||
|
| |
|
||||||
|
| status |
|
||||||
|
| check | status check
|
||||||
|
| +----++----+ +------> +--+--+-+--------+
|
||||||
|
| | active | | active-stopped |
|
||||||
|
| +----++----+ <------+ +--+--+-+--------+
|
||||||
|
| | status check
|
||||||
|
| |
|
||||||
|
| | promote to primary
|
||||||
|
| |
|
||||||
|
| sync +----+--+--+
|
||||||
|
+------------+ | inactive |
|
||||||
|
+-------+--+
|
||||||
|
<end>
|
||||||
|
|
||||||
|
REST API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
* Show replication relationship
|
||||||
|
|
||||||
|
* Show information about a volume replication relationship.
|
||||||
|
* Method type: GET
|
||||||
|
* Normal Response Code: 200
|
||||||
|
* Expected error http response code(s)
|
||||||
|
|
||||||
|
* 404: replication relationship not found
|
||||||
|
|
||||||
|
* /v2/<tenant id>/os-volume-replication/<replication uuid>
|
||||||
|
* JSON schema definition for the response data::
|
||||||
|
|
||||||
|
{
|
||||||
|
'relationship':
|
||||||
|
{
|
||||||
|
'id': 'relationship id'
|
||||||
|
'primary_id': 'primary volume uuid'
|
||||||
|
'status': 'status of relationship'
|
||||||
|
'links': '{ ... }'
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
* Show replication relationship with details
|
||||||
|
|
||||||
|
* Show detailed information about a volume replication relationship.
|
||||||
|
* Method type: GET
|
||||||
|
* Normal Response Code: 200
|
||||||
|
* Expected error http response code(s)
|
||||||
|
|
||||||
|
* 404: replication relationship not found
|
||||||
|
|
||||||
|
* /v2/<tenant id>/os-volume-replication/<replication uuid>/detail
|
||||||
|
* JSON schema definition for the response data::
|
||||||
|
|
||||||
|
{
|
||||||
|
'relationship':
|
||||||
|
{
|
||||||
|
'id': 'relationship id'
|
||||||
|
'primary_id': 'primary volume uuid'
|
||||||
|
'secondary_id': 'secondary volume uuid'
|
||||||
|
'status': 'status of relationship'
|
||||||
|
'extended_status': 'extended status'
|
||||||
|
'links': { ... }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
* List replication relationship with details
|
||||||
|
|
||||||
|
* List detailed information about a volume replication relationship.
|
||||||
|
* Method type: GET
|
||||||
|
* Normal Response Code: 200
|
||||||
|
* Expected error http response code(s)
|
||||||
|
|
||||||
|
* TBD
|
||||||
|
|
||||||
|
* /v2/<tenant id>/os-volume-replication/detail
|
||||||
|
* Parameters:
|
||||||
|
|
||||||
|
*status*
|
||||||
|
filter by replication relationship status
|
||||||
|
*primary_id*
|
||||||
|
Filter by primary volume id
|
||||||
|
*secondary_id*
|
||||||
|
Filter by secondary volume id
|
||||||
|
|
||||||
|
* JSON schema definition for the response data::
|
||||||
|
|
||||||
|
{
|
||||||
|
'relationship':
|
||||||
|
{
|
||||||
|
'id': 'relationship id'
|
||||||
|
'primary_id': 'primary volume uuid'
|
||||||
|
'secondary_id': 'secondary volume uuid'
|
||||||
|
'status': 'status of relationship'
|
||||||
|
'extended_status': 'extended status'
|
||||||
|
'links': { ... }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
* Promote volume to be the primary volume
|
||||||
|
|
||||||
|
* Switch between the uuids of the primary and secondary volumes, and
|
||||||
|
make the secondary volume the primary volume.
|
||||||
|
* Method type: PUT
|
||||||
|
* Normal Response Code: 202
|
||||||
|
* Expected error http response code(s)
|
||||||
|
|
||||||
|
* 404: replication relationship not found
|
||||||
|
|
||||||
|
* /v2/<tenant id>/os-volume-replication/<replication uuid>
|
||||||
|
* JSON schema definition for the body data::
|
||||||
|
|
||||||
|
{
|
||||||
|
'relationship':
|
||||||
|
{
|
||||||
|
'promote': None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
* Sync between the primary and secondary volume.
|
||||||
|
|
||||||
|
* Resync the replication between the primary and secondary volume.
|
||||||
|
Typically follows a promote operation on the replication.
|
||||||
|
* Method type: PUT
|
||||||
|
* Normal Response Code: 202
|
||||||
|
* Expected error http response code(s)
|
||||||
|
|
||||||
|
* 404: replication relationship not found
|
||||||
|
|
||||||
|
* /v2/<tenant id>/os-volume-replication/<replication uuid>
|
||||||
|
* JSON schema definition for the body data::
|
||||||
|
|
||||||
|
{
|
||||||
|
'relationship':
|
||||||
|
{
|
||||||
|
'sync': None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
* Test replication by make a copy of the secondary volume available
|
||||||
|
|
||||||
|
* Test the volume replication. Create a clone of the secondary volume
|
||||||
|
and make it accessible, so the promote process can be tested.
|
||||||
|
* Method type: POST
|
||||||
|
* Normal Response Code: 202
|
||||||
|
* Expected error http response code(s)
|
||||||
|
|
||||||
|
* 404: replication relationship not found
|
||||||
|
|
||||||
|
* /v2/<tenant id>/os-volume-replication/<replication uuid>/test
|
||||||
|
* JSON schema definition for the response data::
|
||||||
|
|
||||||
|
{
|
||||||
|
'relationship':
|
||||||
|
{
|
||||||
|
'volume_id': 'volume id of the cloned secondary'
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
* Does this change touch sensitive data such as tokens, keys, or user data?
|
||||||
|
*No*.
|
||||||
|
|
||||||
|
* Does this change alter the API in a way that may impact security, such as
|
||||||
|
a new way to access sensitive information or a new way to login?
|
||||||
|
*No*.
|
||||||
|
|
||||||
|
* Does this change involve cryptography or hashing?
|
||||||
|
*No*.
|
||||||
|
|
||||||
|
* Does this change require the use of sudo or any elevated privileges?
|
||||||
|
*No*.
|
||||||
|
|
||||||
|
* Does this change involve using or parsing user-provided data? This could
|
||||||
|
be directly at the API level or indirectly such as changes to a cache layer.
|
||||||
|
*No*.
|
||||||
|
|
||||||
|
* Can this change enable a resource exhaustion attack, such as allowing a
|
||||||
|
single API interaction to consume significant server resources? Some
|
||||||
|
examples of this include launching subprocesses for each connection, or
|
||||||
|
entity expansion attacks in XML.
|
||||||
|
*Yes*, enabling replication consume cloud and storage resources.
|
||||||
|
|
||||||
|
Notifications impact
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
Will add notification for enabling replication, promoting, syncing and
|
||||||
|
dropping replication.
|
||||||
|
|
||||||
|
Other end user impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
* End-user to use volume types to enable/disable replication.
|
||||||
|
|
||||||
|
* Cloud admin to use the *promote*, *sync* and *test* commands
|
||||||
|
in the python-cinderclient to execute failover, failback and test.
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
* Scheduler now needs to choose two hosts instead of one based on
|
||||||
|
additional input from the driver and volume type.
|
||||||
|
|
||||||
|
* The periodic task will query the driver and back-end for status
|
||||||
|
of all replicated volumes - running on the primary and secondary.
|
||||||
|
|
||||||
|
* Extra db calls identifying if replication exists are added to retype,
|
||||||
|
snapshot operations, etc will add a small latency to these functions.
|
||||||
|
|
||||||
|
Other deployer impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
* Added options for volume types (see above)
|
||||||
|
|
||||||
|
* Add new driver capabilities, needs to be supported by the volume drivers,
|
||||||
|
which may imply changes to the driver configuration options.
|
||||||
|
|
||||||
|
* This change will require explicit enablement (to be used by users)
|
||||||
|
from the cloud administrator.
|
||||||
|
|
||||||
|
Developer impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
* Change to the driver API is noted above. Basically new functions are
|
||||||
|
needed to support using replication.
|
||||||
|
|
||||||
|
* The API will expand to include consistency groups following merging
|
||||||
|
consistency group support to Cinder.
|
||||||
|
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
ronenkat
|
||||||
|
|
||||||
|
Other contributors:
|
||||||
|
None
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
* Cinder public (admin) APIs for replication
|
||||||
|
* DB schema for replication
|
||||||
|
* Cinder scheduler support for replication
|
||||||
|
* Cinder driver API additions for replication
|
||||||
|
* Cinder manager update for replication
|
||||||
|
* Testing
|
||||||
|
|
||||||
|
Note: Code is based on https://review.openstack.org/#/c/64026/ which was
|
||||||
|
submitted in the Icehouse development cycle.
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
* Related blueprints: Consistency groups
|
||||||
|
https://blueprints.launchpad.net/cinder/+spec/consistency-groups
|
||||||
|
|
||||||
|
* LVM to support replication using DRBD, in a separate contribution.
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
* Testing in gate is not supported due to the following considerations:
|
||||||
|
|
||||||
|
* LVM has no replication support, to be addressed using DRBD in a separate
|
||||||
|
contribution.
|
||||||
|
* requires setting up at least two nodes using DRBD
|
||||||
|
|
||||||
|
* Should be discussed/addressed as support for LVM is added.
|
||||||
|
|
||||||
|
* 3rd party driver CI will be expected to test replication.
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
* Public (admin) API changes.
|
||||||
|
* Details how replication is used by leveraging volume types.
|
||||||
|
* Driver docs explaining how replication is setup for each driver.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
* Volume replication design session
|
||||||
|
https://etherpad.openstack.org/p/juno-cinder-volume-replication
|
||||||
|
|
Loading…
x
Reference in New Issue
Block a user