Merge "Add volume replication support"

2014-06-19 15:28:50 +00:00 · 2014-06-19 15:28:50 +00:00 · 59e7c7572f
commit 59e7c7572f
parent 765c3cf75d d7d3ac33f8
1 changed files with 505 additions and 0 deletions
--- a/specs/juno/volume-replication.rst
+++ b/specs/juno/volume-replication.rst
@ -0,0 +1,505 @@
 ..
 This work is licensed under a Creative Commons Attribution 3.0 Unported
 License.
 http://creativecommons.org/licenses/by/3.0/legalcode
 ==========================================
 Volume Replication
 ==========================================
 https://blueprints.launchpad.net/cinder/+spec/volume-replication
 Volume replication is a key storage feature and a requirement for
 features such as high-availability and disaster recovery of applications
 running on top of OpenStack clouds.
 This blueprint is an attempt to add initial support for volume replication
 in Cinder, and is considered a first take, and will include support for:
 * Replicate volumes (primary to secondary approach)
 * Promote a secondary to primary (and stop replication)
 * Synchronize replication with direction
 This would further be enhanced in the future.
 While this blueprint focuses on volume replication, a related blueprint
 focuses on consistency groups, and replication would be extended to
 support it.
 Problem description
 ===================
 The main use of volume replication is resiliency in presence of failures.
 Examples of possible failures are:
 * Storage system failure
 * Rack(s) level failure
 * Datacenter level failure
 Here we specifically exclude failures like media failures, disk failures, etc.
 Such failures are typically addressed by local resiliency schemes.
 Replication can be implemented in the following ways:
 * Host-based - Requires Nova integration
 * Storage-based
  - Typical block based approach - replication is specified between two
    existing volumes (or groups of volumes) on the controllers.
  - Typically file system based approach - a file
    (in Cinder context, the file representing a block device) placed in a
    directory (or group or fileset, etc) that is automatically copied to a
    specified remote location.
 Assumptions:
 * Replication should be transparent to the end-user, failover, failback
  and test will be executed by the cloud admin.
  However, to test that the application is working, the end-user may be
  involved, as he will be required to verify that his application is
  working with the volume replica.
 * The storage admin will provide the setup and configuration to enable the
  actual replication between the storage systems. This could be performed
  at the storage back-end or storage driver level depending on the storage
  back-end. Specifically, storage drivers are expected to report with whom
  they can replicate and report this to the scheduler.
 * The cloud admin will enable the replication feature through the use of
  volume types.
 * The end-user will not be directly exposed to the replication feature.
  Selecting a volume-type will determine if the volume will be replicated,
  based on the actual extra-spec definition of the volume type (defined by
  the cloud admin).
 * Quota management: quota are consumed as 2x as two volumes are
  created and the consumed space id double.
  We can re-examine this mechanism after we get comments from deployers.
 Proposed change
 ===============
 Each Cinder host will report replication capabilities:
 * Replication_support: indicate if replication is enabled for this driver
  instance
 * Replication_unit_id: device specific id used for replication
 * Replication_partners: list of device specific ids that this node can
  replicate with
 * Replication_rpo_range - supported RPO by this driver instance <min,max>
 * replication_supported_methods - list of methods supported by the back-end
 Add extra-specs in the volume type to indicate replication:
 * Replication_enabled - if True, volume to be replicated if exists as extra
   specs. if option is not specified or False, then replication is not
   enabled. This option is required to enable replication.
 * replica_same_az  - (optional) indicate if replica should be in the same AZ
 * replica_volume_backend_name - (optional) specify back-end to be used as
  target
 * replication_target_rpo - (optional) requested RPO (numeric, minutes) for
  the volume
 Create volume with replication enabled:
 * Scheduler selects two hosts for volume placement and sets up the replication
  DB entry
 * Manager on primary creates the primary volume (as is done today)
 * Manager on secondary creates the replica volume
 * Manager on primary sets up the replication
 Re-type volume:
 * Replication_enabled: True->False:
  drop the replication and continue with the regular retype logic.
 * Replication_enabled: False->True:
  after the retype logic selects back-ends (scheduler) and enables
  replication.
 Promote to primary:
 * Manager on secondary stops the replication.
 * Switch between volume ids of primary and secondary
  (user sees no change in volume ids)
 Sync replication:
 * Manager on primary restarts the replication
 Test:
 * Create a clone of the secondary volume.
 Delete volume:
 * Disable the replication
 * Delete secondary volume
 * Delete primary volume (as is done today)
 Cloning a volume:
 * Since the replica are added after the primary is created, if we
  clone a volume and keep the volume-type, it will be replicated.
 Snapshots:
 * Snapshot for the primary volume works as is today, and create
  a snapshot on the primary. No snapshot is done for the replica.
 * Snapshot for the replica (secondary) volume will fail.
 Notes:
 * Manager acts via the driver for back-end replication specific functions.
 * Failover is "promote to primary" as described above.
 * Failback is "sync replication" + "promote to primary".
 Driver API:
 * create_replica: to be run on secondary to create the volume
 * enable_replica: to be run on primary to start replication
 * disable_replica: to be run on primary, stops the replication
 * delete_replica: to be run on secondary, deletes the replica target volume
 * replication_status_check: to be run on all hosts, updating the replication
  status as observed from the back-end perspective
 * promote_replica: to be run on secondary, make secondary the primary
 Alternatives
 ------------
 Replication can be performed outside of Cinder, and OpenStack can be
 unaware of it. However, this requires vendor specific scripts, and
 is not visible to the admin user, as only the storage system admin
 will see the replica and the state of the replication.
 Also all recovery actions (failover, failback) will require both the
 the storage and cloud admins to work together.
 While replication in Cinder reduces the role of the storage admin to
 only the setup phase, and the cloud admin is responsible for failover
 and failback with (typically) not need for intervention from the clouds
 admin.
 Data model impact
 -----------------
 * A new replication relationship table will be created.
  (with its database migration support).
 * On promote to primary, the ids of the primary and secondary volume entries
  will change (switch).
 Replication relationship db table:
 * id = Column(String(36), primary_key=True)
 * deleted = Column(Boolean, default=False)
 * primary_id = Column(String(36), ForeignKey('volumes.id'), nullable=False)
 * secondary_id = Column(String(36), ForeignKey('volumes.id'), nullable=False)
 * primary_replication_unit_id = Column(String(255))
 * secondary_replication_unit_id = Column(String(255))
 * status = Column(Enum('error', 'creating', 'copying', 'active', 'active-stopped',
                       'stopping', 'deleting', 'deleted', 'inactive',
                       name='replicationrelationship_status'))
 * extended_status = Column(String(255))
 * driver_data = Column(String(255))
 State diagram for replication (status)::
 <start>
                                          any error
 Create replica   +----------+             condition   +-------+
 +--------------> | creating |          +------------> | error |
                  +----+-----+                         +---+---+
                       |                                   | Storage admin to
                       | enable replication                | fix, and status
                       |                                   | check will update
                  +----+-----+                             |
 +-------------> | copying  |           any state <--------+
 |               +----+-----+
 |                    |
 |             status |
 |             check  |       status check
 |               +----++----+ +------> +--+--+-+--------+
 |               | active   |          | active-stopped |
 |               +----++----+ <------+ +--+--+-+--------+
 |                    |       status check
 |                    |
 |                    | promote to primary
 |                    |
 |    sync       +----+--+--+
 +------------+  | inactive |
                 +-------+--+
 <end>
 REST API impact
 ---------------
 * Show replication relationship
  * Show information about a volume replication relationship.
  * Method type: GET
  * Normal Response Code: 200
  * Expected error http response code(s)
    * 404: replication relationship not found
  * /v2/<tenant id>/os-volume-replication/<replication uuid>
  * JSON schema definition for the response data::
     {
        'relationship':
        {
           'id': 'relationship id'
           'primary_id': 'primary volume uuid'
           'status': 'status of relationship'
           'links': '{ ... }'
        }
      }
 * Show replication relationship with details
  * Show detailed information about a volume replication relationship.
  * Method type: GET
  * Normal Response Code: 200
  * Expected error http response code(s)
    * 404: replication relationship not found
  * /v2/<tenant id>/os-volume-replication/<replication uuid>/detail
  * JSON schema definition for the response data::
     {
        'relationship':
        {
           'id': 'relationship id'
           'primary_id': 'primary volume uuid'
           'secondary_id': 'secondary volume uuid'
           'status': 'status of relationship'
           'extended_status': 'extended status'
           'links': { ... }
        }
     }
 * List replication relationship with details
  * List detailed information about a volume replication relationship.
  * Method type: GET
  * Normal Response Code: 200
  * Expected error http response code(s)
    * TBD
  * /v2/<tenant id>/os-volume-replication/detail
  * Parameters:
    *status*
       filter by replication relationship status
    *primary_id*
       Filter by primary volume id
    *secondary_id*
       Filter by secondary volume id
  * JSON schema definition for the response data::
     {
        'relationship':
        {
           'id': 'relationship id'
           'primary_id': 'primary volume uuid'
           'secondary_id': 'secondary volume uuid'
           'status': 'status of relationship'
           'extended_status': 'extended status'
           'links': { ... }
        }
     }
 * Promote volume to be the primary volume
  * Switch between the uuids of the primary and secondary volumes, and
    make the secondary volume the primary volume.
  * Method type: PUT
  * Normal Response Code: 202
  * Expected error http response code(s)
    * 404: replication relationship not found
  * /v2/<tenant id>/os-volume-replication/<replication uuid>
  * JSON schema definition for the body data::
     {
        'relationship':
        {
           'promote': None
        }
     }
 * Sync between the primary and secondary volume.
  * Resync the replication between the primary and secondary volume.
    Typically follows a promote operation on the replication.
  * Method type: PUT
  * Normal Response Code: 202
  * Expected error http response code(s)
    * 404: replication relationship not found
  * /v2/<tenant id>/os-volume-replication/<replication uuid>
  * JSON schema definition for the body data::
     {
        'relationship':
        {
           'sync': None
        }
     }
 * Test replication by make a copy of the secondary volume available
  * Test the volume replication. Create a clone of the secondary volume
    and make it accessible, so the promote process can be tested.
  * Method type: POST
  * Normal Response Code: 202
  * Expected error http response code(s)
    * 404: replication relationship not found
  * /v2/<tenant id>/os-volume-replication/<replication uuid>/test
  * JSON schema definition for the response data::
     {
        'relationship':
        {
           'volume_id': 'volume id of the cloned secondary'
        }
     }
 Security impact
 ---------------
 * Does this change touch sensitive data such as tokens, keys, or user data?
  *No*.
 * Does this change alter the API in a way that may impact security, such as
  a new way to access sensitive information or a new way to login?
  *No*.
 * Does this change involve cryptography or hashing?
  *No*.
 * Does this change require the use of sudo or any elevated privileges?
  *No*.
 * Does this change involve using or parsing user-provided data? This could
  be directly at the API level or indirectly such as changes to a cache layer.
  *No*.
 * Can this change enable a resource exhaustion attack, such as allowing a
  single API interaction to consume significant server resources? Some
  examples of this include launching subprocesses for each connection, or
  entity expansion attacks in XML.
  *Yes*, enabling replication consume cloud and storage resources.
 Notifications impact
 --------------------
 Will add notification for enabling replication, promoting, syncing and
 dropping replication.
 Other end user impact
 ---------------------
 * End-user to use volume types to enable/disable replication.
 * Cloud admin to use the *promote*, *sync* and *test* commands
  in the python-cinderclient to execute failover, failback and test.
 Performance Impact
 ------------------
 * Scheduler now needs to choose two hosts instead of one based on
  additional input from the driver and volume type.
 * The periodic task will query the driver and back-end for status
  of all replicated volumes - running on the primary and secondary.
 * Extra db calls identifying if replication exists are added to retype,
  snapshot operations, etc will add a small latency to these functions.
 Other deployer impact
 ---------------------
 * Added options for volume types (see above)
 * Add new driver capabilities, needs to be supported by the volume drivers,
  which may imply changes to the driver configuration options.
 * This change will require explicit enablement (to be used by users)
  from the cloud administrator.
 Developer impact
 ----------------
 * Change to the driver API is noted above. Basically new functions are
  needed to support using replication.
 * The API will expand to include consistency groups following merging
  consistency group support to Cinder.
 Implementation
 ==============
 Assignee(s)
 -----------
 Primary assignee:
  ronenkat
 Other contributors:
  None
 Work Items
 ----------
 * Cinder public (admin) APIs for replication
 * DB schema for replication
 * Cinder scheduler support for replication
 * Cinder driver API additions for replication
 * Cinder manager update for replication
 * Testing
 Note: Code is based on https://review.openstack.org/#/c/64026/ which was
 submitted in the Icehouse development cycle.
 Dependencies
 ============
 * Related blueprints: Consistency groups
  https://blueprints.launchpad.net/cinder/+spec/consistency-groups
 * LVM to support replication using DRBD, in a separate contribution.
 Testing
 =======
 * Testing in gate is not supported due to the following considerations:
  * LVM has no replication support, to be addressed using DRBD in a separate
    contribution.
  * requires setting up at least two nodes using DRBD
 * Should be discussed/addressed as support for LVM is added.
 * 3rd party driver CI will be expected to test replication.
 Documentation Impact
 ====================
 * Public (admin) API changes.
 * Details how replication is used by leveraging volume types.
 * Driver docs explaining how replication is setup for each driver.
 References
 ==========
 * Volume replication design session
  https://etherpad.openstack.org/p/juno-cinder-volume-replication