Add Cinder volume replication

Add new page cinder-volume-replication.rst and the accompanying cinder-volume-replication-overlay.rst Separate the disaster scenario from the main body as per team consensus Related-Bug: #1925035 Change-Id: Id9e4c8fff27a678d78aa0b606ec9e8a00208a894
2021-04-19 15:04:48 -04:00 · 2021-04-19 15:04:48 -04:00 · c680f1bf70
parent cbe50300e5
commit c680f1bf70
4 changed files with 990 additions and 0 deletions
--- a/deploy-guide/source/cinder-volume-replication-dr.rst
+++ b/deploy-guide/source/cinder-volume-replication-dr.rst
@ -0,0 +1,275 @@
+:orphan:
+
+.. _cinder_volume_replication_dr:
+
+=============================================
+Cinder volume replication - Disaster recovery
+=============================================
+
+Overview
+--------
+
+This is the disaster recovery scenario of a Cinder volume replication
+deployment. It should be read in conjunction with the :doc:`Cinder volume
+replication <cinder-volume-replication>` page.
+
+Scenario description
+--------------------
+
+Disaster recovery involves an uncontrolled failover to the secondary site.
+Site-b takes over from a troubled site-a and becomes the de-facto primary site,
+which includes writes to its images. Control is passed back to site-a once it
+is repaired.
+
+.. warning::
+
+   The charms support the underlying OpenStack servcies in their native ability
+   to failover and failback. However, a significant degree of administrative
+   care is still needed in order to ensure a successful recovery.
+
+   For example,
+
+   * primary volume images that are currently in use may experience difficulty
+     during their demotion to secondary status
+
+   * running VMs will lose connectivity to their volumes
+
+   * subsequent image resyncs may not be straightforward
+
+   Any work necessary to rectify data issues resulting from an uncontrolled
+   failover is beyond the scope of the OpenStack charms and this document.
+
+Simulation
+----------
+
+For the sake of understanding some of the rudimentary aspects involved in
+disaster recovery a simulation is provided.
+
+Preparation
+~~~~~~~~~~~
+
+Create the replicated data volume and confirm it is available:
+
+.. code-block:: none
+
+   openstack volume create --size 5 --type site-a-repl vol-site-a-repl-data
+   openstack volume list
+
+Simulate a failure in site-a by turning off all of its Ceph MON daemons:
+
+.. code-block:: none
+
+   juju ssh site-a-ceph-mon/0 sudo systemctl stop ceph-mon.target
+   juju ssh site-a-ceph-mon/1 sudo systemctl stop ceph-mon.target
+   juju ssh site-a-ceph-mon/2 sudo systemctl stop ceph-mon.target
+
+Modify timeout and retry settings
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When a Ceph cluster fails communication between Cinder and the failed cluster
+will be interrupted and the RBD driver will accommodate with retries and
+timeouts.
+
+To accelerate the failover mechanism, timeout and retry settings on the
+cinder-ceph unit in site-a can be modified:
+
+.. code-block:: none
+
+   juju ssh cinder-ceph-a/0
+   > sudo apt install -y crudini
+   > sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a rados_connect_timeout 1
+   > sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a rados_connection_retries 1
+   > sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a rados_connection_interval 0
+   > sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a replication_connect_timeout 1
+   > sudo systemctl restart cinder-volume
+   > exit
+
+These configuration changes are only intended to be in effect during the
+failover transition period. They should be reverted afterwards since the
+default values are fine for normal operations.
+
+Failover
+~~~~~~~~
+
+Perform the failover of site-a, confirm its cinder-volume host is disabled, and
+that the volume remains available:
+
+.. code-block:: none
+
+   cinder failover-host cinder@cinder-ceph-a
+   cinder service-list
+   openstack volume list
+
+Confirm that the Cinder log file (``/var/log/cinder/cinder-volume.log``) on
+unit ``cinder/0`` contains the successful failover message: ``Failed over to
+replication target successfully.``.
+
+Revert timeout and retry settings
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Revert the configuration changes made to the cinder-ceph backend:
+
+.. code-block:: none
+
+   juju ssh cinder-ceph-a/0
+   > sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a rados_connect_timeout
+   > sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a rados_connection_retries
+   > sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a rados_connection_interval
+   > sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a replication_connect_timeout
+   > sudo systemctl restart cinder-volume
+   > exit
+
+Write to the volume
+~~~~~~~~~~~~~~~~~~~
+
+Create a VM (named 'vm-with-data-volume'):
+
+.. code-block:: none
+
+   openstack server create --image focal-amd64 --flavor m1.tiny \
+      --key-name mykey --network int_net vm-with-data-volume
+
+   FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net)
+   openstack server add floating ip vm-with-data-volume $FLOATING_IP
+
+Attach the volume to the VM, write some data to it, and detach it:
+
+.. code-block:: none
+
+   openstack server add volume vm-with-data-volume vol-site-a-repl-data
+
+   ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
+   > sudo mkfs.ext4 /dev/vdc
+   > mkdir data
+   > sudo mount /dev/vdc data
+   > sudo chown ubuntu: data
+   > echo "This is a test." > data/test.txt
+   > sync
+   > sudo umount /dev/vdc
+   > exit
+
+   openstack server remove volume vm-with-data-volume vol-site-a-repl-data
+
+Repair site-a
+~~~~~~~~~~~~~
+
+In the current example, site-a is repaired by starting the Ceph MON daemons:
+
+.. code-block:: none
+
+   juju ssh site-a-ceph-mon/0 sudo systemctl start ceph-mon.target
+   juju ssh site-a-ceph-mon/1 sudo systemctl start ceph-mon.target
+   juju ssh site-a-ceph-mon/2 sudo systemctl start ceph-mon.target
+
+Confirm that the MON cluster is now healthy (it may take a while):
+
+.. code-block:: none
+
+   juju status site-a-ceph-mon
+
+   Unit                       Workload  Agent  Machine  Public address  Ports  Message
+   site-a-ceph-mon/0          active    idle   14       10.5.0.15              Unit is ready and clustered
+   site-a-ceph-mon/1*         active    idle   15       10.5.0.31              Unit is ready and clustered
+   site-a-ceph-mon/2          active    idle   16       10.5.0.11              Unit is ready and clustered
+
+Image resync
+~~~~~~~~~~~~
+
+Putting site-a back online at this point will lead to two primary images for
+each replicated volume. This is a split-brain condition that cannot be resolved
+by the RBD mirror daemon. Hence, before failback is invoked each replicated
+volume will need a resync of its images (site-b images are more recent than the
+site-a images).
+
+The image resync is a two-step process that is initiated on the ceph-rbd-mirror
+unit in site-a:
+
+Demote the site-a images with the ``demote`` action:
+
+.. code-block:: none
+
+   juju run-action --wait site-a-ceph-rbd-mirror/0 demote pools=cinder-ceph-a
+
+Flag the site-a images for a resync with the ``resync-pools`` action. The
+``pools`` argument should point to the corresponding site's pool, which by
+default is the name of the cinder-ceph application for the site (here
+'cinder-ceph-a'):
+
+.. code-block:: none
+
+   juju run-action --wait site-a-ceph-rbd-mirror/0 resync-pools i-really-mean-it=true pools=cinder-ceph-a
+
+The Ceph RBD mirror daemon will perform the resync in the background.
+
+Failback
+~~~~~~~~
+
+Prior to failback, confirm that the images of all replicated volumes in site-a
+are fully synchronised. Perform a check with the ceph-rbd-mirror charm's
+``status`` action as per :ref:`RBD image status <rbd_image_status>`:
+
+.. code-block:: none
+
+   juju run-action --wait site-a-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
+
+This will take a while.
+
+The state and description for site-a images will transition to:
+
+.. code-block:: console
+
+        state:       up+syncing
+        description: bootstrapping, IMAGE_SYNC/CREATE_SYNC_POINT
+
+The intermediate values will look like:
+
+.. code-block:: console
+
+        state:       up+replaying
+        description: replaying, {"bytes_per_second":110318.93,"entries_behind_primary":4712.....
+
+The final values, as expected, will become:
+
+.. code-block:: console
+
+        state:       up+replaying
+        description: replaying, {"bytes_per_second":0.0,"entries_behind_primary":0.....
+
+The failback of site-a can now proceed:
+
+.. code-block:: none
+
+   cinder failover-host cinder@cinder-ceph-a --backend_id default
+
+Confirm the original health of Cinder services (as per :ref:`Cinder service
+list <cinder_service_list>`):
+
+.. code-block:: none
+
+   cinder service-list
+
+Verification
+~~~~~~~~~~~~
+
+Re-attach the volume to the VM and verify that the secondary device contains
+the expected data:
+
+.. code-block:: none
+
+   openstack server add volume vm-with-data-volume vol-site-a-repl-data
+   ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
+   > sudo mount /dev/vdc data
+   > cat data/test.txt
+   This is a test.
+
+We can also check the status of the image as per :ref:`RBD image status
+<rbd_image_status>` to verify that the primary indeed resides in site-a again:
+
+.. code-block:: none
+
+   juju run-action --wait site-a-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
+
+   volume-c44d4d20-6ede-422a-903d-588d1b0d51b0:
+     global_id:   3a4aa755-c9ee-4319-8ba4-fc494d20d783
+     state:       up+stopped
+     description: local image is primary
--- a/deploy-guide/source/cinder-volume-replication-overlay.rst
+++ b/deploy-guide/source/cinder-volume-replication-overlay.rst
@ -0,0 +1,138 @@
+:orphan:
+
+.. _cinder_volume_replication_custom_overlay:
+
+========================================
+Cinder volume replication custom overlay
+========================================
+
+The below bundle overlay is used in the instructions given on the :doc:`Cinder
+volume replication <cinder-volume-replication>` page.
+
+.. code-block:: yaml
+
+   series: focal
+
+   # Change these variables according to the local environment, 'osd-devices'
+   # and 'data-port' in particular.
+   variables:
+     openstack-origin: &openstack-origin cloud:focal-victoria
+     osd-devices: &osd-devices /dev/sdb /dev/vdb
+     expected-osd-count: &expected-osd-count 3
+     expected-mon-count: &expected-mon-count 3
+     data-port: &data-port br-ex:ens7
+
+   relations:
+   - - cinder-ceph-a:storage-backend
+     - cinder:storage-backend
+   - - cinder-ceph-b:storage-backend
+     - cinder:storage-backend
+
+   - - site-a-ceph-osd:mon
+     - site-a-ceph-mon:osd
+   - - site-b-ceph-osd:mon
+     - site-b-ceph-mon:osd
+
+   - - site-a-ceph-mon:client
+     - nova-compute:ceph
+   - - site-b-ceph-mon:client
+     - nova-compute:ceph
+
+   - - site-a-ceph-mon:client
+     - cinder-ceph-a:ceph
+   - - site-b-ceph-mon:client
+     - cinder-ceph-b:ceph
+
+   - - nova-compute:ceph-access
+     - cinder-ceph-a:ceph-access
+   - - nova-compute:ceph-access
+     - cinder-ceph-b:ceph-access
+
+   - - site-a-ceph-mon:client
+     - glance:ceph
+
+   - - site-a-ceph-mon:rbd-mirror
+     - site-a-ceph-rbd-mirror:ceph-local
+   - - site-b-ceph-mon:rbd-mirror
+     - site-b-ceph-rbd-mirror:ceph-local
+
+   - - site-a-ceph-mon
+     - site-b-ceph-rbd-mirror:ceph-remote
+   - - site-b-ceph-mon
+     - site-a-ceph-rbd-mirror:ceph-remote
+
+   - - site-a-ceph-mon:client
+     - cinder-ceph-b:ceph-replication-device
+   - - site-b-ceph-mon:client
+     - cinder-ceph-a:ceph-replication-device
+
+   applications:
+
+     # Prevent some applications in the main bundle from being deployed.
+     ceph-radosgw:
+     ceph-osd:
+     ceph-mon:
+     cinder-ceph:
+
+     # Deploy ceph-osd applications with the appropriate names.
+     site-a-ceph-osd:
+       charm: cs:ceph-osd
+       num_units: 3
+       options:
+         osd-devices: *osd-devices
+         source: *openstack-origin
+
+     site-b-ceph-osd:
+       charm: cs:ceph-osd
+       num_units: 3
+       options:
+         osd-devices: *osd-devices
+         source: *openstack-origin
+
+     # Deploy ceph-mon applications with the appropriate names.
+     site-a-ceph-mon:
+       charm: cs:ceph-mon
+       num_units: 3
+       options:
+         expected-osd-count: *expected-osd-count
+         monitor-count: *expected-mon-count
+         source: *openstack-origin
+
+     site-b-ceph-mon:
+       charm: cs:ceph-mon
+       num_units: 3
+       options:
+         expected-osd-count: *expected-osd-count
+         monitor-count: *expected-mon-count
+         source: *openstack-origin
+
+     # Deploy cinder-ceph applications with the appropriate names.
+     cinder-ceph-a:
+       charm: cs:cinder-ceph
+       num_units: 0
+       options:
+         rbd-mirroring-mode: image
+
+     cinder-ceph-b:
+       charm: cs:cinder-ceph
+       num_units: 0
+       options:
+         rbd-mirroring-mode: image
+
+     # Deploy ceph-rbd-mirror applications with the appropriate names.
+     site-a-ceph-rbd-mirror:
+       charm: cs:ceph-rbd-mirror
+       num_units: 1
+       options:
+         source: *openstack-origin
+
+     site-b-ceph-rbd-mirror:
+       charm: cs:ceph-rbd-mirror
+       num_units: 1
+       options:
+         source: *openstack-origin
+
+     # Configure for the local environment.
+     ovn-chassis:
+       options:
+         bridge-interface-mappings: *data-port
--- a/deploy-guide/source/cinder-volume-replication.rst
+++ b/deploy-guide/source/cinder-volume-replication.rst
@ -0,0 +1,576 @@
+=========================
+Cinder volume replication
+=========================
+
+Overview
+--------
+
+Cinder volume replication is a primary/secondary failover solution based on
+two-way `Ceph RBD mirroring`_.
+
+Deployment
+----------
+
+The cloud deployment in this document is based on the stable `openstack-base`_
+bundle in the `openstack-bundles`_ repository. The necessary documentation is
+found in the `bundle README`_.
+
+A custom overlay bundle (`cinder-volume-replication-overlay`_) is used to
+extend the base cloud in order to implement volume replication.
+
+.. note::
+
+   The key elements for adding volume replication to Ceph RBD mirroring is the
+   relation between cinder-ceph in one site and ceph-mon in the other (using the
+   ``ceph-replication-device`` endpoint) and the cinder-ceph charm
+   configuration option ``rbd-mirroring-mode=image``.
+
+Cloud notes:
+
+* The cloud used in these instructions is based on Ubuntu 20.04 LTS (Focal) and
+  OpenStack Victoria. The openstack-base bundle may have been updated since.
+* The two Ceph clusters are named 'site-a' and 'site-b' and are placed in the
+  same Juju model.
+* A site's pool is named after its corresponding cinder-ceph application (e.g.
+  'cinder-ceph-a' for site-a) and is mirrored to the other site. Each site will
+  therefore have two pools: 'cinder-ceph-a' and 'cinder-ceph-b'.
+* Glance is only backed by site-a.
+
+To deploy:
+
+.. code-block:: none
+
+   juju deploy ./bundle.yaml --overlay ./cinder-volume-replication-overlay.yaml
+
+Configuration and verification of the base cloud
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Configure the base cloud as per the referenced documentation.
+
+Before proceeding, verify the base cloud by creating a VM and connecting to it
+over SSH. See the main bundle's README for guidance.
+
+.. important::
+
+   A known issue affecting the interaction of the ceph-rbd-mirror charm and
+   Ceph itself gives the impression of a fatal error. The symptom is messaging
+   that appears in :command:`juju status` command output: ``Pools WARNING (1)
+   OK (1) Images unknown (1)``. This remains a cosmetic issue however. See bug
+   `LP #1892201`_ for details.
+
+Cinder volume types
+-------------------
+
+For each site, create replicated and non-replicated Cinder volumes types. A
+type is referenced at volume-creation time in order to specify whether the
+volume is replicated (or not) and what pool it will reside in.
+
+Type 'site-a-repl' denotes replication in site-a:
+
+.. code-block:: none
+
+   openstack volume type create site-a-repl \
+      --property volume_backend_name=cinder-ceph-a \
+      --property replication_enabled='<is> True'
+
+Type 'site-a-local' denotes non-replication in site-a:
+
+.. code-block:: none
+
+   openstack volume type create site-a-local \
+      --property volume_backend_name=cinder-ceph-a
+
+Type 'site-b-repl' denotes replication in site-b:
+
+.. code-block:: none
+
+   openstack volume type create site-b-repl \
+      --property volume_backend_name=cinder-ceph-b \
+      --property replication_enabled='<is> True'
+
+Type 'site-b-local' denotes non-replication in site-b:
+
+.. code-block:: none
+
+   openstack volume type create site-b-local \
+      --property volume_backend_name=cinder-ceph-b
+
+List the volume types:
+
+.. code-block:: none
+
+   openstack volume type list
+   +--------------------------------------+--------------+-----------+
+   | ID                                   | Name         | Is Public |
+   +--------------------------------------+--------------+-----------+
+   | ee70dfd9-7b97-407d-a860-868e0209b93b | site-b-local | True      |
+   | b0f6d6b5-9c76-4967-9eb4-d488a6690712 | site-b-repl  | True      |
+   | fc89ca9b-d75a-443e-9025-6710afdbfd5c | site-a-local | True      |
+   | 780980dc-1357-4fbd-9714-e16a79df252a | site-a-repl  | True      |
+   | d57df78d-ff27-4cf0-9959-0ada21ce86ad | __DEFAULT__  | True      |
+   +--------------------------------------+--------------+-----------+
+
+.. note::
+
+   In this document, site-b volume types will not be used. They are created
+   here for the more generalised case where new volumes may be needed while
+   site-a is in a failover state. In such a circumstance, any volumes created
+   in site-b will naturally not be replicated (in site-a).
+
+.. _rbd_image_status:
+
+RBD image status
+----------------
+
+The status of the two RBD images associated with a replicated volume can be
+queried using the ``status`` action of the ceph-rbd-mirror unit for each site.
+
+A state of ``up+replaying`` in combination with the presence of
+``"entries_behind_primary":0`` in the image description means the image in one
+site is in sync with its counterpart in the other site.
+
+A state of ``up+syncing`` indicates that the sync process is still underway.
+
+A description of ``local image is primary`` means that the image is the
+primary.
+
+Consider the volume below that is created and given the volume type of
+'site-a-repl'. Its primary will be in site-a and its non-primary (secondary)
+will be in site-b:
+
+.. code-block:: none
+
+   openstack volume create --size 5 --type site-a-repl vol-site-a-repl
+
+Their statuses can be queried in each site as shown:
+
+Site a (primary),
+
+.. code-block:: none
+
+   juju run-action --wait site-a-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
+         volume-c44d4d20-6ede-422a-903d-588d1b0d51b0:
+           global_id:   f66140a6-0c09-478c-9431-4eb1eb16ca86
+           state:       up+stopped
+           description: local image is primary
+
+Site b (secondary is in sync with the primary),
+
+.. code-block:: none
+
+   juju run-action --wait site-b-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
+         volume-c44d4d20-6ede-422a-903d-588d1b0d51b0:
+           global_id:   f66140a6-0c09-478c-9431-4eb1eb16ca86
+           state:       up+replaying
+           description: replaying, {"bytes_per_second":0.0,"entries_behind_primary":0,.....
+
+.. _cinder_service_list:
+
+Cinder service list
+-------------------
+
+To verify the state of Cinder services the ``cinder service-list`` command is
+used:
+
+.. code-block:: none
+
+   cinder service-list
+   +------------------+----------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+
+   | Binary           | Host                 | Zone | Status  | State | Updated_at                 | Cluster | Disabled Reason | Backend State |
+   +------------------+----------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+
+   | cinder-scheduler | cinder               | nova | enabled | up    | 2021-04-08T15:59:25.000000 | -       | -               |               |
+   | cinder-volume    | cinder@cinder-ceph-a | nova | enabled | up    | 2021-04-08T15:59:24.000000 | -       | -               | up            |
+   | cinder-volume    | cinder@cinder-ceph-b | nova | enabled | up    | 2021-04-08T15:59:25.000000 | -       | -               | up            |
+   +------------------+----------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+
+
+Each of the below examples ends with a failback to site-a. The above output is
+the desired result.
+
+The failover of a particular site entails the referencing of its corresponding
+cinder-volume service host (e.g. ``cinder@cinder-ceph-a`` for site-a). We'll
+see how to do this later on.
+
+.. note::
+
+   'cinder-ceph-a' and 'cinder-ceph-b' correspond to the two applications
+   deployed via the `cinder-ceph`_ charm. The express purpose of this charm is
+   to connect Cinder to a Ceph cluster. See the
+   `cinder-volume-replication-overlay`_ bundle for details.
+
+Failover, volumes, images, and pools
+------------------------------------
+
+This section will show the basics of failover/failback, non-replicated vs
+replicated volumes, and what pools are used for the volume images.
+
+In site-a, create one non-replicated and one replicated data volume and list
+them:
+
+.. code-block:: none
+
+   openstack volume create --size 5 --type site-a-local vol-site-a-local
+   openstack volume create --size 5 --type site-a-repl vol-site-a-repl
+
+   openstack volume list
+   +--------------------------------------+------------------+-----------+------+-------------+
+   | ID                                   | Name             | Status    | Size | Attached to |
+   +--------------------------------------+------------------+-----------+------+-------------+
+   | fba13395-62d1-468e-9b9a-40bebd0373e8 | vol-site-a-local | available |    5 |             |
+   | c21a539e-d524-4f4d-991b-9b9476d4f930 | vol-site-a-repl  | available |    5 |             |
+   +--------------------------------------+------------------+-----------+------+-------------+
+
+Pools and images
+~~~~~~~~~~~~~~~~
+
+For 'vol-site-a-local' there should be one image in the 'cinder-ceph-a' pool of
+site-a.
+
+For 'vol-site-a-repl' there should be two images: one in the 'cinder-ceph-a'
+pool of site-a and one in the 'cinder-ceph-a' pool of site-b:
+
+This can all be confirmed by querying a Ceph MON in each site:
+
+.. code-block:: none
+
+   juju ssh site-a-ceph-mon/0 sudo rbd ls -p cinder-ceph-a
+
+   volume-fba13395-62d1-468e-9b9a-40bebd0373e8
+   volume-c21a539e-d524-4f4d-991b-9b9476d4f930
+
+   juju ssh site-b-ceph-mon/0 sudo rbd ls -p cinder-ceph-a
+
+   volume-c21a539e-d524-4f4d-991b-9b9476d4f930
+
+Failover
+~~~~~~~~
+
+Perform the failover of site-a:
+
+.. code-block:: none
+
+   cinder failover-host cinder@cinder-ceph-a
+
+Wait until the failover is complete:
+
+.. code-block:: none
+
+   cinder service-list
+   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
+   | Binary           | Host                 | Zone | Status   | State | Updated_at                 | Cluster | Disabled Reason | Backend State |
+   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
+   | cinder-scheduler | cinder               | nova | enabled  | up    | 2021-04-08T17:11:56.000000 | -       | -               |               |
+   | cinder-volume    | cinder@cinder-ceph-a | nova | disabled | up    | 2021-04-08T17:11:56.000000 | -       | failed-over     | -             |
+   | cinder-volume    | cinder@cinder-ceph-b | nova | enabled  | up    | 2021-04-08T17:11:56.000000 | -       | -               | up            |
+   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
+
+A failover triggers the promotion of one site and the demotion of the other
+(site-b and site-a respectively in this example). Communication between Cinder
+and each Ceph cluster is therefore ideal, as in this example.
+
+Inspection
+~~~~~~~~~~
+
+By consulting the volume list we see that the replicated volume is still
+available but that the non-replicated volume has errored:
+
+.. code-block:: none
+
+   openstack volume list
+   +--------------------------------------+------------------+-----------+------+-------------+
+   | ID                                   | Name             | Status    | Size | Attached to |
+   +--------------------------------------+------------------+-----------+------+-------------+
+   | fba13395-62d1-468e-9b9a-40bebd0373e8 | vol-site-a-local | error     |    5 |             |
+   | c21a539e-d524-4f4d-991b-9b9476d4f930 | vol-site-a-repl  | available |    5 |             |
+   +--------------------------------------+------------------+-----------+------+-------------+
+
+Generally a failover indicates a significant degree of non-confidence in the
+primary site, site-a in this case. Once a **local** volume goes into an error
+state due to a failover it is expected to not recover after failback. The
+errored local volumes should normally be discarded (deleted).
+
+Failback
+~~~~~~~~
+
+Failback site-a and confirm the original health of Cinder services (as per
+`Cinder service list`_):
+
+.. code-block:: none
+
+   cinder failover-host cinder@cinder-ceph-a --backend_id default
+   cinder service-list
+
+Examples
+--------
+
+The following two examples will be considered. They will both use replication
+and involve the failing over of site-a to site-b:
+
+#. `Data volume used by a VM`_
+#. `Bootable volume used by a VM`_
+
+Data volume used by a VM
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+In this example, a replicated data volume will be created in site-a and
+attached to a VM. The volume's block device will then have some test data
+written to it. This will allow for verification of the replicated data once
+failover has occurred and the volume is re-attached to the VM.
+
+Preparation
+^^^^^^^^^^^
+
+Create the replicated data volume:
+
+.. code-block:: none
+
+   openstack volume create --size 5 --type site-a-repl vol-site-a-repl-data
+   openstack volume list
+   +--------------------------------------+---------------------------+-----------+------+-------------+
+   | ID                                   | Name                      | Status    | Size | Attached to |
+   +--------------------------------------+---------------------------+-----------+------+-------------+
+   | f23732c1-3257-4e58-a214-085c460abf56 | vol-site-a-repl-data      | available |    5 |             |
+   +--------------------------------------+---------------------------+-----------+------+-------------+
+
+Create the VM (named 'vm-with-data-volume'):
+
+.. code-block:: none
+
+   openstack server create --image focal-amd64 --flavor m1.tiny \
+      --key-name mykey --network int_net vm-with-data-volume
+
+   FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net)
+   openstack server add floating ip vm-with-data-volume $FLOATING_IP
+
+   openstack server list
+   +--------------------------------------+----------------------+--------+---------------------------------+--------------------------+---------+
+   | ID                                   | Name                 | Status | Networks                        | Image                    | Flavor  |
+   +--------------------------------------+----------------------+--------+---------------------------------+--------------------------+---------+
+   | fbe07fea-731e-4973-8455-c8466be72293 | vm-with-data-volume  | ACTIVE | int_net=192.168.0.38, 10.5.1.28 | focal-amd64              | m1.tiny |
+   +--------------------------------------+----------------------+--------+---------------------------------+--------------------------+---------+
+
+Attach the data volume to the VM:
+
+.. code-block:: none
+
+   openstack server add volume vm-with-data-volume vol-site-a-repl-data
+
+Prepare the block device and write the test data to it:
+
+.. code-block:: none
+
+   ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
+   > sudo mkfs.ext4 /dev/vdc
+   > mkdir data
+   > sudo mount /dev/vdc data
+   > sudo chown ubuntu: data
+   > echo "This is a test." > data/test.txt
+   > sync
+   > exit
+
+Failover
+^^^^^^^^
+
+When both sites are online, as is here, it is not recommended to perform a
+failover when volumes are in use. This is because Cinder will try to demote the
+Ceph image from the primary site, and if there is an active connection to it
+the operation may fail (i.e. the volume will transition to an error state).
+
+Here we ensure the volume is not in use by unmounting the block device and
+removing it from the VM:
+
+.. code-block:: none
+
+   ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP sudo umount /dev/vdc
+   openstack server remove vm-with-data-volume vol-site-a-repl-data
+
+Prior to failover the images of all replicated volumes must be fully
+synchronised. Perform a check with the ceph-rbd-mirror charm's ``status``
+action as per `RBD image status`_. If the volumes were created in site-a then
+the ceph-rbd-mirror unit in site-b is the target:
+
+.. code-block:: none
+
+   juju run-action --wait site-b-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
+
+If all images look good, perform the failover of site-a:
+
+.. code-block:: none
+
+   cinder failover-host cinder@cinder-ceph-a
+   cinder service-list
+   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
+   | Binary           | Host                 | Zone | Status   | State | Updated_at                 | Cluster | Disabled Reason | Backend State |
+   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
+   | cinder-scheduler | cinder               | nova | enabled  | up    | 2021-04-08T19:30:29.000000 | -       | -               |               |
+   | cinder-volume    | cinder@cinder-ceph-a | nova | disabled | up    | 2021-04-08T19:30:28.000000 | -       | failed-over     | -             |
+   | cinder-volume    | cinder@cinder-ceph-b | nova | enabled  | up    | 2021-04-08T19:30:28.000000 | -       | -               | up            |
+   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
+
+Verification
+^^^^^^^^^^^^
+
+Re-attach the volume to the VM:
+
+.. code-block:: none
+
+   openstack server add volume vm-with-data-volume vol-site-a-repl-data
+
+Verify that the secondary device contains the expected data:
+
+.. code-block:: none
+
+   ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
+   > sudo mount /dev/vdc /data
+   > cat /data/test.txt
+   This is a test.
+
+Failback
+^^^^^^^^
+
+Failback site-a and confirm the original health of Cinder services (as per
+`Cinder service list`_):
+
+.. code-block:: none
+
+   cinder failover-host cinder@cinder-ceph-a --backend_id default
+   cinder service-list
+
+Bootable volume used by a VM
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In this example, a bootable volume will be created in site-a and have a
+newly-created VM use that volume as its root device. Identically to the
+previous example, the volume's block device will have test data written to it
+to use for verification purposes.
+
+Preparation
+^^^^^^^^^^^
+
+Create the replicated bootable volume:
+
+.. code-block:: none
+
+   openstack volume create --size 5 --type site-a-repl --image focal-amd64 --bootable vol-site-a-repl-boot
+
+Wait for the volume to become available (it may take a while):
+
+.. code-block:: none
+
+   openstack volume list
+   +--------------------------------------+----------------------+-----------+------+-------------+
+   | ID                                   | Name                 | Status    | Size | Attached to |
+   +--------------------------------------+----------------------+-----------+------+-------------+
+   | c44d4d20-6ede-422a-903d-588d1b0d51b0 | vol-site-a-repl-boot | available |    5 |             |
+   +--------------------------------------+----------------------+-----------+------+-------------+
+
+Create a VM (named 'vm-with-boot-volume') by specifying the newly-created
+bootable volume:
+
+.. code-block:: none
+
+   openstack server create --volume vol-site-a-repl-boot --flavor m1.tiny \
+      --key-name mykey --network int_net vm-with-boot-volume
+
+   FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net)
+   openstack server add floating ip vm-with-boot-volume $FLOATING_IP
+
+   openstack server list
+   +--------------------------------------+---------------------+--------+---------------------------------+--------------------------+---------+
+   | ID                                   | Name                | Status | Networks                        | Image                    | Flavor  |
+   +--------------------------------------+---------------------+--------+---------------------------------+--------------------------+---------+
+   | c0a152d7-376b-4500-95d4-7c768a3ff280 | vm-with-boot-volume | ACTIVE | int_net=192.168.0.75, 10.5.1.53 | N/A (booted from volume) | m1.tiny |
+   +--------------------------------------+---------------------+--------+---------------------------------+--------------------------+---------+
+
+Write the test data to the block device:
+
+.. code-block:: none
+
+   ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
+   > echo "This is a test." > test.txt
+   > sync
+   > exit
+
+Failover
+^^^^^^^^
+
+As explained previously, when both sites are functional, prior to failover the
+replicated volume should not be in use. Since the testing of the replicated
+boot volume requires the VM to be rebuilt anyway (Cinder needs to give the
+updated Ceph connection credentials to Nova) the easiest way forward is to
+simply delete the VM:
+
+.. code-block:: none
+
+   openstack server delete vm-with-boot-volume
+
+Like before, prior to failover, confirm that the images of all replicated
+volumes in site-b are fully synchronised. Perform a check with the
+ceph-rbd-mirror charm's ``status`` action as per `RBD image status`_:
+
+.. code-block:: none
+
+   juju run-action --wait site-b-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
+
+If all images look good, perform the failover of site-a:
+
+.. code-block:: none
+
+   cinder failover-host cinder@cinder-ceph-a
+   cinder service-list
+   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
+   | Binary           | Host                 | Zone | Status   | State | Updated_at                 | Cluster | Disabled Reason | Backend State |
+   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
+   | cinder-scheduler | cinder               | nova | enabled  | up    | 2021-04-08T21:29:12.000000 | -       | -               |               |
+   | cinder-volume    | cinder@cinder-ceph-a | nova | disabled | up    | 2021-04-08T21:29:12.000000 | -       | failed-over     | -             |
+   | cinder-volume    | cinder@cinder-ceph-b | nova | enabled  | up    | 2021-04-08T21:29:11.000000 | -       | -               | up            |
+   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
+
+Verification
+^^^^^^^^^^^^
+
+Re-create the VM:
+
+.. code-block:: none
+
+   openstack server create --volume vol-site-a-repl-boot --flavor m1.tiny \
+      --key-name mykey --network int_net vm-with-boot-volume
+
+   FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net)
+   openstack server add floating ip vm-with-boot-volume $FLOATING_IP
+
+Verify that the root device contains the expected data:
+
+.. code-block:: none
+
+   ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
+   > cat test.txt
+   This is a test.
+   > exit
+
+Failback
+^^^^^^^^
+
+Failback site-a and confirm the original health of Cinder services (as per
+`Cinder service list`_):
+
+.. code-block:: none
+
+   cinder failover-host cinder@cinder-ceph-a --backend_id default
+   cinder service-list
+
+Disaster recovery
+-----------------
+
+An uncontrolled failover is known as the disaster recovery scenario. It is
+characterised by the sudden failure of the primary Ceph cluster. See the
+:ref:`Cinder volume replication - Disaster recovery
+<cinder_volume_replication_dr>` page for more information.
+
+.. LINKS
+.. _Ceph RBD mirroring: app-ceph-rbd-mirror.html
+.. _openstack-base: https://github.com/openstack-charmers/openstack-bundles/blob/master/stable/openstack-base/bundle.yaml
+.. _openstack-bundles: https://github.com/openstack-charmers/openstack-bundles/
+.. _bundle README: https://github.com/openstack-charmers/openstack-bundles/blob/master/stable/openstack-base/README.md
+.. _cinder-volume-replication-overlay: cinder-volume-replication-overlay.html
+.. _cinder-ceph: https://jaas.ai/cinder-ceph
+.. _LP #1892201: https://bugs.launchpad.net/charm-ceph-rbd-mirror/+bug/1892201
--- a/deploy-guide/source/index.rst
+++ b/deploy-guide/source/index.rst
@ -85,6 +85,7 @@ OpenStack Charms usage. To help improve it you can `file an issue`_ or
   app-erasure-coding
   app-rgw-multisite
   app-ceph-rbd-mirror
+   cinder-volume-replication
   app-manila-ganesha
   app-swift