Add Cinder volume replication

Add new page cinder-volume-replication.rst and the accompanying cinder-volume-replication-overlay.rst Separate the disaster scenario from the main body as per team consensus Related-Bug: #1925035 Change-Id: Id9e4c8fff27a678d78aa0b606ec9e8a00208a894
2021-04-19 15:04:48 -04:00 · 2021-04-19 15:04:48 -04:00 · c680f1bf70
parent cbe50300e5
commit c680f1bf70
4 changed files with 990 additions and 0 deletions
--- a/deploy-guide/source/cinder-volume-replication-dr.rst
+++ b/deploy-guide/source/cinder-volume-replication-dr.rst
@ -0,0 +1,275 @@
 :orphan:
 .. _cinder_volume_replication_dr:
 =============================================
 Cinder volume replication - Disaster recovery
 =============================================
 Overview
 --------
 This is the disaster recovery scenario of a Cinder volume replication
 deployment. It should be read in conjunction with the :doc:`Cinder volume
 replication <cinder-volume-replication>` page.
 Scenario description
 --------------------
 Disaster recovery involves an uncontrolled failover to the secondary site.
 Site-b takes over from a troubled site-a and becomes the de-facto primary site,
 which includes writes to its images. Control is passed back to site-a once it
 is repaired.
 .. warning::
   The charms support the underlying OpenStack servcies in their native ability
   to failover and failback. However, a significant degree of administrative
   care is still needed in order to ensure a successful recovery.
   For example,
   * primary volume images that are currently in use may experience difficulty
     during their demotion to secondary status
   * running VMs will lose connectivity to their volumes
   * subsequent image resyncs may not be straightforward
   Any work necessary to rectify data issues resulting from an uncontrolled
   failover is beyond the scope of the OpenStack charms and this document.
 Simulation
 ----------
 For the sake of understanding some of the rudimentary aspects involved in
 disaster recovery a simulation is provided.
 Preparation
 ~~~~~~~~~~~
 Create the replicated data volume and confirm it is available:
 .. code-block:: none
   openstack volume create --size 5 --type site-a-repl vol-site-a-repl-data
   openstack volume list
 Simulate a failure in site-a by turning off all of its Ceph MON daemons:
 .. code-block:: none
   juju ssh site-a-ceph-mon/0 sudo systemctl stop ceph-mon.target
   juju ssh site-a-ceph-mon/1 sudo systemctl stop ceph-mon.target
   juju ssh site-a-ceph-mon/2 sudo systemctl stop ceph-mon.target
 Modify timeout and retry settings
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 When a Ceph cluster fails communication between Cinder and the failed cluster
 will be interrupted and the RBD driver will accommodate with retries and
 timeouts.
 To accelerate the failover mechanism, timeout and retry settings on the
 cinder-ceph unit in site-a can be modified:
 .. code-block:: none
   juju ssh cinder-ceph-a/0
   > sudo apt install -y crudini
   > sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a rados_connect_timeout 1
   > sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a rados_connection_retries 1
   > sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a rados_connection_interval 0
   > sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a replication_connect_timeout 1
   > sudo systemctl restart cinder-volume
   > exit
 These configuration changes are only intended to be in effect during the
 failover transition period. They should be reverted afterwards since the
 default values are fine for normal operations.
 Failover
 ~~~~~~~~
 Perform the failover of site-a, confirm its cinder-volume host is disabled, and
 that the volume remains available:
 .. code-block:: none
   cinder failover-host cinder@cinder-ceph-a
   cinder service-list
   openstack volume list
 Confirm that the Cinder log file (``/var/log/cinder/cinder-volume.log``) on
 unit ``cinder/0`` contains the successful failover message: ``Failed over to
 replication target successfully.``.
 Revert timeout and retry settings
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Revert the configuration changes made to the cinder-ceph backend:
 .. code-block:: none
   juju ssh cinder-ceph-a/0
   > sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a rados_connect_timeout
   > sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a rados_connection_retries
   > sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a rados_connection_interval
   > sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a replication_connect_timeout
   > sudo systemctl restart cinder-volume
   > exit
 Write to the volume
 ~~~~~~~~~~~~~~~~~~~
 Create a VM (named 'vm-with-data-volume'):
 .. code-block:: none
   openstack server create --image focal-amd64 --flavor m1.tiny \
      --key-name mykey --network int_net vm-with-data-volume
   FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net)
   openstack server add floating ip vm-with-data-volume $FLOATING_IP
 Attach the volume to the VM, write some data to it, and detach it:
 .. code-block:: none
   openstack server add volume vm-with-data-volume vol-site-a-repl-data
   ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
   > sudo mkfs.ext4 /dev/vdc
   > mkdir data
   > sudo mount /dev/vdc data
   > sudo chown ubuntu: data
   > echo "This is a test." > data/test.txt
   > sync
   > sudo umount /dev/vdc
   > exit
   openstack server remove volume vm-with-data-volume vol-site-a-repl-data
 Repair site-a
 ~~~~~~~~~~~~~
 In the current example, site-a is repaired by starting the Ceph MON daemons:
 .. code-block:: none
   juju ssh site-a-ceph-mon/0 sudo systemctl start ceph-mon.target
   juju ssh site-a-ceph-mon/1 sudo systemctl start ceph-mon.target
   juju ssh site-a-ceph-mon/2 sudo systemctl start ceph-mon.target
 Confirm that the MON cluster is now healthy (it may take a while):
 .. code-block:: none
   juju status site-a-ceph-mon
   Unit                       Workload  Agent  Machine  Public address  Ports  Message
   site-a-ceph-mon/0          active    idle   14       10.5.0.15              Unit is ready and clustered
   site-a-ceph-mon/1*         active    idle   15       10.5.0.31              Unit is ready and clustered
   site-a-ceph-mon/2          active    idle   16       10.5.0.11              Unit is ready and clustered
 Image resync
 ~~~~~~~~~~~~
 Putting site-a back online at this point will lead to two primary images for
 each replicated volume. This is a split-brain condition that cannot be resolved
 by the RBD mirror daemon. Hence, before failback is invoked each replicated
 volume will need a resync of its images (site-b images are more recent than the
 site-a images).
 The image resync is a two-step process that is initiated on the ceph-rbd-mirror
 unit in site-a:
 Demote the site-a images with the ``demote`` action:
 .. code-block:: none
   juju run-action --wait site-a-ceph-rbd-mirror/0 demote pools=cinder-ceph-a
 Flag the site-a images for a resync with the ``resync-pools`` action. The
 ``pools`` argument should point to the corresponding site's pool, which by
 default is the name of the cinder-ceph application for the site (here
 'cinder-ceph-a'):
 .. code-block:: none
   juju run-action --wait site-a-ceph-rbd-mirror/0 resync-pools i-really-mean-it=true pools=cinder-ceph-a
 The Ceph RBD mirror daemon will perform the resync in the background.
 Failback
 ~~~~~~~~
 Prior to failback, confirm that the images of all replicated volumes in site-a
 are fully synchronised. Perform a check with the ceph-rbd-mirror charm's
 ``status`` action as per :ref:`RBD image status <rbd_image_status>`:
 .. code-block:: none
   juju run-action --wait site-a-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
 This will take a while.
 The state and description for site-a images will transition to:
 .. code-block:: console
        state:       up+syncing
        description: bootstrapping, IMAGE_SYNC/CREATE_SYNC_POINT
 The intermediate values will look like:
 .. code-block:: console
        state:       up+replaying
        description: replaying, {"bytes_per_second":110318.93,"entries_behind_primary":4712.....
 The final values, as expected, will become:
 .. code-block:: console
        state:       up+replaying
        description: replaying, {"bytes_per_second":0.0,"entries_behind_primary":0.....
 The failback of site-a can now proceed:
 .. code-block:: none
   cinder failover-host cinder@cinder-ceph-a --backend_id default
 Confirm the original health of Cinder services (as per :ref:`Cinder service
 list <cinder_service_list>`):
 .. code-block:: none
   cinder service-list
 Verification
 ~~~~~~~~~~~~
 Re-attach the volume to the VM and verify that the secondary device contains
 the expected data:
 .. code-block:: none
   openstack server add volume vm-with-data-volume vol-site-a-repl-data
   ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
   > sudo mount /dev/vdc data
   > cat data/test.txt
   This is a test.
 We can also check the status of the image as per :ref:`RBD image status
 <rbd_image_status>` to verify that the primary indeed resides in site-a again:
 .. code-block:: none
   juju run-action --wait site-a-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
   volume-c44d4d20-6ede-422a-903d-588d1b0d51b0:
     global_id:   3a4aa755-c9ee-4319-8ba4-fc494d20d783
     state:       up+stopped
     description: local image is primary
--- a/deploy-guide/source/cinder-volume-replication-overlay.rst
+++ b/deploy-guide/source/cinder-volume-replication-overlay.rst
@ -0,0 +1,138 @@
 :orphan:
 .. _cinder_volume_replication_custom_overlay:
 ========================================
 Cinder volume replication custom overlay
 ========================================
 The below bundle overlay is used in the instructions given on the :doc:`Cinder
 volume replication <cinder-volume-replication>` page.
 .. code-block:: yaml
   series: focal
   # Change these variables according to the local environment, 'osd-devices'
   # and 'data-port' in particular.
   variables:
     openstack-origin: &openstack-origin cloud:focal-victoria
     osd-devices: &osd-devices /dev/sdb /dev/vdb
     expected-osd-count: &expected-osd-count 3
     expected-mon-count: &expected-mon-count 3
     data-port: &data-port br-ex:ens7
   relations:
   - - cinder-ceph-a:storage-backend
     - cinder:storage-backend
   - - cinder-ceph-b:storage-backend
     - cinder:storage-backend
   - - site-a-ceph-osd:mon
     - site-a-ceph-mon:osd
   - - site-b-ceph-osd:mon
     - site-b-ceph-mon:osd
   - - site-a-ceph-mon:client
     - nova-compute:ceph
   - - site-b-ceph-mon:client
     - nova-compute:ceph
   - - site-a-ceph-mon:client
     - cinder-ceph-a:ceph
   - - site-b-ceph-mon:client
     - cinder-ceph-b:ceph
   - - nova-compute:ceph-access
     - cinder-ceph-a:ceph-access
   - - nova-compute:ceph-access
     - cinder-ceph-b:ceph-access
   - - site-a-ceph-mon:client
     - glance:ceph
   - - site-a-ceph-mon:rbd-mirror
     - site-a-ceph-rbd-mirror:ceph-local
   - - site-b-ceph-mon:rbd-mirror
     - site-b-ceph-rbd-mirror:ceph-local
   - - site-a-ceph-mon
     - site-b-ceph-rbd-mirror:ceph-remote
   - - site-b-ceph-mon
     - site-a-ceph-rbd-mirror:ceph-remote
   - - site-a-ceph-mon:client
     - cinder-ceph-b:ceph-replication-device
   - - site-b-ceph-mon:client
     - cinder-ceph-a:ceph-replication-device
   applications:
     # Prevent some applications in the main bundle from being deployed.
     ceph-radosgw:
     ceph-osd:
     ceph-mon:
     cinder-ceph:
     # Deploy ceph-osd applications with the appropriate names.
     site-a-ceph-osd:
       charm: cs:ceph-osd
       num_units: 3
       options:
         osd-devices: *osd-devices
         source: *openstack-origin
     site-b-ceph-osd:
       charm: cs:ceph-osd
       num_units: 3
       options:
         osd-devices: *osd-devices
         source: *openstack-origin
     # Deploy ceph-mon applications with the appropriate names.
     site-a-ceph-mon:
       charm: cs:ceph-mon
       num_units: 3
       options:
         expected-osd-count: *expected-osd-count
         monitor-count: *expected-mon-count
         source: *openstack-origin
     site-b-ceph-mon:
       charm: cs:ceph-mon
       num_units: 3
       options:
         expected-osd-count: *expected-osd-count
         monitor-count: *expected-mon-count
         source: *openstack-origin
     # Deploy cinder-ceph applications with the appropriate names.
     cinder-ceph-a:
       charm: cs:cinder-ceph
       num_units: 0
       options:
         rbd-mirroring-mode: image
     cinder-ceph-b:
       charm: cs:cinder-ceph
       num_units: 0
       options:
         rbd-mirroring-mode: image
     # Deploy ceph-rbd-mirror applications with the appropriate names.
     site-a-ceph-rbd-mirror:
       charm: cs:ceph-rbd-mirror
       num_units: 1
       options:
         source: *openstack-origin
     site-b-ceph-rbd-mirror:
       charm: cs:ceph-rbd-mirror
       num_units: 1
       options:
         source: *openstack-origin
     # Configure for the local environment.
     ovn-chassis:
       options:
         bridge-interface-mappings: *data-port
--- a/deploy-guide/source/cinder-volume-replication.rst
+++ b/deploy-guide/source/cinder-volume-replication.rst
@ -0,0 +1,576 @@
 =========================
 Cinder volume replication
 =========================
 Overview
 --------
 Cinder volume replication is a primary/secondary failover solution based on
 two-way `Ceph RBD mirroring`_.
 Deployment
 ----------
 The cloud deployment in this document is based on the stable `openstack-base`_
 bundle in the `openstack-bundles`_ repository. The necessary documentation is
 found in the `bundle README`_.
 A custom overlay bundle (`cinder-volume-replication-overlay`_) is used to
 extend the base cloud in order to implement volume replication.
 .. note::
   The key elements for adding volume replication to Ceph RBD mirroring is the
   relation between cinder-ceph in one site and ceph-mon in the other (using the
   ``ceph-replication-device`` endpoint) and the cinder-ceph charm
   configuration option ``rbd-mirroring-mode=image``.
 Cloud notes:
 * The cloud used in these instructions is based on Ubuntu 20.04 LTS (Focal) and
  OpenStack Victoria. The openstack-base bundle may have been updated since.
 * The two Ceph clusters are named 'site-a' and 'site-b' and are placed in the
  same Juju model.
 * A site's pool is named after its corresponding cinder-ceph application (e.g.
  'cinder-ceph-a' for site-a) and is mirrored to the other site. Each site will
  therefore have two pools: 'cinder-ceph-a' and 'cinder-ceph-b'.
 * Glance is only backed by site-a.
 To deploy:
 .. code-block:: none
   juju deploy ./bundle.yaml --overlay ./cinder-volume-replication-overlay.yaml
 Configuration and verification of the base cloud
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Configure the base cloud as per the referenced documentation.
 Before proceeding, verify the base cloud by creating a VM and connecting to it
 over SSH. See the main bundle's README for guidance.
 .. important::
   A known issue affecting the interaction of the ceph-rbd-mirror charm and
   Ceph itself gives the impression of a fatal error. The symptom is messaging
   that appears in :command:`juju status` command output: ``Pools WARNING (1)
   OK (1) Images unknown (1)``. This remains a cosmetic issue however. See bug
   `LP #1892201`_ for details.
 Cinder volume types
 -------------------
 For each site, create replicated and non-replicated Cinder volumes types. A
 type is referenced at volume-creation time in order to specify whether the
 volume is replicated (or not) and what pool it will reside in.
 Type 'site-a-repl' denotes replication in site-a:
 .. code-block:: none
   openstack volume type create site-a-repl \
      --property volume_backend_name=cinder-ceph-a \
      --property replication_enabled='<is> True'
 Type 'site-a-local' denotes non-replication in site-a:
 .. code-block:: none
   openstack volume type create site-a-local \
      --property volume_backend_name=cinder-ceph-a
 Type 'site-b-repl' denotes replication in site-b:
 .. code-block:: none
   openstack volume type create site-b-repl \
      --property volume_backend_name=cinder-ceph-b \
      --property replication_enabled='<is> True'
 Type 'site-b-local' denotes non-replication in site-b:
 .. code-block:: none
   openstack volume type create site-b-local \
      --property volume_backend_name=cinder-ceph-b
 List the volume types:
 .. code-block:: none
   openstack volume type list
   +--------------------------------------+--------------+-----------+
   | ID                                   | Name         | Is Public |
   +--------------------------------------+--------------+-----------+
   | ee70dfd9-7b97-407d-a860-868e0209b93b | site-b-local | True      |
   | b0f6d6b5-9c76-4967-9eb4-d488a6690712 | site-b-repl  | True      |
   | fc89ca9b-d75a-443e-9025-6710afdbfd5c | site-a-local | True      |
   | 780980dc-1357-4fbd-9714-e16a79df252a | site-a-repl  | True      |
   | d57df78d-ff27-4cf0-9959-0ada21ce86ad | __DEFAULT__  | True      |
   +--------------------------------------+--------------+-----------+
 .. note::
   In this document, site-b volume types will not be used. They are created
   here for the more generalised case where new volumes may be needed while
   site-a is in a failover state. In such a circumstance, any volumes created
   in site-b will naturally not be replicated (in site-a).
 .. _rbd_image_status:
 RBD image status
 ----------------
 The status of the two RBD images associated with a replicated volume can be
 queried using the ``status`` action of the ceph-rbd-mirror unit for each site.
 A state of ``up+replaying`` in combination with the presence of
 ``"entries_behind_primary":0`` in the image description means the image in one
 site is in sync with its counterpart in the other site.
 A state of ``up+syncing`` indicates that the sync process is still underway.
 A description of ``local image is primary`` means that the image is the
 primary.
 Consider the volume below that is created and given the volume type of
 'site-a-repl'. Its primary will be in site-a and its non-primary (secondary)
 will be in site-b:
 .. code-block:: none
   openstack volume create --size 5 --type site-a-repl vol-site-a-repl
 Their statuses can be queried in each site as shown:
 Site a (primary),
 .. code-block:: none
   juju run-action --wait site-a-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
         volume-c44d4d20-6ede-422a-903d-588d1b0d51b0:
           global_id:   f66140a6-0c09-478c-9431-4eb1eb16ca86
           state:       up+stopped
           description: local image is primary
 Site b (secondary is in sync with the primary),
 .. code-block:: none
   juju run-action --wait site-b-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
         volume-c44d4d20-6ede-422a-903d-588d1b0d51b0:
           global_id:   f66140a6-0c09-478c-9431-4eb1eb16ca86
           state:       up+replaying
           description: replaying, {"bytes_per_second":0.0,"entries_behind_primary":0,.....
 .. _cinder_service_list:
 Cinder service list
 -------------------
 To verify the state of Cinder services the ``cinder service-list`` command is
 used:
 .. code-block:: none
   cinder service-list
   +------------------+----------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+
   | Binary           | Host                 | Zone | Status  | State | Updated_at                 | Cluster | Disabled Reason | Backend State |
   +------------------+----------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+
   | cinder-scheduler | cinder               | nova | enabled | up    | 2021-04-08T15:59:25.000000 | -       | -               |               |
   | cinder-volume    | cinder@cinder-ceph-a | nova | enabled | up    | 2021-04-08T15:59:24.000000 | -       | -               | up            |
   | cinder-volume    | cinder@cinder-ceph-b | nova | enabled | up    | 2021-04-08T15:59:25.000000 | -       | -               | up            |
   +------------------+----------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+
 Each of the below examples ends with a failback to site-a. The above output is
 the desired result.
 The failover of a particular site entails the referencing of its corresponding
 cinder-volume service host (e.g. ``cinder@cinder-ceph-a`` for site-a). We'll
 see how to do this later on.
 .. note::
   'cinder-ceph-a' and 'cinder-ceph-b' correspond to the two applications
   deployed via the `cinder-ceph`_ charm. The express purpose of this charm is
   to connect Cinder to a Ceph cluster. See the
   `cinder-volume-replication-overlay`_ bundle for details.
 Failover, volumes, images, and pools
 ------------------------------------
 This section will show the basics of failover/failback, non-replicated vs
 replicated volumes, and what pools are used for the volume images.
 In site-a, create one non-replicated and one replicated data volume and list
 them:
 .. code-block:: none
   openstack volume create --size 5 --type site-a-local vol-site-a-local
   openstack volume create --size 5 --type site-a-repl vol-site-a-repl
   openstack volume list
   +--------------------------------------+------------------+-----------+------+-------------+
   | ID                                   | Name             | Status    | Size | Attached to |
   +--------------------------------------+------------------+-----------+------+-------------+
   | fba13395-62d1-468e-9b9a-40bebd0373e8 | vol-site-a-local | available |    5 |             |
   | c21a539e-d524-4f4d-991b-9b9476d4f930 | vol-site-a-repl  | available |    5 |             |
   +--------------------------------------+------------------+-----------+------+-------------+
 Pools and images
 ~~~~~~~~~~~~~~~~
 For 'vol-site-a-local' there should be one image in the 'cinder-ceph-a' pool of
 site-a.
 For 'vol-site-a-repl' there should be two images: one in the 'cinder-ceph-a'
 pool of site-a and one in the 'cinder-ceph-a' pool of site-b:
 This can all be confirmed by querying a Ceph MON in each site:
 .. code-block:: none
   juju ssh site-a-ceph-mon/0 sudo rbd ls -p cinder-ceph-a
   volume-fba13395-62d1-468e-9b9a-40bebd0373e8
   volume-c21a539e-d524-4f4d-991b-9b9476d4f930
   juju ssh site-b-ceph-mon/0 sudo rbd ls -p cinder-ceph-a
   volume-c21a539e-d524-4f4d-991b-9b9476d4f930
 Failover
 ~~~~~~~~
 Perform the failover of site-a:
 .. code-block:: none
   cinder failover-host cinder@cinder-ceph-a
 Wait until the failover is complete:
 .. code-block:: none
   cinder service-list
   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
   | Binary           | Host                 | Zone | Status   | State | Updated_at                 | Cluster | Disabled Reason | Backend State |
   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
   | cinder-scheduler | cinder               | nova | enabled  | up    | 2021-04-08T17:11:56.000000 | -       | -               |               |
   | cinder-volume    | cinder@cinder-ceph-a | nova | disabled | up    | 2021-04-08T17:11:56.000000 | -       | failed-over     | -             |
   | cinder-volume    | cinder@cinder-ceph-b | nova | enabled  | up    | 2021-04-08T17:11:56.000000 | -       | -               | up            |
   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
 A failover triggers the promotion of one site and the demotion of the other
 (site-b and site-a respectively in this example). Communication between Cinder
 and each Ceph cluster is therefore ideal, as in this example.
 Inspection
 ~~~~~~~~~~
 By consulting the volume list we see that the replicated volume is still
 available but that the non-replicated volume has errored:
 .. code-block:: none
   openstack volume list
   +--------------------------------------+------------------+-----------+------+-------------+
   | ID                                   | Name             | Status    | Size | Attached to |
   +--------------------------------------+------------------+-----------+------+-------------+
   | fba13395-62d1-468e-9b9a-40bebd0373e8 | vol-site-a-local | error     |    5 |             |
   | c21a539e-d524-4f4d-991b-9b9476d4f930 | vol-site-a-repl  | available |    5 |             |
   +--------------------------------------+------------------+-----------+------+-------------+
 Generally a failover indicates a significant degree of non-confidence in the
 primary site, site-a in this case. Once a **local** volume goes into an error
 state due to a failover it is expected to not recover after failback. The
 errored local volumes should normally be discarded (deleted).
 Failback
 ~~~~~~~~
 Failback site-a and confirm the original health of Cinder services (as per
 `Cinder service list`_):
 .. code-block:: none
   cinder failover-host cinder@cinder-ceph-a --backend_id default
   cinder service-list
 Examples
 --------
 The following two examples will be considered. They will both use replication
 and involve the failing over of site-a to site-b:
 #. `Data volume used by a VM`_
 #. `Bootable volume used by a VM`_
 Data volume used by a VM
 ~~~~~~~~~~~~~~~~~~~~~~~~
 In this example, a replicated data volume will be created in site-a and
 attached to a VM. The volume's block device will then have some test data
 written to it. This will allow for verification of the replicated data once
 failover has occurred and the volume is re-attached to the VM.
 Preparation
 ^^^^^^^^^^^
 Create the replicated data volume:
 .. code-block:: none
   openstack volume create --size 5 --type site-a-repl vol-site-a-repl-data
   openstack volume list
   +--------------------------------------+---------------------------+-----------+------+-------------+
   | ID                                   | Name                      | Status    | Size | Attached to |
   +--------------------------------------+---------------------------+-----------+------+-------------+
   | f23732c1-3257-4e58-a214-085c460abf56 | vol-site-a-repl-data      | available |    5 |             |
   +--------------------------------------+---------------------------+-----------+------+-------------+
 Create the VM (named 'vm-with-data-volume'):
 .. code-block:: none
   openstack server create --image focal-amd64 --flavor m1.tiny \
      --key-name mykey --network int_net vm-with-data-volume
   FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net)
   openstack server add floating ip vm-with-data-volume $FLOATING_IP
   openstack server list
   +--------------------------------------+----------------------+--------+---------------------------------+--------------------------+---------+
   | ID                                   | Name                 | Status | Networks                        | Image                    | Flavor  |
   +--------------------------------------+----------------------+--------+---------------------------------+--------------------------+---------+
   | fbe07fea-731e-4973-8455-c8466be72293 | vm-with-data-volume  | ACTIVE | int_net=192.168.0.38, 10.5.1.28 | focal-amd64              | m1.tiny |
   +--------------------------------------+----------------------+--------+---------------------------------+--------------------------+---------+
 Attach the data volume to the VM:
 .. code-block:: none
   openstack server add volume vm-with-data-volume vol-site-a-repl-data
 Prepare the block device and write the test data to it:
 .. code-block:: none
   ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
   > sudo mkfs.ext4 /dev/vdc
   > mkdir data
   > sudo mount /dev/vdc data
   > sudo chown ubuntu: data
   > echo "This is a test." > data/test.txt
   > sync
   > exit
 Failover
 ^^^^^^^^
 When both sites are online, as is here, it is not recommended to perform a
 failover when volumes are in use. This is because Cinder will try to demote the
 Ceph image from the primary site, and if there is an active connection to it
 the operation may fail (i.e. the volume will transition to an error state).
 Here we ensure the volume is not in use by unmounting the block device and
 removing it from the VM:
 .. code-block:: none
   ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP sudo umount /dev/vdc
   openstack server remove vm-with-data-volume vol-site-a-repl-data
 Prior to failover the images of all replicated volumes must be fully
 synchronised. Perform a check with the ceph-rbd-mirror charm's ``status``
 action as per `RBD image status`_. If the volumes were created in site-a then
 the ceph-rbd-mirror unit in site-b is the target:
 .. code-block:: none
   juju run-action --wait site-b-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
 If all images look good, perform the failover of site-a:
 .. code-block:: none
   cinder failover-host cinder@cinder-ceph-a
   cinder service-list
   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
   | Binary           | Host                 | Zone | Status   | State | Updated_at                 | Cluster | Disabled Reason | Backend State |
   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
   | cinder-scheduler | cinder               | nova | enabled  | up    | 2021-04-08T19:30:29.000000 | -       | -               |               |
   | cinder-volume    | cinder@cinder-ceph-a | nova | disabled | up    | 2021-04-08T19:30:28.000000 | -       | failed-over     | -             |
   | cinder-volume    | cinder@cinder-ceph-b | nova | enabled  | up    | 2021-04-08T19:30:28.000000 | -       | -               | up            |
   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
 Verification
 ^^^^^^^^^^^^
 Re-attach the volume to the VM:
 .. code-block:: none
   openstack server add volume vm-with-data-volume vol-site-a-repl-data
 Verify that the secondary device contains the expected data:
 .. code-block:: none
   ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
   > sudo mount /dev/vdc /data
   > cat /data/test.txt
   This is a test.
 Failback
 ^^^^^^^^
 Failback site-a and confirm the original health of Cinder services (as per
 `Cinder service list`_):
 .. code-block:: none
   cinder failover-host cinder@cinder-ceph-a --backend_id default
   cinder service-list
 Bootable volume used by a VM
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 In this example, a bootable volume will be created in site-a and have a
 newly-created VM use that volume as its root device. Identically to the
 previous example, the volume's block device will have test data written to it
 to use for verification purposes.
 Preparation
 ^^^^^^^^^^^
 Create the replicated bootable volume:
 .. code-block:: none
   openstack volume create --size 5 --type site-a-repl --image focal-amd64 --bootable vol-site-a-repl-boot
 Wait for the volume to become available (it may take a while):
 .. code-block:: none
   openstack volume list
   +--------------------------------------+----------------------+-----------+------+-------------+
   | ID                                   | Name                 | Status    | Size | Attached to |
   +--------------------------------------+----------------------+-----------+------+-------------+
   | c44d4d20-6ede-422a-903d-588d1b0d51b0 | vol-site-a-repl-boot | available |    5 |             |
   +--------------------------------------+----------------------+-----------+------+-------------+
 Create a VM (named 'vm-with-boot-volume') by specifying the newly-created
 bootable volume:
 .. code-block:: none
   openstack server create --volume vol-site-a-repl-boot --flavor m1.tiny \
      --key-name mykey --network int_net vm-with-boot-volume
   FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net)
   openstack server add floating ip vm-with-boot-volume $FLOATING_IP
   openstack server list
   +--------------------------------------+---------------------+--------+---------------------------------+--------------------------+---------+
   | ID                                   | Name                | Status | Networks                        | Image                    | Flavor  |
   +--------------------------------------+---------------------+--------+---------------------------------+--------------------------+---------+
   | c0a152d7-376b-4500-95d4-7c768a3ff280 | vm-with-boot-volume | ACTIVE | int_net=192.168.0.75, 10.5.1.53 | N/A (booted from volume) | m1.tiny |
   +--------------------------------------+---------------------+--------+---------------------------------+--------------------------+---------+
 Write the test data to the block device:
 .. code-block:: none
   ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
   > echo "This is a test." > test.txt
   > sync
   > exit
 Failover
 ^^^^^^^^
 As explained previously, when both sites are functional, prior to failover the
 replicated volume should not be in use. Since the testing of the replicated
 boot volume requires the VM to be rebuilt anyway (Cinder needs to give the
 updated Ceph connection credentials to Nova) the easiest way forward is to
 simply delete the VM:
 .. code-block:: none
   openstack server delete vm-with-boot-volume
 Like before, prior to failover, confirm that the images of all replicated
 volumes in site-b are fully synchronised. Perform a check with the
 ceph-rbd-mirror charm's ``status`` action as per `RBD image status`_:
 .. code-block:: none
   juju run-action --wait site-b-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
 If all images look good, perform the failover of site-a:
 .. code-block:: none
   cinder failover-host cinder@cinder-ceph-a
   cinder service-list
   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
   | Binary           | Host                 | Zone | Status   | State | Updated_at                 | Cluster | Disabled Reason | Backend State |
   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
   | cinder-scheduler | cinder               | nova | enabled  | up    | 2021-04-08T21:29:12.000000 | -       | -               |               |
   | cinder-volume    | cinder@cinder-ceph-a | nova | disabled | up    | 2021-04-08T21:29:12.000000 | -       | failed-over     | -             |
   | cinder-volume    | cinder@cinder-ceph-b | nova | enabled  | up    | 2021-04-08T21:29:11.000000 | -       | -               | up            |
   +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
 Verification
 ^^^^^^^^^^^^
 Re-create the VM:
 .. code-block:: none
   openstack server create --volume vol-site-a-repl-boot --flavor m1.tiny \
      --key-name mykey --network int_net vm-with-boot-volume
   FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net)
   openstack server add floating ip vm-with-boot-volume $FLOATING_IP
 Verify that the root device contains the expected data:
 .. code-block:: none
   ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
   > cat test.txt
   This is a test.
   > exit
 Failback
 ^^^^^^^^
 Failback site-a and confirm the original health of Cinder services (as per
 `Cinder service list`_):
 .. code-block:: none
   cinder failover-host cinder@cinder-ceph-a --backend_id default
   cinder service-list
 Disaster recovery
 -----------------
 An uncontrolled failover is known as the disaster recovery scenario. It is
 characterised by the sudden failure of the primary Ceph cluster. See the
 :ref:`Cinder volume replication - Disaster recovery
 <cinder_volume_replication_dr>` page for more information.
 .. LINKS
 .. _Ceph RBD mirroring: app-ceph-rbd-mirror.html
 .. _openstack-base: https://github.com/openstack-charmers/openstack-bundles/blob/master/stable/openstack-base/bundle.yaml
 .. _openstack-bundles: https://github.com/openstack-charmers/openstack-bundles/
 .. _bundle README: https://github.com/openstack-charmers/openstack-bundles/blob/master/stable/openstack-base/README.md
 .. _cinder-volume-replication-overlay: cinder-volume-replication-overlay.html
 .. _cinder-ceph: https://jaas.ai/cinder-ceph
 .. _LP #1892201: https://bugs.launchpad.net/charm-ceph-rbd-mirror/+bug/1892201
--- a/deploy-guide/source/index.rst
+++ b/deploy-guide/source/index.rst
@ -85,6 +85,7 @@ OpenStack Charms usage. To help improve it you can `file an issue`_ or
   app-erasure-coding
   app-rgw-multisite
   app-ceph-rbd-mirror
   cinder-volume-replication
   app-manila-ganesha
   app-swift