:orphan: .. _cinder_volume_replication_dr: ============================================= Cinder volume replication - Disaster recovery ============================================= Overview -------- This is the disaster recovery scenario of a Cinder volume replication deployment. It should be read in conjunction with the :doc:`Cinder volume replication ` page. Scenario description -------------------- Disaster recovery involves an uncontrolled failover to the secondary site. Site-b takes over from a troubled site-a and becomes the de-facto primary site, which includes writes to its images. Control is passed back to site-a once it is repaired. .. warning:: The charms support the underlying OpenStack servcies in their native ability to failover and failback. However, a significant degree of administrative care is still needed in order to ensure a successful recovery. For example, * primary volume images that are currently in use may experience difficulty during their demotion to secondary status * running VMs will lose connectivity to their volumes * subsequent image resyncs may not be straightforward Any work necessary to rectify data issues resulting from an uncontrolled failover is beyond the scope of the OpenStack charms and this document. Simulation ---------- For the sake of understanding some of the rudimentary aspects involved in disaster recovery a simulation is provided. Preparation ~~~~~~~~~~~ Create the replicated data volume and confirm it is available: .. code-block:: none openstack volume create --size 5 --type site-a-repl vol-site-a-repl-data openstack volume list Simulate a failure in site-a by turning off all of its Ceph MON daemons: .. code-block:: none juju ssh site-a-ceph-mon/0 sudo systemctl stop ceph-mon.target juju ssh site-a-ceph-mon/1 sudo systemctl stop ceph-mon.target juju ssh site-a-ceph-mon/2 sudo systemctl stop ceph-mon.target Modify timeout and retry settings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When a Ceph cluster fails communication between Cinder and the failed cluster will be interrupted and the RBD driver will accommodate with retries and timeouts. To accelerate the failover mechanism, timeout and retry settings on the cinder-ceph unit in site-a can be modified: .. code-block:: none juju ssh cinder-ceph-a/0 > sudo apt install -y crudini > sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a rados_connect_timeout 1 > sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a rados_connection_retries 1 > sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a rados_connection_interval 0 > sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a replication_connect_timeout 1 > sudo systemctl restart cinder-volume > exit These configuration changes are only intended to be in effect during the failover transition period. They should be reverted afterwards since the default values are fine for normal operations. Failover ~~~~~~~~ Perform the failover of site-a, confirm its cinder-volume host is disabled, and that the volume remains available: .. code-block:: none cinder failover-host cinder@cinder-ceph-a cinder service-list openstack volume list Confirm that the Cinder log file (``/var/log/cinder/cinder-volume.log``) on unit ``cinder/0`` contains the successful failover message: ``Failed over to replication target successfully.``. Revert timeout and retry settings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Revert the configuration changes made to the cinder-ceph backend: .. code-block:: none juju ssh cinder-ceph-a/0 > sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a rados_connect_timeout > sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a rados_connection_retries > sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a rados_connection_interval > sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a replication_connect_timeout > sudo systemctl restart cinder-volume > exit Write to the volume ~~~~~~~~~~~~~~~~~~~ Create a VM (named 'vm-with-data-volume'): .. code-block:: none openstack server create --image focal-amd64 --flavor m1.tiny \ --key-name mykey --network int_net vm-with-data-volume FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net) openstack server add floating ip vm-with-data-volume $FLOATING_IP Attach the volume to the VM, write some data to it, and detach it: .. code-block:: none openstack server add volume vm-with-data-volume vol-site-a-repl-data ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP > sudo mkfs.ext4 /dev/vdc > mkdir data > sudo mount /dev/vdc data > sudo chown ubuntu: data > echo "This is a test." > data/test.txt > sync > sudo umount /dev/vdc > exit openstack server remove volume vm-with-data-volume vol-site-a-repl-data Repair site-a ~~~~~~~~~~~~~ In the current example, site-a is repaired by starting the Ceph MON daemons: .. code-block:: none juju ssh site-a-ceph-mon/0 sudo systemctl start ceph-mon.target juju ssh site-a-ceph-mon/1 sudo systemctl start ceph-mon.target juju ssh site-a-ceph-mon/2 sudo systemctl start ceph-mon.target Confirm that the MON cluster is now healthy (it may take a while): .. code-block:: none juju status site-a-ceph-mon Unit Workload Agent Machine Public address Ports Message site-a-ceph-mon/0 active idle 14 10.5.0.15 Unit is ready and clustered site-a-ceph-mon/1* active idle 15 10.5.0.31 Unit is ready and clustered site-a-ceph-mon/2 active idle 16 10.5.0.11 Unit is ready and clustered Image resync ~~~~~~~~~~~~ Putting site-a back online at this point will lead to two primary images for each replicated volume. This is a split-brain condition that cannot be resolved by the RBD mirror daemon. Hence, before failback is invoked each replicated volume will need a resync of its images (site-b images are more recent than the site-a images). The image resync is a two-step process that is initiated on the ceph-rbd-mirror unit in site-a: Demote the site-a images with the ``demote`` action: .. code-block:: none juju run-action --wait site-a-ceph-rbd-mirror/0 demote pools=cinder-ceph-a Flag the site-a images for a resync with the ``resync-pools`` action. The ``pools`` argument should point to the corresponding site's pool, which by default is the name of the cinder-ceph application for the site (here 'cinder-ceph-a'): .. code-block:: none juju run-action --wait site-a-ceph-rbd-mirror/0 resync-pools i-really-mean-it=true pools=cinder-ceph-a The Ceph RBD mirror daemon will perform the resync in the background. Failback ~~~~~~~~ Prior to failback, confirm that the images of all replicated volumes in site-a are fully synchronised. Perform a check with the ceph-rbd-mirror charm's ``status`` action as per :ref:`RBD image status `: .. code-block:: none juju run-action --wait site-a-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume- This will take a while. The state and description for site-a images will transition to: .. code-block:: console state: up+syncing description: bootstrapping, IMAGE_SYNC/CREATE_SYNC_POINT The intermediate values will look like: .. code-block:: console state: up+replaying description: replaying, {"bytes_per_second":110318.93,"entries_behind_primary":4712..... The final values, as expected, will become: .. code-block:: console state: up+replaying description: replaying, {"bytes_per_second":0.0,"entries_behind_primary":0..... The failback of site-a can now proceed: .. code-block:: none cinder failover-host cinder@cinder-ceph-a --backend_id default Confirm the original health of Cinder services (as per :ref:`Cinder service list `): .. code-block:: none cinder service-list Verification ~~~~~~~~~~~~ Re-attach the volume to the VM and verify that the secondary device contains the expected data: .. code-block:: none openstack server add volume vm-with-data-volume vol-site-a-repl-data ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP > sudo mount /dev/vdc data > cat data/test.txt This is a test. We can also check the status of the image as per :ref:`RBD image status ` to verify that the primary indeed resides in site-a again: .. code-block:: none juju run-action --wait site-a-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume- volume-c44d4d20-6ede-422a-903d-588d1b0d51b0: global_id: 3a4aa755-c9ee-4319-8ba4-fc494d20d783 state: up+stopped description: local image is primary