From c680f1bf7096f3cc510cc345e77e960a2555e717 Mon Sep 17 00:00:00 2001 From: Peter Matulis Date: Mon, 19 Apr 2021 15:04:48 -0400 Subject: [PATCH] Add Cinder volume replication Add new page cinder-volume-replication.rst and the accompanying cinder-volume-replication-overlay.rst Separate the disaster scenario from the main body as per team consensus Related-Bug: #1925035 Change-Id: Id9e4c8fff27a678d78aa0b606ec9e8a00208a894 --- .../source/cinder-volume-replication-dr.rst | 275 +++++++++ .../cinder-volume-replication-overlay.rst | 138 +++++ .../source/cinder-volume-replication.rst | 576 ++++++++++++++++++ deploy-guide/source/index.rst | 1 + 4 files changed, 990 insertions(+) create mode 100644 deploy-guide/source/cinder-volume-replication-dr.rst create mode 100644 deploy-guide/source/cinder-volume-replication-overlay.rst create mode 100644 deploy-guide/source/cinder-volume-replication.rst diff --git a/deploy-guide/source/cinder-volume-replication-dr.rst b/deploy-guide/source/cinder-volume-replication-dr.rst new file mode 100644 index 0000000..1bf9729 --- /dev/null +++ b/deploy-guide/source/cinder-volume-replication-dr.rst @@ -0,0 +1,275 @@ +:orphan: + +.. _cinder_volume_replication_dr: + +============================================= +Cinder volume replication - Disaster recovery +============================================= + +Overview +-------- + +This is the disaster recovery scenario of a Cinder volume replication +deployment. It should be read in conjunction with the :doc:`Cinder volume +replication ` page. + +Scenario description +-------------------- + +Disaster recovery involves an uncontrolled failover to the secondary site. +Site-b takes over from a troubled site-a and becomes the de-facto primary site, +which includes writes to its images. Control is passed back to site-a once it +is repaired. + +.. warning:: + + The charms support the underlying OpenStack servcies in their native ability + to failover and failback. However, a significant degree of administrative + care is still needed in order to ensure a successful recovery. + + For example, + + * primary volume images that are currently in use may experience difficulty + during their demotion to secondary status + + * running VMs will lose connectivity to their volumes + + * subsequent image resyncs may not be straightforward + + Any work necessary to rectify data issues resulting from an uncontrolled + failover is beyond the scope of the OpenStack charms and this document. + +Simulation +---------- + +For the sake of understanding some of the rudimentary aspects involved in +disaster recovery a simulation is provided. + +Preparation +~~~~~~~~~~~ + +Create the replicated data volume and confirm it is available: + +.. code-block:: none + + openstack volume create --size 5 --type site-a-repl vol-site-a-repl-data + openstack volume list + +Simulate a failure in site-a by turning off all of its Ceph MON daemons: + +.. code-block:: none + + juju ssh site-a-ceph-mon/0 sudo systemctl stop ceph-mon.target + juju ssh site-a-ceph-mon/1 sudo systemctl stop ceph-mon.target + juju ssh site-a-ceph-mon/2 sudo systemctl stop ceph-mon.target + +Modify timeout and retry settings +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a Ceph cluster fails communication between Cinder and the failed cluster +will be interrupted and the RBD driver will accommodate with retries and +timeouts. + +To accelerate the failover mechanism, timeout and retry settings on the +cinder-ceph unit in site-a can be modified: + +.. code-block:: none + + juju ssh cinder-ceph-a/0 + > sudo apt install -y crudini + > sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a rados_connect_timeout 1 + > sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a rados_connection_retries 1 + > sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a rados_connection_interval 0 + > sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a replication_connect_timeout 1 + > sudo systemctl restart cinder-volume + > exit + +These configuration changes are only intended to be in effect during the +failover transition period. They should be reverted afterwards since the +default values are fine for normal operations. + +Failover +~~~~~~~~ + +Perform the failover of site-a, confirm its cinder-volume host is disabled, and +that the volume remains available: + +.. code-block:: none + + cinder failover-host cinder@cinder-ceph-a + cinder service-list + openstack volume list + +Confirm that the Cinder log file (``/var/log/cinder/cinder-volume.log``) on +unit ``cinder/0`` contains the successful failover message: ``Failed over to +replication target successfully.``. + +Revert timeout and retry settings +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Revert the configuration changes made to the cinder-ceph backend: + +.. code-block:: none + + juju ssh cinder-ceph-a/0 + > sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a rados_connect_timeout + > sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a rados_connection_retries + > sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a rados_connection_interval + > sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a replication_connect_timeout + > sudo systemctl restart cinder-volume + > exit + +Write to the volume +~~~~~~~~~~~~~~~~~~~ + +Create a VM (named 'vm-with-data-volume'): + +.. code-block:: none + + openstack server create --image focal-amd64 --flavor m1.tiny \ + --key-name mykey --network int_net vm-with-data-volume + + FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net) + openstack server add floating ip vm-with-data-volume $FLOATING_IP + +Attach the volume to the VM, write some data to it, and detach it: + +.. code-block:: none + + openstack server add volume vm-with-data-volume vol-site-a-repl-data + + ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP + > sudo mkfs.ext4 /dev/vdc + > mkdir data + > sudo mount /dev/vdc data + > sudo chown ubuntu: data + > echo "This is a test." > data/test.txt + > sync + > sudo umount /dev/vdc + > exit + + openstack server remove volume vm-with-data-volume vol-site-a-repl-data + +Repair site-a +~~~~~~~~~~~~~ + +In the current example, site-a is repaired by starting the Ceph MON daemons: + +.. code-block:: none + + juju ssh site-a-ceph-mon/0 sudo systemctl start ceph-mon.target + juju ssh site-a-ceph-mon/1 sudo systemctl start ceph-mon.target + juju ssh site-a-ceph-mon/2 sudo systemctl start ceph-mon.target + +Confirm that the MON cluster is now healthy (it may take a while): + +.. code-block:: none + + juju status site-a-ceph-mon + + Unit Workload Agent Machine Public address Ports Message + site-a-ceph-mon/0 active idle 14 10.5.0.15 Unit is ready and clustered + site-a-ceph-mon/1* active idle 15 10.5.0.31 Unit is ready and clustered + site-a-ceph-mon/2 active idle 16 10.5.0.11 Unit is ready and clustered + +Image resync +~~~~~~~~~~~~ + +Putting site-a back online at this point will lead to two primary images for +each replicated volume. This is a split-brain condition that cannot be resolved +by the RBD mirror daemon. Hence, before failback is invoked each replicated +volume will need a resync of its images (site-b images are more recent than the +site-a images). + +The image resync is a two-step process that is initiated on the ceph-rbd-mirror +unit in site-a: + +Demote the site-a images with the ``demote`` action: + +.. code-block:: none + + juju run-action --wait site-a-ceph-rbd-mirror/0 demote pools=cinder-ceph-a + +Flag the site-a images for a resync with the ``resync-pools`` action. The +``pools`` argument should point to the corresponding site's pool, which by +default is the name of the cinder-ceph application for the site (here +'cinder-ceph-a'): + +.. code-block:: none + + juju run-action --wait site-a-ceph-rbd-mirror/0 resync-pools i-really-mean-it=true pools=cinder-ceph-a + +The Ceph RBD mirror daemon will perform the resync in the background. + +Failback +~~~~~~~~ + +Prior to failback, confirm that the images of all replicated volumes in site-a +are fully synchronised. Perform a check with the ceph-rbd-mirror charm's +``status`` action as per :ref:`RBD image status `: + +.. code-block:: none + + juju run-action --wait site-a-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume- + +This will take a while. + +The state and description for site-a images will transition to: + +.. code-block:: console + + state: up+syncing + description: bootstrapping, IMAGE_SYNC/CREATE_SYNC_POINT + +The intermediate values will look like: + +.. code-block:: console + + state: up+replaying + description: replaying, {"bytes_per_second":110318.93,"entries_behind_primary":4712..... + +The final values, as expected, will become: + +.. code-block:: console + + state: up+replaying + description: replaying, {"bytes_per_second":0.0,"entries_behind_primary":0..... + +The failback of site-a can now proceed: + +.. code-block:: none + + cinder failover-host cinder@cinder-ceph-a --backend_id default + +Confirm the original health of Cinder services (as per :ref:`Cinder service +list `): + +.. code-block:: none + + cinder service-list + +Verification +~~~~~~~~~~~~ + +Re-attach the volume to the VM and verify that the secondary device contains +the expected data: + +.. code-block:: none + + openstack server add volume vm-with-data-volume vol-site-a-repl-data + ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP + > sudo mount /dev/vdc data + > cat data/test.txt + This is a test. + +We can also check the status of the image as per :ref:`RBD image status +` to verify that the primary indeed resides in site-a again: + +.. code-block:: none + + juju run-action --wait site-a-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume- + + volume-c44d4d20-6ede-422a-903d-588d1b0d51b0: + global_id: 3a4aa755-c9ee-4319-8ba4-fc494d20d783 + state: up+stopped + description: local image is primary diff --git a/deploy-guide/source/cinder-volume-replication-overlay.rst b/deploy-guide/source/cinder-volume-replication-overlay.rst new file mode 100644 index 0000000..c479541 --- /dev/null +++ b/deploy-guide/source/cinder-volume-replication-overlay.rst @@ -0,0 +1,138 @@ +:orphan: + +.. _cinder_volume_replication_custom_overlay: + +======================================== +Cinder volume replication custom overlay +======================================== + +The below bundle overlay is used in the instructions given on the :doc:`Cinder +volume replication ` page. + +.. code-block:: yaml + + series: focal + + # Change these variables according to the local environment, 'osd-devices' + # and 'data-port' in particular. + variables: + openstack-origin: &openstack-origin cloud:focal-victoria + osd-devices: &osd-devices /dev/sdb /dev/vdb + expected-osd-count: &expected-osd-count 3 + expected-mon-count: &expected-mon-count 3 + data-port: &data-port br-ex:ens7 + + relations: + - - cinder-ceph-a:storage-backend + - cinder:storage-backend + - - cinder-ceph-b:storage-backend + - cinder:storage-backend + + - - site-a-ceph-osd:mon + - site-a-ceph-mon:osd + - - site-b-ceph-osd:mon + - site-b-ceph-mon:osd + + - - site-a-ceph-mon:client + - nova-compute:ceph + - - site-b-ceph-mon:client + - nova-compute:ceph + + - - site-a-ceph-mon:client + - cinder-ceph-a:ceph + - - site-b-ceph-mon:client + - cinder-ceph-b:ceph + + - - nova-compute:ceph-access + - cinder-ceph-a:ceph-access + - - nova-compute:ceph-access + - cinder-ceph-b:ceph-access + + - - site-a-ceph-mon:client + - glance:ceph + + - - site-a-ceph-mon:rbd-mirror + - site-a-ceph-rbd-mirror:ceph-local + - - site-b-ceph-mon:rbd-mirror + - site-b-ceph-rbd-mirror:ceph-local + + - - site-a-ceph-mon + - site-b-ceph-rbd-mirror:ceph-remote + - - site-b-ceph-mon + - site-a-ceph-rbd-mirror:ceph-remote + + - - site-a-ceph-mon:client + - cinder-ceph-b:ceph-replication-device + - - site-b-ceph-mon:client + - cinder-ceph-a:ceph-replication-device + + applications: + + # Prevent some applications in the main bundle from being deployed. + ceph-radosgw: + ceph-osd: + ceph-mon: + cinder-ceph: + + # Deploy ceph-osd applications with the appropriate names. + site-a-ceph-osd: + charm: cs:ceph-osd + num_units: 3 + options: + osd-devices: *osd-devices + source: *openstack-origin + + site-b-ceph-osd: + charm: cs:ceph-osd + num_units: 3 + options: + osd-devices: *osd-devices + source: *openstack-origin + + # Deploy ceph-mon applications with the appropriate names. + site-a-ceph-mon: + charm: cs:ceph-mon + num_units: 3 + options: + expected-osd-count: *expected-osd-count + monitor-count: *expected-mon-count + source: *openstack-origin + + site-b-ceph-mon: + charm: cs:ceph-mon + num_units: 3 + options: + expected-osd-count: *expected-osd-count + monitor-count: *expected-mon-count + source: *openstack-origin + + # Deploy cinder-ceph applications with the appropriate names. + cinder-ceph-a: + charm: cs:cinder-ceph + num_units: 0 + options: + rbd-mirroring-mode: image + + cinder-ceph-b: + charm: cs:cinder-ceph + num_units: 0 + options: + rbd-mirroring-mode: image + + # Deploy ceph-rbd-mirror applications with the appropriate names. + site-a-ceph-rbd-mirror: + charm: cs:ceph-rbd-mirror + num_units: 1 + options: + source: *openstack-origin + + site-b-ceph-rbd-mirror: + charm: cs:ceph-rbd-mirror + num_units: 1 + options: + source: *openstack-origin + + # Configure for the local environment. + ovn-chassis: + options: + bridge-interface-mappings: *data-port diff --git a/deploy-guide/source/cinder-volume-replication.rst b/deploy-guide/source/cinder-volume-replication.rst new file mode 100644 index 0000000..bbacab1 --- /dev/null +++ b/deploy-guide/source/cinder-volume-replication.rst @@ -0,0 +1,576 @@ +========================= +Cinder volume replication +========================= + +Overview +-------- + +Cinder volume replication is a primary/secondary failover solution based on +two-way `Ceph RBD mirroring`_. + +Deployment +---------- + +The cloud deployment in this document is based on the stable `openstack-base`_ +bundle in the `openstack-bundles`_ repository. The necessary documentation is +found in the `bundle README`_. + +A custom overlay bundle (`cinder-volume-replication-overlay`_) is used to +extend the base cloud in order to implement volume replication. + +.. note:: + + The key elements for adding volume replication to Ceph RBD mirroring is the + relation between cinder-ceph in one site and ceph-mon in the other (using the + ``ceph-replication-device`` endpoint) and the cinder-ceph charm + configuration option ``rbd-mirroring-mode=image``. + +Cloud notes: + +* The cloud used in these instructions is based on Ubuntu 20.04 LTS (Focal) and + OpenStack Victoria. The openstack-base bundle may have been updated since. +* The two Ceph clusters are named 'site-a' and 'site-b' and are placed in the + same Juju model. +* A site's pool is named after its corresponding cinder-ceph application (e.g. + 'cinder-ceph-a' for site-a) and is mirrored to the other site. Each site will + therefore have two pools: 'cinder-ceph-a' and 'cinder-ceph-b'. +* Glance is only backed by site-a. + +To deploy: + +.. code-block:: none + + juju deploy ./bundle.yaml --overlay ./cinder-volume-replication-overlay.yaml + +Configuration and verification of the base cloud +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Configure the base cloud as per the referenced documentation. + +Before proceeding, verify the base cloud by creating a VM and connecting to it +over SSH. See the main bundle's README for guidance. + +.. important:: + + A known issue affecting the interaction of the ceph-rbd-mirror charm and + Ceph itself gives the impression of a fatal error. The symptom is messaging + that appears in :command:`juju status` command output: ``Pools WARNING (1) + OK (1) Images unknown (1)``. This remains a cosmetic issue however. See bug + `LP #1892201`_ for details. + +Cinder volume types +------------------- + +For each site, create replicated and non-replicated Cinder volumes types. A +type is referenced at volume-creation time in order to specify whether the +volume is replicated (or not) and what pool it will reside in. + +Type 'site-a-repl' denotes replication in site-a: + +.. code-block:: none + + openstack volume type create site-a-repl \ + --property volume_backend_name=cinder-ceph-a \ + --property replication_enabled=' True' + +Type 'site-a-local' denotes non-replication in site-a: + +.. code-block:: none + + openstack volume type create site-a-local \ + --property volume_backend_name=cinder-ceph-a + +Type 'site-b-repl' denotes replication in site-b: + +.. code-block:: none + + openstack volume type create site-b-repl \ + --property volume_backend_name=cinder-ceph-b \ + --property replication_enabled=' True' + +Type 'site-b-local' denotes non-replication in site-b: + +.. code-block:: none + + openstack volume type create site-b-local \ + --property volume_backend_name=cinder-ceph-b + +List the volume types: + +.. code-block:: none + + openstack volume type list + +--------------------------------------+--------------+-----------+ + | ID | Name | Is Public | + +--------------------------------------+--------------+-----------+ + | ee70dfd9-7b97-407d-a860-868e0209b93b | site-b-local | True | + | b0f6d6b5-9c76-4967-9eb4-d488a6690712 | site-b-repl | True | + | fc89ca9b-d75a-443e-9025-6710afdbfd5c | site-a-local | True | + | 780980dc-1357-4fbd-9714-e16a79df252a | site-a-repl | True | + | d57df78d-ff27-4cf0-9959-0ada21ce86ad | __DEFAULT__ | True | + +--------------------------------------+--------------+-----------+ + +.. note:: + + In this document, site-b volume types will not be used. They are created + here for the more generalised case where new volumes may be needed while + site-a is in a failover state. In such a circumstance, any volumes created + in site-b will naturally not be replicated (in site-a). + +.. _rbd_image_status: + +RBD image status +---------------- + +The status of the two RBD images associated with a replicated volume can be +queried using the ``status`` action of the ceph-rbd-mirror unit for each site. + +A state of ``up+replaying`` in combination with the presence of +``"entries_behind_primary":0`` in the image description means the image in one +site is in sync with its counterpart in the other site. + +A state of ``up+syncing`` indicates that the sync process is still underway. + +A description of ``local image is primary`` means that the image is the +primary. + +Consider the volume below that is created and given the volume type of +'site-a-repl'. Its primary will be in site-a and its non-primary (secondary) +will be in site-b: + +.. code-block:: none + + openstack volume create --size 5 --type site-a-repl vol-site-a-repl + +Their statuses can be queried in each site as shown: + +Site a (primary), + +.. code-block:: none + + juju run-action --wait site-a-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume- + volume-c44d4d20-6ede-422a-903d-588d1b0d51b0: + global_id: f66140a6-0c09-478c-9431-4eb1eb16ca86 + state: up+stopped + description: local image is primary + +Site b (secondary is in sync with the primary), + +.. code-block:: none + + juju run-action --wait site-b-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume- + volume-c44d4d20-6ede-422a-903d-588d1b0d51b0: + global_id: f66140a6-0c09-478c-9431-4eb1eb16ca86 + state: up+replaying + description: replaying, {"bytes_per_second":0.0,"entries_behind_primary":0,..... + +.. _cinder_service_list: + +Cinder service list +------------------- + +To verify the state of Cinder services the ``cinder service-list`` command is +used: + +.. code-block:: none + + cinder service-list + +------------------+----------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+ + | Binary | Host | Zone | Status | State | Updated_at | Cluster | Disabled Reason | Backend State | + +------------------+----------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+ + | cinder-scheduler | cinder | nova | enabled | up | 2021-04-08T15:59:25.000000 | - | - | | + | cinder-volume | cinder@cinder-ceph-a | nova | enabled | up | 2021-04-08T15:59:24.000000 | - | - | up | + | cinder-volume | cinder@cinder-ceph-b | nova | enabled | up | 2021-04-08T15:59:25.000000 | - | - | up | + +------------------+----------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+ + +Each of the below examples ends with a failback to site-a. The above output is +the desired result. + +The failover of a particular site entails the referencing of its corresponding +cinder-volume service host (e.g. ``cinder@cinder-ceph-a`` for site-a). We'll +see how to do this later on. + +.. note:: + + 'cinder-ceph-a' and 'cinder-ceph-b' correspond to the two applications + deployed via the `cinder-ceph`_ charm. The express purpose of this charm is + to connect Cinder to a Ceph cluster. See the + `cinder-volume-replication-overlay`_ bundle for details. + +Failover, volumes, images, and pools +------------------------------------ + +This section will show the basics of failover/failback, non-replicated vs +replicated volumes, and what pools are used for the volume images. + +In site-a, create one non-replicated and one replicated data volume and list +them: + +.. code-block:: none + + openstack volume create --size 5 --type site-a-local vol-site-a-local + openstack volume create --size 5 --type site-a-repl vol-site-a-repl + + openstack volume list + +--------------------------------------+------------------+-----------+------+-------------+ + | ID | Name | Status | Size | Attached to | + +--------------------------------------+------------------+-----------+------+-------------+ + | fba13395-62d1-468e-9b9a-40bebd0373e8 | vol-site-a-local | available | 5 | | + | c21a539e-d524-4f4d-991b-9b9476d4f930 | vol-site-a-repl | available | 5 | | + +--------------------------------------+------------------+-----------+------+-------------+ + +Pools and images +~~~~~~~~~~~~~~~~ + +For 'vol-site-a-local' there should be one image in the 'cinder-ceph-a' pool of +site-a. + +For 'vol-site-a-repl' there should be two images: one in the 'cinder-ceph-a' +pool of site-a and one in the 'cinder-ceph-a' pool of site-b: + +This can all be confirmed by querying a Ceph MON in each site: + +.. code-block:: none + + juju ssh site-a-ceph-mon/0 sudo rbd ls -p cinder-ceph-a + + volume-fba13395-62d1-468e-9b9a-40bebd0373e8 + volume-c21a539e-d524-4f4d-991b-9b9476d4f930 + + juju ssh site-b-ceph-mon/0 sudo rbd ls -p cinder-ceph-a + + volume-c21a539e-d524-4f4d-991b-9b9476d4f930 + +Failover +~~~~~~~~ + +Perform the failover of site-a: + +.. code-block:: none + + cinder failover-host cinder@cinder-ceph-a + +Wait until the failover is complete: + +.. code-block:: none + + cinder service-list + +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+ + | Binary | Host | Zone | Status | State | Updated_at | Cluster | Disabled Reason | Backend State | + +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+ + | cinder-scheduler | cinder | nova | enabled | up | 2021-04-08T17:11:56.000000 | - | - | | + | cinder-volume | cinder@cinder-ceph-a | nova | disabled | up | 2021-04-08T17:11:56.000000 | - | failed-over | - | + | cinder-volume | cinder@cinder-ceph-b | nova | enabled | up | 2021-04-08T17:11:56.000000 | - | - | up | + +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+ + +A failover triggers the promotion of one site and the demotion of the other +(site-b and site-a respectively in this example). Communication between Cinder +and each Ceph cluster is therefore ideal, as in this example. + +Inspection +~~~~~~~~~~ + +By consulting the volume list we see that the replicated volume is still +available but that the non-replicated volume has errored: + +.. code-block:: none + + openstack volume list + +--------------------------------------+------------------+-----------+------+-------------+ + | ID | Name | Status | Size | Attached to | + +--------------------------------------+------------------+-----------+------+-------------+ + | fba13395-62d1-468e-9b9a-40bebd0373e8 | vol-site-a-local | error | 5 | | + | c21a539e-d524-4f4d-991b-9b9476d4f930 | vol-site-a-repl | available | 5 | | + +--------------------------------------+------------------+-----------+------+-------------+ + +Generally a failover indicates a significant degree of non-confidence in the +primary site, site-a in this case. Once a **local** volume goes into an error +state due to a failover it is expected to not recover after failback. The +errored local volumes should normally be discarded (deleted). + +Failback +~~~~~~~~ + +Failback site-a and confirm the original health of Cinder services (as per +`Cinder service list`_): + +.. code-block:: none + + cinder failover-host cinder@cinder-ceph-a --backend_id default + cinder service-list + +Examples +-------- + +The following two examples will be considered. They will both use replication +and involve the failing over of site-a to site-b: + +#. `Data volume used by a VM`_ +#. `Bootable volume used by a VM`_ + +Data volume used by a VM +~~~~~~~~~~~~~~~~~~~~~~~~ + +In this example, a replicated data volume will be created in site-a and +attached to a VM. The volume's block device will then have some test data +written to it. This will allow for verification of the replicated data once +failover has occurred and the volume is re-attached to the VM. + +Preparation +^^^^^^^^^^^ + +Create the replicated data volume: + +.. code-block:: none + + openstack volume create --size 5 --type site-a-repl vol-site-a-repl-data + openstack volume list + +--------------------------------------+---------------------------+-----------+------+-------------+ + | ID | Name | Status | Size | Attached to | + +--------------------------------------+---------------------------+-----------+------+-------------+ + | f23732c1-3257-4e58-a214-085c460abf56 | vol-site-a-repl-data | available | 5 | | + +--------------------------------------+---------------------------+-----------+------+-------------+ + +Create the VM (named 'vm-with-data-volume'): + +.. code-block:: none + + openstack server create --image focal-amd64 --flavor m1.tiny \ + --key-name mykey --network int_net vm-with-data-volume + + FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net) + openstack server add floating ip vm-with-data-volume $FLOATING_IP + + openstack server list + +--------------------------------------+----------------------+--------+---------------------------------+--------------------------+---------+ + | ID | Name | Status | Networks | Image | Flavor | + +--------------------------------------+----------------------+--------+---------------------------------+--------------------------+---------+ + | fbe07fea-731e-4973-8455-c8466be72293 | vm-with-data-volume | ACTIVE | int_net=192.168.0.38, 10.5.1.28 | focal-amd64 | m1.tiny | + +--------------------------------------+----------------------+--------+---------------------------------+--------------------------+---------+ + +Attach the data volume to the VM: + +.. code-block:: none + + openstack server add volume vm-with-data-volume vol-site-a-repl-data + +Prepare the block device and write the test data to it: + +.. code-block:: none + + ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP + > sudo mkfs.ext4 /dev/vdc + > mkdir data + > sudo mount /dev/vdc data + > sudo chown ubuntu: data + > echo "This is a test." > data/test.txt + > sync + > exit + +Failover +^^^^^^^^ + +When both sites are online, as is here, it is not recommended to perform a +failover when volumes are in use. This is because Cinder will try to demote the +Ceph image from the primary site, and if there is an active connection to it +the operation may fail (i.e. the volume will transition to an error state). + +Here we ensure the volume is not in use by unmounting the block device and +removing it from the VM: + +.. code-block:: none + + ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP sudo umount /dev/vdc + openstack server remove vm-with-data-volume vol-site-a-repl-data + +Prior to failover the images of all replicated volumes must be fully +synchronised. Perform a check with the ceph-rbd-mirror charm's ``status`` +action as per `RBD image status`_. If the volumes were created in site-a then +the ceph-rbd-mirror unit in site-b is the target: + +.. code-block:: none + + juju run-action --wait site-b-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume- + +If all images look good, perform the failover of site-a: + +.. code-block:: none + + cinder failover-host cinder@cinder-ceph-a + cinder service-list + +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+ + | Binary | Host | Zone | Status | State | Updated_at | Cluster | Disabled Reason | Backend State | + +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+ + | cinder-scheduler | cinder | nova | enabled | up | 2021-04-08T19:30:29.000000 | - | - | | + | cinder-volume | cinder@cinder-ceph-a | nova | disabled | up | 2021-04-08T19:30:28.000000 | - | failed-over | - | + | cinder-volume | cinder@cinder-ceph-b | nova | enabled | up | 2021-04-08T19:30:28.000000 | - | - | up | + +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+ + +Verification +^^^^^^^^^^^^ + +Re-attach the volume to the VM: + +.. code-block:: none + + openstack server add volume vm-with-data-volume vol-site-a-repl-data + +Verify that the secondary device contains the expected data: + +.. code-block:: none + + ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP + > sudo mount /dev/vdc /data + > cat /data/test.txt + This is a test. + +Failback +^^^^^^^^ + +Failback site-a and confirm the original health of Cinder services (as per +`Cinder service list`_): + +.. code-block:: none + + cinder failover-host cinder@cinder-ceph-a --backend_id default + cinder service-list + +Bootable volume used by a VM +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In this example, a bootable volume will be created in site-a and have a +newly-created VM use that volume as its root device. Identically to the +previous example, the volume's block device will have test data written to it +to use for verification purposes. + +Preparation +^^^^^^^^^^^ + +Create the replicated bootable volume: + +.. code-block:: none + + openstack volume create --size 5 --type site-a-repl --image focal-amd64 --bootable vol-site-a-repl-boot + +Wait for the volume to become available (it may take a while): + +.. code-block:: none + + openstack volume list + +--------------------------------------+----------------------+-----------+------+-------------+ + | ID | Name | Status | Size | Attached to | + +--------------------------------------+----------------------+-----------+------+-------------+ + | c44d4d20-6ede-422a-903d-588d1b0d51b0 | vol-site-a-repl-boot | available | 5 | | + +--------------------------------------+----------------------+-----------+------+-------------+ + +Create a VM (named 'vm-with-boot-volume') by specifying the newly-created +bootable volume: + +.. code-block:: none + + openstack server create --volume vol-site-a-repl-boot --flavor m1.tiny \ + --key-name mykey --network int_net vm-with-boot-volume + + FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net) + openstack server add floating ip vm-with-boot-volume $FLOATING_IP + + openstack server list + +--------------------------------------+---------------------+--------+---------------------------------+--------------------------+---------+ + | ID | Name | Status | Networks | Image | Flavor | + +--------------------------------------+---------------------+--------+---------------------------------+--------------------------+---------+ + | c0a152d7-376b-4500-95d4-7c768a3ff280 | vm-with-boot-volume | ACTIVE | int_net=192.168.0.75, 10.5.1.53 | N/A (booted from volume) | m1.tiny | + +--------------------------------------+---------------------+--------+---------------------------------+--------------------------+---------+ + +Write the test data to the block device: + +.. code-block:: none + + ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP + > echo "This is a test." > test.txt + > sync + > exit + +Failover +^^^^^^^^ + +As explained previously, when both sites are functional, prior to failover the +replicated volume should not be in use. Since the testing of the replicated +boot volume requires the VM to be rebuilt anyway (Cinder needs to give the +updated Ceph connection credentials to Nova) the easiest way forward is to +simply delete the VM: + +.. code-block:: none + + openstack server delete vm-with-boot-volume + +Like before, prior to failover, confirm that the images of all replicated +volumes in site-b are fully synchronised. Perform a check with the +ceph-rbd-mirror charm's ``status`` action as per `RBD image status`_: + +.. code-block:: none + + juju run-action --wait site-b-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume- + +If all images look good, perform the failover of site-a: + +.. code-block:: none + + cinder failover-host cinder@cinder-ceph-a + cinder service-list + +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+ + | Binary | Host | Zone | Status | State | Updated_at | Cluster | Disabled Reason | Backend State | + +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+ + | cinder-scheduler | cinder | nova | enabled | up | 2021-04-08T21:29:12.000000 | - | - | | + | cinder-volume | cinder@cinder-ceph-a | nova | disabled | up | 2021-04-08T21:29:12.000000 | - | failed-over | - | + | cinder-volume | cinder@cinder-ceph-b | nova | enabled | up | 2021-04-08T21:29:11.000000 | - | - | up | + +------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+ + +Verification +^^^^^^^^^^^^ + +Re-create the VM: + +.. code-block:: none + + openstack server create --volume vol-site-a-repl-boot --flavor m1.tiny \ + --key-name mykey --network int_net vm-with-boot-volume + + FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net) + openstack server add floating ip vm-with-boot-volume $FLOATING_IP + +Verify that the root device contains the expected data: + +.. code-block:: none + + ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP + > cat test.txt + This is a test. + > exit + +Failback +^^^^^^^^ + +Failback site-a and confirm the original health of Cinder services (as per +`Cinder service list`_): + +.. code-block:: none + + cinder failover-host cinder@cinder-ceph-a --backend_id default + cinder service-list + +Disaster recovery +----------------- + +An uncontrolled failover is known as the disaster recovery scenario. It is +characterised by the sudden failure of the primary Ceph cluster. See the +:ref:`Cinder volume replication - Disaster recovery +` page for more information. + +.. LINKS +.. _Ceph RBD mirroring: app-ceph-rbd-mirror.html +.. _openstack-base: https://github.com/openstack-charmers/openstack-bundles/blob/master/stable/openstack-base/bundle.yaml +.. _openstack-bundles: https://github.com/openstack-charmers/openstack-bundles/ +.. _bundle README: https://github.com/openstack-charmers/openstack-bundles/blob/master/stable/openstack-base/README.md +.. _cinder-volume-replication-overlay: cinder-volume-replication-overlay.html +.. _cinder-ceph: https://jaas.ai/cinder-ceph +.. _LP #1892201: https://bugs.launchpad.net/charm-ceph-rbd-mirror/+bug/1892201 diff --git a/deploy-guide/source/index.rst b/deploy-guide/source/index.rst index 485a6d8..7ee0b11 100644 --- a/deploy-guide/source/index.rst +++ b/deploy-guide/source/index.rst @@ -85,6 +85,7 @@ OpenStack Charms usage. To help improve it you can `file an issue`_ or app-erasure-coding app-rgw-multisite app-ceph-rbd-mirror + cinder-volume-replication app-manila-ganesha app-swift