Add Distributed Cloud GEO Redundancy docs (r9, dsr8MR3)

- Overview of the feature - Procedure of the feature configuration Story: 2010852 Task: 48493 Change-Id: If5fd6792adbb7e77ab2e92f29527c951be0134ee Signed-off-by: Litao Gao <litao.gao@windriver.com> Signed-off-by: Ngairangbam Mili <ngairangbam.mili@windriver.com> (cherry picked from commit 2a75cb0a7a)
2023-12-07 14:43:49 +08:00 · 2023-12-07 14:43:49 +08:00 · 899a3a29f2
commit 899a3a29f2
parent c8ed52834f
7 changed files with 760 additions and 1 deletions
--- a/doc/source/_vendor/vendor_strings.txt
+++ b/doc/source/_vendor/vendor_strings.txt
@ -23,6 +23,8 @@
 .. |os-prod-hor| replace:: OpenStack |prod-hor|
 .. |prod-img| replace:: https://mirror.starlingx.windriver.com/mirror/starlingx/
 .. |prod-abbr| replace:: StX
+.. |prod-dc-geo-red| replace:: Distributed Cloud Geo Redundancy
+.. |prod-dc-geo-red-long| replace:: Distributed Cloud System controller Geographic Redundancy

 .. Guide names; will be formatted in italics by default.
 .. |node-doc| replace:: :title:`StarlingX Node Configuration and Management`
--- a/doc/source/dist_cloud/kubernetes/backup-a-subcloud-group-of-subclouds-using-dcmanager-cli-f12020a8fc42.rst
+++ b/doc/source/dist_cloud/kubernetes/backup-a-subcloud-group-of-subclouds-using-dcmanager-cli-f12020a8fc42.rst
@ -16,6 +16,12 @@ system data backup file has been generated on the subcloud, it will be
 transferred to the system controller and stored at a dedicated central location
 ``/opt/dc-vault/backups/<subcloud-name>/<release-version>``.

+.. note::
+
+    Enabling the GEO Redundancy function will affect some of the subcloud
+    backup functions. For more information on GEO Redundancy and its
+    restrictions, see :ref:`configure-distributed-cloud-system-controller-geo-redundancy-e3a31d6bf662`.
+
 Backup data creation requires the subcloud to be online, managed, and in
 healthy state.

--- a/doc/source/dist_cloud/kubernetes/configure-distributed-cloud-system-controller-geo-redundancy-e3a31d6bf662.rst
+++ b/doc/source/dist_cloud/kubernetes/configure-distributed-cloud-system-controller-geo-redundancy-e3a31d6bf662.rst
@ -0,0 +1,617 @@
+.. _configure-distributed-cloud-system-controller-geo-redundancy-e3a31d6bf662:
+
+============================================================
+Configure Distributed Cloud System Controller GEO Redundancy
+============================================================
+
+.. rubric:: |context|
+
+You can configure a distributed cloud System Controller GEO Redundancy
+using DC manager |CLI| commands.
+
+System administrators can follow the procedures below to enable and
+disable the GEO Redundancy feature.
+
+.. Note::
+
+   In this release, the GEO Redundancy feature supports only two
+   distributed clouds in one protection group.
+
+.. contents::
+   :local:
+   :depth: 1
+
+---------------------
+Enable GEO Redundancy
+---------------------
+
+Set up a protection group for two distributed clouds, making these two
+distributed clouds operational in 1+1 active GEO Redundancy mode.
+
+For example, let us assume we have two distributed clouds, site A and site B.
+When the operation is performed on site A, the local site is site A and the
+peer site is site B. When the operation is performed on site B, the local
+site is site B and the peer site is site A.
+
+.. rubric:: |prereq|
+
+The peer system controller's |OAM| network is accessible to each other and can
+access the subclouds via both |OAM| and management networks.
+
+For security of production system, it is important to ensure the safety and
+identification of peer site queries. To meet this objective, it is essential to
+have an HTTPS-based system API in place. This necessitates the presence of a
+well-known and trusted |CA| to enable secure HTTPS communication between peers.
+If you are using an internally trusted |CA|, ensure that the system trusts the |CA| by installing
+its certificate with the following command.
+
+.. code-block:: none
+
+    ~(keystone_admin)]$ system certificate-install --mode ssl_ca <trusted-ca-bundle-pem-file>
+
+where:
+
+``<trusted-ca-bundle-pem-file>``
+    is the path to the intermediate or Root |CA| certificate associated
+    with the |prod| REST API's Intermediate or Root |CA|-signed certificate.
+
+.. rubric:: |proc|
+
+You can enable the GEO Redundancy feature between site A and site B from the
+command line. In this procedure, the subclouds managed by site A will be
+configured to be managed by GEO Redundancy protection group that consists of site
+A and site B. When site A is offline for some reasons, an alarm notifies the
+administrator, who initiates the group based batch migration
+to rehome the subclouds of site A to site B for centralized management.
+
+Similarly, you can also configure the subclouds managed by site B to be
+taken over by site A when site B is offline by following the same procedure where
+site B is local site and site A is peer site.
+
+#. Log in to the active controller node of site B and get the required
+   information about the site B to create a protection group.
+
+   * Unique |UUID| of the central cloud of the peer system controller
+   * URI of Keystone endpoint of peer system controller
+   * Gateway IP address of the management network of peer system controller
+
+   For example:
+
+   .. code-block:: bash
+
+      # On site B
+      sysadmin@controller-0:~$ source /etc/platform/openrc
+      ~(keystone_admin)]$ system show | grep -i uuid
+      | uuid | 223fcb30-909d-4edf-8c36-1aebc8e9bd4a |
+
+      ~(keystone_admin)]$ openstack endpoint list --service keystone \
+          --interface public --region RegionOne -c URL
+      +-----------------------------+
+      | URL                         |
+      +-----------------------------+
+      | http://10.10.10.2:5000      |
+      +-----------------------------+
+
+      ~(keystone_admin)]$ system host-route-list controller-0 | awk '{print $10}' | grep -v "^$"
+      gateway
+      10.10.27.1
+
+#. Log in to the active controller node of the central cloud of site A. Create
+   a System Peer instance of site B on site A so that site A can access information of
+   site B.
+
+   .. code-block:: bash
+
+      # On site A
+      ~(keystone_admin)]$ dcmanager system-peer add \
+          --peer-uuid 223fcb30-909d-4edf-8c36-1aebc8e9bd4a \
+          --peer-name siteB \
+          --manager-endpoint http://10.10.10.2:5000 \
+          --peer-controller-gateway-address 10.10.27.1
+      Enter the admin password for the system peer:
+      Re-enter admin password to confirm:
+
+      +----+--------------------------------------+-----------+-----------------------------+----------------------------+
+      | id | peer uuid                            | peer name | manager endpoint            | controller gateway address |
+      +----+--------------------------------------+-----------+-----------------------------+----------------------------+
+      |  2 | 223fcb30-909d-4edf-8c36-1aebc8e9bd4a | siteB     | http://10.10.10.2:5000      | 10.10.27.1                 |
+      +----+--------------------------------------+-----------+-----------------------------+----------------------------+
+
+#. Collect the information from site A.
+
+   .. code-block:: bash
+
+      # On site A
+      sysadmin@controller-0:~$ source /etc/platform/openrc
+      ~(keystone_admin)]$ system show | grep -i uuid
+      ~(keystone_admin)]$ openstack endpoint list --service keystone --interface public --region RegionOne -c URL
+      ~(keystone_admin)]$ system host-route-list controller-0 | awk '{print $10}' | grep -v "^$"
+
+#. Log in to the active controller node of the central cloud of site B. Create
+   a System Peer instance of site A on site B so that site B has information about site A.
+
+   .. code-block:: bash
+
+      # On site B
+      ~(keystone_admin)]$ dcmanager system-peer add \
+          --peer-uuid 3963cb21-c01a-49cc-85dd-ebc1d142a41d \
+          --peer-name siteA \
+          --manager-endpoint http://10.10.11.2:5000 \
+          --peer-controller-gateway-address 10.10.25.1
+      Enter the admin password for the system peer:
+      Re-enter admin password to confirm:
+
+#. Create a |SPG| for site A.
+
+   .. code-block:: bash
+
+      # On site A
+      ~(keystone_admin)]$ dcmanager subcloud-peer-group add --peer-group-name group1
+
+#. Add the subclouds needed for redundancy protection on site A.
+
+   Ensure that the subclouds bootstrap data is updated. The bootstrap data is
+   the data used to bootstrap the subcloud, which includes the |OAM| and
+   management network information, system controller gateway information, and docker
+   registry information to pull necessary images to bootstrap the system.
+
+   For an example of a typical bootstrap file, see :ref:`installing-and-provisioning-a-subcloud`.
+
+   #. Update the subcloud information with the bootstrap values.
+
+      .. code-block:: bash
+
+         ~(keystone_admin)]$ dcmanager subcloud update subcloud1 \
+            --bootstrap-address <Subcloud_OAM_IP_Address> \
+            --bootstrap-values <Path_of_Bootstrap-Value-File>
+
+   #. Update the subcloud information with the |SPG| created locally.
+
+      .. code-block:: bash
+
+         ~(keystone_admin)]$ dcmanager subcloud update <SiteA-Subcloud1-Name> \
+             --peer-group <SiteA-Subcloud-Peer-Group-ID-or-Name>
+
+      For example,
+
+      .. code-block:: bash
+
+         ~(keystone_admin)]$ dcmanager subcloud update subcloud1 --peer-group group1
+
+   #. If you want to remove one subcloud from the |SPG|, run the
+      following command:
+
+      .. code-block:: bash
+
+         ~(keystone_admin)]$ dcmanager subcloud update <SiteA-Subcloud-Name> --peer-group none
+
+      For example,
+
+      .. code-block:: bash
+
+         ~(keystone_admin)]$ dcmanager subcloud update subcloud1 --peer-group none
+
+   #. Check the subclouds that are under the |SPG|.
+
+      .. code-block:: bash
+
+         ~(keystone_admin)]$ dcmanager subcloud-peer-group list-subclouds <SiteA-Subcloud-Peer-Group-ID-or-Name>
+
+#. Create an association between the System Peer and |SPG|.
+
+   .. code-block:: bash
+
+      # On site A
+      ~(keystone_admin)]$ dcmanager peer-group-association add \
+          --system-peer-id <SiteB-System-Peer-ID> \
+          --peer-group-id <SiteA-System-Peer-Group1> \
+          --peer-group-priority <priority>
+
+   The ``peer-group-priority`` parameter can accept an integer value greater
+   than 0. It is used to set the priority of the |SPG|, which is
+   created in peer site using the peer site's dcmanager API during association
+   synchronization.
+
+   * The default priority in the |SPG| is 0 when it is created
+     in the local site.
+
+   * The smallest integer has the highest priority.
+
+   During the association creation, the |SPG| in the association
+   will be synchronized from the local site to the peer site, and the subclouds
+   belonging to the |SPG|.
+
+   Confirm that the local |SPG| and its subclouds have been synchronized
+   into site B with the same name.
+
+   * Show the association information just created in site A and ensure that
+     ``sync_status`` is ``in-sync``.
+
+     .. code-block:: bash
+
+        # On site A
+        ~(keystone_admin)]$ dcmanager peer-group-association list <Association-ID>
+
+        +----+---------------+----------------+---------+-----------------+---------------------+
+        | id | peer_group_id | system_peer_id | type    | sync_status     | peer_group_priority |
+        +----+---------------+----------------+---------+-----------------+---------------------+
+        |  1 |             1 |              2 | primary | in-sync         | 2                   |
+        +----+---------------+----------------+---------+-----------------+---------------------+
+
+   * Show ``subcloud-peer-group`` in site B and ensure that it has been created.
+
+   * List the subcloud in ``subcloud-peer-group`` in site B and ensure that all
+     the subclouds have been synchronized as secondary subclouds.
+
+     .. code-block:: bash
+
+        # On site B
+        ~(keystone_admin)]$ dcmanager subcloud-peer-group show <SiteA-Subcloud-Peer-Group-Name>
+        ~(keystone_admin)]$ dcmanager subcloud-peer-group list-subclouds <SiteA-Subcloud-Peer-Group-Name>
+
+   When you create the primary association on site A, a non-primary association
+   on site B will automatically be created to associate the synchronized |SPG|
+   from site A and the system peer pointing to site A.
+
+   You can check the association list to confirm if the non-primary association
+   was created on site B.
+
+   .. code-block:: bash
+
+      # On site B
+      ~(keystone_admin)]$ dcmanager peer-group-association list
+      +----+---------------+----------------+-------------+-------------+---------------------+
+      | id | peer_group_id | system_peer_id | type        | sync_status | peer_group_priority |
+      +----+---------------+----------------+-------------+-------------+---------------------+
+      |  2 |            26 |              1 | non-primary | in-sync     | None                |
+      +----+---------------+----------------+-------------+-------------+---------------------+
+
+#. (Optional) Update the protection group related configuration.
+
+   After the peer group association has been created, you can still update the
+   related resources configured in the protection group:
+
+   * Update subcloud with bootstrap values
+   * Add subcloud(s) into the |SPG|
+   * Remove subcloud(s) from the |SPG|
+
+   After any of the above operations, ``sync_status`` is changed to ``out-of-sync``.
+
+   After the update has been completed, you need to use the :command:`sync`
+   command to push the |SPG| changes to the peer site that
+   keeps the |SPG| the same status.
+
+   .. code-block:: bash
+
+      # On site A
+      dcmanager peer-group-association sync <SiteA-Peer-Group-Association1-ID>
+
+   .. warning::
+
+       The :command:`dcmanager peer-group-association sync` command must be run
+       after any of the following changes:
+
+       - Subcloud is removed from the |SPG| for the subcloud name change.
+
+       - Subcloud is removed from the |SPG| for the subcloud management network
+         reconfiguration.
+
+       - Subcloud updates one or both of these parameters:
+         ``--bootstrap-address``, ``--bootstrap-values parameters``.
+
+   Similarly, you need to check the information has been synchronized by
+   showing the association information just created in site A, ensuring that
+   ``sync_status`` is ``in-sync``.
+
+   .. code-block:: bash
+
+      # On site A
+      ~(keystone_admin)]$ dcmanager peer-group-association show <Association-ID>
+
+       +----+---------------+----------------+---------+-----------------+---------------------+
+       | id | peer_group_id | system_peer_id | type    | sync_status     | peer_group_priority |
+       +----+---------------+----------------+---------+-----------------+---------------------+
+       |  1 |             1 |              2 | primary | in-sync         | 2                   |
+       +----+---------------+----------------+---------+-----------------+---------------------+
+
+.. rubric:: |result|
+
+You have configured a GEO Redundancy protection group between site A and site B.
+If site A is offline, the subclouds configured in the |SPG| can be
+migrated in batch to site B for central management manually.
+
+----------------------------
+Health Monitor and Migration
+----------------------------
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Peer monitoring and alarming
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+After the peer protection group is formed, if site A cannot be connected to
+site B, there will be an alarm message on site B.
+
+For example:
+
+.. code-block:: bash
+
+   # On site B
+   ~(keystone_admin)]$ fm alarm-list
+   +----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
+   | Alarm ID | Reason Text                                                                                                              | Entity ID                            | Severity | Time Stamp               |
+   +----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
+   | 280.004  | Peer siteA is in disconnected state. Following subcloud peer groups are impacted: group1.                                | peer=223fcb30-909d-4edf-             | major    | 2023-08-18T10:25:29.     |
+   |          |                                                                                                                          | 8c36-1aebc8e9bd4a                    |          | 670977                   |
+   |          |                                                                                                                          |                                      |          |                          |
+   +----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
+
+Administrator can suppress the alarm with the following command:
+
+.. code-block:: bash
+
+   # On site B
+   ~(keystone_admin)]$ fm event-suppress --alarm_id 280.004
+   +----------+------------+
+   | Event ID | Status     |
+   +----------+------------+
+   | 280.004  | suppressed |
+   +----------+------------+
+
+---------
+Migration
+---------
+
+If site A is down, after receiving the alarming message the administrator
+can choose to perform the migration on site B, which will migrate the
+subclouds under the |SPG| from site A to site B.
+
+.. note::
+
+    Before initiating the migration operation, ensure that ``sync-status`` of the
+    peer group association is ``in-sync`` so that the latest updates from site A
+    have been successfully synchronized to site B. If ``sync_status`` is not
+    ``in-sync``, the migration may fail.
+
+.. code-block:: bash
+
+   # On site B
+   ~(keystone_admin)]$ dcmanager subcloud-peer-group migrate <Subcloud-Peer-Group-ID-or-Name>
+
+   # For example:
+   ~(keystone_admin)]$ dcmanager subcloud-peer-group migrate group1
+
+During the batch migration, you can check the status of the migration of each
+subcloud in the |SPG| by showing the details of the |SPG| being migrated.
+
+.. code-block:: bash
+
+   # On site B
+   ~(keystone_admin)]$ dcmanager subcloud-peer-group status <Subcloud-Peer-Group-ID-or-Name>
+
+After successful migration, the subcloud(s) should be in
+``managed/online/complete`` status on site B.
+
+For example:
+
+.. code-block:: bash
+
+   # On site B
+   ~(keystone_admin)]$ dcmanager subcloud list
+   +----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
+   | id | name                            | management | availability | deploy status | sync        | backup status | backup datetime |
+   +----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
+   | 45 | subcloud3-node2                 | managed    | online       | complete      | in-sync     | None          | None            |
+   | 46 | subcloud1-node6                 | managed    | online       | complete      | in-sync     | None          | None            |
+   +----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
+
+--------------
+Post Migration
+--------------
+
+If site A is restored, the subcloud(s) should be adjusted to
+``unmanaged/secondary`` status in site A. The administrator can receive an
+alarm on site A that notifies that the |SPG| is managed by a peer site (site
+B), because this |SPG| on site A has the higher priority.
+
+.. code-block:: bash
+
+   ~(keystone_admin)]$ fm alarm-list
+   +----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
+   | Alarm ID | Reason Text                                                                                                             | Entity ID                        | Severity | Time Stamp            |
+   +----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
+   | 280.005  | Subcloud peer group (peer_group_name=group1)                                              is managed by remote system   | subcloud_peer_group=7            | warning  | 2023-09-04T04:51:58.  |
+   |          | (peer_uuid=223fcb30-909d-4edf-8c36-1aebc8e9bd4a) with lower priority.                                                   |                                  |          | 435539                |
+   |          |                                                                                                                         |                                  |          |                       |
+   +----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
+
+Then, the administrator can decide if and when to migrate the subcloud(s) back.
+
+.. code-block:: bash
+
+   # On site A
+   ~(keystone_admin)]$ dcmanager subcloud-peer-group migrate <Subcloud-Peer-Group-ID-or-Name>
+
+   # For example:
+   ~(keystone_admin)]$ dcmanager subcloud-peer-group migrate group1
+
+After successful migration, the subcloud status should be back to the
+``managed/online/complete`` status.
+
+For example:
+
+.. code-block:: bash
+
+   +----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
+   | id | name                            | management | availability | deploy status | sync    | backup status | backup datetime |
+   +----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
+   | 33 | subcloud3-node2                 | managed    | online       | complete      | in-sync | None          | None            |
+   | 34 | subcloud1-node6                 | managed    | online       | complete      | in-sync | None          | None            |
+   +----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
+
+Also, the alarm mentioned above will be cleared after migrating back.
+
+.. code-block:: bash
+
+   ~(keystone_admin)]$ fm alarm-list
+
+----------------------
+Disable GEO Redundancy
+----------------------
+
+You can disable the GEO Redundancy feature from the command line.
+
+Ensure that you have a stable environment to disable the GEO Redundancy
+feature, ensuring that the subclouds are managed by the expected site.
+
+.. rubric:: |proc|
+
+#. Delete the primary association on both the sites.
+
+   .. code-block:: bash
+
+      # site A
+      ~(keystone_admin)]$ dcmanager peer-group-association delete <SiteA-Peer-Group-Association1-ID>
+
+#. Delete the |SPG|.
+
+   .. code-block:: bash
+
+      # site A
+      ~(keystone_admin)]$ dcmanager subcloud-peer-group delete group1
+
+#. Delete the system peer.
+
+   .. code-block:: bash
+
+      # site A
+      ~(keystone_admin)]$ dcmanager system-peer delete siteB
+      # site B
+      ~(keystone_admin)]$ dcmanager system-peer delete siteA
+
+.. rubric:: |result|
+
+You have torn down the protection group between site A and site B.
+
+---------------------------
+Backup and Restore Subcloud
+---------------------------
+
+You can backup and restore a subcloud in a distributed cloud environment.
+However, GEO redundancy does not support the replication of subcloud backup
+files from one site to another.
+
+A subcloud backup is valid only for the current system controller. When a
+subcloud is migrated from site A to site B, the existing backup becomes
+unavailable. In this case, you can create a new backup of that subcloud on site
+B. Subsequently, you can restore the subcloud from this newly created backup
+when it is managed under site B.
+
+For information on how to backup and restore a subcloud, see
+:ref:`backup-a-subcloud-group-of-subclouds-using-dcmanager-cli-f12020a8fc42`
+and :ref:`restore-a-subcloud-group-of-subclouds-from-backup-data-using-dcmanager-cli-f10c1b63a95e`.
+
+-------------------------------------------
+Operations Performed by Protected Subclouds
+-------------------------------------------
+
+The table below lists the operations that can/cannot be performed on the protected subclouds.
+
+**Primary site**: The site where the |SPG| was created.
+
+**Secondary site**: The peer site where the subclouds in the |SPG| can be migrated to.
+
+**Protected subcloud**: The subcloud that belongs to a |SPG|.
+
+**Local/Unprotected subcloud**: The subcloud that does not belong to any |SPG|.
+
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
+| Operation                                |  Allow (Y/N/Maybe)               | Note                                                                                            |
+==========================================+==================================+=================================================================================================+
+| Unmanage                                 |  N                               | Subcloud must be removed from the |SPG| before it can be manually unmanaged.                    |
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
+| Manage                                   |  N                               | Subcloud must be removed from the |SPG| before it can be manually managed.                      |
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
+| Delete                                   |  N                               | Subcloud must be removed from the |SPG| before it can be manually unmanaged                     |
+|                                          |                                  | and deleted.                                                                                    |
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
+| Update                                   |  Maybe                           | Subcloud can only be updated while it is managed in the primary site because the sync command   |
+|                                          |                                  | can only be issued from the system controller where the |SPG| was created.                      |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  | .. warning::                                                                                    |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  |     The subcloud network cannot be reconfigured while it is being managed by the secondary      |
+|                                          |                                  |     site. If this operation is necessary, perform the following steps:                          |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  |     #. Remove the subcloud from the |SPG| to make it a local/unprotected                        |
+|                                          |                                  |        subcloud.                                                                                |
+|                                          |                                  |     #. Update the subcloud.                                                                     |
+|                                          |                                  |     #. (Optional) Manually rehome the subcloud to the primary site after it is restored.        |
+|                                          |                                  |     #. (Optional) Re-add the subcloud to the |SPG|.                                             |
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
+| Rename                                   |  Yes                             | - If the subcloud in the primary site is already a part of |SPG|, we need to remove it from the |
+|                                          |                                  |   |SPG| and then unmanage, rename, and manage the subcloud, and add it back to |SPG| and perform|
+|                                          |                                  |   the sync operation.                                                                           |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  | - If the subcloud is in the secondary site, perform the following steps:                        |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  |   #. Remove the subcloud from the |SPG| to make it a local/unprotected subcloud.                |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  |   #. Unmange the subcloud.                                                                      |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  |   #. Rename the subcloud.                                                                       |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  |   #. (Optional) Manually rehome the subcloud to the primary site after it is restored.          |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  |   #. (Optional) Re-add the subcloud to the |SPG|.                                               |
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
+| Patch                                    |  Y                               | .. warning::                                                                                    |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  |     There may be a patch out-of-sync alarm when the subcloud is migrated to another site.       |
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
+| Upgrade                                  |  Y                               | All the system controllers in the protection group must be upgraded first before upgrading      |
+|                                          |                                  | any of the subclouds.                                                                           |
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
+| Rehome                                   |  N                               | Subcloud cannot be manually rehomed while being part of the |SPG|                               |
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
+| Backup                                   |  Y                               |                                                                                                 |
+|                                          |                                  |                                                                                                 |
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
+| Restore                                  |  Maybe                           | - If the subcloud in the primary site is already a part of |SPG|, we need to remove it from the |
+|                                          |                                  |   |SPG| and then unmanage and restore the subcloud, and add it back to |SPG| and perform        |
+|                                          |                                  |   the sync operation.                                                                           |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  | - If the subcloud is in the secondary site, perform the following steps:                        |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  |   #. Remove the subcloud from the |SPG| to make it a local/unprotected subcloud.                |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  |   #. Unmange the subcloud.                                                                      |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  |   #. Restore the subcloud from the backup.                                                      |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  |   #. (Optional) Manually rehome the subcloud to the primary site after it is restored.          |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  |   #. (Optional) Re-add the subcloud to the |SPG|.                                               |
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
+| Prestage                                 |  Y                               | .. warning::                                                                                    |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  |     The prestage data will get overwritten because it is not guaranteed that both the system    |
+|                                          |                                  |     controllers always run on the same patch level (ostree repo) and/or have the same images    |
+|                                          |                                  |     list.                                                                                       |
+|                                          |                                  |                                                                                                 |
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
+| Reinstall                                |  Y                               |                                                                                                 |
+|                                          |                                  |                                                                                                 |
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
+| Remove from |SPG|                        |  Maybe                           | Subcloud can be removed from the |SPG| in the primary site. Subcloud can                        |
+|                                          |                                  | only be removed from the |SPG| in the secondary site if the primary site is                     |
+|                                          |                                  | currently down.                                                                                 |
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
+| Add to |SPG|                             |  Maybe                           | Subcloud can only be added to the |SPG| in the primary site as manual sync is required.         |
+|                                          |                                  |                                                                                                 |
+|                                          |                                  |                                                                                                 |
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
+
+
+
+
+
--- a/doc/source/dist_cloud/kubernetes/figures/dcg1695034653874.png
+++ b/doc/source/dist_cloud/kubernetes/figures/dcg1695034653874.png
--- a/doc/source/dist_cloud/kubernetes/index-dist-cloud-kub-95bef233eef0.rst
+++ b/doc/source/dist_cloud/kubernetes/index-dist-cloud-kub-95bef233eef0.rst
@ -175,6 +175,16 @@ Upgrade Orchestration for Distributed Cloud SubClouds
    failure-prior-to-the-installation-of-n-plus-1-load-on-a-subcloud
    failure-during-the-installation-or-data-migration-of-n-plus-1-load-on-a-subcloud

+--------------------------------------------------
+Distributed Cloud System Controller GEO Redundancy
+--------------------------------------------------
+
+.. toctree::
+    :maxdepth: 1
+
+    overview-of-distributed-cloud-geo-redundancy
+    configure-distributed-cloud-system-controller-geo-redundancy-e3a31d6bf662
+
 --------
 Appendix
 --------
--- a/doc/source/dist_cloud/kubernetes/overview-of-distributed-cloud-geo-redundancy.rst
+++ b/doc/source/dist_cloud/kubernetes/overview-of-distributed-cloud-geo-redundancy.rst
@ -0,0 +1,118 @@
+
+.. eho1558617205547
+.. _overview-of-distributed-cloud-geo-redundancy:
+
+============================================
+Overview of Distributed Cloud GEO Redundancy
+============================================
+
+|prod-long| |prod-dc-geo-red| configuration supports the ability to recover from
+a catastrophic event that requires subclouds to be rehomed away from the failed
+system controller site to the available site(s) which have enough spare capacity.
+This way, even if the failed site cannot be restored in short time, the subclouds
+can still be rehomed to available peer system controller(s) for centralized
+management.
+
+In this configuration, the following items are addressed:
+
+*  1+1 GEO redundancy
+
+   - Active-Active redundancy model
+   - Total number of subcloud should not exceed 1K
+
+*  Automated operations
+
+   - Synchronization and liveness check between peer systems
+   - Alarm generation if peer system controller is down
+
+*  Manual operations
+
+   - Batch rehoming from alive peer system controller
+
+---------------------------------------------
+Distributed Cloud GEO Redundancy Architecture
+---------------------------------------------
+
+1+1 Distributed Cloud GEO Redundancy Architecture consists of two local high
+availability Distributed Cloud clusters. They are the mutual peers that form a
+protection group illustrated in the figure below:
+
+.. image:: figures/dcg1695034653874.png
+
+The architecture features a synchronized distributed control plane for
+geographic redundancy, where system peer instance is created in each local
+Distributed Cloud cluster pointing to each other via keystone endpoints to
+form a system protection group.
+
+If the administrator wants the peer site to take over the subclouds where local
+system controller is in failure state, |SPG| needs to be created and subclouds
+need to be assigned to it. Then, a Peer Group Association needs to be created
+to link the system peer and |SPG| together. The |SPG| information and the
+subclouds in it will be synchronized to the peer site via the endpoint information
+stored in system peer instance.
+
+The peer sites do health checks via the endpoint information stored in the system peer
+instance. If the local site detects that the peer site is not reachable,
+it will raise an alarm to alert the administrator.
+
+If the failed site cannot be restored quickly, the administrator needs to
+initiate batch subcloud migration by performing migration on the |SPG| from the
+healthy peer of the failed site.
+
+When the failed site has been restored and is ready for service, administrator can
+initiate the batch subcloud migration from the restored site to migrate back
+all the subclouds in the |SPG| for geographic proximity.
+
+**Protection Group** A group of peer sites, which is configured to monitor each
+other and decide how to take over the subclouds (based on predefined |SPG|) if
+any peer in the group fails.
+
+**System Peer**
+A logic entity, which is created in a system controller site. System controller
+site uses the information (keystone endpoint, credential) stored in the system
+peer for the health check and data synchronization.
+
+**Subcloud Secondary Deploy State**
+This is a newly introduced state for a subcloud. If a subcloud is in the secondary
+deploy state, the subcloud instance is only a placeholder holding the configuration
+parameters, which can be used to migrate the corresponding subcloud from the peer
+site. After rehoming, the subcloud's state will be changed from secondary to complete,
+and is managed by the local site. The subcloud instance on the peer site is changed to secondary.
+
+**Subcloud Peer Group**
+Group of locally managed subclouds, which is supposed to be duplicated into a
+peer site as secondary subclouds. The |SPG| instance will also be created in
+peer site and it will contain all the secondary subclouds just duplicated.
+
+Multiple |SPGs| are supported and the membership of the |SPG| is decided by
+administrator. This way, administrator can divide local subclouds into different groups.
+
+|SPG| can be used to initiate subcloud batch migration. For example, when the
+peer site has been detected to be down, and the local site is supposed to take
+over the management of the subclouds in failed peer site, administrator can
+perform |SPG| migration to migrate all the subclouds in the |SPG| to the local
+site for centralized management.
+
+**Subcloud Peer Group Priority**
+The priority is an attribute of |SPG| instance, and the |SPG| is designed to be
+synchronized to each peer sites in the protection group with different priority
+value.
+
+In a Protection Group, there can be multiple System Peers. The site which owns
+the |SPG| with the highest priority (smallest value) is the
+leader site, which needs to initiate the batch migration to take over the
+subclouds grouped by the |SPG|.
+
+**Subcloud Peer Group and System Peer Association**
+Association refers to the binding relationship between |SPG| and system peer.
+When the association between a |SPG| and system peer is created on the local site,
+the |SPG| and the subclouds in the group will be duplicated to the peer site to
+which the system peer in this association is pointing. This way, when the local
+site is down, the peer site has enough information to initiate the |SPG| based batch
+migration to take over the centralized management for subclouds previously
+managed by the failed site.
+
+One system peer can be associated with multiple |SPGs|. One |SPG| can be associated
+with multiple system peers, with priority specified. This priority is used to
+decide which |SPG| has the higher priority to take over the subclouds when batch migration
+should be performed.
--- a/doc/source/dist_cloud/kubernetes/rehoming-a-subcloud.rst
+++ b/doc/source/dist_cloud/kubernetes/rehoming-a-subcloud.rst
@ -17,6 +17,12 @@ controller using the rehoming playbook.
    The rehoming playbook does not work with freshly installed/bootstrapped
    subclouds.

+.. note::
+
+    Manual rehoming is not possible if a subcloud is included in an |SPG|.
+    Use the :command:`dcmanager subcloud-peer-group migrate` command for automatic
+    rehoming. To get more information, see :ref:`configure-distributed-cloud-system-controller-geo-redundancy-e3a31d6bf662`.
+
 .. note::

    The system time should be accurately configured on the system controllers