Add Distributed Cloud GEO Redundancy docs (r9, dsr8MR3)
- Overview of the feature - Procedure of the feature configuration Story: 2010852 Task: 48493 Change-Id: If5fd6792adbb7e77ab2e92f29527c951be0134ee Signed-off-by: Litao Gao <litao.gao@windriver.com> Signed-off-by: Ngairangbam Mili <ngairangbam.mili@windriver.com>
This commit is contained in:
parent
a34b9d46c8
commit
2a75cb0a7a
@ -23,6 +23,8 @@
|
|||||||
.. |os-prod-hor| replace:: OpenStack |prod-hor|
|
.. |os-prod-hor| replace:: OpenStack |prod-hor|
|
||||||
.. |prod-img| replace:: https://mirror.starlingx.windriver.com/mirror/starlingx/
|
.. |prod-img| replace:: https://mirror.starlingx.windriver.com/mirror/starlingx/
|
||||||
.. |prod-abbr| replace:: StX
|
.. |prod-abbr| replace:: StX
|
||||||
|
.. |prod-dc-geo-red| replace:: Distributed Cloud Geo Redundancy
|
||||||
|
.. |prod-dc-geo-red-long| replace:: Distributed Cloud System controller Geographic Redundancy
|
||||||
|
|
||||||
.. Guide names; will be formatted in italics by default.
|
.. Guide names; will be formatted in italics by default.
|
||||||
.. |node-doc| replace:: :title:`StarlingX Node Configuration and Management`
|
.. |node-doc| replace:: :title:`StarlingX Node Configuration and Management`
|
||||||
|
@ -16,6 +16,12 @@ system data backup file has been generated on the subcloud, it will be
|
|||||||
transferred to the system controller and stored at a dedicated central location
|
transferred to the system controller and stored at a dedicated central location
|
||||||
``/opt/dc-vault/backups/<subcloud-name>/<release-version>``.
|
``/opt/dc-vault/backups/<subcloud-name>/<release-version>``.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Enabling the GEO Redundancy function will affect some of the subcloud
|
||||||
|
backup functions. For more information on GEO Redundancy and its
|
||||||
|
restrictions, see :ref:`configure-distributed-cloud-system-controller-geo-redundancy-e3a31d6bf662`.
|
||||||
|
|
||||||
Backup data creation requires the subcloud to be online, managed, and in
|
Backup data creation requires the subcloud to be online, managed, and in
|
||||||
healthy state.
|
healthy state.
|
||||||
|
|
||||||
|
@ -0,0 +1,617 @@
|
|||||||
|
.. _configure-distributed-cloud-system-controller-geo-redundancy-e3a31d6bf662:
|
||||||
|
|
||||||
|
============================================================
|
||||||
|
Configure Distributed Cloud System Controller GEO Redundancy
|
||||||
|
============================================================
|
||||||
|
|
||||||
|
.. rubric:: |context|
|
||||||
|
|
||||||
|
You can configure a distributed cloud System Controller GEO Redundancy
|
||||||
|
using DC manager |CLI| commands.
|
||||||
|
|
||||||
|
System administrators can follow the procedures below to enable and
|
||||||
|
disable the GEO Redundancy feature.
|
||||||
|
|
||||||
|
.. Note::
|
||||||
|
|
||||||
|
In this release, the GEO Redundancy feature supports only two
|
||||||
|
distributed clouds in one protection group.
|
||||||
|
|
||||||
|
.. contents::
|
||||||
|
:local:
|
||||||
|
:depth: 1
|
||||||
|
|
||||||
|
---------------------
|
||||||
|
Enable GEO Redundancy
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
Set up a protection group for two distributed clouds, making these two
|
||||||
|
distributed clouds operational in 1+1 active GEO Redundancy mode.
|
||||||
|
|
||||||
|
For example, let us assume we have two distributed clouds, site A and site B.
|
||||||
|
When the operation is performed on site A, the local site is site A and the
|
||||||
|
peer site is site B. When the operation is performed on site B, the local
|
||||||
|
site is site B and the peer site is site A.
|
||||||
|
|
||||||
|
.. rubric:: |prereq|
|
||||||
|
|
||||||
|
The peer system controller's |OAM| network is accessible to each other and can
|
||||||
|
access the subclouds via both |OAM| and management networks.
|
||||||
|
|
||||||
|
For security of production system, it is important to ensure the safety and
|
||||||
|
identification of peer site queries. To meet this objective, it is essential to
|
||||||
|
have an HTTPS-based system API in place. This necessitates the presence of a
|
||||||
|
well-known and trusted |CA| to enable secure HTTPS communication between peers.
|
||||||
|
If you are using an internally trusted |CA|, ensure that the system trusts the |CA| by installing
|
||||||
|
its certificate with the following command.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)]$ system certificate-install --mode ssl_ca <trusted-ca-bundle-pem-file>
|
||||||
|
|
||||||
|
where:
|
||||||
|
|
||||||
|
``<trusted-ca-bundle-pem-file>``
|
||||||
|
is the path to the intermediate or Root |CA| certificate associated
|
||||||
|
with the |prod| REST API's Intermediate or Root |CA|-signed certificate.
|
||||||
|
|
||||||
|
.. rubric:: |proc|
|
||||||
|
|
||||||
|
You can enable the GEO Redundancy feature between site A and site B from the
|
||||||
|
command line. In this procedure, the subclouds managed by site A will be
|
||||||
|
configured to be managed by GEO Redundancy protection group that consists of site
|
||||||
|
A and site B. When site A is offline for some reasons, an alarm notifies the
|
||||||
|
administrator, who initiates the group based batch migration
|
||||||
|
to rehome the subclouds of site A to site B for centralized management.
|
||||||
|
|
||||||
|
Similarly, you can also configure the subclouds managed by site B to be
|
||||||
|
taken over by site A when site B is offline by following the same procedure where
|
||||||
|
site B is local site and site A is peer site.
|
||||||
|
|
||||||
|
#. Log in to the active controller node of site B and get the required
|
||||||
|
information about the site B to create a protection group.
|
||||||
|
|
||||||
|
* Unique |UUID| of the central cloud of the peer system controller
|
||||||
|
* URI of Keystone endpoint of peer system controller
|
||||||
|
* Gateway IP address of the management network of peer system controller
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site B
|
||||||
|
sysadmin@controller-0:~$ source /etc/platform/openrc
|
||||||
|
~(keystone_admin)]$ system show | grep -i uuid
|
||||||
|
| uuid | 223fcb30-909d-4edf-8c36-1aebc8e9bd4a |
|
||||||
|
|
||||||
|
~(keystone_admin)]$ openstack endpoint list --service keystone \
|
||||||
|
--interface public --region RegionOne -c URL
|
||||||
|
+-----------------------------+
|
||||||
|
| URL |
|
||||||
|
+-----------------------------+
|
||||||
|
| http://10.10.10.2:5000 |
|
||||||
|
+-----------------------------+
|
||||||
|
|
||||||
|
~(keystone_admin)]$ system host-route-list controller-0 | awk '{print $10}' | grep -v "^$"
|
||||||
|
gateway
|
||||||
|
10.10.27.1
|
||||||
|
|
||||||
|
#. Log in to the active controller node of the central cloud of site A. Create
|
||||||
|
a System Peer instance of site B on site A so that site A can access information of
|
||||||
|
site B.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site A
|
||||||
|
~(keystone_admin)]$ dcmanager system-peer add \
|
||||||
|
--peer-uuid 223fcb30-909d-4edf-8c36-1aebc8e9bd4a \
|
||||||
|
--peer-name siteB \
|
||||||
|
--manager-endpoint http://10.10.10.2:5000 \
|
||||||
|
--peer-controller-gateway-address 10.10.27.1
|
||||||
|
Enter the admin password for the system peer:
|
||||||
|
Re-enter admin password to confirm:
|
||||||
|
|
||||||
|
+----+--------------------------------------+-----------+-----------------------------+----------------------------+
|
||||||
|
| id | peer uuid | peer name | manager endpoint | controller gateway address |
|
||||||
|
+----+--------------------------------------+-----------+-----------------------------+----------------------------+
|
||||||
|
| 2 | 223fcb30-909d-4edf-8c36-1aebc8e9bd4a | siteB | http://10.10.10.2:5000 | 10.10.27.1 |
|
||||||
|
+----+--------------------------------------+-----------+-----------------------------+----------------------------+
|
||||||
|
|
||||||
|
#. Collect the information from site A.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site A
|
||||||
|
sysadmin@controller-0:~$ source /etc/platform/openrc
|
||||||
|
~(keystone_admin)]$ system show | grep -i uuid
|
||||||
|
~(keystone_admin)]$ openstack endpoint list --service keystone --interface public --region RegionOne -c URL
|
||||||
|
~(keystone_admin)]$ system host-route-list controller-0 | awk '{print $10}' | grep -v "^$"
|
||||||
|
|
||||||
|
#. Log in to the active controller node of the central cloud of site B. Create
|
||||||
|
a System Peer instance of site A on site B so that site B has information about site A.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site B
|
||||||
|
~(keystone_admin)]$ dcmanager system-peer add \
|
||||||
|
--peer-uuid 3963cb21-c01a-49cc-85dd-ebc1d142a41d \
|
||||||
|
--peer-name siteA \
|
||||||
|
--manager-endpoint http://10.10.11.2:5000 \
|
||||||
|
--peer-controller-gateway-address 10.10.25.1
|
||||||
|
Enter the admin password for the system peer:
|
||||||
|
Re-enter admin password to confirm:
|
||||||
|
|
||||||
|
#. Create a |SPG| for site A.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site A
|
||||||
|
~(keystone_admin)]$ dcmanager subcloud-peer-group add --peer-group-name group1
|
||||||
|
|
||||||
|
#. Add the subclouds needed for redundancy protection on site A.
|
||||||
|
|
||||||
|
Ensure that the subclouds bootstrap data is updated. The bootstrap data is
|
||||||
|
the data used to bootstrap the subcloud, which includes the |OAM| and
|
||||||
|
management network information, system controller gateway information, and docker
|
||||||
|
registry information to pull necessary images to bootstrap the system.
|
||||||
|
|
||||||
|
For an example of a typical bootstrap file, see :ref:`installing-and-provisioning-a-subcloud`.
|
||||||
|
|
||||||
|
#. Update the subcloud information with the bootstrap values.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
~(keystone_admin)]$ dcmanager subcloud update subcloud1 \
|
||||||
|
--bootstrap-address <Subcloud_OAM_IP_Address> \
|
||||||
|
--bootstrap-values <Path_of_Bootstrap-Value-File>
|
||||||
|
|
||||||
|
#. Update the subcloud information with the |SPG| created locally.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
~(keystone_admin)]$ dcmanager subcloud update <SiteA-Subcloud1-Name> \
|
||||||
|
--peer-group <SiteA-Subcloud-Peer-Group-ID-or-Name>
|
||||||
|
|
||||||
|
For example,
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
~(keystone_admin)]$ dcmanager subcloud update subcloud1 --peer-group group1
|
||||||
|
|
||||||
|
#. If you want to remove one subcloud from the |SPG|, run the
|
||||||
|
following command:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
~(keystone_admin)]$ dcmanager subcloud update <SiteA-Subcloud-Name> --peer-group none
|
||||||
|
|
||||||
|
For example,
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
~(keystone_admin)]$ dcmanager subcloud update subcloud1 --peer-group none
|
||||||
|
|
||||||
|
#. Check the subclouds that are under the |SPG|.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
~(keystone_admin)]$ dcmanager subcloud-peer-group list-subclouds <SiteA-Subcloud-Peer-Group-ID-or-Name>
|
||||||
|
|
||||||
|
#. Create an association between the System Peer and |SPG|.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site A
|
||||||
|
~(keystone_admin)]$ dcmanager peer-group-association add \
|
||||||
|
--system-peer-id <SiteB-System-Peer-ID> \
|
||||||
|
--peer-group-id <SiteA-System-Peer-Group1> \
|
||||||
|
--peer-group-priority <priority>
|
||||||
|
|
||||||
|
The ``peer-group-priority`` parameter can accept an integer value greater
|
||||||
|
than 0. It is used to set the priority of the |SPG|, which is
|
||||||
|
created in peer site using the peer site's dcmanager API during association
|
||||||
|
synchronization.
|
||||||
|
|
||||||
|
* The default priority in the |SPG| is 0 when it is created
|
||||||
|
in the local site.
|
||||||
|
|
||||||
|
* The smallest integer has the highest priority.
|
||||||
|
|
||||||
|
During the association creation, the |SPG| in the association
|
||||||
|
will be synchronized from the local site to the peer site, and the subclouds
|
||||||
|
belonging to the |SPG|.
|
||||||
|
|
||||||
|
Confirm that the local |SPG| and its subclouds have been synchronized
|
||||||
|
into site B with the same name.
|
||||||
|
|
||||||
|
* Show the association information just created in site A and ensure that
|
||||||
|
``sync_status`` is ``in-sync``.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site A
|
||||||
|
~(keystone_admin)]$ dcmanager peer-group-association list <Association-ID>
|
||||||
|
|
||||||
|
+----+---------------+----------------+---------+-----------------+---------------------+
|
||||||
|
| id | peer_group_id | system_peer_id | type | sync_status | peer_group_priority |
|
||||||
|
+----+---------------+----------------+---------+-----------------+---------------------+
|
||||||
|
| 1 | 1 | 2 | primary | in-sync | 2 |
|
||||||
|
+----+---------------+----------------+---------+-----------------+---------------------+
|
||||||
|
|
||||||
|
* Show ``subcloud-peer-group`` in site B and ensure that it has been created.
|
||||||
|
|
||||||
|
* List the subcloud in ``subcloud-peer-group`` in site B and ensure that all
|
||||||
|
the subclouds have been synchronized as secondary subclouds.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site B
|
||||||
|
~(keystone_admin)]$ dcmanager subcloud-peer-group show <SiteA-Subcloud-Peer-Group-Name>
|
||||||
|
~(keystone_admin)]$ dcmanager subcloud-peer-group list-subclouds <SiteA-Subcloud-Peer-Group-Name>
|
||||||
|
|
||||||
|
When you create the primary association on site A, a non-primary association
|
||||||
|
on site B will automatically be created to associate the synchronized |SPG|
|
||||||
|
from site A and the system peer pointing to site A.
|
||||||
|
|
||||||
|
You can check the association list to confirm if the non-primary association
|
||||||
|
was created on site B.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site B
|
||||||
|
~(keystone_admin)]$ dcmanager peer-group-association list
|
||||||
|
+----+---------------+----------------+-------------+-------------+---------------------+
|
||||||
|
| id | peer_group_id | system_peer_id | type | sync_status | peer_group_priority |
|
||||||
|
+----+---------------+----------------+-------------+-------------+---------------------+
|
||||||
|
| 2 | 26 | 1 | non-primary | in-sync | None |
|
||||||
|
+----+---------------+----------------+-------------+-------------+---------------------+
|
||||||
|
|
||||||
|
#. (Optional) Update the protection group related configuration.
|
||||||
|
|
||||||
|
After the peer group association has been created, you can still update the
|
||||||
|
related resources configured in the protection group:
|
||||||
|
|
||||||
|
* Update subcloud with bootstrap values
|
||||||
|
* Add subcloud(s) into the |SPG|
|
||||||
|
* Remove subcloud(s) from the |SPG|
|
||||||
|
|
||||||
|
After any of the above operations, ``sync_status`` is changed to ``out-of-sync``.
|
||||||
|
|
||||||
|
After the update has been completed, you need to use the :command:`sync`
|
||||||
|
command to push the |SPG| changes to the peer site that
|
||||||
|
keeps the |SPG| the same status.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site A
|
||||||
|
dcmanager peer-group-association sync <SiteA-Peer-Group-Association1-ID>
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
The :command:`dcmanager peer-group-association sync` command must be run
|
||||||
|
after any of the following changes:
|
||||||
|
|
||||||
|
- Subcloud is removed from the |SPG| for the subcloud name change.
|
||||||
|
|
||||||
|
- Subcloud is removed from the |SPG| for the subcloud management network
|
||||||
|
reconfiguration.
|
||||||
|
|
||||||
|
- Subcloud updates one or both of these parameters:
|
||||||
|
``--bootstrap-address``, ``--bootstrap-values parameters``.
|
||||||
|
|
||||||
|
Similarly, you need to check the information has been synchronized by
|
||||||
|
showing the association information just created in site A, ensuring that
|
||||||
|
``sync_status`` is ``in-sync``.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site A
|
||||||
|
~(keystone_admin)]$ dcmanager peer-group-association show <Association-ID>
|
||||||
|
|
||||||
|
+----+---------------+----------------+---------+-----------------+---------------------+
|
||||||
|
| id | peer_group_id | system_peer_id | type | sync_status | peer_group_priority |
|
||||||
|
+----+---------------+----------------+---------+-----------------+---------------------+
|
||||||
|
| 1 | 1 | 2 | primary | in-sync | 2 |
|
||||||
|
+----+---------------+----------------+---------+-----------------+---------------------+
|
||||||
|
|
||||||
|
.. rubric:: |result|
|
||||||
|
|
||||||
|
You have configured a GEO Redundancy protection group between site A and site B.
|
||||||
|
If site A is offline, the subclouds configured in the |SPG| can be
|
||||||
|
migrated in batch to site B for central management manually.
|
||||||
|
|
||||||
|
----------------------------
|
||||||
|
Health Monitor and Migration
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
Peer monitoring and alarming
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
After the peer protection group is formed, if site A cannot be connected to
|
||||||
|
site B, there will be an alarm message on site B.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site B
|
||||||
|
~(keystone_admin)]$ fm alarm-list
|
||||||
|
+----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
|
||||||
|
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
|
||||||
|
+----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
|
||||||
|
| 280.004 | Peer siteA is in disconnected state. Following subcloud peer groups are impacted: group1. | peer=223fcb30-909d-4edf- | major | 2023-08-18T10:25:29. |
|
||||||
|
| | | 8c36-1aebc8e9bd4a | | 670977 |
|
||||||
|
| | | | | |
|
||||||
|
+----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
|
||||||
|
|
||||||
|
Administrator can suppress the alarm with the following command:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site B
|
||||||
|
~(keystone_admin)]$ fm event-suppress --alarm_id 280.004
|
||||||
|
+----------+------------+
|
||||||
|
| Event ID | Status |
|
||||||
|
+----------+------------+
|
||||||
|
| 280.004 | suppressed |
|
||||||
|
+----------+------------+
|
||||||
|
|
||||||
|
---------
|
||||||
|
Migration
|
||||||
|
---------
|
||||||
|
|
||||||
|
If site A is down, after receiving the alarming message the administrator
|
||||||
|
can choose to perform the migration on site B, which will migrate the
|
||||||
|
subclouds under the |SPG| from site A to site B.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Before initiating the migration operation, ensure that ``sync-status`` of the
|
||||||
|
peer group association is ``in-sync`` so that the latest updates from site A
|
||||||
|
have been successfully synchronized to site B. If ``sync_status`` is not
|
||||||
|
``in-sync``, the migration may fail.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site B
|
||||||
|
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate <Subcloud-Peer-Group-ID-or-Name>
|
||||||
|
|
||||||
|
# For example:
|
||||||
|
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate group1
|
||||||
|
|
||||||
|
During the batch migration, you can check the status of the migration of each
|
||||||
|
subcloud in the |SPG| by showing the details of the |SPG| being migrated.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site B
|
||||||
|
~(keystone_admin)]$ dcmanager subcloud-peer-group status <Subcloud-Peer-Group-ID-or-Name>
|
||||||
|
|
||||||
|
After successful migration, the subcloud(s) should be in
|
||||||
|
``managed/online/complete`` status on site B.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site B
|
||||||
|
~(keystone_admin)]$ dcmanager subcloud list
|
||||||
|
+----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
|
||||||
|
| id | name | management | availability | deploy status | sync | backup status | backup datetime |
|
||||||
|
+----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
|
||||||
|
| 45 | subcloud3-node2 | managed | online | complete | in-sync | None | None |
|
||||||
|
| 46 | subcloud1-node6 | managed | online | complete | in-sync | None | None |
|
||||||
|
+----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
|
||||||
|
|
||||||
|
--------------
|
||||||
|
Post Migration
|
||||||
|
--------------
|
||||||
|
|
||||||
|
If site A is restored, the subcloud(s) should be adjusted to
|
||||||
|
``unmanaged/secondary`` status in site A. The administrator can receive an
|
||||||
|
alarm on site A that notifies that the |SPG| is managed by a peer site (site
|
||||||
|
B), because this |SPG| on site A has the higher priority.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
~(keystone_admin)]$ fm alarm-list
|
||||||
|
+----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
|
||||||
|
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
|
||||||
|
+----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
|
||||||
|
| 280.005 | Subcloud peer group (peer_group_name=group1) is managed by remote system | subcloud_peer_group=7 | warning | 2023-09-04T04:51:58. |
|
||||||
|
| | (peer_uuid=223fcb30-909d-4edf-8c36-1aebc8e9bd4a) with lower priority. | | | 435539 |
|
||||||
|
| | | | | |
|
||||||
|
+----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
|
||||||
|
|
||||||
|
Then, the administrator can decide if and when to migrate the subcloud(s) back.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# On site A
|
||||||
|
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate <Subcloud-Peer-Group-ID-or-Name>
|
||||||
|
|
||||||
|
# For example:
|
||||||
|
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate group1
|
||||||
|
|
||||||
|
After successful migration, the subcloud status should be back to the
|
||||||
|
``managed/online/complete`` status.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
+----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
|
||||||
|
| id | name | management | availability | deploy status | sync | backup status | backup datetime |
|
||||||
|
+----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
|
||||||
|
| 33 | subcloud3-node2 | managed | online | complete | in-sync | None | None |
|
||||||
|
| 34 | subcloud1-node6 | managed | online | complete | in-sync | None | None |
|
||||||
|
+----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
|
||||||
|
|
||||||
|
Also, the alarm mentioned above will be cleared after migrating back.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
~(keystone_admin)]$ fm alarm-list
|
||||||
|
|
||||||
|
----------------------
|
||||||
|
Disable GEO Redundancy
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
You can disable the GEO Redundancy feature from the command line.
|
||||||
|
|
||||||
|
Ensure that you have a stable environment to disable the GEO Redundancy
|
||||||
|
feature, ensuring that the subclouds are managed by the expected site.
|
||||||
|
|
||||||
|
.. rubric:: |proc|
|
||||||
|
|
||||||
|
#. Delete the primary association on both the sites.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# site A
|
||||||
|
~(keystone_admin)]$ dcmanager peer-group-association delete <SiteA-Peer-Group-Association1-ID>
|
||||||
|
|
||||||
|
#. Delete the |SPG|.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# site A
|
||||||
|
~(keystone_admin)]$ dcmanager subcloud-peer-group delete group1
|
||||||
|
|
||||||
|
#. Delete the system peer.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# site A
|
||||||
|
~(keystone_admin)]$ dcmanager system-peer delete siteB
|
||||||
|
# site B
|
||||||
|
~(keystone_admin)]$ dcmanager system-peer delete siteA
|
||||||
|
|
||||||
|
.. rubric:: |result|
|
||||||
|
|
||||||
|
You have torn down the protection group between site A and site B.
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
Backup and Restore Subcloud
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
You can backup and restore a subcloud in a distributed cloud environment.
|
||||||
|
However, GEO redundancy does not support the replication of subcloud backup
|
||||||
|
files from one site to another.
|
||||||
|
|
||||||
|
A subcloud backup is valid only for the current system controller. When a
|
||||||
|
subcloud is migrated from site A to site B, the existing backup becomes
|
||||||
|
unavailable. In this case, you can create a new backup of that subcloud on site
|
||||||
|
B. Subsequently, you can restore the subcloud from this newly created backup
|
||||||
|
when it is managed under site B.
|
||||||
|
|
||||||
|
For information on how to backup and restore a subcloud, see
|
||||||
|
:ref:`backup-a-subcloud-group-of-subclouds-using-dcmanager-cli-f12020a8fc42`
|
||||||
|
and :ref:`restore-a-subcloud-group-of-subclouds-from-backup-data-using-dcmanager-cli-f10c1b63a95e`.
|
||||||
|
|
||||||
|
-------------------------------------------
|
||||||
|
Operations Performed by Protected Subclouds
|
||||||
|
-------------------------------------------
|
||||||
|
|
||||||
|
The table below lists the operations that can/cannot be performed on the protected subclouds.
|
||||||
|
|
||||||
|
**Primary site**: The site where the |SPG| was created.
|
||||||
|
|
||||||
|
**Secondary site**: The peer site where the subclouds in the |SPG| can be migrated to.
|
||||||
|
|
||||||
|
**Protected subcloud**: The subcloud that belongs to a |SPG|.
|
||||||
|
|
||||||
|
**Local/Unprotected subcloud**: The subcloud that does not belong to any |SPG|.
|
||||||
|
|
||||||
|
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||||
|
| Operation | Allow (Y/N/Maybe) | Note |
|
||||||
|
+==========================================+==================================+=================================================================================================+
|
||||||
|
| Unmanage | N | Subcloud must be removed from the |SPG| before it can be manually unmanaged. |
|
||||||
|
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||||
|
| Manage | N | Subcloud must be removed from the |SPG| before it can be manually managed. |
|
||||||
|
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||||
|
| Delete | N | Subcloud must be removed from the |SPG| before it can be manually unmanaged |
|
||||||
|
| | | and deleted. |
|
||||||
|
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||||
|
| Update | Maybe | Subcloud can only be updated while it is managed in the primary site because the sync command |
|
||||||
|
| | | can only be issued from the system controller where the |SPG| was created. |
|
||||||
|
| | | |
|
||||||
|
| | | .. warning:: |
|
||||||
|
| | | |
|
||||||
|
| | | The subcloud network cannot be reconfigured while it is being managed by the secondary |
|
||||||
|
| | | site. If this operation is necessary, perform the following steps: |
|
||||||
|
| | | |
|
||||||
|
| | | #. Remove the subcloud from the |SPG| to make it a local/unprotected |
|
||||||
|
| | | subcloud. |
|
||||||
|
| | | #. Update the subcloud. |
|
||||||
|
| | | #. (Optional) Manually rehome the subcloud to the primary site after it is restored. |
|
||||||
|
| | | #. (Optional) Re-add the subcloud to the |SPG|. |
|
||||||
|
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||||
|
| Rename | Yes | - If the subcloud in the primary site is already a part of |SPG|, we need to remove it from the |
|
||||||
|
| | | |SPG| and then unmanage, rename, and manage the subcloud, and add it back to |SPG| and perform|
|
||||||
|
| | | the sync operation. |
|
||||||
|
| | | |
|
||||||
|
| | | - If the subcloud is in the secondary site, perform the following steps: |
|
||||||
|
| | | |
|
||||||
|
| | | #. Remove the subcloud from the |SPG| to make it a local/unprotected subcloud. |
|
||||||
|
| | | |
|
||||||
|
| | | #. Unmange the subcloud. |
|
||||||
|
| | | |
|
||||||
|
| | | #. Rename the subcloud. |
|
||||||
|
| | | |
|
||||||
|
| | | #. (Optional) Manually rehome the subcloud to the primary site after it is restored. |
|
||||||
|
| | | |
|
||||||
|
| | | #. (Optional) Re-add the subcloud to the |SPG|. |
|
||||||
|
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||||
|
| Patch | Y | .. warning:: |
|
||||||
|
| | | |
|
||||||
|
| | | There may be a patch out-of-sync alarm when the subcloud is migrated to another site. |
|
||||||
|
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||||
|
| Upgrade | Y | All the system controllers in the protection group must be upgraded first before upgrading |
|
||||||
|
| | | any of the subclouds. |
|
||||||
|
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||||
|
| Rehome | N | Subcloud cannot be manually rehomed while being part of the |SPG| |
|
||||||
|
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||||
|
| Backup | Y | |
|
||||||
|
| | | |
|
||||||
|
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||||
|
| Restore | Maybe | - If the subcloud in the primary site is already a part of |SPG|, we need to remove it from the |
|
||||||
|
| | | |SPG| and then unmanage and restore the subcloud, and add it back to |SPG| and perform |
|
||||||
|
| | | the sync operation. |
|
||||||
|
| | | |
|
||||||
|
| | | - If the subcloud is in the secondary site, perform the following steps: |
|
||||||
|
| | | |
|
||||||
|
| | | #. Remove the subcloud from the |SPG| to make it a local/unprotected subcloud. |
|
||||||
|
| | | |
|
||||||
|
| | | #. Unmange the subcloud. |
|
||||||
|
| | | |
|
||||||
|
| | | #. Restore the subcloud from the backup. |
|
||||||
|
| | | |
|
||||||
|
| | | #. (Optional) Manually rehome the subcloud to the primary site after it is restored. |
|
||||||
|
| | | |
|
||||||
|
| | | #. (Optional) Re-add the subcloud to the |SPG|. |
|
||||||
|
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||||
|
| Prestage | Y | .. warning:: |
|
||||||
|
| | | |
|
||||||
|
| | | The prestage data will get overwritten because it is not guaranteed that both the system |
|
||||||
|
| | | controllers always run on the same patch level (ostree repo) and/or have the same images |
|
||||||
|
| | | list. |
|
||||||
|
| | | |
|
||||||
|
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||||
|
| Reinstall | Y | |
|
||||||
|
| | | |
|
||||||
|
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||||
|
| Remove from |SPG| | Maybe | Subcloud can be removed from the |SPG| in the primary site. Subcloud can |
|
||||||
|
| | | only be removed from the |SPG| in the secondary site if the primary site is |
|
||||||
|
| | | currently down. |
|
||||||
|
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||||
|
| Add to |SPG| | Maybe | Subcloud can only be added to the |SPG| in the primary site as manual sync is required. |
|
||||||
|
| | | |
|
||||||
|
| | | |
|
||||||
|
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
BIN
doc/source/dist_cloud/kubernetes/figures/dcg1695034653874.png
Normal file
BIN
doc/source/dist_cloud/kubernetes/figures/dcg1695034653874.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 34 KiB |
@ -175,6 +175,16 @@ Upgrade Orchestration for Distributed Cloud SubClouds
|
|||||||
failure-prior-to-the-installation-of-n-plus-1-load-on-a-subcloud
|
failure-prior-to-the-installation-of-n-plus-1-load-on-a-subcloud
|
||||||
failure-during-the-installation-or-data-migration-of-n-plus-1-load-on-a-subcloud
|
failure-during-the-installation-or-data-migration-of-n-plus-1-load-on-a-subcloud
|
||||||
|
|
||||||
|
--------------------------------------------------
|
||||||
|
Distributed Cloud System Controller GEO Redundancy
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
overview-of-distributed-cloud-geo-redundancy
|
||||||
|
configure-distributed-cloud-system-controller-geo-redundancy-e3a31d6bf662
|
||||||
|
|
||||||
--------
|
--------
|
||||||
Appendix
|
Appendix
|
||||||
--------
|
--------
|
||||||
|
@ -0,0 +1,118 @@
|
|||||||
|
|
||||||
|
.. eho1558617205547
|
||||||
|
.. _overview-of-distributed-cloud-geo-redundancy:
|
||||||
|
|
||||||
|
============================================
|
||||||
|
Overview of Distributed Cloud GEO Redundancy
|
||||||
|
============================================
|
||||||
|
|
||||||
|
|prod-long| |prod-dc-geo-red| configuration supports the ability to recover from
|
||||||
|
a catastrophic event that requires subclouds to be rehomed away from the failed
|
||||||
|
system controller site to the available site(s) which have enough spare capacity.
|
||||||
|
This way, even if the failed site cannot be restored in short time, the subclouds
|
||||||
|
can still be rehomed to available peer system controller(s) for centralized
|
||||||
|
management.
|
||||||
|
|
||||||
|
In this configuration, the following items are addressed:
|
||||||
|
|
||||||
|
* 1+1 GEO redundancy
|
||||||
|
|
||||||
|
- Active-Active redundancy model
|
||||||
|
- Total number of subcloud should not exceed 1K
|
||||||
|
|
||||||
|
* Automated operations
|
||||||
|
|
||||||
|
- Synchronization and liveness check between peer systems
|
||||||
|
- Alarm generation if peer system controller is down
|
||||||
|
|
||||||
|
* Manual operations
|
||||||
|
|
||||||
|
- Batch rehoming from alive peer system controller
|
||||||
|
|
||||||
|
---------------------------------------------
|
||||||
|
Distributed Cloud GEO Redundancy Architecture
|
||||||
|
---------------------------------------------
|
||||||
|
|
||||||
|
1+1 Distributed Cloud GEO Redundancy Architecture consists of two local high
|
||||||
|
availability Distributed Cloud clusters. They are the mutual peers that form a
|
||||||
|
protection group illustrated in the figure below:
|
||||||
|
|
||||||
|
.. image:: figures/dcg1695034653874.png
|
||||||
|
|
||||||
|
The architecture features a synchronized distributed control plane for
|
||||||
|
geographic redundancy, where system peer instance is created in each local
|
||||||
|
Distributed Cloud cluster pointing to each other via keystone endpoints to
|
||||||
|
form a system protection group.
|
||||||
|
|
||||||
|
If the administrator wants the peer site to take over the subclouds where local
|
||||||
|
system controller is in failure state, |SPG| needs to be created and subclouds
|
||||||
|
need to be assigned to it. Then, a Peer Group Association needs to be created
|
||||||
|
to link the system peer and |SPG| together. The |SPG| information and the
|
||||||
|
subclouds in it will be synchronized to the peer site via the endpoint information
|
||||||
|
stored in system peer instance.
|
||||||
|
|
||||||
|
The peer sites do health checks via the endpoint information stored in the system peer
|
||||||
|
instance. If the local site detects that the peer site is not reachable,
|
||||||
|
it will raise an alarm to alert the administrator.
|
||||||
|
|
||||||
|
If the failed site cannot be restored quickly, the administrator needs to
|
||||||
|
initiate batch subcloud migration by performing migration on the |SPG| from the
|
||||||
|
healthy peer of the failed site.
|
||||||
|
|
||||||
|
When the failed site has been restored and is ready for service, administrator can
|
||||||
|
initiate the batch subcloud migration from the restored site to migrate back
|
||||||
|
all the subclouds in the |SPG| for geographic proximity.
|
||||||
|
|
||||||
|
**Protection Group** A group of peer sites, which is configured to monitor each
|
||||||
|
other and decide how to take over the subclouds (based on predefined |SPG|) if
|
||||||
|
any peer in the group fails.
|
||||||
|
|
||||||
|
**System Peer**
|
||||||
|
A logic entity, which is created in a system controller site. System controller
|
||||||
|
site uses the information (keystone endpoint, credential) stored in the system
|
||||||
|
peer for the health check and data synchronization.
|
||||||
|
|
||||||
|
**Subcloud Secondary Deploy State**
|
||||||
|
This is a newly introduced state for a subcloud. If a subcloud is in the secondary
|
||||||
|
deploy state, the subcloud instance is only a placeholder holding the configuration
|
||||||
|
parameters, which can be used to migrate the corresponding subcloud from the peer
|
||||||
|
site. After rehoming, the subcloud's state will be changed from secondary to complete,
|
||||||
|
and is managed by the local site. The subcloud instance on the peer site is changed to secondary.
|
||||||
|
|
||||||
|
**Subcloud Peer Group**
|
||||||
|
Group of locally managed subclouds, which is supposed to be duplicated into a
|
||||||
|
peer site as secondary subclouds. The |SPG| instance will also be created in
|
||||||
|
peer site and it will contain all the secondary subclouds just duplicated.
|
||||||
|
|
||||||
|
Multiple |SPGs| are supported and the membership of the |SPG| is decided by
|
||||||
|
administrator. This way, administrator can divide local subclouds into different groups.
|
||||||
|
|
||||||
|
|SPG| can be used to initiate subcloud batch migration. For example, when the
|
||||||
|
peer site has been detected to be down, and the local site is supposed to take
|
||||||
|
over the management of the subclouds in failed peer site, administrator can
|
||||||
|
perform |SPG| migration to migrate all the subclouds in the |SPG| to the local
|
||||||
|
site for centralized management.
|
||||||
|
|
||||||
|
**Subcloud Peer Group Priority**
|
||||||
|
The priority is an attribute of |SPG| instance, and the |SPG| is designed to be
|
||||||
|
synchronized to each peer sites in the protection group with different priority
|
||||||
|
value.
|
||||||
|
|
||||||
|
In a Protection Group, there can be multiple System Peers. The site which owns
|
||||||
|
the |SPG| with the highest priority (smallest value) is the
|
||||||
|
leader site, which needs to initiate the batch migration to take over the
|
||||||
|
subclouds grouped by the |SPG|.
|
||||||
|
|
||||||
|
**Subcloud Peer Group and System Peer Association**
|
||||||
|
Association refers to the binding relationship between |SPG| and system peer.
|
||||||
|
When the association between a |SPG| and system peer is created on the local site,
|
||||||
|
the |SPG| and the subclouds in the group will be duplicated to the peer site to
|
||||||
|
which the system peer in this association is pointing. This way, when the local
|
||||||
|
site is down, the peer site has enough information to initiate the |SPG| based batch
|
||||||
|
migration to take over the centralized management for subclouds previously
|
||||||
|
managed by the failed site.
|
||||||
|
|
||||||
|
One system peer can be associated with multiple |SPGs|. One |SPG| can be associated
|
||||||
|
with multiple system peers, with priority specified. This priority is used to
|
||||||
|
decide which |SPG| has the higher priority to take over the subclouds when batch migration
|
||||||
|
should be performed.
|
@ -17,6 +17,12 @@ controller using the rehoming playbook.
|
|||||||
The rehoming playbook does not work with freshly installed/bootstrapped
|
The rehoming playbook does not work with freshly installed/bootstrapped
|
||||||
subclouds.
|
subclouds.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Manual rehoming is not possible if a subcloud is included in an |SPG|.
|
||||||
|
Use the :command:`dcmanager subcloud-peer-group migrate` command for automatic
|
||||||
|
rehoming. To get more information, see :ref:`configure-distributed-cloud-system-controller-geo-redundancy-e3a31d6bf662`.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
The system time should be accurately configured on the system controllers
|
The system time should be accurately configured on the system controllers
|
||||||
@ -27,7 +33,7 @@ controller using the rehoming playbook.
|
|||||||
Do not rehome a subcloud if the RECONCILED status on the system resource or
|
Do not rehome a subcloud if the RECONCILED status on the system resource or
|
||||||
any host resource of the subcloud is FALSE. To check the RECONCILED status,
|
any host resource of the subcloud is FALSE. To check the RECONCILED status,
|
||||||
run the :command:`kubectl -n deployment get system` and :command:`kubectl -n deployment get hosts` commands.
|
run the :command:`kubectl -n deployment get system` and :command:`kubectl -n deployment get hosts` commands.
|
||||||
|
|
||||||
Use the following procedure to enable subcloud rehoming and to update the new
|
Use the following procedure to enable subcloud rehoming and to update the new
|
||||||
subcloud configuration (networking parameters, passwords, etc.) to be
|
subcloud configuration (networking parameters, passwords, etc.) to be
|
||||||
compatible with the new system controller.
|
compatible with the new system controller.
|
||||||
|
Loading…
Reference in New Issue
Block a user