From 7fcec920d2974120fef97f765c2dba4c7353d9d9 Mon Sep 17 00:00:00 2001 From: Juanita-Balaraj Date: Fri, 23 Apr 2021 16:59:39 -0400 Subject: [PATCH] Remote Redfish Subcloud Restore Fixed Merge conflicts Fixed review comments for patchset 8 Fixed review comments for patchset 7 Fixed review comments for Patchset 4 Moved restoring-subclouds-from-backupdata-using-dcmanager to the Distributed Cloud Guide Added missing files. Story: 2008573 Task: 42332 Signed-off-by: Juanita-Balaraj Change-Id: Ife0319125df38c54fb0baa79ac32070446a0d605 Signed-off-by: Juanita-Balaraj (cherry picked from commit e2e42814e6a18eb7cf61f047ec07628707a1ac33) Signed-off-by: Ron Stone --- ...n-aiosx-subcloud-to-an-aiodx-subcloud.rest | 0 doc/source/backup/.vscode/settings.json | 3 + ...ring-starlingx-system-data-and-storage.rst | 22 +- ...ore-playbook-locally-on-the-controller.rst | 25 ++- ...ning-ansible-restore-playbook-remotely.rst | 17 +- doc/source/dist_cloud/index.rst | 3 + ...an-aiosx-subcloud-to-an-aiodx-subcloud.rst | 196 ++++++++++++++++++ ...clouds-from-backupdata-using-dcmanager.rst | 113 ++++++++++ 8 files changed, 361 insertions(+), 18 deletions(-) create mode 100644 doc/source/_includes/migrate-an-aiosx-subcloud-to-an-aiodx-subcloud.rest create mode 100644 doc/source/backup/.vscode/settings.json create mode 100644 doc/source/dist_cloud/migrate-an-aiosx-subcloud-to-an-aiodx-subcloud.rst create mode 100644 doc/source/dist_cloud/restoring-subclouds-from-backupdata-using-dcmanager.rst diff --git a/doc/source/_includes/migrate-an-aiosx-subcloud-to-an-aiodx-subcloud.rest b/doc/source/_includes/migrate-an-aiosx-subcloud-to-an-aiodx-subcloud.rest new file mode 100644 index 000000000..e69de29bb diff --git a/doc/source/backup/.vscode/settings.json b/doc/source/backup/.vscode/settings.json new file mode 100644 index 000000000..3cce948f6 --- /dev/null +++ b/doc/source/backup/.vscode/settings.json @@ -0,0 +1,3 @@ +{ + "restructuredtext.confPath": "" +} \ No newline at end of file diff --git a/doc/source/backup/kubernetes/restoring-starlingx-system-data-and-storage.rst b/doc/source/backup/kubernetes/restoring-starlingx-system-data-and-storage.rst index 8dce62318..836f89db6 100644 --- a/doc/source/backup/kubernetes/restoring-starlingx-system-data-and-storage.rst +++ b/doc/source/backup/kubernetes/restoring-starlingx-system-data-and-storage.rst @@ -28,24 +28,34 @@ specific applications must be re-applied once a storage cluster is configured. To restore the data, use the same version of the boot image \(ISO\) that was used at the time of the original installation. -The |prod| restore supports two modes: +The |prod| restore supports the following optional modes: .. _restoring-starlingx-system-data-and-storage-ol-tw4-kvc-4jb: -#. To keep the Ceph cluster data intact \(false - default option\), use the - following syntax, when passing the extra arguments to the Ansible Restore +- To keep the Ceph cluster data intact \(false - default option\), use the + following parameter, when passing the extra arguments to the Ansible Restore playbook command: .. code-block:: none wipe_ceph_osds=false -#. To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will - need to be recreated, use the following syntax: +- To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will + need to be recreated, use the following parameter: .. code-block:: none - wipe_ceph_osds=true + wipe_ceph_osds=true + +- To indicate that the backup data file is under /opt/platform-backup + directory on the local machine, use the following parameter: + + .. code-block:: none + + on_box_data=true + + If this parameter is set to **false**, the Ansible Restore playbook expects + both the **initial_backup_dir** and **backup_filename** to be specified. Restoring a |prod| cluster from a backup file is done by re-installing the ISO on controller-0, running the Ansible Restore Playbook, applying updates diff --git a/doc/source/backup/kubernetes/running-restore-playbook-locally-on-the-controller.rst b/doc/source/backup/kubernetes/running-restore-playbook-locally-on-the-controller.rst index 8834c3f3a..23a70aa7a 100644 --- a/doc/source/backup/kubernetes/running-restore-playbook-locally-on-the-controller.rst +++ b/doc/source/backup/kubernetes/running-restore-playbook-locally-on-the-controller.rst @@ -18,22 +18,20 @@ following command to run the Ansible Restore playbook: ~(keystone_admin)]$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir= admin_password= wipe_ceph_osds=" -The |prod| restore supports two optional modes, keeping the Ceph cluster data -intact or wiping the Ceph cluster. - -.. rubric:: |proc| +The |prod| restore supports the following optional modes, keeping the Ceph +cluster data intact or wiping the Ceph cluster. .. _running-restore-playbook-locally-on-the-controller-steps-usl-2c3-pmb: -#. To keep the Ceph cluster data intact \(false - default option\), use the - following command: +- To keep the Ceph cluster data intact \(false - default option\), use the + following parameter: .. code-block:: none wipe_ceph_osds=false -#. To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will - need to be recreated, use the following command: +- To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will + need to be recreated, use the following parameter: .. code-block:: none @@ -50,12 +48,23 @@ intact or wiping the Ceph cluster. the patches and prompt you to reboot the system. Then you will need to re-run Ansible Restore playbook. +- To indicate that the backup data file is under /opt/platform-backup + directory on the local machine, use the following parameter: + + .. code-block:: none + + on_box_data=true + + If this parameter is set to **false**, the Ansible Restore playbook expects + both the **initial_backup_dir** and **backup_filename** to be specified. + .. rubric:: |postreq| After running restore\_platform.yml playbook, you can restore the local registry images. .. note:: + The backup file of the local registry images may be large. Restore the backed up file on the controller, where there is sufficient space. diff --git a/doc/source/backup/kubernetes/system-backup-running-ansible-restore-playbook-remotely.rst b/doc/source/backup/kubernetes/system-backup-running-ansible-restore-playbook-remotely.rst index ca19932e6..748c85b8e 100644 --- a/doc/source/backup/kubernetes/system-backup-running-ansible-restore-playbook-remotely.rst +++ b/doc/source/backup/kubernetes/system-backup-running-ansible-restore-playbook-remotely.rst @@ -51,18 +51,27 @@ In this method you can run Ansible Restore playbook and point to controller-0. where optional-extra-vars can be: - - **Optional**: You can select one of the two restore modes: + - **Optional**: You can select one of the following restore modes: - To keep Ceph data intact \(false - default option\), use the - following syntax: + following parameter: :command:`wipe_ceph_osds=false` - - Start with an empty Ceph cluster \(true\), to recreate a new - Ceph cluster, use the following syntax: + - To start with an empty Ceph cluster \(true\), where the Ceph + cluster will need to be recreated, use the following parameter: :command:`wipe_ceph_osds=true` + - To indicate that the backup data file is under /opt/platform-backup + directory on the local machine, use the following parameter: + + :command:`on_box_data=true` + + If this parameter is set to **false**, the Ansible Restore playbook + expects both the **initial_backup_dir** and **backup_filename** + to be specified. + - The backup\_filename is the platform backup tar file. It must be provided using the ``-e`` option on the command line, for example: diff --git a/doc/source/dist_cloud/index.rst b/doc/source/dist_cloud/index.rst index cfb3ddeed..48649f5aa 100644 --- a/doc/source/dist_cloud/index.rst +++ b/doc/source/dist_cloud/index.rst @@ -48,6 +48,9 @@ Operation managing-ldap-linux-user-accounts-on-the-system-controller changing-the-admin-password-on-distributed-cloud updating-docker-registry-credentials-on-a-subcloud + migrate-an-aiosx-subcloud-to-an-aiodx-subcloud + restoring-subclouds-from-backupdata-using-dcmanager + ---------------------- Manage Subcloud Groups diff --git a/doc/source/dist_cloud/migrate-an-aiosx-subcloud-to-an-aiodx-subcloud.rst b/doc/source/dist_cloud/migrate-an-aiosx-subcloud-to-an-aiodx-subcloud.rst new file mode 100644 index 000000000..d1abad68a --- /dev/null +++ b/doc/source/dist_cloud/migrate-an-aiosx-subcloud-to-an-aiodx-subcloud.rst @@ -0,0 +1,196 @@ + +.. _migrate-an-aiosx-subcloud-to-an-aiodx-subcloud: + +--------------------------------------- +Migrate an AIO-SX to an AIO-DX Subcloud +--------------------------------------- + +|release-caveat| + +.. rubric:: |context| + +You can migrate an |AIO-SX| subcloud to an |AIO-DX| subcloud without +reinstallation. This operation involves updating the system mode, adding the +|OAM| unit IP addresses of each controller, and installing the second controller. + +.. rubric:: |prereq| + +A distributed cloud system is setup with at least a system controller and an +|AIO-SX| subcloud. The subcloud must be online and managed by dcmanager. +Both the management network and cluster-host network need to be configured and +cannot be on the loopback interface. + +====================================== +Reconfigure the Cluster-Host Interface +====================================== + +If the cluster-host interface is on the loopback interface, use the following +procedure to reconfigure the cluster-host interface on to a physical interface. + +.. rubric:: |proc| + +#. Lock the active controller. + + .. code-block:: none + + ~(keystone_admin)$ system host-lock controller-0 + +#. Change the class attribute to 'none' for the loopback interface. + + .. code-block:: none + + ~(keystone_admin)$ system host-if-modify controller-0 lo -c none + +#. Delete the current cluster-host interface-network configuration + + .. code-block:: none + + ~(keystone_admin)$ IFNET_UUID=$(system interface-network-list controller-0 | awk '{if ($8 =="cluster-host") print $4;}') + ~(keystone_admin)$ system interface-network-remove $IFNET_UUID + +#. Assign the cluster-host network to the new interface. This example assumes + the interface name is mgmt0. + + .. code-block:: none + + ~(keystone_admin)$ system interface-network-assign controller-0 mgmt0 cluster-host + +.. rubric:: |postreq| + +Continue with the |AIO-SX| to |AIO-DX| subcloud migration, using one of the +following procedures: + +Use Ansible Playbook to Migrate a Subcloud from AIO-SX to AIO-DX, or +Manually Migrate a Subcloud from AIO-SX to AIO-DX. + + +.. _use-ansible-playbook-to-migrate-a-subcloud-from-AIO-SX-to-AIO-DX: + +================================================================ +Use Ansible Playbook to Migrate a Subcloud from AIO-SX to AIO-DX +================================================================ + +Use the following procedure to migrate a subcloud from |AIO-SX| to |AIO-DX| +using the ansible playbook. + +.. rubric:: |prereq| + +- the subcloud must be online and managed from the System Controller +- the subcloud's controller-0 may be locked or unlocked; the ansible playbook + will lock the subcloud controller-0 as part of migrating the subcloud + + +.. rubric:: |proc| + +#. Create a configuration file and specify the |OAM| unit IP addresses and + the ansible ssh password in the **migrate-subcloud1-overrides-EXAMPLE.yml** + file. The existing |OAM| IP address of the |AIO-SX| system will be used as + the |OAM| floating IP address of the new |AIO-DX| system. + + In the following example, 10.10.10.13 and 10.10.10.14 are the new |OAM| unit + IP addresses for controller-0 and controller-1 respectively. + + .. code-block:: none + + { + "ansible_ssh_pass": "St8rlingX*", + "external_oam_node_0_address": "10.10.10.13", + "external_oam_node_1_address": "10.10.10.14", + } + +#. On the system controller, run the ansible playbook to migrate the |AIO-SX| + subcloud to an |AIO-DX|. + + For example, if the subcloud name is 'subcloud1', enter: + + .. code-block:: none + + ~(keystone_admin)$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/migrate_sx_to_dx.yml -e @migrate-subcloud1-overrides-EXAMPLE.yml -i subcloud1, -v + + The ansible playbook will lock the subcloud's controller-0, if it not + already locked, apply the configuration changes to convert the subcloud to + an |AIO-DX| system with a single controller, and unlock controller-0. + Wait for the controller to reset and come back up to an operational state. + +#. Software install and configure the second controller for the subcloud. + + From the System Controller, reconfigure the subcloud, using dcmanager. + Specify the sysadmin password and the deployment configuration file, using + the :command:`dcmanager subcloud reconfig` command. + + .. code-block:: none + + ~(keystone_admin)$ dcmanager subcloud reconfig --sysadmin-password --deploy-config deployment-config-subcloud1-duplex.yaml + + where ** is assumed to be the login password and + ** is the name of the subcloud + + .. note:: + + ``--deploy-config`` must reference a deployment configuration file for + a |AIO-DX| subcloud. + + For example, **deployment-config-subcloud1-duplex.yaml** should only + include changes for controller-1 as changing fields for other nodes/ + resources may cause them to go out of sync. + +.. only:: partner + + .. include:: /_includes/migrate-an-aiosx-subcloud-to-an-aiodx-subcloud.rest + + +.. _manually-migrate-a-subcloud-from-AIO-SX-to-AIO-DX: + +================================================= +Manually Migrate a Subcloud from AIO-SX to AIO-DX +================================================= + +As an alternative to using the Ansible playbook, use the following procedure +to manually migrate a subcloud from |AIO-SX| to |AIO-DX|. Perform the following +commands on the |AIO-SX| subcloud. + +.. rubric:: |proc| + +#. If not already locked, lock the active controller. + + .. code-block:: none + + ~(keystone_admin)$ system host-lock controller-0 + +#. Change the system mode to 'duplex'. + + .. code-block:: none + + ~(keystone_admin)$ system modify --system_mode=duplex + +#. Add the |OAM| unit IP addresses of controller-0 and controller-1. + + For example, the |OAM| subnet is 10.10.10.0/24 and uses 10.10.10.13 and + 10.10.10.14 for the unit IP addresses of controller-0 and controller-1 + respectively. The existing |OAM| IP address of the |AIO-SX| system will be + used as the OAM floating IP address of the new |AIO-DX| system. + + .. note:: + + Only specifying oam_c0_ip and oam_c1_ip is necessary to configure the + OAM unit IPs to transition to Duplex. However, oam_c0_ip and oam_c1_ip + cannot equal the current or specified value for oam_floating_ip. + + .. code-block:: none + + ~(keystone_admin)$ system oam-modify oam_subnet=10.10.10.0/24 oam_gateway_ip=10.10.10.1 oam_floating_ip=10.10.10.12 oam_c0_ip=10.10.10.13 oam_c1_ip=10.10.10.14 + +#. Unlock the controller. + + .. code-block:: none + + ~(keystone_admin)$ system host-unlock controller-0 + + Wait for the controller to reset and come back up to an operational state. + +#. Software install and configure the second controller for the subcloud. + + For instructions on installing and configuring controller-1 in an + |AIO-DX| setup to continue with the migration, see |inst-doc|. + + diff --git a/doc/source/dist_cloud/restoring-subclouds-from-backupdata-using-dcmanager.rst b/doc/source/dist_cloud/restoring-subclouds-from-backupdata-using-dcmanager.rst new file mode 100644 index 000000000..a4d3f109f --- /dev/null +++ b/doc/source/dist_cloud/restoring-subclouds-from-backupdata-using-dcmanager.rst @@ -0,0 +1,113 @@ + +.. _restoring-subclouds-from-backupdata-using-dcmanager: + +========================================================= +Restoring a Subcloud From Backup Data Using DCManager CLI +========================================================= + +For subclouds with servers that support Redfish Virtual Media Service +(version 1.2 or higher), you can use the Central Cloud's CLI to restore the +subcloud from data that was backed up previously. + +.. rubric:: |context| + +The CLI command :command:`dcmanager subcloud restore` can be used to restore a +subcloud from available system data and bring it back to the operational state +it was in when the backup procedure took place. The subcloud restore has three +phases: + +- Re-install the controller-0 of the subcloud with the current active load + running in the SystemController. For subcloud servers that support + Redfish Virtual Media Service, this phase can be carried out remotely + as part of the CLI. + +- Run Ansible Platform Restore to restore |prod|, from a previous backup on + the controller-0 of the subcloud. This phase is also carried out as part + of the CLI. + +- Unlock the controller-0 of the subcloud and continue with the steps to + restore the remaining nodes of the subcloud where applicable. This phase + is carried out by the system administrator, see :ref:`Restoring Platform System Data and Storage `. + +.. rubric:: |prereq| + +- The SystemController is healthy, and ready to accept **dcmanager** related + commands. + +- The subcloud is unmanaged, and not in the process of installation, + bootstrap or deployment. + +- The platform backup tar file is already on the subcloud in + /opt/platform-backup directory or has been transferred to the + SystemController. + +- The subcloud install values have been saved in the **dcmanager** database + i.e. the subcloud has been installed remotely as part of :command:`dcmanager subcloud add`. + +.. rubric:: |proc| + +#. Create the restore_values.yaml file which will be passed to the + :command:`dcmanager subcloud restore` command using the ``--restore-values`` + option. This file contains parameters that will be used during the platform + restore phase. Minimally, the **backup_filename** parameter, indicating the + file containing a previous backup of the subcloud, must be specified in the + yaml file, see :ref:`Run Ansible Restore Playbook Remotely `, + and, :ref:`Run Restore Playbook Locally on the Controller `, + for supported restore parameters. + +#. Restore the subcloud, using the dcmanager CLI command, :command:`subcloud restore` + and specify the restore values, with the ``--with-install`` option and the + subcloud's sysadmin password. + + .. code-block:: none + + ~(keystone_admin) $ dcmanager subcloud restore --restore-values /home/sysadmin/subcloud1-restore.yaml --with-install --sysadmin-password subcloud-name-or-id + + Where: + + - ``--restore-values`` must reference the restore values yaml file + mentioned in Step 1 of this procedure. + + - ``--with-install`` indicates that a re-install of controller-0 of the + subcloud should be done remotely using Redfish Virtual Media Service. + + If the ``--sysadmin-password`` option is not specified, the system + administrator will be prompted for the password. The password is masked + when it is entered. Enter the sysadmin password for the subcloud. + The **dcmanager subcloud restore** can take up to 30 minutes to reinstall + and restore the platform on controller-0 of the subcloud. + +#. On the Central Cloud (SystemController), monitor the progress of the + subcloud reinstall and restore via the deploy status field of the + :command:`dcmanager subcloud list` command. + + .. code-block:: none + + ~(keystone_admin)]$ dcmanager subcloud list + + +----+-----------+------------+--------------+---------------+---------+ + | id | name | management | availability | deploy status | sync | + +----+-----------+------------+--------------+---------------+---------+ + | 1 | subcloud1 | unmanaged | online | installing | unknown | + +----+-----------+------------+--------------+---------------+---------+ + +#. In case of a failure, check the Ansible log for the corresponding subcloud + under /var/log/dcmanager/ansible directory. + +#. When the subcloud deploy status changes to "complete", the controller-0 + is ready to be unlocked. Log into the controller-0 of the subcloud using + its bootstrap IP and unlock the host using the following command. + + .. code-block:: none + + ~(keystone_admin)]$ system host-unlock controller-0 + +#. For |AIO|-DX and Standard subclouds, follow the procedure, + see :ref:`Restoring Platform System Data and Storage ` + to restore the rest of the subcloud nodes. + +#. To resume subcloud audit, use the following command. + + .. code-block:: none + + ~(keystone_admin)]$ dcmanager subcloud manage