diff --git a/doc/source/_includes/software-upload-output.rest b/doc/source/_includes/software-upload-output.rest new file mode 100644 index 000000000..756696adb --- /dev/null +++ b/doc/source/_includes/software-upload-output.rest @@ -0,0 +1,10 @@ +.. software-upload-begin +.. software-upload-end + +.. software-load-begin +.. software-load-end + +.. software-upload-precheck-begin +.. software-upload-precheck-end + + diff --git a/doc/source/_includes/upgrading-the-systemcontroller-using-the-cli.rest b/doc/source/_includes/upgrading-the-systemcontroller-using-the-cli.rest index b616af13e..7caa92d7d 100644 --- a/doc/source/_includes/upgrading-the-systemcontroller-using-the-cli.rest +++ b/doc/source/_includes/upgrading-the-systemcontroller-using-the-cli.rest @@ -14,3 +14,9 @@ .. deploymentmanager-begin .. deploymentmanager-end +.. manualupgrade1-begin +.. manualupgrade1-end + +.. manualupgrade2-begin +.. manualupgrade2-end + diff --git a/doc/source/dist_cloud/kubernetes/upgrading-the-systemcontroller-using-the-cli.rst b/doc/source/dist_cloud/kubernetes/upgrading-the-systemcontroller-using-the-cli.rst index 3e5a74ff7..b15308b52 100644 --- a/doc/source/dist_cloud/kubernetes/upgrading-the-systemcontroller-using-the-cli.rst +++ b/doc/source/dist_cloud/kubernetes/upgrading-the-systemcontroller-using-the-cli.rst @@ -5,10 +5,17 @@ Upgrade the System Controller Using the CLI =========================================== -You can upload and apply upgrades to the system controller in order to upgrade -the central repository, from the CLI. The system controller can be upgraded -using either a manual software upgrade procedure or by using the -non-distributed systems :command:`sw-manager` orchestration procedure. +You can upload and apply a software upgrade (deploy a major release or patched +major Release) to the system controller, using the CLI. The software upgrade +not only upgrades software of the system controller but also updates software +in the system controller's |prod-dc| vault and the central container image +repository, in support of subsequent subcloud upgrades. + +The system controller can be upgraded using either a :ref:`manual software +upgrade ` or by using the +standalone cloud :ref:`orchestrated software upgraded procedure +` with +:command:`sw-manager`. .. rubric:: |context| @@ -16,9 +23,54 @@ Follow the steps below to manually upgrade the system controller: .. rubric:: |prereq| -- Validate the list of new images with the target release. If you are using a - private registry for installs/upgrades, you must populate your private - registry with the new images prior to bootstrap and/or patch application. +.. only:: starlingx + + - Transfer the ISO and signature files for the new major release (or new + patched major release) from the |prod-long| mirror + https://mirror.starlingx.cengn.ca/mirror/starlingx/release/latest_release/debian/monolithic/outputs/iso/ + to controller-0 (active controller). + + - Upgrade to a patched major release (patched ISO). + +.. only:: partner + + .. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest + :start-after: manualupgrade1-begin + :end-before: manualupgrade1-end + +.. only:: starlingx + + - If you are using a private registry (see the ``docker / *-registry`` + sections of `system service-parameter-list`), transfer the container + image versions associated with the new major release (or new patched + major release) using the list from |prod-long| mirror + https://mirror.starlingx.cengn.ca/mirror/starlingx/release/latest_release/debian/monolithic/outputs/docker-images/ + from docker.io to the private registry. + +.. only:: partner + + .. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest + :start-after: manualupgrade2-begin + :end-before: manualupgrade2-end + +- The platform issuer (system-local-ca) is required to have an RSA + certificate/private key pair before upgrading. If ``system-local-ca`` was + configured with a different type of certificate/private key, the deploy pre + check will fail with an informative message. In this case, the + :ref:`migrate-platform-certificates-to-use-cert-manager-c0b1727e4e5d` + procedure needs to be executed to reconfigure ``system-local-ca`` with the + RSA certificate/private key targeting the ``SystemController`` and all + subclouds. + +- If there are software updates for your current |prod| software release that + are required in order to upgrade to the new software release, these + patches/updates should be applied in a separate software deploy of the + patch release(s) (see :ref:`manual-host-software-deployment-ee17ec6f71a4`) + on the system controller. These patches/updates should also be applied in + an orchestrated software deploy of the subclouds (see + :ref:`orchestrated-deployment-host-software-deployment-d234754c7d20`) in + order to get patch current of all the systems before starting the upgrade + to the new major release on the |prod-dc| system. .. rubric:: |proc| @@ -37,39 +89,35 @@ Follow the steps below to manually upgrade the system controller: :start-after: license-begin :end-before: license-end -#. Transfer iso and signature files to controller-0 (active controller) and import the load. +#. Upload the load. - .. code-block:: none + .. only:: starlingx - ~(keystone_admin)]$ software --os-region-name SystemController upload --local .iso .sig - +-------------------------------+-------------------+ - | Uploaded File | Release | - +-------------------------------+-------------------+ - | starlingx-intel-x86-64-cd.iso | starlingx-24.09.0 | - +-------------------------------+-------------------+ + .. parsed-literal:: + + ~(keystone_admin)]$ software upload --local /full_path/.iso /full_path/.sig + +-------------------------------+--------------------------+ + | Uploaded File | Release | + +-------------------------------+--------------------------+ + | starlingx-intel-x86-64-cd.iso | stx-10.0.0 | + +-------------------------------+--------------------------+ + + .. only:: partner + + .. include:: /_includes/software-upload-output.rest + :start-after: software-upload-begin + :end-before: software-upload-end + + .. note:: + + Do not use ``--os-region-name SystemController`` proxy at this moment for + subcloud deployment. This step will be performed once the system + controller deploy is complete. .. note:: If you face any issue while importing the load, go to ``/var/log/software.log`` and examine the error messages. - .. note:: - This can take several minutes. After the system controller is successfully - upgraded, the old load (which is in imported state) should not be deleted - from load list otherwise the subcloud upgrade orchestration will fail - with an error. - -#. Apply any required software updates. After the update is installed ensure - controller-0 is active. - - The system controller as well as the subclouds must be 'patch current'. All - software updates related to your current |prod| software release must be - uploaded, applied, and installed. - - All software updates to the new |prod| release, only need to be uploaded - and applied. The install of these software updates will occur automatically - during the software upgrade procedure as the hosts are reset to load the - new release of software. - .. only:: partner .. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest @@ -81,8 +129,7 @@ Follow the steps below to manually upgrade the system controller: Check the current system health status, resolve any alarms and other issues reported by the :command:`software deploy precheck ` command then recheck the system health status to confirm that all **System Health** - fields are set to **OK**. "If the upgrade health query fails 'Boot Device - and Root file system Device' check as seen below:" + fields are set to **OK**. .. code-block:: none @@ -97,32 +144,29 @@ Follow the steps below to manually upgrade the system controller: All kubernetes control plane pods are ready: [OK] All kubernetes applications are in a valid state: [OK] All hosts are patch current: [OK] + Active kubernetes version [vX.XX.X] is a valid supported version: [OK] + Active controller is controller-0: [OK] + Installed license is valid: [OK] Valid upgrade path from release 22.12 to 24.09: [OK] Required patches are applied: [OK] - Where ```` is ``starlingx-24.09.0`` for above software upload - example, or it can be found out by running :command:`software list`. - The platform issuer (system-local-ca) is required to have an RSA - certificate/private key pair before upgrading. If ``system-local-ca`` was - configured with a different type of certificate/private key, the upgrade - pre check will fail with an informative message. In this case, the - :ref:`migrate-platform-certificates-to-use-cert-manager-c0b1727e4e5d` procedure - needs to be executed to reconfigure ``system-local-ca`` with the RSA - certificate/private key targeting the ``SystemController`` and all subclouds. + .. only:: starlingx - By default, the upgrade process cannot run and is not recommended to run + Where ```` is stx-10.0.0 for above software upload + example, or it can be found out by running :command:`software list`. + + .. only:: partner + + .. include:: /_includes/software-upload-output.rest + :start-after: software-upload-precheck-begin + :end-before: software-upload-precheck-end + + By default, the deploy process cannot run and is not recommended to run with active alarms present. It is strongly recommended that you clear your - system of all alarms before doing an upgrade. + system of all alarms before doing a deploy. - .. note:: - - Use the command :command:`system upgrade-start --force` to force the - upgrade process to start and ignore non-management-affecting alarms. - This should ONLY be done if these alarms do not cause an issue for the - upgrades process. - -#. Start the upgrade from controller-0. +#. Begin the deploy from controller-0. Make sure that controller-0 is the active controller, and you are logged into controller-0 as **sysadmin** and your present working directory is @@ -134,54 +178,34 @@ Follow the steps below to manually upgrade the system controller: +--------------+------------+------+--------------+ | From Release | To Release | RR | State | +--------------+------------+------+--------------+ - | 22.12.0 | 24.09.0 | True | deploy-start | + | 22.12.0 | 24.09.100 | True | deploy-start | +--------------+------------+------+--------------+ - When ``deploy start`` is complete: - - .. code-block:: none - - +--------------+------------+------+-------------------+ - | From Release | To Release | RR | State | - +--------------+------------+------+-------------------+ - | 22.12.0 | 24.09.0 | True | deploy-start-done | - +--------------+------------+------+-------------------+ - - This will make a copy of the system data to be used in the upgrade. - Configuration changes must not be made after this point, until the - upgrade is completed. - - The following upgrade state applies once this command is executed. Run the - :command:`system upgrade-show` command to verify the status of the upgrade. - - - - started: - - - State entered after :command:`system upgrade-start` completes. - - - Release system data (for example, postgres databases) has - been exported to be used in the upgrade. - - As part of the upgrade, the upgrade process checks the health of the system - and validates that the system is ready for an upgrade. - - The upgrade process checks that no alarms are active before starting an - upgrade. - .. note:: - Use the command :command:`system upgrade-start --force` to force the - upgrades process to start and to ignore management affecting alarms. - This should only be done if these alarms do not cause an issue for the - upgrades process. + It is recommended to run the :command:`software deploy precheck` + command before running :command:`software deploy start`. However, the + :command:`software deploy start` command will automatically run + the precheck command even if the precheck command has not been run + before. - The ``fm alarm-list --mgmt_affecting`` option provides specific alarms - which may be blocking an orchestrated upgrade. + Wait for :command:`software deploy start ` to complete by monitoring the + status of the deploy. - On systems with Ceph storage, it also checks that the Ceph cluster is - healthy. + .. code-block:: none -#. Upgrade controller-1. + ~(keystone_admin)]$ software deploy show + +--------------+------------+------+-------------------+ + | From Release | To Release | RR | State | + +--------------+------------+------+-------------------+ + | 22.12.0 | 24.09.100 | True | deploy-start-done | + +--------------+------------+------+-------------------+ + + :command:`software deploy start ` will migrate configuration + data to the new release's data model. Configuration must not be changed + after this point, until the deploy is completed. + +#. Software deploy controller-1. #. Lock controller-1. @@ -190,10 +214,7 @@ Follow the steps below to manually upgrade the system controller: ~(keystone_admin)]$ system host-lock controller-1 - #. Start the upgrade on controller-1. - - Controller-1 installs the update and reboots, then performs data - migration. + #. Begin the deploy on controller-1. .. code-block:: none @@ -211,29 +232,31 @@ Follow the steps below to manually upgrade the system controller: the DRBD sync **400.001** Services-related alarm has been raised and then cleared. - The **upgrading-controllers** state applies when this command is - run. This state is entered after controller-1 has been upgraded to - release nn.nn and data migration is successfully completed. + When the first :command:`software deploy host ` command is + issued after the deploy state becomes ``deploy-start-done``, the + software deploy show state is changed to ``deploy-host``. When the + software is deployed to all the hosts, that is, when the + :command:`software deploy host ` successfully completes + against the last host, the software deploy show state changes to + ``deploy-host-done``. - where *nn.nn* in the update file name is the |prod| release number. - - If it transitions to **unlocked-disabled-failed**, check the issue - before proceeding to the next step. The alarms may indicate a - configuration error. Check the result of the configuration logs on - controller-1, (for example, Error logs in - controller1:``/var/log/puppet``). + If software deploy show state transitions to + **unlocked-disabled-failed**, check the issue before proceeding to the + next step. The alarms may indicate a configuration error. Check the + result of the configuration logs on controller-1, (for example, Error + logs in controller-1:``/var/log/puppet``). #. Run the :command:`system application-list` and :command:`software deploy host-list` commands to view the current progress. - After controller-1 is unlocked/enabled/available, insert step to check + After controller-1 is unlocked/enabled/available, run the following step to check controller-1 is running the new release: .. code-block:: none ~(keystone_admin)]$ system host-show controller-1 -#. Set controller-1 as the active controller. Swact to controller-1. +#. Set controller-1 as the active controller. Swact away from controller-0. .. code-block:: none @@ -243,12 +266,7 @@ Follow the steps below to manually upgrade the system controller: proceeding to the next step. When all services on controller-1 are enabled-active, the swact is complete. - .. note:: - - Continue the remaining steps below to manually upgrade or use upgrade - orchestration to upgrade the remaining nodes. - -#. Upgrade controller-0. +#. Software deploy controller-0. For more information, see :ref:`introduction-platform-software-updates-upgrades-06d6de90bbd0`. @@ -259,57 +277,28 @@ Follow the steps below to manually upgrade the system controller: ~(keystone_admin)]$ system host-lock controller-0 - #. Upgrade controller-0. + #. Begin the deploy on controller-0. .. code-block:: none ~(keystone_admin)]$ software deploy host controller-0 - - .. note:: - - controller-0 must pxe-boot over the management network and its load - must be served from controller-1, and not from any external - pxe-boot server attached to the |OAM| network. To ensure this, - check that the network boot list/order of BIOS |NIC| is correct. + Running major release deployment, major_release=24.09, force=False, async_req=False, commit_id= #. Unlock controller-0. - .. code-block:: none + .. code-block:: none ~(keystone_admin)]$ system host-unlock controller-0 - .. code-block:: none - - ~(keystone_admin)]$ software deploy host controller-0 - - You may encounter the following error message: - - .. code-block:: none - - Expecting number of interface sriov_numvfs=16. Please wait a few - minutes for inventory update and retry host-unlock. - - If you see this error message, you need to retry after 5 minutes. - - Wait until the DRBD sync **400.001** Services-related alarm has been raised - and then cleared before proceeding to the next step. - - - - upgrading-hosts: - - - State entered when both controllers are running release - software. - - #. Check the system health to ensure that there are no unexpected alarms. .. code-block:: none ~(keystone_admin)]$ fm alarm-list - Clear all alarms unrelated to the upgrade process. + Clear all alarms unrelated to the deploy process. -#. If using Ceph storage backend, upgrade the storage nodes one at a time. +#. If using Ceph storage backend, deploy the storage nodes one at a time. The storage node must be locked and all |OSDs| must be down in order to do the upgrade. @@ -323,16 +312,32 @@ Follow the steps below to manually upgrade the system controller: #. Verify that the |OSDs| are down after the storage node is locked. - In the Horizon interface, navigate to **Admin** \> **Platform** \> - **Storage Overview** to view the status of the |OSDs|. + .. code-block:: none - #. Upgrade storage-0. + ~(keystone_admin)]$ ceph osd tree + +----+---------+------------+---------+-------------------+-------------+------------------+-------------+ + | ID | CLASS | WEIGHT | TYPE | NAME | STATUS | REWEIGHT | PRI-AFF | + +----+---------+------------+---------+-------------------+-------------+------------------+-------------+ + | -1 | | 0.01700 | root | storage-tier | | | | + +----+---------+------------+---------+-------------------+-------------+------------------+-------------+ + | -2 | | 0.01700 | chassis | group-0 | | | | + +----+---------+------------+---------+-------------------+-------------+------------------+-------------+ + | -4 | | 0.00850 | host | controller-0 | | | | + +----+---------+------------+---------+-------------------+-------------+------------------+-------------+ + | 0 | hdd | 0.00850 | | osd.0 | up | 1.00000 | 1.00000 | + +----+---------+------------+---------+-------------------+-------------+------------------+-------------+ + | -3 | | 0.00850 | host | controller-1 | | | | + +----+---------+------------+---------+-------------------+-------------+------------------+-------------+ + | 1 | hdd | 0.00850 | | osd.1 | down | 1.00000 | 1.00000 | + +----+---------+------------+---------+-------------------+-------------+------------------+-------------+ + + #. Begin the deploy on storage-0. .. code-block:: none ~(keystone_admin)]$ software deploy host storage-0 - The upgrade is complete when the node comes online, and at that point, + The deploy is complete when the node comes online, and at that point, you can safely unlock the node. After upgrading a storage node, but before unlocking, there are Ceph @@ -341,7 +346,7 @@ Follow the steps below to manually upgrade the system controller: (since the infrastructure network interface configuration has not been applied to the storage node yet, as it has not been unlocked). - Unlock the node as soon as the upgraded storage node comes online. + Unlock the node as soon as the deployed storage node comes online. #. Unlock storage-0. @@ -350,17 +355,17 @@ Follow the steps below to manually upgrade the system controller: ~(keystone_admin)]$ system host-unlock storage-0 Wait for all alarms to clear after the unlock before proceeding to - upgrade the next storage host. + deploy the next storage host. #. Repeat the above steps for each storage host. .. note:: - After upgrading the first storage node you can expect alarm + After deploying the first storage node you can expect alarm **800.003**. The alarm is cleared after all storage nodes are - upgraded. + deployed. -#. If worker nodes are present, upgrade worker hosts, serially or in parallel, +#. If worker nodes are present, deploy worker hosts, serially or in parallel, if any. @@ -370,7 +375,7 @@ Follow the steps below to manually upgrade the system controller: ~(keystone_admin)]$ system host-lock worker-0 - #. Upgrade worker-0. + #. Deploy worker-0. .. code-block:: none @@ -391,7 +396,7 @@ Follow the steps below to manually upgrade the system controller: #. Repeat the above steps for each worker host. -#. Set controller-0 as the active controller. Swact to controller-0. +#. Set controller-0 as the active controller. Swact away from controller-1. .. code-block:: none @@ -401,7 +406,7 @@ Follow the steps below to manually upgrade the system controller: proceeding to the next step. When all services on controller-0 are enabled-active, the swact is complete. -#. Activate the upgrade. +#. Activate the deploy. .. code-block:: none @@ -410,30 +415,32 @@ Follow the steps below to manually upgrade the system controller: Check deploy state: - .. code-block:: none + .. code-block:: none ~(keystone_admin)]$ software deploy show +--------------+------------+------+-----------------+ | From Release | To Release | RR | State | +--------------+------------+------+-----------------+ - | 22.12.0 | 24.09.0 | True | deploy-activate | + | 22.12.0 | 24.09.100 | True | deploy-activate | +--------------+------------+------+-----------------+ - When activate is complete: + Wait for :command:`software deploy activate` to complete by monitoring the + status of the deploy. - .. code-block:: none + .. code-block:: none + ~(keystone_admin)]$ software deploy show +--------------+------------+------+----------------------+ | From Release | To Release | RR | State | +--------------+------------+------+----------------------+ - | 22.12.0 | 24.09.0 | True | deploy-activate-done | + | 22.12.0 | 24.09.100 | True | deploy-activate-done | +--------------+------------+------+----------------------+ - During the running of the :command:`upgrade-activate` command, new + During the running of the :command:`software deploy activate` command, new configurations are applied to the controller. 250.001 (**hostname Configuration is out-of-date**) alarms are raised and are cleared as the - configuration is applied. The upgrade state goes from **activating** to - **activation-complete** once this is done. + configuration is applied. The deploy state goes from ``deploy-activate`` to + ``deploy-activate-done`` once this is done. .. only:: partner @@ -443,43 +450,19 @@ Follow the steps below to manually upgrade the system controller: The following states apply when this command is executed. - **activation-requested** - State entered when :command:`system upgrade-activate` is executed. + **deploy-activate** + State entered when deploy is being activated. - **activating** - State entered when we have started activating the upgrade by - applying new configurations to the controller and compute hosts. - - **activating-hosts** - State entered when applying host-specific configurations. This state is - entered only if needed. - - **activation-complete** - State entered when new configurations have been applied to all - controller and compute hosts. - - #. Check the status of the upgrade again to see it has reached - **activation-complete**, for example. - - .. code-block:: none - - ~(keystone_admin)]$ system upgrade-show - +--------------+--------------------------------------+ - | Property | Value | - +--------------+--------------------------------------+ - | uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 | - | state | activation-complete | - | from_release | nn.nn | - | to_release | nn.nn | - +--------------+--------------------------------------+ + **deploy-activate-done** + State entered when the deploy-activate completes successfully. .. note:: - This can take more than half an hour to complete. + This can take more than 15 minutes to complete. .. note:: - Alarms are generated as the subcloud load sync_status is "out-of-sync". + Alarms are generated as the subcloud software sync_status is "out-of-sync". #. Complete the upgrade. @@ -492,33 +475,62 @@ Follow the steps below to manually upgrade the system controller: .. code-block:: none - ~(keystone_admin)]$ software deploy show, - +--------------+------------+------+------------------+ - | From Release | To Release | RR | State | - +--------------+------------+------+------------------+ - | 22.12.0 | 24.09.0 | True | deploy-completed | - +--------------+------------+------+------------------+ + ~(keystone_admin)]$ software deploy show + +--------------+------------+------+-----------------------+ + | From Release | To Release | RR | State | + +--------------+------------+------+-----------------------+ + | 22.12.0 | 24.09.100 | True | deploy-completed | + +--------------+------------+------+-----------------------+ - Run the :command:`system upgrade-show` command, and the status will display - "no upgrade in progress". The subclouds will be out-of-sync. +#. Upgrade Kubernetes, after the platform deploy is completed. To upgrade + Kubernetes of standalone system, see :ref:`index-updates-kub-03d4d10fa0be`. -#. Upgrade Kubernetes, after deploy is completed. When Kubernetes upgrade - completes, conclude the deploy by deleting it. +#. When the Kubernetes upgrade completes, conclude the platform deploy by deleting + it. .. code-block:: none - ~(keystone_admin)]$ software deploy delete, output + ~(keystone_admin)]$ software deploy delete Deploy deleted with success Verify deploy state: - - .. code-block:: none - - ~(keystone_admin)]$ software deploy show, output + + .. code-block:: none + + ~(keystone_admin)]$ software deploy show No deploy in progress +#. Upload the load for subcloud deployment. + + .. only:: starlingx + + .. parsed-literal:: + + ~(keystone_admin)]$ software --os-region-name SystemController upload --local /full_path/.iso /full_path/.sig + +-------------------------------+--------------------------+ + | Uploaded File | Release | + +-------------------------------+--------------------------+ + | starlingx-intel-x86-64-cd.iso | stx-10.0.0 | + +-------------------------------+--------------------------+ + + .. only:: partner + + .. include:: /_includes/software-upload-output.rest + :start-after: software-load-begin + :end-before: software-load-end + +.. note:: + This can take a few minutes. After the system controller is successfully + deployed, the old load (which is in imported state) should not be deleted + from load list as this load is required for managing the subclouds that + are still running the previous load. + .. only:: partner .. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest :start-after: DMupgrades-begin :end-before: DMupgrades-end + +.. rubric:: |postreq| + +Separately apply the patches after the upgrade to the major release.