docs/doc/source/updates/kubernetes/software-upgrades.rst
Ron Stone f125a8b892 Remove spurious escapes (r8,dsR8)
This change addresses a long-standing issue in rST documentation imported from XML.
That import process added backslash escapes in front of various characters. The three
most common being '(', ')', and '_'.
These instances are removed.

Signed-off-by: Ron Stone <ronald.stone@windriver.com>
Change-Id: Id43a9337ffcd505ccbdf072d7b29afdb5d2c997e
2023-03-01 11:19:04 +00:00

5.3 KiB

Software Upgrades

upgrades enable you to move software from one release of to the next release of .

software upgrade is a multi-step rolling-upgrade process, where hosts are upgraded one at time while continuing to provide its hosting services to its hosted applications. An upgrade can be performed manually or using Upgrade Orchestration, which automates much of the upgrade procedure, leaving a few manual steps to prevent operator oversight. For more information on manual upgrades, see Manual Platform Components Upgrade <manual-upgrade-overview>. For more information on upgrade orchestration, see Orchestrated Platform Component Upgrade <orchestration-upgrade-overview>.

Warning

Do NOT use information in the guide for orchestrated software upgrades. If information in this document is used for a orchestrated upgrade, the upgrade will fail, resulting in an outage. The Upgrade Orchestrator automates a recursive rolling upgrade of all subclouds and all hosts within the subclouds.

Before starting the upgrades process:

  • The system must be 'patch current'.
  • There must be no management-affecting alarms present on the system.
  • Ensure that any certificates managed by cert manager will not be renewed during the upgrade process.
  • The new software load must be imported.
  • A valid license file for the new software release must be installed.

The upgrade process starts by upgrading the controllers. The standby controller is upgraded first and involves loading the standby controller with the new release of software and migrating all the controller services' databases for the new release of software. Activity is switched to the upgraded controller, running in a 'compatibility' mode where all inter-node messages are using message formats from the old release of software. Prior to upgrading the second controller, you reach a "point-of-no-return for an in-service abort" of the upgrades process. The second controller is loaded with the new release of software and becomes the new Standby controller. For more information on manual upgrades, see Manual Platform Components Upgrade <manual-upgrade-overview> .

If present, storage nodes are locked, upgraded and unlocked one at a time in order to respect the redundancy model of storage nodes. Storage nodes can be upgraded in parallel if using upgrade orchestration.

Worker nodes are then upgraded. Worker nodes are tainted when locked, such that Kubernetes shuts down any pods on this worker node and restarts the pods on another worker node. When upgrading the worker node, the worker node network boots/installs the new software from the active controller. After unlocking the worker node, the worker services are running in a 'compatibility' mode where all inter-node messages are using message formats from the old release of software. Note that the worker nodes can only be upgraded in parallel if using upgrade orchestration.

The final step of the upgrade process is to activate and complete the upgrade. This involves disabling 'compatibility' modes on all hosts and clearing the Upgrade Alarm.

partner

Rolling Back / Aborting an Upgrade

In general, any issues encountered during an upgrade should be addressed during the upgrade with the intention of completing the upgrade after the issues are resolved. Issues specific to a storage or worker host can be addressed by temporarily downgrading the host, addressing the issues and then upgrading the host again, or in some cases by replacing the node.

In extremely rare cases, it may be necessary to abort an upgrade. This is a last resort and should only be done if there is no other way to address the issue within the context of the upgrade. There are two scenarios for doing such an abort:

  • Before controller-0 has been upgraded (that is, only controller-1 has been upgraded): In this case the upgrade can be aborted and the system will remain in service during the abort, see, Rolling Back a Software Upgrade Before the Second Controller Upgrade <rolling-back-a-software-upgrade-before-the-second-controller-upgrade>.
  • After controller-0 has been upgraded (that is, both controllers have been upgraded): In this case the upgrade can only be aborted with a complete outage and a reinstall of all hosts. This would only be done as a last resort, if there was absolutely no other way to recover the system, see, Rolling Back a Software Upgrade After the Second Controller Upgrade <rolling-back-a-software-upgrade-after-the-second-controller-upgrade>.