From a655378cca96f8f2e518eca6fa5a32cf88a4626b Mon Sep 17 00:00:00 2001 From: Peter Matulis Date: Tue, 16 Nov 2021 16:26:43 -0500 Subject: [PATCH] Review series upgrades Review and correct the upgrade-series page. Make miscellaneous improvements to various places for the sake of consistency. Also closes a doc bug and reverts changes made due to a second doc bug. Closes-Bug: #1838041 Related-Bug: #1934764 Change-Id: I88692573e8ae50cd77d3872b40361b464b8e0f19 --- .../source/upgrade-series-openstack.rst | 917 ++++++++++++------ 1 file changed, 614 insertions(+), 303 deletions(-) diff --git a/deploy-guide/source/upgrade-series-openstack.rst b/deploy-guide/source/upgrade-series-openstack.rst index 566eebc..8dc3c9b 100644 --- a/deploy-guide/source/upgrade-series-openstack.rst +++ b/deploy-guide/source/upgrade-series-openstack.rst @@ -24,7 +24,7 @@ cloud will nonetheless result in some level of downtime for the control plane. When the machines associated with stateful applications such as percona-cluster and rabbitmq-server undergo a series upgrade all cloud APIs will experience -downtime, in addition to the stateful application itself. +downtime, in addition to the stateful applications themselves. When machines associated with a single API application undergo a series upgrade that individual API will also experience downtime. This is because it is @@ -47,8 +47,8 @@ The topology is defined in this way: * Only compute and storage charms (and their subordinates) may be co-located. -* Third-party charms either do not exist or have been thoroughly tested - for a series upgrade. +* Third-party charms either do not exist or have been thoroughly tested for a + series upgrade. * The following are containerised: @@ -60,9 +60,9 @@ The topology is defined in this way: * The ceph-mon application -Storage charms are charms that manage physical disks. For example, ceph-osd and -swift-storage. Example OpenStack subordinate charms are networking SDN charms -for the nova-compute charm, or monitoring charms for compute or storage charms. +* All applications, where possible, are under high availability, whether + natively (e.g. ceph-mon, rabbitmq-server) or via hacluster (e.g. + percona-cluster). .. caution:: @@ -76,7 +76,14 @@ for the nova-compute charm, or monitoring charms for compute or storage charms. * does not affect containers hosted on the target machine - * an application's leader should be upgraded before its non-leaders +Notes +~~~~~ + +Storage charms are charms that manage physical disks. For example, ceph-osd and +swift-storage. + +Example OpenStack subordinate charms are networking SDN charms for the +nova-compute charm, or monitoring charms for compute or storage charms. Generalised OpenStack series upgrade ------------------------------------ @@ -85,41 +92,44 @@ This section will summarise the series upgrade steps in the context of specific OpenStack applications. It is an enhancement of the :ref:`Generic series upgrade ` section in the companion document. +Generally, this summary is well-suited to API applications (e.g. neutron-api, +keystone, nova-cloud-controller). + Applications for which this summary does **not** apply include: -* nova-compute -* ceph-mon -* ceph-osd +#. those that do not require the pausing of units and where application + leadership is irrelevant: -This is because the above applications do not require the pausing of units and -application leadership is irrelevant for them. + * nova-compute + * ceph-mon + * ceph-osd -However, this summary does apply to all API applications (e.g. neutron-api, -keystone, nova-cloud-controller), as well as percona-cluster, and -rabbitmq-server. +#. those that require a special upgrade workflow due to payload/upstream + requirements: -.. important:: + * percona-cluster + * rabbitmq-server - The first machines to be upgraded are always associated with the non-leaders - of the principal application. Let these machines be called the "principal - non-leader machines" and their units the "principal non-leader units". +.. note:: - The last machine to be upgraded is always associated with the leader - of the principal application. Let this machine be called the "principal - leader machine" and its unit the "principal leader unit". + Let the machine associated with the leader of the principal application be + called the "principal leader machine" and its unit the "principal leader + unit". + + Let the machines associated with the non-leaders of the principal + application be be called the "principal non-leader machines" and their units + the "principal non-leader units". The steps are as follows: -#. Set the default series for the principal application and ensure the same has - been done to the model. +#. Set the default series for the principal application. #. If hacluster is used, pause the hacluster units not associated with the principal leader machine. #. Pause the principal non-leader units. -#. Perform a series upgrade on one of the (now paused) principal non-leader - machines: +#. Perform a series upgrade on each of the paused machines: #. Disable :ref:`Unattended upgrades `. @@ -130,7 +140,7 @@ The steps are as follows: #. Upgrade the operating system (APT commands). - #. Perform any post-upgrade workload maintenance tasks. + #. Perform any post-upgrade tasks at the machine/unit level. #. Re-enable Unattended upgrades. @@ -138,39 +148,32 @@ The steps are as follows: #. Invoke the :command:`complete` sub-command. -#. Repeat step 4 for the remaining principal non-leader machines. - #. Pause the principal leader unit. -#. Repeat step 4 but for the principal leader machine. +#. Repeat step 4 for the paused principal leader machine. -#. Set the value of the (application-dependent) ``openstack-origin`` or the - ``source`` configuration option to 'distro' (new operating system). +#. Perform any remaining post-upgrade tasks. -#. Perform any possible cluster completed upgrade tasks once all machines have - had their series upgraded. - - .. note:: - - Here is a non-extensive list of the most common post-upgrade tasks for - OpenStack and supporting charms: - - * percona-cluster: run action ``complete-cluster-series-upgrade`` on the - leader unit. - * rabbitmq-server: run action ``complete-cluster-series-upgrade`` on the - leader unit. - * ceilometer: run action ``ceilometer-upgrade`` on the leader unit. - * vault: Each vault unit will need to be unsealed after its machine is - rebooted. +#. Update the software sources for the principal application's machines. Procedures ---------- The procedures are categorised based on application types. The example scenario used throughout is a 'xenial' to 'bionic' series upgrade, within an OpenStack -release of Queens (i.e. the starting point is a cloud archive pocket of +release of Queens (i.e. the starting point is a UCA release of 'xenial-queens'). +New default series for the model +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Ensure that any newly-created application units are based on the next series by +setting the model's default series appropriately: + +.. code-block:: none + + juju model-config default-series=bionic + Stateful applications ~~~~~~~~~~~~~~~~~~~~~ @@ -182,78 +185,228 @@ applications. These include: * rabbitmq-server A stateful application is one that maintains the state of various aspects of -the cloud. Clustered stateful applications, such as all the ones given above, -also require a quorum to function properly. Because of these reasons a stateful -application should not have all of its units restarted simultaneously; it must -have the series of its corresponding machines upgraded sequentially. +the cloud. Clustered stateful applications, such as the ones given above, +require a quorum to function properly. Therefore, a stateful application should +not have all of its units restarted simultaneously; it must have the series of +its corresponding machines upgraded sequentially. -.. note:: - - The concurrent upgrade approach is theoretically possible, although to use - it all cloud workloads will need to be stopped in order to ensure - consistency. This is not recommended. - -The example procedure will be based on the percona-cluster application. - -.. warning:: - - The eoan series is the last series supported by the percona-cluster charm. - It is replaced by the `mysql-innodb-cluster`_ and `mysql-router`_ charms in the - focal series. The migration steps are documented in `percona-cluster charm - - series upgrade to focal`_. - - Do not upgrade the machines hosting percona-cluster units to the focal - series. To be clear, if percona-cluster is containerised then it is the LXD - container that must not be upgraded. +ceph-mon +^^^^^^^^ .. important:: - Unlike percona-cluster, the ceph-mon and rabbitmq-server applications do not - use hacluster to achieve HA, nor do they need backups. Disregard therefore - the hacluster and backup steps for these two applications. + During this upgrade there will NOT be a Ceph service outage. - The ceph-mon charm will maintain the MON cluster during a series upgrade, so - ceph-mon units do not need to be paused. + The MON cluster will be maintained during the upgrade by the ceph-mon charm, + rendering application leadership irrelevant. Notably, ceph-mon units do not + need to be paused. This scenario is represented by the following partial :command:`juju status` command output: .. code-block:: console - Model Controller Cloud/Region Version SLA Timestamp - upgrade maas-controller mymaas/default 2.7.6 unsupported 18:26:57Z + App Version Status Scale Charm Store Channel Rev OS Message + ceph-mon 12.2.13 active 3 ceph-mon charmstore stable 483 ubuntu Unit is ready and clustered - App Version Status Scale Charm Store Rev OS Notes - percona-cluster 5.6.37 active 3 percona-cluster jujucharms 286 ubuntu - percona-cluster-hacluster active 3 hacluster jujucharms 66 ubuntu + Unit Workload Agent Machine Public address Ports Message + ceph-mon/0 active idle 0/lxd/0 10.246.114.57 Unit is ready and clustered + ceph-mon/1 active idle 1/lxd/0 10.246.114.56 Unit is ready and clustered + ceph-mon/2* active idle 2/lxd/0 10.246.114.26 Unit is ready and clustered - Unit Workload Agent Machine Public address Ports Message - percona-cluster/0 active idle 0/lxd/0 10.0.0.47 3306/tcp Unit is ready - percona-cluster-hacluster/0* active idle 10.0.0.47 Unit is ready and clustered - percona-cluster/1* active idle 1/lxd/0 10.0.0.48 3306/tcp Unit is ready - percona-cluster-hacluster/2 active idle 10.0.0.48 Unit is ready and clustered - percona-cluster/2 active idle 2/lxd/0 10.0.0.49 3306/tcp Unit is ready - percona-cluster-hacluster/1 active idle 10.0.0.49 Unit is ready and clustered +#. Perform any workload maintenance pre-upgrade steps. -In summary, the principal leader unit is percona-cluster/1 and is deployed on -machine 1/lxd/0 (the principal leader machine). + For ceph-mon, there are no recommended steps to take. -.. warning:: - - During this upgrade, there will be a MySQL service outage. The HA resources - provided by hacluster will **not** be monitored during the series upgrade - due to the pausing of units. - -#. Perform any workload maintenance pre-upgrade steps. For percona-cluster, - take a backup and transfer it to a secure location: +#. Set the default series for the principal application: .. code-block:: none - juju run-action --wait percona-cluster/1 backup - juju scp -- -r percona-cluster/1:/opt/backups/mysql /path/to/local/directory + juju set-series ceph-mon bionic - Permissions will need to be altered on the remote machine, and note that the - last command transfers **all** existing backups. +#. Perform a series upgrade of the machines in any order: + + .. code-block:: none + + juju upgrade-series 0/lxd/0 prepare bionic + juju ssh 0/lxd/0 sudo apt update + juju ssh 0/lxd/0 sudo apt full-upgrade + juju ssh 0/lxd/0 sudo do-release-upgrade + + For ceph-mon, there are no post-upgrade steps; the prompt to reboot can be + answered in the affirmative. + + Invoke the :command:`complete` sub-command: + + .. code-block:: none + + juju upgrade-series 0/lxd/0 complete + +#. Repeat step 4 for each of the remaining machines: + + .. code-block:: none + + juju upgrade-series 1/lxd/0 prepare bionic + juju ssh 1/lxd/0 sudo apt update + juju ssh 1/lxd/0 sudo apt full-upgrade + juju ssh 1/lxd/0 sudo do-release-upgrade # and reboot + juju upgrade-series 1/lxd/0 complete + + .. code-block:: none + + juju upgrade-series 2/lxd/0 prepare bionic + juju ssh 2/lxd/0 sudo apt update + juju ssh 2/lxd/0 sudo apt full-upgrade + juju ssh 2/lxd/0 sudo do-release-upgrade # and reboot + juju upgrade-series 2/lxd/0 complete + +#. Perform any remaining post-upgrade tasks. + + For ceph-mon, there are no remaining post-upgrade steps. + +#. Update the software sources for the application's machines. + + For ceph-mon, set the value of the ``source`` configuration option to + 'distro': + + .. code-block:: none + + juju config ceph-mon source=distro + +The final partial :command:`juju status` output looks like this: + +.. code-block:: console + + App Version Status Scale Charm Store Channel Rev OS Message + ceph-mon 12.2.13 active 3 ceph-mon charmstore stable 483 ubuntu Unit is ready and clustered + + Unit Workload Agent Machine Public address Ports Message + ceph-mon/0 active idle 0/lxd/0 10.246.114.57 Unit is ready and clustered + ceph-mon/1 active idle 1/lxd/0 10.246.114.56 Unit is ready and clustered + ceph-mon/2* active idle 2/lxd/0 10.246.114.26 Unit is ready and clustered + +Note that the version of Ceph has not been upgraded (from 12.2.13 - Luminous) +since the OpenStack release (of Queens) remains unchanged. + +rabbitmq-server +^^^^^^^^^^^^^^^ + +To ensure proper cluster health, the RabbitMQ cluster is not reformed until all +rabbitmq-server units are series upgraded. An action is then used to complete +the upgrade by bringing the cluster back online. + +.. warning:: + + During this upgrade there will be a RabbitMQ service outage. + +This scenario is represented by the following partial :command:`juju status` +command output: + +.. code-block:: console + + App Version Status Scale Charm Store Channel Rev OS Message + rabbitmq-server 3.5.7 active 3 rabbitmq-server charmstore stable 118 ubuntu Unit is ready and clustered + + Unit Workload Agent Machine Public address Ports Message + rabbitmq-server/0* active idle 0/lxd/0 10.0.0.162 5672/tcp Unit is ready and clustered + rabbitmq-server/1 active idle 1/lxd/0 10.0.0.164 5672/tcp Unit is ready and clustered + rabbitmq-server/2 active idle 2/lxd/0 10.0.0.163 5672/tcp Unit is ready and clustered + +In summary, the principal leader unit is rabbitmq-server/0 and is deployed on +machine 0/lxd/0 (the principal leader machine). + +#. Perform any workload maintenance pre-upgrade steps. + + For rabbitmq-server, there are no recommended steps to take. + +#. Set the default series for the principal application: + + .. code-block:: none + + juju set-series rabbitmq-server bionic + +#. Pause the principal non-leader units: + + .. code-block:: none + + juju run-action --wait rabbitmq-server/1 pause + juju run-action --wait rabbitmq-server/2 pause + +#. Perform a series upgrade of the principal leader machine: + + .. code-block:: none + + juju upgrade-series 0/lxd/0 prepare bionic + juju ssh 0/lxd/0 sudo apt update + juju ssh 0/lxd/0 sudo apt full-upgrade + juju ssh 0/lxd/0 sudo do-release-upgrade + + For rabbitmq-server, there are no post-upgrade steps; the prompt to reboot + can be answered in the affirmative. + + Invoke the :command:`complete` sub-command: + + .. code-block:: none + + juju upgrade-series 0/lxd/0 complete + +#. Repeat step 4 for each of the principal non-leader machines: + + .. code-block:: none + + juju upgrade-series 1/lxd/0 prepare bionic + juju ssh 1/lxd/0 sudo apt update + juju ssh 1/lxd/0 sudo apt full-upgrade + juju ssh 1/lxd/0 sudo do-release-upgrade # and reboot + juju upgrade-series 1/lxd/0 complete + + .. code-block:: none + + juju upgrade-series 2/lxd/0 prepare bionic + juju ssh 2/lxd/0 sudo apt update + juju ssh 2/lxd/0 sudo apt full-upgrade + juju ssh 2/lxd/0 sudo do-release-upgrade # and reboot + juju upgrade-series 2/lxd/0 complete + +#. Perform any remaining post-upgrade tasks. + + For rabbitmq-server, run an action: + + .. code-block:: none + + juju run-action --wait rabbitmq-server/leader complete-cluster-series-upgrade + +#. Update the software sources for the application's machines. + + For rabbitmq-server, set the value of the ``source`` configuration option to + 'distro': + + .. code-block:: none + + juju config rabbitmq-server source=distro + +The final partial :command:`juju status` output looks like this: + +.. code-block:: console + + App Version Status Scale Charm Store Channel Rev OS Message + rabbitmq-server 3.6.10 active 3 rabbitmq-server charmstore stable 118 ubuntu Unit is ready and clustered + + Unit Workload Agent Machine Public address Ports Message + rabbitmq-server/0* active idle 0/lxd/0 10.0.0.162 5672/tcp Unit is ready and clustered + rabbitmq-server/1 active idle 1/lxd/0 10.0.0.164 5672/tcp Unit is ready and clustered + rabbitmq-server/2 active idle 2/lxd/0 10.0.0.163 5672/tcp Unit is ready and clustered + +Note that the version of RabbitMQ has been upgraded (from 3.5.7 to 3.6.10) +since more recent software has been found in the Ubuntu package archive for +Bionic. + +percona-cluster +^^^^^^^^^^^^^^^ + +.. warning:: + + During this upgrade there will be a MySQL service outage. .. note:: @@ -263,11 +416,57 @@ machine 1/lxd/0 (the principal leader machine). * `Percona XtraDB Cluster In-Place Upgrading Guide From 5.5 to 5.6`_ * `Galera replication - how to recover a PXC cluster`_ -#. Set the default series for both the model and the principal application: +To ensure proper cluster health, the Percona cluster is not reformed until all +percona-cluster units are series upgraded. An action is then used to complete +the upgrade by bringing the cluster back online. + +.. warning:: + + The eoan series is the last series supported by the percona-cluster charm. + It is replaced by the `mysql-innodb-cluster`_ and `mysql-router`_ charms in + the focal series. The migration steps are documented in `percona-cluster + charm - series upgrade to focal`_. + + Do not upgrade the machines hosting percona-cluster units to the focal + series. To be clear, if percona-cluster is containerised then it is the LXD + container that must not be upgraded. + +This scenario is represented by the following partial :command:`juju status` +command output: + +.. code-block:: console + + App Version Status Scale Charm Store Channel Rev OS Message + percona-cluster 5.6.37 active 3 percona-cluster charmstore stable 302 ubuntu Unit is ready + percona-cluster-hacluster active 3 hacluster charmstore stable 81 ubuntu Unit is ready and clustered + + Unit Workload Agent Machine Public address Ports Message + percona-cluster/0* active idle 0/lxd/1 10.0.0.165 3306/tcp Unit is ready + percona-cluster-hacluster/2 active idle 10.0.0.165 Unit is ready and clustered + percona-cluster/1 active idle 1/lxd/1 10.0.0.166 3306/tcp Unit is ready + percona-cluster-hacluster/0* active idle 10.0.0.166 Unit is ready and clustered + percona-cluster/2 active idle 2/lxd/1 10.0.0.167 3306/tcp Unit is ready + percona-cluster-hacluster/1 active idle 10.0.0.167 Unit is ready and clustered + +In summary, the principal leader unit is percona-cluster/0 and is deployed on +machine 0/lxd/1 (the principal leader machine). + +#. Perform any workload maintenance pre-upgrade steps. + + For percona-cluster, take a backup and transfer it to a secure location: + + .. code-block:: none + + juju run-action --wait percona-cluster/leader backup + juju scp -- -r percona-cluster/leader:/opt/backups/mysql /path/to/local/directory + + Permissions will need to be altered on the remote machine, and note that the + :command:`scp` command transfers **all** existing backups. + +#. Set the default series for the principal application: .. code-block:: none - juju model-config default-series=bionic juju set-series percona-cluster bionic #. Pause the hacluster units not associated with the principal leader machine: @@ -281,80 +480,103 @@ machine 1/lxd/0 (the principal leader machine). .. code-block:: none - juju run-action --wait percona-cluster/0 pause + juju run-action --wait percona-cluster/1 pause juju run-action --wait percona-cluster/2 pause - For percona-cluster, leaving the principal leader unit up will ensure it - has the latest MySQL sequence number; it will be considered the most up to - date cluster member. + Leaving the principal leader unit up will ensure it has the latest MySQL + sequence number; it will be considered the most up to date cluster member. -#. Perform a series upgrade on the principal leader machine: + At this point the partial :command:`juju status` output looks like this: + + .. code-block:: console + + App Version Status Scale Charm Store Channel Rev OS Message + percona-cluster 5.6.37 maintenance 3 percona-cluster charmstore stable 302 ubuntu Paused. Use 'resume' action to resume normal service. + percona-cluster-hacluster maintenance 3 hacluster charmstore stable 81 ubuntu Paused. Use 'resume' action to resume normal service. + + Unit Workload Agent Machine Public address Ports Message + percona-cluster/0* active idle 0/lxd/1 10.0.0.165 3306/tcp Unit is ready + percona-cluster-hacluster/2 active idle 10.0.0.165 Unit is ready and clustered + percona-cluster/1 maintenance idle 1/lxd/1 10.0.0.166 3306/tcp Paused. Use 'resume' action to resume normal service. + percona-cluster-hacluster/0* maintenance idle 10.0.0.166 Paused. Use 'resume' action to resume normal service. + percona-cluster/2 maintenance idle 2/lxd/1 10.0.0.167 3306/tcp Paused. Use 'resume' action to resume normal service. + percona-cluster-hacluster/1 maintenance idle 10.0.0.167 Paused. Use 'resume' action to resume normal service. + +#. Perform a series upgrade of the principal leader machine: .. code-block:: none - juju upgrade-series 1/lxd/0 prepare bionic - juju run --machine=1/lxd/0 -- sudo apt update - juju ssh 1/lxd/0 sudo apt full-upgrade - juju ssh 1/lxd/0 sudo do-release-upgrade + juju upgrade-series 0/lxd/1 prepare bionic + juju ssh 0/lxd/1 sudo apt update + juju ssh 0/lxd/1 sudo apt full-upgrade + juju ssh 0/lxd/1 sudo do-release-upgrade For percona-cluster, there are no post-upgrade steps; the prompt to reboot can be answered in the affirmative. -#. Set the value of the ``source`` configuration option to 'distro': + Invoke the :command:`complete` sub-command: .. code-block:: none - juju config percona-cluster source=distro + juju upgrade-series 0/lxd/1 complete -#. Invoke the :command:`complete` sub-command on the principal leader machine: +#. Repeat step 4 for each of the principal non-leader machines: .. code-block:: none - juju upgrade-series 1/lxd/0 complete + juju upgrade-series 1/lxd/1 prepare bionic + juju ssh 1/lxd/1 sudo apt update + juju ssh 1/lxd/1 sudo apt full-upgrade + juju ssh 1/lxd/1 sudo do-release-upgrade # and reboot + juju upgrade-series 1/lxd/1 complete - At this point the :command:`juju status` output looks like this: + .. code-block:: none - .. code-block:: console + juju upgrade-series 2/lxd/1 prepare bionic + juju ssh 2/lxd/1 sudo apt update + juju ssh 2/lxd/1 sudo apt full-upgrade + juju ssh 2/lxd/1 sudo do-release-upgrade # and reboot + juju upgrade-series 2/lxd/1 complete - Model Controller Cloud/Region Version SLA Timestamp - upgrade maas-controller mymaas/default 2.7.6 unsupported 19:51:52Z +#. Perform any remaining post-upgrade tasks. - App Version Status Scale Charm Store Rev OS Notes - percona-cluster 5.7.20 maintenance 3 percona-cluster jujucharms 286 ubuntu - percona-cluster-hacluster blocked 3 hacluster jujucharms 66 ubuntu - - Unit Workload Agent Machine Public address Ports Message - percona-cluster/0 maintenance idle 0/lxd/0 10.0.0.47 3306/tcp Paused. Use 'resume' action to resume normal service. - percona-cluster-hacluster/0* maintenance idle 10.0.0.47 Paused. Use 'resume' action to resume normal service. - percona-cluster/1* active idle 1/lxd/0 10.0.0.48 3306/tcp Unit is ready - percona-cluster-hacluster/2 blocked idle 10.0.0.48 Resource: res_mysql_11810cc_vip not running - percona-cluster/2 maintenance idle 2/lxd/0 10.0.0.49 3306/tcp Paused. Use 'resume' action to resume normal service. - percona-cluster-hacluster/1 maintenance idle 10.0.0.49 Paused. Use 'resume' action to resume normal service. - - Machine State DNS Inst id Series AZ Message - 0 started 10.0.0.44 node1 xenial zone1 Deployed - 0/lxd/0 started 10.0.0.47 juju-f83fcd-0-lxd-0 xenial zone1 Container started - 1 started 10.0.0.45 node2 xenial zone2 Deployed - 1/lxd/0 started 10.0.0.48 juju-f83fcd-1-lxd-0 bionic zone2 Running - 2 started 10.0.0.46 node3 xenial zone3 Deployed - 2/lxd/0 started 10.0.0.49 juju-f83fcd-2-lxd-0 xenial zone3 Container started - -#. For percona-cluster, a sanity check should be done on the leader unit's + For percona-cluster, a sanity check should be performed on the leader unit's databases and data. -#. Repeat steps 5 and 7 for the principal non-leader machines. - -#. Perform any possible cluster completed upgrade tasks once all machines have - had their series upgraded: + Also, an action must be run: .. code-block:: none juju run-action --wait percona-cluster/leader complete-cluster-series-upgrade - For percona-cluster (and rabbitmq-server), the above action is performed on - the leader unit. It informs each cluster node that the upgrade process is - complete cluster-wide. This also updates MySQL configuration with all peers - in the cluster. +#. Update the software sources for the application's machines. + + For percona-cluster, set the value of the ``source`` configuration option to + 'distro': + + .. code-block:: none + + juju config percona-cluster source=distro + +The final partial :command:`juju status` output looks like this: + +.. code-block:: console + + App Version Status Scale Charm Store Channel Rev OS Message + percona-cluster 5.7.20 active 3 percona-cluster charmstore stable 302 ubuntu Unit is ready + percona-cluster-hacluster active 3 hacluster charmstore stable 81 ubuntu Unit is ready and clustered + + Unit Workload Agent Machine Public address Ports Message + percona-cluster/0* active idle 0/lxd/1 10.0.0.165 3306/tcp Unit is ready + percona-cluster-hacluster/2 active idle 10.0.0.165 Unit is ready and clustered + percona-cluster/1 active idle 1/lxd/1 10.0.0.166 3306/tcp Unit is ready + percona-cluster-hacluster/0* active idle 10.0.0.166 Unit is ready and clustered + percona-cluster/2 active idle 2/lxd/1 10.0.0.167 3306/tcp Unit is ready + percona-cluster-hacluster/1 active idle 10.0.0.167 Unit is ready and clustered + +Note that the version of Percona has been upgraded (from 5.6.37 to 5.7.20) +since more recent software has been found in the Ubuntu package archive for +Bionic. API applications ~~~~~~~~~~~~~~~~ @@ -387,37 +609,33 @@ command output: .. code-block:: console - Model Controller Cloud/Region Version SLA Timestamp - upgrade maas-controller mymaas/default 2.7.6 unsupported 22:48:41Z - - App Version Status Scale Charm Store Rev OS Notes - keystone 13.0.2 active 3 keystone jujucharms 312 ubuntu - keystone-hacluster active 3 hacluster jujucharms 66 ubuntu + App Version Status Scale Charm Store Channel Rev OS Message + keystone 13.0.4 active 3 keystone charmstore stable 330 ubuntu Application Ready + keystone-hacluster active 3 hacluster charmstore stable 81 ubuntu Unit is ready and clustered Unit Workload Agent Machine Public address Ports Message - keystone/0* active idle 0/lxd/0 10.0.0.70 5000/tcp Unit is ready - keystone-hacluster/0* active idle 10.0.0.70 Unit is ready and clustered - keystone/1 active idle 1/lxd/0 10.0.0.71 5000/tcp Unit is ready - keystone-hacluster/2 active idle 10.0.0.71 Unit is ready and clustered - keystone/2 active idle 2/lxd/0 10.0.0.72 5000/tcp Unit is ready - keystone-hacluster/1 active idle 10.0.0.72 Unit is ready and clustered + keystone/0* active idle 0/lxd/0 10.0.0.198 5000/tcp Unit is ready + keystone-hacluster/2 active idle 10.0.0.198 Unit is ready and clustered + keystone/1 active idle 1/lxd/0 10.0.0.196 5000/tcp Unit is ready + keystone-hacluster/0* active idle 10.0.0.196 Unit is ready and clustered + keystone/2 active idle 2/lxd/0 10.0.0.197 5000/tcp Unit is ready + keystone-hacluster/1 active idle 10.0.0.197 Unit is ready and clustered In summary, the principal leader unit is keystone/0 and is deployed on machine 0/lxd/0 (the principal leader machine). -#. Set the default series for both the model and the principal application: +#. Set the default series for the principal application: .. code-block:: none - juju model-config default-series=bionic juju set-series keystone bionic #. Pause the hacluster units not associated with the principal leader machine: .. code-block:: none + juju run-action --wait keystone-hacluster/0 pause juju run-action --wait keystone-hacluster/1 pause - juju run-action --wait keystone-hacluster/2 pause #. Pause the principal non-leader units: @@ -442,20 +660,17 @@ In summary, the principal leader unit is keystone/0 and is deployed on machine .. code-block:: console - Model Controller Cloud/Region Version SLA Timestamp - upgrade maas-controller mymaas/default 2.7.6 unsupported 23:11:01Z - - App Version Status Scale Charm Store Rev OS Notes - keystone 13.0.2 blocked 3 keystone jujucharms 312 ubuntu - keystone-hacluster blocked 3 hacluster jujucharms 66 ubuntu + App Version Status Scale Charm Store Channel Rev OS Message + keystone 13.0.4 blocked 3 keystone charmstore stable 330 ubuntu Unit paused. + keystone-hacluster blocked 3 hacluster charmstore stable 81 ubuntu Ready for do-release-upgrade. Set complete when finished Unit Workload Agent Machine Public address Ports Message - keystone/0* blocked idle 0/lxd/0 10.0.0.70 5000/tcp Ready for do-release-upgrade and reboot. Set complete when finished. - keystone-hacluster/0* blocked idle 10.0.0.70 Ready for do-release-upgrade. Set complete when finished - keystone/1 blocked idle 1/lxd/0 10.0.0.71 5000/tcp Ready for do-release-upgrade and reboot. Set complete when finished. - keystone-hacluster/2 blocked idle 10.0.0.71 Ready for do-release-upgrade. Set complete when finished - keystone/2 blocked idle 2/lxd/0 10.0.0.72 5000/tcp Ready for do-release-upgrade and reboot. Set complete when finished. - keystone-hacluster/1 blocked idle 10.0.0.72 Ready for do-release-upgrade. Set complete when finished + keystone/0* blocked idle 0/lxd/0 10.0.0.198 5000/tcp Ready for do-release-upgrade and reboot. Set complete when finished., Unit paused. + keystone-hacluster/2 blocked idle 10.0.0.198 Ready for do-release-upgrade. Set complete when finished + keystone/1 blocked idle 1/lxd/0 10.0.0.196 5000/tcp Ready for do-release-upgrade and reboot. Set complete when finished., Unit paused. + keystone-hacluster/0* blocked idle 10.0.0.196 Ready for do-release-upgrade. Set complete when finished + keystone/2 blocked idle 2/lxd/0 10.0.0.197 5000/tcp Ready for do-release-upgrade and reboot. Set complete when finished., Unit paused. + keystone-hacluster/1 blocked idle 10.0.0.197 Ready for do-release-upgrade. Set complete when finished #. Upgrade the operating system on all machines. The non-interactive method is used here: @@ -464,10 +679,12 @@ In summary, the principal leader unit is keystone/0 and is deployed on machine juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0 --timeout=10m \ -- sudo apt-get update + juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0 --timeout=60m \ -- sudo DEBIAN_FRONTEND=noninteractive apt-get --assume-yes \ -o "Dpkg::Options::=--force-confdef" \ -o "Dpkg::Options::=--force-confold" dist-upgrade + juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0 --timeout=120m \ -- sudo DEBIAN_FRONTEND=noninteractive \ do-release-upgrade -f DistUpgradeViewNonInteractive @@ -477,8 +694,9 @@ In summary, the principal leader unit is keystone/0 and is deployed on machine Choose values for the ``--timeout`` option that are appropriate for the task at hand. -#. Perform any workload maintenance post-upgrade steps on all machines. There - are no keystone-specific steps to perform. +#. Perform any post-upgrade tasks. + + For keystone, there are no specific steps to perform. #. Reboot all machines: @@ -486,12 +704,6 @@ In summary, the principal leader unit is keystone/0 and is deployed on machine juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0 -- sudo reboot -#. Set the value of the ``openstack-origin`` configuration option to 'distro': - - .. code-block:: none - - juju config keystone openstack-origin=distro - #. Invoke the :command:`complete` sub-command on all machines: .. code-block:: none @@ -500,6 +712,38 @@ In summary, the principal leader unit is keystone/0 and is deployed on machine juju upgrade-series 1/lxd/0 complete juju upgrade-series 2/lxd/0 complete +#. Perform any remaining post-upgrade tasks. + + For keystone, there are no remaining post-upgrade steps. + +#. Update the software sources for the application's machines. + + For keystone, set the value of the ``openstack-origin`` configuration option + to 'distro': + + .. code-block:: none + + juju config keystone openstack-origin=distro + +The final partial :command:`juju status` output looks like this: + +.. code-block:: console + + App Version Status Scale Charm Store Channel Rev OS Message + keystone 13.0.4 active 3 keystone charmstore stable 330 ubuntu Application Ready + keystone-hacluster active 3 hacluster charmstore stable 81 ubuntu Unit is ready and clustered + + Unit Workload Agent Machine Public address Ports Message + keystone/0* active idle 0/lxd/0 10.0.0.198 5000/tcp Unit is ready + keystone-hacluster/2 active idle 10.0.0.198 Unit is ready and clustered + keystone/1 active idle 1/lxd/0 10.0.0.196 5000/tcp Unit is ready + keystone-hacluster/0* active idle 10.0.0.196 Unit is ready and clustered + keystone/2 active idle 2/lxd/0 10.0.0.197 5000/tcp Unit is ready + keystone-hacluster/1 active idle 10.0.0.197 Unit is ready and clustered + +Note that the version of Keystone has not been upgraded (from 13.0.4) since the +OpenStack release (of Queens) remains unchanged. + Upgrading multiple API applications concurrently ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -511,45 +755,38 @@ command output: .. code-block:: console - Model Controller Cloud/Region Version SLA Timestamp - upgrade maas-controller mymaas/default 2.7.6 unsupported 19:23:41Z + App Version Status Scale Charm Store Channel Rev OS Message + glance 16.0.1 active 3 glance charmstore stable 484 ubuntu Unit is ready + glance-hacluster active 3 hacluster charmstore stable 81 ubuntu Unit is ready and clustered + nova-cloud-controller 17.0.13 active 3 nova-cloud-controller charmstore stable 555 ubuntu Unit is ready + nova-cloud-controller-hacluster active 3 hacluster charmstore stable 81 ubuntu Unit is ready and clustered - App Version Status Scale Charm Store Rev OS Notes - glance 16.0.1 active 3 glance jujucharms 295 ubuntu - glance-hacluster active 3 hacluster jujucharms 66 ubuntu - nova-cc-hacluster active 3 hacluster jujucharms 66 ubuntu - nova-cloud-controller 17.0.12 active 3 nova-cloud-controller jujucharms 343 ubuntu - - Unit Workload Agent Machine Public address Ports Message - glance/0* active idle 0/lxd/0 10.246.114.39 9292/tcp Unit is ready - glance-hacluster/0* active idle 10.246.114.39 Unit is ready and clustered - glance/1 active idle 1/lxd/0 10.246.114.40 9292/tcp Unit is ready - glance-hacluster/1 active idle 10.246.114.40 Unit is ready and clustered - glance/2 active idle 2/lxd/0 10.246.114.41 9292/tcp Unit is ready - glance-hacluster/2 active idle 10.246.114.41 Unit is ready and clustered - nova-cloud-controller/0 active idle 3/lxd/0 10.246.114.48 8774/tcp,8778/tcp Unit is ready - nova-cc-hacluster/2 active idle 10.246.114.48 Unit is ready and clustered - nova-cloud-controller/1* active idle 4/lxd/0 10.246.114.43 8774/tcp,8778/tcp Unit is ready - nova-cc-hacluster/0* active idle 10.246.114.43 Unit is ready and clustered - nova-cloud-controller/2 active idle 5/lxd/0 10.246.114.47 8774/tcp,8778/tcp Unit is ready - nova-cc-hacluster/1 active idle 10.246.114.47 Unit is ready and clustered + Unit Workload Agent Machine Public address Ports Message + glance/0* active idle 2/lxd/1 10.246.114.27 9292/tcp Unit is ready + glance-hacluster/0* active idle 10.246.114.27 Unit is ready and clustered + glance/1 active idle 2/lxd/3 10.246.114.64 9292/tcp Unit is ready + glance-hacluster/2 active idle 10.246.114.64 Unit is ready and clustered + glance/2 active idle 1/lxd/4 10.246.114.65 9292/tcp Unit is ready + glance-hacluster/1 active idle 10.246.114.65 Unit is ready and clustered + nova-cloud-controller/0* active idle 2/lxd/2 10.246.114.25 8774/tcp,8778/tcp Unit is ready + nova-cloud-controller-hacluster/0* active idle 10.246.114.25 Unit is ready and clustered + nova-cloud-controller/1 active idle 1/lxd/2 10.246.114.61 8774/tcp,8778/tcp Unit is ready + nova-cloud-controller-hacluster/1 active idle 10.246.114.61 Unit is ready and clustered + nova-cloud-controller/2 active idle 0/lxd/4 10.246.114.62 8774/tcp,8778/tcp Unit is ready + nova-cloud-controller-hacluster/2 active idle 10.246.114.62 Unit is ready and clustered In summary, * The glance principal leader unit is glance/0 and is deployed on machine - 0/lxd/0 (the glance principal leader machine). -* The nova-cloud-controller principal leader unit is nova-cloud-controller/1 - and is deployed on machine 4/lxd/0 (the nova-cloud-controller principal + 2/lxd/1 (the glance principal leader machine). +* The nova-cloud-controller principal leader unit is nova-cloud-controller/0 + and is deployed on machine 2/lxd/2 (the nova-cloud-controller principal leader machine). -The procedure has been expedited slightly by adding the ``--yes`` confirmation -option to the :command:`prepare` sub-command. - -#. Set the default series for both the model and the principal applications: +#. Set the default series for the principal applications: .. code-block:: none - juju model-config default-series=bionic juju set-series glance bionic juju set-series nova-cloud-controller bionic @@ -560,8 +797,8 @@ option to the :command:`prepare` sub-command. juju run-action --wait glance-hacluster/1 pause juju run-action --wait glance-hacluster/2 pause - juju run-action --wait nova-cc-hacluster/1 pause - juju run-action --wait nova-cc-hacluster/2 pause + juju run-action --wait nova-cloud-controller-hacluster/1 pause + juju run-action --wait nova-cloud-controller-hacluster/2 pause #. Pause the principal non-leader units: @@ -569,37 +806,40 @@ option to the :command:`prepare` sub-command. juju run-action --wait glance/1 pause juju run-action --wait glance/2 pause - juju run-action --wait nova-cloud-controller/0 pause + juju run-action --wait nova-cloud-controller/1 pause juju run-action --wait nova-cloud-controller/2 pause #. Perform any workload maintenance pre-upgrade steps on all machines. There - are no glance-specific or nova-cloud-controller-specific steps to perform. + are no glance-specific nor nova-cloud-controller-specific steps to perform. #. Invoke the :command:`prepare` sub-command on all machines, **starting with - the principal leader machines**: + the principal leader machines**. The procedure has been expedited slightly + by adding the ``--yes`` confirmation option: .. code-block:: none - juju upgrade-series --yes 0/lxd/0 prepare bionic - juju upgrade-series --yes 4/lxd/0 prepare bionic - juju upgrade-series --yes 1/lxd/0 prepare bionic - juju upgrade-series --yes 2/lxd/0 prepare bionic - juju upgrade-series --yes 3/lxd/0 prepare bionic - juju upgrade-series --yes 5/lxd/0 prepare bionic + juju upgrade-series --yes 2/lxd/1 prepare bionic + juju upgrade-series --yes 2/lxd/2 prepare bionic + juju upgrade-series --yes 2/lxd/3 prepare bionic + juju upgrade-series --yes 1/lxd/4 prepare bionic + juju upgrade-series --yes 1/lxd/2 prepare bionic + juju upgrade-series --yes 0/lxd/4 prepare bionic #. Upgrade the operating system on all machines. The non-interactive method is used here: .. code-block:: none - juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0,3/lxd/0,4/lxd/0,5/lxd/0 \ + juju run --machine=2/lxd/1,2/lxd/2,2/lxd/3,1/lxd/4,1/lxd/2,0/lxd/4 \ --timeout=20m -- sudo apt-get update - juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0,3/lxd/0,4/lxd/0,5/lxd/0 \ + + juju run --machine=2/lxd/1,2/lxd/2,2/lxd/3,1/lxd/4,1/lxd/2,0/lxd/4 \ --timeout=120m -- sudo DEBIAN_FRONTEND=noninteractive apt-get --assume-yes \ -o "Dpkg::Options::=--force-confdef" \ -o "Dpkg::Options::=--force-confold" dist-upgrade - juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0,3/lxd/0,4/lxd/0,5/lxd/0 \ - --timeout=200m -- sudo DEBIAN_FRONTEND=noninteractive \ + + juju run --machine=2/lxd/1,2/lxd/2,2/lxd/3,1/lxd/4,1/lxd/2,0/lxd/4 \ + --timeout=240m -- sudo DEBIAN_FRONTEND=noninteractive \ do-release-upgrade -f DistUpgradeViewNonInteractive #. Perform any workload maintenance post-upgrade steps on all machines. There @@ -609,134 +849,205 @@ option to the :command:`prepare` sub-command. .. code-block:: none - juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0,3/lxd/0,4/lxd/0,5/lxd/0 -- sudo reboot + juju run --machine=2/lxd/1,2/lxd/2,2/lxd/3,1/lxd/4,1/lxd/2,0/lxd/4 \ + -- sudo reboot -#. Set the value of the ``openstack-origin`` configuration option to 'distro': +#. Invoke the :command:`complete` sub-command on all machines: + + .. code-block:: none + + juju upgrade-series 2/lxd/1 complete + juju upgrade-series 2/lxd/2 complete + juju upgrade-series 2/lxd/3 complete + juju upgrade-series 1/lxd/4 complete + juju upgrade-series 1/lxd/2 complete + juju upgrade-series 0/lxd/4 complete + +#. Update the software sources for the application's machines. + + For glance and nova-cloud-controller, set the value of the + ``openstack-origin`` configuration option to 'distro': .. code-block:: none juju config glance openstack-origin=distro juju config nova-cloud-controller openstack-origin=distro -#. Invoke the :command:`complete` sub-command on all machines: +The final partial :command:`juju status` output looks like this: - .. code-block:: none +.. code-block:: console - juju upgrade-series 0/lxd/0 complete - juju upgrade-series 1/lxd/0 complete - juju upgrade-series 2/lxd/0 complete - juju upgrade-series 3/lxd/0 complete - juju upgrade-series 4/lxd/0 complete - juju upgrade-series 5/lxd/0 complete + App Version Status Scale Charm Store Channel Rev OS Message + glance 16.0.1 active 3 glance charmstore stable 484 ubuntu Unit is ready + glance-hacluster active 3 hacluster charmstore stable 81 ubuntu Unit is ready and clustered + nova-cloud-controller 17.0.13 active 3 nova-cloud-controller charmstore stable 555 ubuntu Unit is ready + nova-cloud-controller-hacluster active 3 hacluster charmstore stable 81 ubuntu Unit is ready and clustered + + Unit Workload Agent Machine Public address Ports Message + glance/0* active idle 2/lxd/1 10.246.114.27 9292/tcp Unit is ready + glance-hacluster/0* active idle 10.246.114.27 Unit is ready and clustered + glance/1 active idle 2/lxd/3 10.246.114.64 9292/tcp Unit is ready + glance-hacluster/2 active idle 10.246.114.64 Unit is ready and clustered + glance/2 active idle 1/lxd/4 10.246.114.65 9292/tcp Unit is ready + glance-hacluster/1 active idle 10.246.114.65 Unit is ready and clustered + nova-cloud-controller/0* active idle 2/lxd/2 10.246.114.25 8774/tcp,8778/tcp Unit is ready + nova-cloud-controller-hacluster/0* active idle 10.246.114.25 Unit is ready and clustered + nova-cloud-controller/1 active idle 1/lxd/2 10.246.114.61 8774/tcp,8778/tcp Unit is ready + nova-cloud-controller-hacluster/1 active idle 10.246.114.61 Unit is ready and clustered + nova-cloud-controller/2 active idle 0/lxd/4 10.246.114.62 8774/tcp,8778/tcp Unit is ready + nova-cloud-controller-hacluster/2 active idle 10.246.114.62 Unit is ready and clustered Physical machines ~~~~~~~~~~~~~~~~~ -This section covers series upgrade procedures for applications hosted on -physical machines in particular. These typically include: +This section looks at series upgrades from the standpoint of an individual +(physical) machine. This is different from looking at series upgrades from the +standpoint of applications that happen to be running on certain machines. + +Since the standard topology for Charmed OpenStack is to optimise +containerisation (with one service per container), a physical machine is +expected to directly host only those applications which cannot generally be +containerised. These notably include: * ceph-osd * neutron-gateway * nova-compute +Naturally, when the physical machine is rebooted all containerised applications +will also go offline. + +It is assumed that all affected services, as much as is possible, are under +HA. Note that a hypervisor (nova-compute) cannot be made highly available. + When performing a series upgrade on a physical machine more attention should be -given to any workload maintenance pre-upgrade steps: +accorded to workload maintenance pre-upgrade steps: * For compute nodes migrate all running VMs to another hypervisor. -* For network nodes force HA routers off of the current node. +* For network nodes migrate routers to another cloud node. * Any storage related tasks that may be required. * Any site specific tasks that may be required. -The following two sub-sections will show how to perform a series upgrade -for a single physical machine and for multiple physical machines concurrently. +The following two sub-sections will examine series upgrades for a single +physical machine and, concurrently, for multiple physical machines. Upgrading a single physical machine ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -This example procedure will be based on the nova-compute and ceph-osd -applications residing on the same physical machine. Since application -leadership does not play a significant role with these two applications, and -because the hacluster application is not present, there will be no units to -pause (as there were in previous scenarios). - This scenario is represented by the following partial :command:`juju status` command output: .. code-block:: console - Model Controller Cloud/Region Version SLA Timestamp - upgrade maas-controller mymaas/default 2.7.6 unsupported 15:23:21Z + App Version Status Scale Charm Store Channel Rev OS Message + ceph-mon 12.2.13 active 1 ceph-mon charmstore stable 483 ubuntu Unit is ready and clustered + ceph-osd 12.2.13 active 1 ceph-osd charmstore stable 502 ubuntu Unit is ready (1 OSD) + glance 16.0.1 active 1 glance charmstore stable 484 ubuntu Unit is ready + glance-hacluster active 0 hacluster charmstore stable 81 ubuntu Unit is ready and clustered + nova-cloud-controller 17.0.13 active 1 nova-cloud-controller charmstore stable 555 ubuntu Unit is ready + nova-cloud-controller-hacluster active 0 hacluster charmstore stable 81 ubuntu Unit is ready and clustered + nova-compute 17.0.13 active 1 nova-compute charmstore stable 578 ubuntu Unit is ready - App Version Status Scale Charm Store Rev OS Notes - ceph-osd 12.2.12 active 1 ceph-osd jujucharms 301 ubuntu - keystone 13.0.2 active 1 keystone jujucharms 312 ubuntu - nova-compute 17.0.12 active 1 nova-compute jujucharms 314 ubuntu + Unit Workload Agent Machine Public address Ports Message + ceph-mon/1 active idle 1/lxd/0 10.246.114.56 Unit is ready and clustered + ceph-osd/1 active idle 1 10.246.114.22 Unit is ready (1 OSD) + glance/2 active idle 1/lxd/4 10.246.114.65 9292/tcp Unit is ready + glance-hacluster/1 active idle 10.246.114.65 Unit is ready and clustered + nova-cloud-controller/1 active idle 1/lxd/2 10.246.114.61 8774/tcp,8778/tcp Unit is ready + nova-cloud-controller-hacluster/1 active idle 10.246.114.61 Unit is ready and clustered + nova-compute/0* active idle 1 10.246.114.22 Unit is ready + neutron-openvswitch/0* active idle 10.246.114.22 Unit is ready - Unit Workload Agent Machine Public address Ports Message - ceph-osd/0* active idle 0 10.0.0.235 Unit is ready (1 OSD) - keystone/0* active idle 0/lxd/0 10.0.0.240 5000/tcp Unit is ready - nova-compute/0* active idle 0 10.0.0.235 Unit is ready + Machine State DNS Inst id Series AZ Message + 1 started 10.246.114.22 node-fontana xenial default Deployed + 1/lxd/0 started 10.246.114.56 juju-0642e9-1-lxd-0 bionic default series upgrade completed: success + 1/lxd/2 started 10.246.114.61 juju-0642e9-1-lxd-2 bionic default series upgrade completed: success + 1/lxd/4 started 10.246.114.65 juju-0642e9-1-lxd-4 bionic default series upgrade completed: success - Machine State DNS Inst id Series AZ Message - 0 started 10.0.0.235 node1 xenial zone1 Deployed - 0/lxd/0 started 10.0.0.240 juju-88b27a-0-lxd-0 xenial zone1 Container started +As is evidenced by the noted series for each Juju machine, only the physical +machine remains to have its series upgraded. This example procedure will +therefore involve the nova-compute and ceph-osd applications. Note however that +the nova-compute application is coupled with the neutron-openvswitch +subordinate application. -In summary, the ceph-osd and nova-compute applications are hosted on machine 0. -Recall that container 0/lxd/0 will need to have its series upgraded separately. +Discarding those applications whose machines have already been upgraded we +arrive at the following output: -#. It is recommended to set the Ceph cluster OSDs to 'noout'. This is typically - done at the application level (i.e. not at the unit or machine level): +.. code-block:: console + + App Version Status Scale Charm Store Channel Rev OS Message + ceph-osd 12.2.13 active 1 ceph-osd charmstore stable 502 ubuntu Unit is ready (1 OSD) + neutron-openvswitch 12.1.1 active 0 neutron-openvswitch charmstore stable 454 ubuntu Unit is ready + nova-compute 17.0.13 active 1 nova-compute charmstore stable 578 ubuntu Unit is ready + + Unit Workload Agent Machine Public address Ports Message + ceph-osd/1 active idle 1 10.246.114.22 Unit is ready (1 OSD) + nova-compute/0* active idle 1 10.246.114.22 Unit is ready + neutron-openvswitch/0* active idle 10.246.114.22 Unit is ready + +In summary, the ceph-osd and nova-compute applications are hosted on machine 1. +Since application leadership does not play a significant role with these two +applications, and because the hacluster application is not present, there will +be no units to pause. + +.. important:: + + As was the case for the upgrade procedure involving the ceph-mon + application, during the upgrade involving ceph-osd, there will NOT be a Ceph + service outage. + +#. It is recommended to set the Ceph cluster OSDs to 'noout' to prevent the + rebalancing of data. This is typically done at the application level (i.e. + not at the unit or machine level): .. code-block:: none juju run-action --wait ceph-mon/leader set-noout -#. All running VMs should be migrated to another hypervisor. +#. Perform any workload maintenance pre-upgrade steps. -#. Upgrade the series on machine 0: + All running VMs should be migrated to another hypervisor. See cloud + operation `Live migrate VMs from a running compute node`_. - #. Invoke the :command:`prepare` sub-command: +#. Perform a series upgrade of the machine: - .. code-block:: none + .. code-block:: none - juju upgrade-series 0 prepare bionic + juju upgrade-series 1 prepare bionic + juju ssh 1 sudo apt update + juju ssh 1 sudo apt full-upgrade + juju ssh 1 sudo do-release-upgrade # and reboot + juju upgrade-series 1 complete - #. Upgrade the operating system: +#. Perform any remaining post-upgrade tasks. - .. code-block:: none - - juju run --machine=0 -- sudo apt update - juju ssh 0 sudo apt full-upgrade - juju ssh 0 sudo do-release-upgrade - - #. Reboot (if not already done): - - .. code-block:: none - - juju run --machine=0 -- sudo reboot - - #. Set the value of the ``openstack-origin`` or ``source`` configuration - options to 'distro': - - .. code-block:: none - - juju config nova-compute openstack-origin=distro - juju config ceph-osd source=distro - - #. Invoke the :command:`complete` sub-command on the machine: - - .. code-block:: none - - juju upgrade-series 0 complete - -#. If OSDs were previously set to 'noout' then check up/in status of those - OSDs in ceph status, then unset 'noout' for the cluster: + If OSDs were previously set to 'noout' then verify the up/in status of the + OSDs and then unset 'noout' for the cluster: .. code-block:: none juju run --unit ceph-mon/leader -- ceph status juju run-action --wait ceph-mon/leader unset-noout +#. Update the software sources for the machine. + + .. caution:: + + As was done in previous procedures, only set software sources once all + machines for the associated applications have had their series upgraded. + + For the principal applications ceph-osd and nova-compute, set the + appropriate configuration option to 'distro': + + .. code-block:: none + + juju config nova-compute openstack-origin=distro + juju config ceph-osd source=distro + + .. note:: + + Although updating the software sources more than once on the same machine + may appear redundant it is recommended to do so. + Upgrading multiple physical hosts concurrently ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -769,7 +1080,6 @@ This simplified bundle is used to demonstrate the general idea: - 4 keystone: charm: cs:keystone - constraints: mem=1G num_units: 3 options: vip: 10.85.132.200 @@ -800,7 +1110,7 @@ ensuring the availability of services. same placement group is hosted on those machines. For example, if ceph-mon is deployed with ``customize-failure-domain`` set to 'true' and the ceph-osd units are hosted on machines in three or more separate Juju AZs you can - safely reboot ceph-osd machines concurrently in the same zone. See + safely reboot ceph-osd machines simultaneously in the same zone. See :ref:`Ceph AZ ` in :doc:`Infrastructure high availability ` for details. @@ -836,3 +1146,4 @@ and test the series upgrade primitives: .. _mysql-innodb-cluster: https://jaas.ai/mysql-innodb-cluster .. _mysql-router: https://jaas.ai/mysql-router .. _percona-cluster charm - series upgrade to focal: percona-series-upgrade-to-focal.html +.. _Live migrate VMs from a running compute node: https://docs.openstack.org/charm-guide/latest/admin/ops-live-migrate-vms.html