David Ames 24d97adb6f Re-order series upgrade command arguments

The argument list for juju upgrade-series was reordered. Update the docs
to reflect this.

Change-Id: I6cac0f7c427b9dc8e326ff14d5cd70f63eab52ae

2018-11-13 16:32:59 -08:00

16 KiB

Raw Blame History

Appendix F: Series Upgrade

Introduction

Juju and OpenStack charms provide the primitives to prepare for and respond to an upgrade from one Ubuntu LTS series to another.

Warnings

Upgrading a single machine from one LTS to another is a complex task. Doing so on a running OpenStack cloud is an order of magnitude more complex.

Please read through this document thoroughly before attempting a series upgrade. Please pay particular attention to the Assumptions section and the order of operations.

The series upgrade should be executed by an administrator or team of administrators who are intimately familiar with the cloud undergoing upgrade, OpenStack in general, working with Juju and OpenStack charms.

The tasks of preparing stateful OpenStack services for series upgrade is not automated and is the responsibility of the administrator. For example: evacuating a compute node, switching HA routers to a network node, any storage rebalancing that may be required.

The actual task of executing the do-release-upgrade on an individual machine is not automated. It will be performed by the administrator. Any bespoke preparation for or cleanup after the do-release-upgrade is the responsibility of the administrator.

The series upgrade process requires API downtime. Although the goal is minimal downtime, it is necessary to pause services to avoid race condition errors. Therefore, the API undergoing upgrade will require downtime.

Stateful services which OpenStack depends on such as percona-cluster and rabbitmq will affect all APIs during series upgrade and therefore require downtime.

Third party charms may not have implemented series upgrade yet. Please pay particular attention to SDN and storage charms which may affect cloud operation.

If the architecture and layout of charms does not match the assumptions section of this document, great care needs to be taken to avoid problems with application leadership across machines. In other words, if most services are not in LXD containers, it is possible to have the leader of percona-cluster on one host and the leader of rabbit on another causing complication's in the procure for series upgrade.

Test, test, test! The series upgrade process should be tested on a non-production cloud that closely resembles the eventual production environment. Not only does this validate the software involved but it prepares the administrator for the complex task ahead.

Juju

Please read all Juju documentation on the series upgrade feature.

https://docs.jujucharms.com/devel/en/getting-started

Note

The Juju upgrade-series command operates on the machine level. This document will be focused on applications as many require pausing their peers and some subordinates. But it is important to remember the whole machine is upgraded.

Applications deployed in a LXD container are considered a machine apart from the physical host machine the container is hosted on.

Upgrading the host machine will not upgrade the LXD contained machines. However, when the required post-upgrade reboot of the host machine occurs all the services contained in LXD containers will be unavailable during the reboot.

For example a physical host with nova-compute, neutron-openvswitch and ceph-osd colocated as well as hosting a keystone unit in a LXD. When the juju upgrade-series prepare command is executed on the machine, nova-compute, neutron-openvswitch and ceph-osd will execute their pre-series-upgrade hooks but keystone will not. Nor will the LXD operating system be affected by the do-release-upgrade on the host. At reboot however, the keystone unit will be unavailable during the duration of the reboot. Please plan accordingly.

Assumptions

This document makes a number of assumptions about the architecture and preparation of the cloud undergoing series upgrade. Please review these and compare to the running cloud before performing the series upgrade.

Preparations

Charms are upgraded to the latest release.

OpenStack is upgraded to the highest version the current LTS supports. Mitaka for Trusty and Queens for Xenial.

The current Ubuntu operating system is up to date prior to do-release-upgrade.

Stateful services have been backed up. Percona-cluster and mongodb should be backed up prior to upgrading.

General cloud health. Confirm the cloud is fully operational before beginning a series upgrade.

OpenStack charms health. No charms are in hook error. Confirm the health of the juju environment before beginning series upgrade.

Per machine preparations. Individual compute nodes are evacuated prior to series upgrade. HA routers are moved to network nodes not undergoing series upgrade.

Hyper-Converged Architecture

Compute, storage and their subordinates may be colocated.

API Services are deployed in LXD containers.

Percona-cluster is deployed in a LXD container.

Rabbitmq is deployed in a LXD container.

Third party charms either do not exist or have been thoroughly tested for series upgrade.

No other non-subordinate charms are colocated on the same machine.

Overview

Note

This overview is not a substitute for understanding the entirety of this document. It is the general case but the individual details matter. Read "where appropriate" at the end of each step.

Evacuate or otherwise prepare the machine

Pause hacluster for non-leader units not undergoing upgrade

Pause non-leader peer units not undergoing upgrade

Juju upgrade-series prepare the leader's machine

Execute do-release-upgrade and any post-upgrade operating system tasks

Reboot

Set openstack-origin or source for new operating system ("distro")

Juju upgrade-series complete the machine

Repeat the steps from prepare to complete for the non-leader machines

Perform any cluster completed upgrade tasks after all units of application have been upgraded.

Juju set-series to the new series for all future units of an application.

Exceptions

This overview describes the general case that includes the API charms, percona culster and rabbitmq.

The notable exceptions are nova-compute, ceph-mon and ceph-osd which do not require pausing of any units and unit leadership is irrelevant.

Example as code

Attempting an automated series upgrade on a running production cloud is not recommended. The following example-as-code encapsulates the processes described in this document, and are provided solely to illustrate the methods used to develop and test the series upgrade primitives. The example code should not be consumed in an automation outside of its intended use case (charm dev/test gate automation).

https://github.com/openstack-charmers/zaza/blob/master/zaza/charm_tests/series_upgrade/tests.py

https://github.com/openstack-charmers/zaza/blob/master/zaza/utilities/generic.py#L173

Procedures

The following procures are broken up into categories of charms that follow the same procedure.

Note

Example commands used in this documentation assume a Trusty to Xenial series upgrade, the same approach is used for Xenial to Bionic series upgrades. Unit and machine numbers are examples only they will differ from site to site. For example the machine number 0 is reused purely for example purposes.

Physical Host Nodes

Procedure for the physical host nodes which may include nova-compute, neutron-openvswitch and ceph-osd as well as neutron-gateway. Though ceph-mon is most often deployed in LXD containers it follows this procedure.

Note

Nova-compute and ceph-osd are used in the commands below for example purposes. In this example, physical host where nova-compute/0 and ceph-osd/0 are deployed is machine 0.

Evacuate or otherwise prepare the machine

For compute nodes move all running VMs off the physical host. For network nodes force HA routers off of the current node. Any storage related tasks that may be required. Any site specific tasks that may be required.

Juju upgrade-series prepare the machine

juju upgrade-series 0 prepare xenial

Note

The upgrade-series prepare command causes all the charms on the given machine to run their pre-series-upgrade hook. For most cases with the OpenStack charms this pauses the unit. At the completion of the pre-series-upgrade hook the workload status should be "blocked" with the message "Ready for do-release-upgrade and reboot."

Execute do-release-upgrade and any post-upgrade operating system tasks

The do-release-upgrade process is performed by the administrator. Any post do-release-upgrade tasks are also the responsibility of the administrator.

Reboot

Post do-release-upgrade reboot executed by the administrator.

Set openstack-origin or source for new operating system ("distro")

This step is required and should occur before the first node is completed.

juju config nova-compute openstack-origin=distro
juju config ceph-osd source=distro

Juju upgrade-series complete the machine

juju upgrade-series 0 complete

Note

The upgrade-series complete command causes all the charms on the given machine to run their post-series-upgrade hook. For most cases with the OpenStack charms this re-writes configuration files and resumes the unit. At the completion of the post-series-upgrade hook the workload status should be "active" with the message "Unit is ready."

Juju set-series to the new series for all future units of an application.

To guarantee that any future unit-add commands create new instantiations of the application on the correct series it is necessary to set the series on the application.

juju set-series nova-compute xenial
juju set-series neutron-openvswitch xenial
juju set-series ceph-osd xenial

Repeat the procedure for all physical host nodes.

It is not necessary to repeat the set openstack-origin step.

Stateful Services

Procedure for the stateful services deployed on LXD containers. These include percona-cluster and rabbitmq.

Note

While percona-cluster is often deployed with hacluster for HA, rabbitmq is not. Ignore the hacluster steps for rabbitmq. Likewise no backup is required of rabbitmq. Percona-cluster is used below for example purposes. In this example, the LXD container the leader node of percona-cluster/0 is deployed on is machine 0.

Prepare the machine

Perform backups of percona-cluster and scp the backup to a secure location.

juju run-action percona-cluster/0 backup
juju scp -- -r percona-cluster/0:/opt/backups/mysql /path/to/local/backup/dir

Pause hacluster for non-leader units not undergoing upgrade

juju run-action percona-cluster-hacluster/1 pause
juju run-action percona-cluster-hacluster/2 pause

Pause non-leader peer units not undergoing upgrade

juju run-action percona-cluster/1 pause
juju run-action percona-cluster/2 pause

Juju upgrade-series prepare the leader's machine

juju upgrade-series 0 prepare xenial

Note

Execute do-release-upgrade and any post-upgrade operating system tasks

The do-release-upgrade process is performed by the administrator. Any post do-release-upgrade tasks are also the responsibility of the administrator.

Reboot

Post do-release-upgrade reboot executed by the administrator.

Set openstack-origin or source for new operating system ("distro")

This step is required and should occur before the first node is completed but after the other units are paused.

juju config percona-cluster source=distro

Juju upgrade-series complete the machine

juju upgrade-series 0 complete

Note

Repeat the procedure for non-leader nodes

It is not necessary to repeat the set openstack-origin step.

Perform any cluster completed upgrade tasks after all units of application have been upgraded.

Run the complete-cluster-series-upgrade action on the leader node. This action informs each node of the cluster the upgrade process is complete cluster wide. This also updates mysql configuration with all peers in the cluster.

juju run-action percona-cluster/0 complete-cluster-series-upgrade

Juju set-series to the new series for all future units of an application.

To guarantee that any future unit-add commands create new instantiations of the application on the correct series it is necessary to set the series on the application.

juju set-series percona-cluster xenial

API Services

Procedure for the API services in LXD containers. These include but are not limited to keystone, glance, cinder, neutron-api and nova-cloud-controller. Any subordinates deployed with these applications will be upgraded at the same time.

Note

Keystone is used in the commands below for example purposes. In this example, the LXD container the leader node of keystone/0 is deployed on is machine 0.

Pause hacluster for non-leader units not undergoing upgrade

juju run-action keystone-hacluster/1 pause
juju run-action keystone-hacluster/2 pause

Pause non-leader peer units not undergoing upgrade

juju run-action keystone/1 pause
juju run-action keystone/2 pause

Juju upgrade-series prepare the leader's machine

juju upgrade-series 0 prepare xenial

Note

Execute do-release-upgrade and any post-upgrade operating system tasks

The do-release-upgrade process is performed by the administrator. Any post do-release-upgrade tasks are also the responsibility of the administrator.

Reboot

Post do-release-upgrade reboot executed by the administrator.

Set openstack-origin or source for new operating system ("distro")

This step is required and should occur before the first node is completed but after the other units are paused.

juju config keystone source=distro

Juju upgrade-series complete the machine

juju upgrade-series 0 complete

Note

Repeat the procedure for non-leader nodes

It is not necessary to repeat the set openstack-origin step.

Juju set-series to the new series for all future units of an application.

To guarantee that any future unit-add commands create new instantiations of the application on the correct series it is necessary to set the series on the application.

juju set-series keystone xenial

16 KiB Raw Blame History

Appendix F: Series Upgrade

Introduction

Warnings

Juju

Assumptions

Preparations

Hyper-Converged Architecture

Overview

Exceptions

Example as code

Procedures

Physical Host Nodes

Stateful Services

API Services

16 KiB

Raw Blame History