Merge "Rewrite series upgrade documentation"

This commit is contained in:
Zuul 2020-05-20 16:10:53 +00:00 committed by Gerrit Code Review
commit 67a997f45f
5 changed files with 1237 additions and 586 deletions

View File

@ -142,6 +142,8 @@ client-specified AZ during instance creation, one of these zones will be
scheduled. When 'true', and MAAS is the backing cloud, this option overrides
option ``default-availability-zone``.
.. _ceph_az:
Ceph AZ
^^^^^^^

View File

@ -0,0 +1,796 @@
=====================================
Appendix F2: Series upgrade OpenStack
=====================================
Overview
--------
This document will provide specific steps for how to perform a series upgrade
across the entirety of a Charmed OpenStack cloud.
.. warning::
This document is based upon the foundational knowledge and guidelines set
forth in the more general `Series upgrade`_ appendix. That reference must be
studied in-depth prior to attempting the steps outlined here. In particular,
ensure that the :ref:`Pre-upgrade requirements <pre-upgrade_requirements>`
are satisfied; the :ref:`Specific series upgrade procedures
<series_specific_procedures>` have been reviewed and considered; and that
:ref:`Workload specific preparations <workload_specific_preparations>` have
been addressed during planning.
Downtime
--------
Although the goal is to minimise downtime the series upgrade process across a
cloud will nonetheless result in some level of downtime for the control plane.
When the machines associated with stateful applications such as percona-cluster
and rabbitmq-server undergo a series upgrade all cloud APIs will experience
downtime, in addition to the stateful application itself.
When machines associated with a single API application undergo a series upgrade
that individual API will also experience downtime. This is because it is
necessary to pause services in order to avoid race condition errors.
For those applications working in tandem with hacluster, as will be shown, some
hacluster units will need to be paused before the upgrade. One should assume
that the commencement of an outage coincides with this step (it will cause
cluster quorum heartbeats to fail and the service VIP will consequently go
offline).
Reference cloud topology
------------------------
This section describes a hyperconverged cloud topology that this document will
use for the procedural steps to follow. Hyperconvergence refers to the practice
of co-locating principle applications on the same machine.
The topology is defined in this way:
* Only compute and storage charms (and their subordinates) may be co-located.
* Third-party charms either do not exist or have been thoroughly tested
for a series upgrade.
* The following are containerised:
* All API applications
* The percona-cluster application
* The rabbitmq-server application
* The ceph-mon application
Storage charms are charms that manage physical disks. For example, ceph-osd and
swift-storage. Example OpenStack subordinate charms are networking SDN charms
for the nova-compute charm, or monitoring charms for compute or storage charms.
.. caution::
If your cloud differs from this topology you must adapt the procedural steps
accordingly. In particular, look at the aspects of co-located applications
and containerised applications. Recall that:
* the :command:`upgrade-series` command:
* affects all applications residing on the target machine
* does not affect containers hosted on the target machine
* an application's leader should be upgraded before its non-leaders
Generalised OpenStack series upgrade
------------------------------------
This section will summarise the series upgrade steps in the context of specific
OpenStack applications. It is an enhancement of the :ref:`Generic series
upgrade <generic_series_upgrade>` section in the companion document.
Applications for which this summary does **not** apply include:
* nova-compute
* ceph-mon
* ceph-osd
This is because the above applications do not require the pausing of units and
application leadership is irrelevant for them.
However, this summary does apply to all API applications (e.g. neutron-api,
keystone, nova-cloud-controller), as well as percona-cluster, and
rabbitmq-server.
.. important::
The first machine to be upgraded is always associated with the leader of the
principle application. Let this machine be called the "principle leader
machine" and its unit be called the "principle leader unit".
The steps are as follows:
#. Set the default series for the principle application and ensure the same has
been done to the model.
#. If hacluster is used, pause the hacluster units not associated with the
principle leader machine.
#. Pause the principle non-leader units.
#. Perform a series upgrade on the principle leader machine.
#. Perform any pre-upgrade workload maintenance tasks.
#. Invoke the :command:`prepare` sub-command.
#. Upgrade the operating system (APT commands).
#. Perform any post-upgrade workload maintenance tasks.
#. Reboot.
#. Set the value of the (application-dependent) ``openstack-origin`` or the
``source`` configuration option to 'distro' (new operating system).
#. Invoke the :command:`complete` sub-command on the principle leader machine.
#. Repeat steps 4 and 6 for the application non-leader machines.
#. Perform any possible cluster completed upgrade tasks once all machines have
had their series upgraded.
.. note::
Here is a non-extensive list of the most common post-upgrade tasks for
OpenStack and supporting charms:
* percona-cluster: run action ``complete-cluster-series-upgrade`` on the
leader unit.
* rabbitmq-server: run action ``complete-cluster-series-upgrade`` on the
leader unit.
* ceilometer: run action ``ceilometer-upgrade`` on the leader unit.
* vault: Each vault unit will need to be unsealed after its machine is
rebooted.
Procedures
----------
The procedures are categorised based on application types. The example scenario
used throughout is a 'xenial' to 'bionic' series upgrade, within an OpenStack
release of Queens (i.e. the starting point is a cloud archive pocket of
'xenial-queens').
Stateful applications
~~~~~~~~~~~~~~~~~~~~~
This section covers the series upgrade procedure for containerised stateful
applications. These include:
* ceph-mon
* percona-cluster
* rabbitmq-server
A stateful application is one that maintains the state of various aspects of
the cloud. Clustered stateful applications, such as all the ones given above,
also require a quorum to function properly. Because of these reasons a stateful
application should not have all of its units restarted simultaneously; it must
have the series of its corresponding machines upgraded sequentially.
.. note::
The concurrent upgrade approach is theoretically possible, although to use
it all cloud workloads will need to be stopped in order to ensure
consistency. This is not recommended.
The example procedure will be based on the percona-cluster application.
.. important::
Unlike percona-cluster, the ceph-mon and rabbitmq-server applications do not
use hacluster to achieve HA, nor do they need backups. Disregard therefore
the hacluster and backup steps for these two applications.
The ceph-mon charm will maintain the MON cluster during a series upgrade, so
ceph-mon units do not need to be paused.
This scenario is represented by the following partial :command:`juju status`
command output:
.. code-block:: console
Model Controller Cloud/Region Version SLA Timestamp
upgrade maas-controller mymaas/default 2.7.6 unsupported 18:26:57Z
App Version Status Scale Charm Store Rev OS Notes
percona-cluster 5.6.37 active 3 percona-cluster jujucharms 286 ubuntu
percona-cluster-hacluster active 3 hacluster jujucharms 66 ubuntu
Unit Workload Agent Machine Public address Ports Message
percona-cluster/0 active idle 0/lxd/0 10.0.0.47 3306/tcp Unit is ready
percona-cluster-hacluster/0* active idle 10.0.0.47 Unit is ready and clustered
percona-cluster/1* active idle 1/lxd/0 10.0.0.48 3306/tcp Unit is ready
percona-cluster-hacluster/2 active idle 10.0.0.48 Unit is ready and clustered
percona-cluster/2 active idle 2/lxd/0 10.0.0.49 3306/tcp Unit is ready
percona-cluster-hacluster/1 active idle 10.0.0.49 Unit is ready and clustered
In summary, the principle leader unit is percona-cluster/1 and is deployed on
machine 1/lxd/0 (the principle leader machine).
.. warning::
During this upgrade, there will be a MySQL service outage. The HA resources
provided by hacluster will **not** be monitored during the series upgrade
due to the pausing of units.
#. Perform a backup of percona-cluster and transfer it to a secure location:
.. code-block:: none
juju run-action --wait percona-cluster/1 backup
juju scp -- -r percona-cluster/1:/opt/backups/mysql /path/to/local/directory
Permissions will need to be altered on the remote machine, and note that the
last command transfers **all** existing backups.
#. Set the default series for both the model and the principle application:
.. code-block:: none
juju model-config default-series=bionic
juju set-series percona-cluster bionic
#. Pause the hacluster units not associated with the principle leader machine:
.. code-block:: none
juju run-action --wait percona-cluster-hacluster/0 pause
juju run-action --wait percona-cluster-hacluster/1 pause
#. Pause the principle non-leader units:
.. code-block:: none
juju run-action --wait percona-cluster/0 pause
juju run-action --wait percona-cluster/2 pause
#. Perform a series upgrade on the principle leader machine:
.. code-block:: none
# Perform any workload maintenance pre-upgrade steps here
juju upgrade-series 1/lxd/0 prepare bionic
juju run --machine=1/lxd/0 -- sudo apt update
juju ssh 1/lxd/0 sudo apt full-upgrade
juju ssh 1/lxd/0 sudo do-release-upgrade
# Perform any workload maintenance post-upgrade steps here
There are no pre-upgrade nor post-upgrade workload maintenance steps to
perform; the prompt to reboot can be answered in the affirmative.
#. Set the value of the ``source`` configuration option to 'distro':
.. code-block:: none
juju config percona-cluster source=distro
#. Invoke the :command:`complete` sub-command on the principle leader machine:
.. code-block:: none
juju upgrade-series 1/lxd/0 complete
At this point the :command:`juju status` output looks like this:
.. code-block:: console
Model Controller Cloud/Region Version SLA Timestamp
upgrade maas-controller mymaas/default 2.7.6 unsupported 19:51:52Z
App Version Status Scale Charm Store Rev OS Notes
percona-cluster 5.7.20 maintenance 3 percona-cluster jujucharms 286 ubuntu
percona-cluster-hacluster blocked 3 hacluster jujucharms 66 ubuntu
Unit Workload Agent Machine Public address Ports Message
percona-cluster/0 maintenance idle 0/lxd/0 10.0.0.47 3306/tcp Paused. Use 'resume' action to resume normal service.
percona-cluster-hacluster/0* maintenance idle 10.0.0.47 Paused. Use 'resume' action to resume normal service.
percona-cluster/1* active idle 1/lxd/0 10.0.0.48 3306/tcp Unit is ready
percona-cluster-hacluster/2 blocked idle 10.0.0.48 Resource: res_mysql_11810cc_vip not running
percona-cluster/2 maintenance idle 2/lxd/0 10.0.0.49 3306/tcp Paused. Use 'resume' action to resume normal service.
percona-cluster-hacluster/1 maintenance idle 10.0.0.49 Paused. Use 'resume' action to resume normal service.
Machine State DNS Inst id Series AZ Message
0 started 10.0.0.44 node1 xenial zone1 Deployed
0/lxd/0 started 10.0.0.47 juju-f83fcd-0-lxd-0 xenial zone1 Container started
1 started 10.0.0.45 node2 xenial zone2 Deployed
1/lxd/0 started 10.0.0.48 juju-f83fcd-1-lxd-0 bionic zone2 Running
2 started 10.0.0.46 node3 xenial zone3 Deployed
2/lxd/0 started 10.0.0.49 juju-f83fcd-2-lxd-0 xenial zone3 Container started
#. Repeat steps 5 and 7 for the principle non-leader machines.
#. Perform any possible cluster completed upgrade tasks once all machines have
had their series upgraded:
.. code-block:: none
juju run-action --wait percona-cluster/leader complete-cluster-series-upgrade
For percona-cluster (and rabbitmq-server), the above action is performed on
the leader unit. It informs each cluster node that the upgrade process is
complete cluster-wide. This also updates MySQL configuration with all peers
in the cluster.
API applications
~~~~~~~~~~~~~~~~
This section covers series upgrade procedures for containerised API
applications. These include, but are not limited to:
* cinder
* glance
* keystone
* neutron-api
* nova-cloud-controller
Machines hosting API applications can have their series upgraded concurrently
because those applications are stateless. This results in a dramatically
reduced downtime for the application. A sequential approach will not reduce
downtime as the HA services will still need to be brought down during the
upgrade associated with the application leader.
The following two sub-sections will show how to perform a series upgrade
concurrently for a single API application and for multiple API applications.
Upgrading a single API application concurrently
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This example procedure will be based on the keystone application.
This scenario is represented by the following partial :command:`juju status`
command output:
.. code-block:: console
Model Controller Cloud/Region Version SLA Timestamp
upgrade maas-controller mymaas/default 2.7.6 unsupported 22:48:41Z
App Version Status Scale Charm Store Rev OS Notes
keystone 13.0.2 active 3 keystone jujucharms 312 ubuntu
keystone-hacluster active 3 hacluster jujucharms 66 ubuntu
Unit Workload Agent Machine Public address Ports Message
keystone/0* active idle 0/lxd/0 10.0.0.70 5000/tcp Unit is ready
keystone-hacluster/0* active idle 10.0.0.70 Unit is ready and clustered
keystone/1 active idle 1/lxd/0 10.0.0.71 5000/tcp Unit is ready
keystone-hacluster/2 active idle 10.0.0.71 Unit is ready and clustered
keystone/2 active idle 2/lxd/0 10.0.0.72 5000/tcp Unit is ready
keystone-hacluster/1 active idle 10.0.0.72 Unit is ready and clustered
In summary, the principle leader unit is keystone/0 and is deployed on machine
0/lxd/0 (the principle leader machine).
#. Set the default series for both the model and the principle application:
.. code-block:: none
juju model-config default-series=bionic
juju set-series keystone bionic
#. Pause the hacluster units not associated with the principle leader machine:
.. code-block:: none
juju run-action --wait keystone-hacluster/1 pause
juju run-action --wait keystone-hacluster/2 pause
#. Pause the principle non-leader units:
.. code-block:: none
juju run-action --wait keystone/1 pause
juju run-action --wait keystone/2 pause
#. Perform any workload maintenance pre-upgrade steps on all machines. There
are no keystone-specific steps to perform.
#. Invoke the :command:`prepare` sub-command on all machines, **starting with
the principle leader machine**:
.. code-block:: none
juju upgrade-series 0/lxd/0 prepare bionic
juju upgrade-series 1/lxd/0 prepare bionic
juju upgrade-series 2/lxd/0 prepare bionic
At this point the :command:`juju status` output looks like this:
.. code-block:: console
Model Controller Cloud/Region Version SLA Timestamp
upgrade maas-controller mymaas/default 2.7.6 unsupported 23:11:01Z
App Version Status Scale Charm Store Rev OS Notes
keystone 13.0.2 blocked 3 keystone jujucharms 312 ubuntu
keystone-hacluster blocked 3 hacluster jujucharms 66 ubuntu
Unit Workload Agent Machine Public address Ports Message
keystone/0* blocked idle 0/lxd/0 10.0.0.70 5000/tcp Ready for do-release-upgrade and reboot. Set complete when finished.
keystone-hacluster/0* blocked idle 10.0.0.70 Ready for do-release-upgrade. Set complete when finished
keystone/1 blocked idle 1/lxd/0 10.0.0.71 5000/tcp Ready for do-release-upgrade and reboot. Set complete when finished.
keystone-hacluster/2 blocked idle 10.0.0.71 Ready for do-release-upgrade. Set complete when finished
keystone/2 blocked idle 2/lxd/0 10.0.0.72 5000/tcp Ready for do-release-upgrade and reboot. Set complete when finished.
keystone-hacluster/1 blocked idle 10.0.0.72 Ready for do-release-upgrade. Set complete when finished
#. Upgrade the operating system on all machines. The non-interactive method is
used here:
.. code-block:: none
juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0 --timeout=10m \
-- sudo apt-get update
juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0 --timeout=60m \
-- sudo DEBIAN_FRONTEND=noninteractive apt-get --assume-yes \
-o "Dpkg::Options::=--force-confdef" \
-o "Dpkg::Options::=--force-confold" dist-upgrade
juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0 --timeout=120m \
-- sudo DEBIAN_FRONTEND=noninteractive \
do-release-upgrade -f DistUpgradeViewNonInteractive
.. important::
Choose values for the ``--timeout`` option that are appropriate for the
task at hand.
#. Perform any workload maintenance post-upgrade steps on all machines. There
are no keystone-specific steps to perform.
#. Reboot all machines:
.. code-block:: none
juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0 -- sudo reboot
#. Set the value of the ``openstack-origin`` configuration option to 'distro':
.. code-block:: none
juju config keystone openstack-origin=distro
#. Invoke the :command:`complete` sub-command on all machines:
.. code-block:: none
juju upgrade-series 0/lxd/0 complete
juju upgrade-series 1/lxd/0 complete
juju upgrade-series 2/lxd/0 complete
Upgrading multiple API applications concurrently
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This example procedure will be based on the nova-cloud-controller and glance
applications.
This scenario is represented by the following partial :command:`juju status`
command output:
.. code-block:: console
Model Controller Cloud/Region Version SLA Timestamp
upgrade maas-controller mymaas/default 2.7.6 unsupported 19:23:41Z
App Version Status Scale Charm Store Rev OS Notes
glance 16.0.1 active 3 glance jujucharms 295 ubuntu
glance-hacluster active 3 hacluster jujucharms 66 ubuntu
nova-cc-hacluster active 3 hacluster jujucharms 66 ubuntu
nova-cloud-controller 17.0.12 active 3 nova-cloud-controller jujucharms 343 ubuntu
Unit Workload Agent Machine Public address Ports Message
glance/0* active idle 0/lxd/0 10.246.114.39 9292/tcp Unit is ready
glance-hacluster/0* active idle 10.246.114.39 Unit is ready and clustered
glance/1 active idle 1/lxd/0 10.246.114.40 9292/tcp Unit is ready
glance-hacluster/1 active idle 10.246.114.40 Unit is ready and clustered
glance/2 active idle 2/lxd/0 10.246.114.41 9292/tcp Unit is ready
glance-hacluster/2 active idle 10.246.114.41 Unit is ready and clustered
nova-cloud-controller/0 active idle 3/lxd/0 10.246.114.48 8774/tcp,8778/tcp Unit is ready
nova-cc-hacluster/2 active idle 10.246.114.48 Unit is ready and clustered
nova-cloud-controller/1* active idle 4/lxd/0 10.246.114.43 8774/tcp,8778/tcp Unit is ready
nova-cc-hacluster/0* active idle 10.246.114.43 Unit is ready and clustered
nova-cloud-controller/2 active idle 5/lxd/0 10.246.114.47 8774/tcp,8778/tcp Unit is ready
nova-cc-hacluster/1 active idle 10.246.114.47 Unit is ready and clustered
In summary,
* The glance principle leader unit is glance/0 and is deployed on machine
0/lxd/0 (the glance principle leader machine).
* The nova-cloud-controller principle leader unit is nova-cloud-controller/1
and is deployed on machine 4/lxd/0 (the nova-cloud-controller principle
leader machine).
The procedure has been expedited slightly by adding the ``--yes`` confirmation
option to the :command:`prepare` sub-command.
#. Set the default series for both the model and the principle applications:
.. code-block:: none
juju model-config default-series=bionic
juju set-series glance bionic
juju set-series nova-cloud-controller bionic
#. Pause the hacluster units not associated with their principle leader
machines:
.. code-block:: none
juju run-action --wait glance-hacluster/1 pause
juju run-action --wait glance-hacluster/2 pause
juju run-action --wait nova-cc-hacluster/1 pause
juju run-action --wait nova-cc-hacluster/2 pause
#. Pause the principle non-leader units:
.. code-block:: none
juju run-action --wait glance/1 pause
juju run-action --wait glance/2 pause
juju run-action --wait nova-cloud-controller/0 pause
juju run-action --wait nova-cloud-controller/2 pause
#. Perform any workload maintenance pre-upgrade steps on all machines. There
are no glance-specific or nova-cloud-controller-specific steps to perform.
#. Invoke the :command:`prepare` sub-command on all machines, **starting with
the principle leader machines**:
.. code-block:: none
juju upgrade-series --yes 0/lxd/0 prepare bionic
juju upgrade-series --yes 4/lxd/0 prepare bionic
juju upgrade-series --yes 1/lxd/0 prepare bionic
juju upgrade-series --yes 2/lxd/0 prepare bionic
juju upgrade-series --yes 3/lxd/0 prepare bionic
juju upgrade-series --yes 5/lxd/0 prepare bionic
#. Upgrade the operating system on all machines. The non-interactive method is
used here:
.. code-block:: none
juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0,3/lxd/0,4/lxd/0,5/lxd/0 \
--timeout=20m -- sudo apt-get update
juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0,3/lxd/0,4/lxd/0,5/lxd/0 \
--timeout=120m -- sudo DEBIAN_FRONTEND=noninteractive apt-get --assume-yes \
-o "Dpkg::Options::=--force-confdef" \
-o "Dpkg::Options::=--force-confold" dist-upgrade
juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0,3/lxd/0,4/lxd/0,5/lxd/0 \
--timeout=200m -- sudo DEBIAN_FRONTEND=noninteractive \
do-release-upgrade -f DistUpgradeViewNonInteractive
#. Perform any workload maintenance post-upgrade steps on all machines. There
are no glance-specific or nova-cloud-controller-specific steps to perform.
#. Reboot all machines:
.. code-block:: none
juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0,3/lxd/0,4/lxd/0,5/lxd/0 -- sudo reboot
#. Set the value of the ``openstack-origin`` configuration option to 'distro':
.. code-block:: none
juju config glance openstack-origin=distro
juju config nova-cloud-controller openstack-origin=distro
#. Invoke the :command:`complete` sub-command on all machines:
.. code-block:: none
juju upgrade-series 0/lxd/0 complete
juju upgrade-series 1/lxd/0 complete
juju upgrade-series 2/lxd/0 complete
juju upgrade-series 3/lxd/0 complete
juju upgrade-series 4/lxd/0 complete
juju upgrade-series 5/lxd/0 complete
Physical machines
~~~~~~~~~~~~~~~~~
This section covers series upgrade procedures for applications hosted on
physical machines in particular. These typically include:
* ceph-osd
* neutron-gateway
* nova-compute
When performing a series upgrade on a physical machine more attention should be
given to any workload maintenance pre-upgrade steps:
* For compute nodes migrate all running VMs to another hypervisor.
* For network nodes force HA routers off of the current node.
* Any storage related tasks that may be required.
* Any site specific tasks that may be required.
The following two sub-sections will show how to perform a series upgrade
for a single physical machine and for multiple physical machines concurrently.
Upgrading a single physical machine
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This example procedure will be based on the nova-compute and ceph-osd
applications residing on the same physical machine. Since application
leadership does not play a significant role with these two applications, and
because the hacluster application is not present, there will be no units to
pause (as there were in previous scenarios).
This scenario is represented by the following partial :command:`juju status`
command output:
.. code-block:: console
Model Controller Cloud/Region Version SLA Timestamp
upgrade maas-controller mymaas/default 2.7.6 unsupported 15:23:21Z
App Version Status Scale Charm Store Rev OS Notes
ceph-osd 12.2.12 active 1 ceph-osd jujucharms 301 ubuntu
keystone 13.0.2 active 1 keystone jujucharms 312 ubuntu
nova-compute 17.0.12 active 1 nova-compute jujucharms 314 ubuntu
Unit Workload Agent Machine Public address Ports Message
ceph-osd/0* active idle 0 10.0.0.235 Unit is ready (1 OSD)
keystone/0* active idle 0/lxd/0 10.0.0.240 5000/tcp Unit is ready
nova-compute/0* active idle 0 10.0.0.235 Unit is ready
Machine State DNS Inst id Series AZ Message
0 started 10.0.0.235 node1 xenial zone1 Deployed
0/lxd/0 started 10.0.0.240 juju-88b27a-0-lxd-0 xenial zone1 Container started
In summary, the ceph-osd and nova-compute applications are hosted on machine 0.
Recall that container 0/lxd/0 will need to have its series upgraded separately.
#. It is recommended to set the Ceph cluster OSDs to 'noout'. This is typically
done at the application level (i.e. not at the unit or machine level):
.. code-block:: none
juju run-action --wait ceph-mon/leader set-noout
#. All running VMs should be migrated to another hypervisor.
#. Upgrade the series on machine 0:
#. Invoke the :command:`prepare` sub-command:
.. code-block:: none
juju upgrade-series 0 prepare bionic
#. Upgrade the operating system:
.. code-block:: none
juju run --machine=0 -- sudo apt update
juju ssh 0 sudo apt full-upgrade
juju ssh 0 sudo do-release-upgrade
#. Reboot (if not already done):
.. code-block:: none
juju run --machine=0 -- sudo reboot
#. Set the value of the ``openstack-origin`` or ``source`` configuration
options to 'distro':
.. code-block:: none
juju config nova-compute openstack-origin=distro
juju config ceph-osd source=distro
#. Invoke the :command:`complete` sub-command on the machine:
.. code-block:: none
juju upgrade-series 0 complete
#. If OSDs were previously set to 'noout' then check up/in status of those
OSDs in ceph status, then unset 'noout' for the cluster:
.. code-block:: none
juju run --unit ceph-mon/leader -- ceph status
juju run-action --wait ceph-mon/leader unset-noout
Upgrading multiple physical hosts concurrently
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When physical machines have their series upgraded concurrently Availability
Zones need to be taken into account. Machines should be placed into upgrade
groups such that any API services running on them have a maximum of one unit
per group. This is to ensure API availability at the reboot stage.
This simplified bundle is used to demonstrate the general idea:
.. code-block:: yaml
series: xenial
machines:
0: {}
1: {}
2: {}
3: {}
4: {}
5: {}
applications:
nova-compute:
charm: cs:nova-compute
num_units: 3
options:
openstack-origin: cloud:xenial-queens
to:
- 0
- 2
- 4
keystone:
charm: cs:keystone
constraints: mem=1G
num_units: 3
options:
vip: 10.85.132.200
openstack-origin: cloud:xenial-queens
to:
- lxd:1
- lxd:3
- lxd:5
keystone-hacluster:
charm: cs:hacluster
options:
cluster_count: 3
Three upgrade groups could consist of the following machines:
#. Machines 0 and 1
#. Machines 2 and 3
#. Machines 4 and 5
In this way, a less time-consuming series upgrade can be performed while still
ensuring the availability of services.
.. caution::
For the ceph-osd application, ensure that rack-aware replication rules exist
in the CRUSH map if machines are being rebooted together. This is to prevent
significant interruption to running workloads from occurring if the
same placement group is hosted on those machines. For example, if ceph-mon
is deployed with ``customize-failure-domain`` set to 'true' and the ceph-osd
units are hosted on machines in three or more separate Juju AZs you can
safely reboot ceph-osd machines concurrently in the same zone. See
:ref:`Ceph AZ <ceph_az>` in :doc:`OpenStack high availability <app-ha>` for
details.
Automation
----------
Series upgrades across an OpenStack cloud can be time consuming, even when
using concurrent methods wherever possible. They can also be tedious and thus
susceptible to human error.
The following code examples encapsulate the processes described in this
document. They are provided solely to illustrate the methods used to develop
and test the series upgrade primitives:
* `Parallel tests`_: An example that is used as a functional verification of
a series upgrade in the OpenStack Charms project.
* `Upgrade helpers`_: A set of helpers used in the above upgrade example.
.. caution::
The example code should only be used for its intended use case of
development and testing. Do not attempt to automate a series upgrade on a
production cloud.
.. LINKS
.. _Charm upgrades: app-upgrade-openstack#charm-upgrades
.. _Series upgrade: app-series-upgrade
.. _Parallel tests: https://github.com/openstack-charmers/zaza-openstack-tests/blob/c492ecdcac3b2724833c347e978de97ea2e626d7/zaza/openstack/charm_tests/series_upgrade/parallel_tests.py#L64
.. _Upgrade helpers: https://github.com/openstack-charmers/zaza-openstack-tests/blob/9cec2efabe30fb0709bc098c48ec10bcb85cc9d4/zaza/openstack/utilities/parallel_series_upgrade.py

View File

@ -0,0 +1,182 @@
:orphan:
.. _series_upgrade_specific_procedures:
==================================
Specific series upgrade procedures
==================================
Overview
--------
This page describes procedures that may be required when performing a series
upgrade across a Charmed OpenStack cloud. They relate to specific cloud
workloads. Please read the more general :doc:`Series upgrade
<app-series-upgrade>` appendix before attempting any of the instructions given
here.
.. _percona_series_upgrade_to_focal:
percona-cluster charm: series upgrade to Focal
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In Ubuntu 20.04 LTS (Focal) the percona-xtradb-cluster-server package will no
longer be available. It has been replaced by mysql-server-8.0 and mysql-router
in Ubuntu main. Therefore, there is no way to series upgrade percona-cluster to
Focal. Instead the databases hosted by percona-cluster will need to be migrated
to mysql-innodb-cluster and mysql-router will need to be deployed as a
subordinate on the applications that use MySQL as a data store.
.. warning::
Since the DB affects most OpenStack services it is important to have a
sufficient downtime window. The following procedure is written in an attempt
to migrate one service at a time (i.e. keystone, glance, cinder, etc).
However, it may be more practical to migrate all databases at the same time
during an extended downtime window, as there may be unexpected
interdependencies between services.
.. note::
It is possible for percona-cluster to remain on Ubuntu 18.04 LTS while
the rest of the cloud migrates to Ubuntu 20.04 LTS. In fact, this state
will be one step of the migration process.
Procedure
^^^^^^^^^
* Leave all the percona-cluster machines on Bionic and upgrade the series of
the remaining machines in the cloud per this document.
* Deploy a mysql-innodb-cluster on Focal.
.. code-block:: none
juju deploy -n 3 mysql-innodb-cluster --series focal
* Deploy (but do not yet relate) an instance of mysql-router for every
application that requires a data store (i.e. every application that was
related to percona-cluster).
.. code-block:: none
juju deploy mysql-router cinder-mysql-router
juju deploy mysql-router glance-mysql-router
juju deploy mysql-router keystone-mysql-router
...
* Add relations between the mysql-router instances and the
mysql-innodb-cluster.
.. code-block:: none
juju add-relation cinder-mysql-router:db-router mysql-innodb-cluster:db-router
juju add-relation glance-mysql-router:db-router mysql-innodb-cluster:db-router
juju add-relation keystone-mysql-router:db-router mysql-innodb-cluster:db-router
...
On a per-application basis:
* Remove the relation between the application charm and the percona-cluster
charm. You can view existing relations with the :command:`juju status
percona-cluster --relations` command.
.. code-block:: none
juju remove-relation keystone:shared-db percona-cluster:shared-db
* Dump the existing database(s) from percona-cluster.
.. note::
In the following, the percona-cluster/0 and mysql-innodb-cluster/0 units
are used as examples. For percona, any unit of the application may be used,
though all the steps should use the same unit. For mysql-innodb-cluster,
the RW unit should be used. The RW unit of the mysql-innodb-cluster can be
determined from the :command:`juju status mysql-innodb-cluster` command.
* Allow Percona to dump databases. See `Percona strict mode`_ to understand
the implications of this setting.
.. code-block:: none
juju run-action --wait percona-cluster/0 set-pxc-strict-mode mode=MASTER
* Dump the specific application's database(s).
.. note::
Depending on downtime restrictions it is possible to dump all databases at
one time: run the ``mysqldump`` action without setting the ``databases``
parameter. Similarly, it is possible to import all the databases into
mysql-innodb-clulster from that single dump file.
.. note::
The database name may or may not match the application name. For example,
while keystone has a DB named keystone, openstack-dashboard has a database
named horizon. Some applications have multiple databases. Notably,
nova-cloud-controller which has at least: nova,nova_api,nova_cell0 and a
nova_cellN for each additional cell. See upstream documentation for the
respective application to determine the database name.
.. code-block:: none
# Single DB
juju run-action --wait percona-cluster/0 mysqldump databases=keystone
# Multiple DBs
juju run-action --wait percona-cluster/0 mysqldump databases=nova,nova_api,nova_cell0
* Return Percona enforcing strict mode. See `Percona strict mode`_ to
understand the implications of this setting.
.. code-block:: none
juju run-action --wait percona-cluster/0 set-pxc-strict-mode mode=ENFORCING
* Transfer the mysqldump file from the percona-cluster unit to the
mysql-innodb-cluster RW unit. The RW unit of the mysql-innodb-cluster can be
determined with :command:`juju status mysql-innodb-cluster`. Bellow we use
mysql-innodb-cluster/0 as an example.
.. code-block:: none
juju scp percona-cluster/0:/var/backups/mysql/mysqldump-keystone-<DATE>.gz .
juju scp mysqldump-keystone-<DATE>.gz mysql-innodb-cluster/0:/home/ubuntu
* Import the database(s) into mysql-innodb-cluster.
.. code-block:: none
juju run-action --wait mysql-innodb-cluster/0 restore-mysqldump dump-file=/home/ubuntu/mysqldump-keystone-<DATE>.gz
* Relate an instance of mysql-router for every application that requires a data
store (i.e. every application that needed percona-cluster):
.. code-block:: none
juju add-relation keystone:shared-db keystone-mysql-router:shared-db
* Repeat for remaining applications.
An overview of this process can be seen in the OpenStack charmer's team CI
`Zaza migration code`_.
Post-migration
^^^^^^^^^^^^^^
As noted above, it is possible to run the cloud with percona-cluster remaining
on Bionic indefinitely. Once all databases have been migrated to
mysql-innodb-cluster, all the databases have been backed up, and the cloud has
been verified to be in good working order the percona-cluster application (and
its probable hacluster subordinates) may be removed.
.. code-block:: none
juju remove-application percona-cluster-hacluster
juju remove-application percona-cluster
.. LINKS
.. _Zaza migration code: https://github.com/openstack-charmers/zaza-openstack-tests/blob/master/zaza/openstack/charm_tests/mysql/tests.py#L556
.. _Percona strict mode: https://www.percona.com/doc/percona-xtradb-cluster/LATEST/features/pxc-strict-mode.html

View File

@ -1,668 +1,338 @@
Appendix F: Series Upgrade
==============================
Introduction
++++++++++++
Juju and OpenStack charms provide the primitives to prepare for and
respond to an upgrade from one Ubuntu LTS series to another.
.. note::
The recommended best practice is that the Juju machines that comprise the
cloud should eventually all be running the same series (e.g. 'xenial' or
'bionic', but not a mix of the two).
Warnings
++++++++
Upgrading a single machine from one LTS to another is a complex task.
Doing so on a running OpenStack cloud is an order of magnitude more
complex.
Please read through this document thoroughly before attempting a series
upgrade. Please pay particular attention to the Assumptions section and
the order of operations.
The series upgrade should be executed by an administrator or team of
administrators who are intimately familiar with the cloud undergoing
upgrade, OpenStack in general, working with Juju and OpenStack charms.
The tasks of preparing stateful OpenStack services for series upgrade is
not automated and is the responsibility of the administrator. For
example: evacuating a compute node, switching HA routers to a network
node, any storage rebalancing that may be required.
The actual task of executing the do-release-upgrade on an individual
machine is not automated. It will be performed by the administrator. Any
bespoke preparation for or cleanup after the do-release-upgrade is the
responsibility of the administrator.
The series upgrade process requires API downtime. Although the goal is
minimal downtime, it is necessary to pause services to avoid race
condition errors. Therefore, the API undergoing upgrade will require
downtime.
Stateful services which OpenStack depends on such as percona-cluster and
rabbitmq will affect all APIs during series upgrade and therefore
require downtime.
Third party charms may not have implemented series upgrade yet. Please
pay particular attention to SDN and storage charms which may affect
cloud operation.
If the architecture and layout of charms does not match the assumptions
section of this document, great care needs to be taken to avoid problems
with application leadership across machines. In other words, if most
services are not in LXD containers, it is possible to have the leader of
percona-cluster on one host and the leader of rabbit on another causing
complication's in the procure for series upgrade.
Test, test, test! The series upgrade process should be tested on a
non-production cloud that closely resembles the eventual production
environment. Not only does this validate the software involved but it
prepares the administrator for the complex task ahead.
Juju
++++
Please read all Juju documentation on the series upgrade feature.
https://docs.jujucharms.com/devel/en/getting-started
.. note::
The Juju upgrade-series command operates on the machine level. This
document will be focused on applications as many require pausing their
peers and some subordinates. But it is important to remember the whole
machine is upgraded.
Applications deployed in a LXD container are considered a machine apart
from the physical host machine the container is hosted on.
Upgrading the host machine will not upgrade the LXD contained machines.
However, when the required post-upgrade reboot of the host machine
occurs all the services contained in LXD containers will be unavailable
during the reboot.
For example a physical host with nova-compute, neutron-openvswitch and
ceph-osd colocated as well as hosting a keystone unit in a LXD. When
the juju upgrade-series prepare command is executed on the machine,
nova-compute, neutron-openvswitch and ceph-osd will execute their
pre-series-upgrade hooks but keystone will not. Nor will the LXD
operating system be affected by the do-release-upgrade on the host. At
reboot however, the keystone unit will be unavailable during the
duration of the reboot. Please plan accordingly.
Assumptions
+++++++++++
This document makes a number of assumptions about the architecture and
preparation of the cloud undergoing series upgrade. Please review these
and compare to the running cloud before performing the series upgrade.
Preparations
~~~~~~~~~~~~
The entire suite of charms used to manage the cloud should be upgraded to the
latest stable charm revision before any major change is made to the cloud such
as the current machine series upgrades. See `Charm upgrades`_ for guidance.
OpenStack is upgraded to the highest version the current LTS supports.
Mitaka for Trusty and Queens for Xenial.
The current Ubuntu operating system is up to date prior to do-release-upgrade.
Stateful services have been backed up. Percona-cluster and mongodb
should be backed up prior to upgrading.
General cloud health. Confirm the cloud is fully operational before
beginning a series upgrade.
OpenStack charms health. No charms are in hook error. Confirm the health
of the juju environment before beginning series upgrade.
Per machine preparations. Individual compute nodes are evacuated prior
to series upgrade. HA routers are moved to network nodes not undergoing
series upgrade.
`Automatic Updates aka. Unattended Upgrades <https://help.ubuntu.com/lts/serverguide/automatic-updates.html.en>`_
is enabled by default on Ubuntu Server and must be disabled on all machines
prior to initiating the upgrade procedure. This is imperative to stay in
control of when and where updates are applied throughout the upgrade procedure.
Hyper-Converged Architecture
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compute, storage and their subordinates may be colocated.
API Services are deployed in LXD containers.
Percona-cluster is deployed in a LXD container.
Rabbitmq is deployed in a LXD container.
Third party charms either do not exist or have been thoroughly tested
for series upgrade.
No other non-subordinate charms are colocated on the same machine.
===========================
Appendix F1: Series upgrade
===========================
Overview
++++++++
--------
The purpose of this document is to provide foundational knowledge for preparing
an administrator to perform a series upgrade across a Charmed OpenStack cloud.
This translates to upgrading the operating system of every cloud node to an
entirely new version.
.. note::
This overview is not a substitute for understanding the
entirety of this document. It is the general case but the individual
details matter. Read "where appropriate" at the end of each step.
Evacuate or otherwise prepare the machine
A series upgrade, a charm upgrade, and an OpenStack upgrade are all
conceptually different and involve separate operations.
Pause hacluster for non-leader units not undergoing upgrade
Once this document has been studied the administrator will be ready to graduate
to the :doc:`Series upgrade OpenStack <app-series-upgrade-openstack>` guide
that describes the process in more detail.
Pause non-leader peer units not undergoing upgrade
Concerning the cloud being operated upon, the following is assumed:
Juju upgrade-series prepare the leader's machine
* It is being upgraded from one LTS series to another (e.g. xenial to
bionic, bionic to focal, etc.)
* Its nodes are backed by MAAS.
* Its services are highly available
* It is being upgraded with minimal downtime
Execute do-release-upgrade and any post-upgrade operating system tasks
.. warning::
Reboot
Upgrading a single production machine from one LTS to another is a serious
task. Doing so for every cloud node can be that much harder. Attempting to
do this with minimal cloud downtime is an order of magnitude more complex.
Set openstack-origin or source for new operating system ("distro")
Such an undertaking should be executed by persons who are intimately
familiar with Juju and the currently deployed charms (and their related
applications). It should first be tested on a non-production cloud that
closely resembles the production environment.
Juju upgrade-series complete the machine
The Juju :command:`upgrade-series` command
------------------------------------------
Repeat the steps from prepare to complete for the non-leader machines
The Juju :command:`upgrade-series` command is the cornerstone of the entire
procedure. This command manages an operating system upgrade of a targeted
machine and operates on every application unit hosted on that machine. The
command works in conjunction with either the :command:`prepare` or the
:command:`complete` sub-command.
Perform any cluster completed upgrade tasks after all units of
application have been upgraded.
The basic process is to inform the units on a machine that a series upgrade
is about to commence, to perform the upgrade, and then inform the units that
the upgrade has finished. In most cases with the OpenStack charms, units will
first be paused and be left with a workload status of "blocked" and a message
of "Ready for do-release-upgrade and reboot."
Juju set-series to the new series for all future units of an application.
For example, to inform units on machine '0' that an upgrade (to series
'bionic') is about to occur:
Exceptions
~~~~~~~~~~
.. code-block:: none
This overview describes the general case that includes the API charms,
percona culster and rabbitmq.
juju upgrade-series 0 prepare bionic
The notable exceptions are nova-compute, ceph-mon and ceph-osd which
do not require pausing of any units and unit leadership is irrelevant.
The :command:`prepare` sub-command causes **all** the charms (including
subordinates) on the machine to run their ``pre-series-upgrade`` hook.
The administrator must then perform the traditional steps involved in upgrading
the OS on the targeted machine (in this example, machine '0'). For example,
update/upgrade packages with :command:`apt update && apt full-upgrade`; invoke
the :command:`do-release-upgrade` command; and reboot the machine once
complete.
Example as code
~~~~~~~~~~~~~~~
The :command:`complete` sub-command causes **all** the charms (including
subordinates) on the machine to run their ``post-series-upgrade`` hook. In most
cases with the OpenStack charms, configuration files will be re-written, units
will be resumed automatically (if paused), and be left with a workload status
of "active" and a message of "Unit is ready":
Attempting an automated series upgrade on a running production cloud is
not recommended. The following example-as-code encapsulates the
processes described in this document, and are provided solely to
illustrate the methods used to develop and test the series upgrade
primitives. The example code should not be consumed in an automation
outside of its intended use case (charm dev/test gate automation).
.. code-block:: none
https://github.com/openstack-charmers/zaza/blob/master/zaza/charm_tests/series_upgrade/tests.py
juju upgrade-series 0 complete
https://github.com/openstack-charmers/zaza/blob/master/zaza/utilities/generic.py#L173
Procedures
++++++++++
The following procures are broken up into categories of charms that
follow the same procedure.
At this point the series upgrade on the machine and its charms is now done. In
the :command:`juju status` output the machine's entry under the Series column
will have changed from 'xenial' to 'bionic'.
.. note::
Example commands used in this documentation assume a Trusty to Xenial
series upgrade, the same approach is used for Xenial to Bionic
series upgrades. Unit and machine numbers are examples only they will
differ from site to site. For example the machine number 0 is reused
purely for example purposes.
Charms are not obliged to support the two series upgrade hooks but they do
make for a more intelligent and a less error-prone series upgrade.
Physical Host Nodes
Containers (and their charms) hosted on the target machine remain unaffected by
this command. However, during the required post-upgrade reboot of the host all
containerised services will naturally be unavailable.
See the Juju documentation to learn more about the `series upgrade`_ feature.
.. _pre-upgrade_requirements:
Pre-upgrade requirements
------------------------
This is a list of requirements that apply to any cloud. They must be met before
making any changes.
* All the cloud nodes should be using the same series, be in good working
order, and be updated with the latest stable software packages (APT
upgrades).
* The cloud should be running the latest OpenStack release supported by the
current series (e.g. Mitaka for trusty, Queens for xenial, etc.). See `Ubuntu
OpenStack release cycle`_ and `OpenStack upgrades`_.
* The cloud should be fully operational and error-free.
* All currently deployed charms should be upgraded to the latest stable charm
revision. See `Charm upgrades`_.
* The Juju model comprising the cloud should be error-free (e.g. there should
be no charm hook errors).
* `Automatic package updates`_ should be disabled on the nodes to avoid
potential conflicts with the manual (or scripted) APT steps.
.. _series_specific_procedures:
Specific series upgrade procedures
----------------------------------
Charms belonging to the OpenStack Charms project are designed to accommodate
the next LTS target series wherever possible. However, a new series may
occasionally introduce unavoidable challenges for a deployed charm. For
instance, it could be that a charm is replaced by an entirely new charm on the
new series. This can happen due to development policy concerning the charms
themselves (e.g. the ceph charm is replaced by the ceph-mon and ceph-osd
charms) or due to reasons independent of the charms (e.g. the workload software
is no longer supported on the new operating system). Any core OpenStack charms
affected in this way will be documented below.
* :ref:`percona-cluster charm: series upgrade to Focal <percona_series_upgrade_to_focal>`
.. _workload_specific_preparations:
Workload specific preparations
------------------------------
These are preparations that are specific to the current cloud deployment.
Completing them in advance is an integral part of the upgrade.
Charm upgradability
~~~~~~~~~~~~~~~~~~~
Procedure for the physical host nodes which may include nova-compute,
neutron-openvswitch and ceph-osd as well as neutron-gateway. Though
ceph-mon is most often deployed in LXD containers it follows this
procedure.
Verify the documented series upgrade processes for all currently deployed
charms. Some charms, especially third-party charms, may either not have
implemented series upgrade yet or simply may not work with the target series.
Pay particular attention to SDN (software defined networking) and storage
charms as these play a crucial role in cloud operations.
.. note::
Nova-compute and ceph-osd are used in the commands below for
example purposes. In this example, physical host where
nova-compute/0 and ceph-osd/0 are deployed is machine 0.
Workload maintenance
~~~~~~~~~~~~~~~~~~~~
Evacuate or otherwise prepare the machine
For compute nodes move all running VMs off the physical host.
For network nodes force HA routers off of the current node.
Any storage related tasks that may be required.
Any site specific tasks that may be required.
Any workload-specific pre and post series upgrade maintenance tasks should be
readied in advance. For example, if a node's workload requires a database then
a pre-upgrade backup plan should be drawn up. Similarly, if a workload requires
settings to be adjusted post-upgrade then those changes should be prepared
ahead of time. Pay particular attention to stateful services due to their
importance in cloud operations. Examples include evacuating a compute node,
switching an HA router to another node, and storage rebalancing.
Pre-upgrade tasks are performed before issuing the :command:`prepare`
subcommand, and post-upgrade tasks are done immediately prior to issuing the
:command:`complete` subcommand.
Juju upgrade-series prepare the machine
.. code:: bash
Workflow: sequential vs. concurrent
-----------------------------------
juju upgrade-series 0 prepare xenial
In terms of the workflow there are two approaches:
.. note::
The upgrade-series prepare command causes all the charms on the given
machine to run their pre-series-upgrade hook. For most cases with the
OpenStack charms this pauses the unit. At the completion of the
pre-series-upgrade hook the workload status should be "blocked" with
the message "Ready for do-release-upgrade and reboot."
* Sequential - upgrading one machine at a time
* Concurrent - upgrading a group of machines simultaneously
Execute do-release-upgrade and any post-upgrade operating system tasks
The do-release-upgrade process is performed by the administrator. Any
post do-release-upgrade tasks are also the responsibility of the
administrator.
Normally, it is best to upgrade sequentially as this ensures data reliability
and availability (we've assumed an HA cloud). This approach also minimises
adverse effects to the deployment if something goes wrong.
Reboot
Post do-release-upgrade reboot executed by the administrator.
However, for even moderately sized clouds, an intervention based purely on a
sequential approach can take a very long time to complete. This is where the
concurrent method becomes attractive.
Set openstack-origin or source for new operating system ("distro")
This step is required and should occur before the first node is
completed.
In general, a concurrent approach is a viable option for API applications but
is not an option for stateful applications. During the course of the cloud-wide
series upgrade a hybrid strategy is a reasonable choice.
.. code:: bash
To be clear, the above pertains to upgrading the series on machines associated
with a single application. It is also possible however to employ similar
thinking to multiple applications.
juju config nova-compute openstack-origin=distro
juju config ceph-osd source=distro
Juju upgrade-series complete the machine
.. code:: bash
juju upgrade-series 0 complete
.. note::
The upgrade-series complete command causes all the charms on the given
machine to run their post-series-upgrade hook. For most cases with the
OpenStack charms this re-writes configuration files and resumes the unit.
At the completion of the post-series-upgrade hook the workload status
should be "active" with the message "Unit is ready."
Juju set-series to the new series for all future units of an application.
To guarantee that any future unit-add commands create new
instantiations of the application on the correct series it is necessary
to set the series on the application.
.. code:: bash
juju set-series nova-compute xenial
juju set-series neutron-openvswitch xenial
juju set-series ceph-osd xenial
Repeat the procedure for all physical host nodes.
It is not necessary to repeat the set openstack-origin step.
Stateful Services
~~~~~~~~~~~~~~~~~
Procedure for the stateful services deployed on LXD containers.
These include percona-cluster and rabbitmq.
.. warning::
For Bionic to Focal series upgrades see percona-cluster migration to
mysql-innodb-cluster and mysql-router under Series Specific Procedures.
Application leadership
----------------------
`Application leadership`_ plays an important role in determining the order in
which machines (and their applications) will have their series upgraded. The
guiding principle is that an application's unit leader is acted upon by a
series upgrade before its non-leaders are (the leader is typically used to
coordinate aspects with other services over relations).
.. note::
While percona-cluster is often deployed with hacluster for HA,
rabbitmq is not. Ignore the hacluster steps for rabbitmq.
Likewise no backup is required of rabbitmq. Percona-cluster is used
below for example purposes. In this example, the LXD container the
leader node of percona-cluster/0 is deployed on is machine 0.
Juju will not transfer the leadership of an application (and any
subordinate) to another unit while the application is undergoing a series
upgrade. This allows a charm to make assumptions that will lead to a more
reliable outcome.
Prepare the machine
Perform backups of percona-cluster and scp the backup to a secure
location.
Assuming that a cloud is intended to eventually undergo a series upgrade, this
guideline will generally influence the cloud's topology. Containerisation is an
effective response to this.
.. code:: bash
.. important::
juju run-action percona-cluster/0 backup
juju scp -- -r percona-cluster/0:/opt/backups/mysql /path/to/local/backup/dir
Applications should be co-located on the same machine only if leadership
plays a negligible role. Applications deployed with the compute and storage
charms fall into this category.
.. _generic_series_upgrade:
Pause hacluster for non-leader units not undergoing upgrade
.. code:: bash
Generic series upgrade
----------------------
juju run-action percona-cluster-hacluster/1 pause
juju run-action percona-cluster-hacluster/2 pause
This section contains a generic overview of a series upgrade for three
machines, each hosting a unit of the `ubuntu`_ application. The initial and
target series are xenial and bionic, respectively.
This scenario is represented by the following :command:`juju status` command
output:
Pause non-leader peer units not undergoing upgrade
.. code:: bash
.. code-block:: console
juju run-action percona-cluster/1 pause
juju run-action percona-cluster/2 pause
Model Controller Cloud/Region Version SLA Timestamp
upgrade maas-controller mymaas/default 2.7.6 unsupported 18:33:49Z
App Version Status Scale Charm Store Rev OS Notes
ubuntu1 16.04 active 3 ubuntu jujucharms 15 ubuntu
Juju upgrade-series prepare the leader's machine
.. code:: bash
Unit Workload Agent Machine Public address Ports Message
ubuntu1/0* active idle 0 10.0.0.241 ready
ubuntu1/1 active idle 1 10.0.0.242 ready
ubuntu1/2 active idle 2 10.0.0.243 ready
juju upgrade-series 0 prepare xenial
Machine State DNS Inst id Series AZ Message
0 started 10.0.0.241 node2 xenial zone3 Deployed
1 started 10.0.0.242 node3 xenial zone4 Deployed
2 started 10.0.0.243 node1 xenial zone5 Deployed
.. note::
The upgrade-series prepare command causes all the charms on the given
machine to run their pre-series-upgrade hook. For most cases with the
OpenStack charms this pauses the unit. At the completion of the
pre-series-upgrade hook the workload status should be "blocked" with
the message "Ready for do-release-upgrade and reboot."
First ensure that any new applications will (by default) use the new series, in
this case bionic. This is done by configuring at the model level:
Execute do-release-upgrade and any post-upgrade operating system tasks
The do-release-upgrade process is performed by the administrator. Any
post do-release-upgrade tasks are also the responsibility of the
administrator.
.. code-block:: none
Reboot
Post do-release-upgrade reboot executed by the administrator.
juju model-config default-series=bionic
Set openstack-origin or source for new operating system ("distro")
This step is required and should occur before the first node is
completed but after the other units are paused.
Now do the same at the application level. This will affect any new units of the
existing application, in this case 'ubuntu1':
.. code:: bash
.. code-block:: none
juju config percona-cluster source=distro
juju set-series ubuntu1 bionic
Perform the actual series upgrade. We begin with the machine that houses the
application unit leader, machine 0 (see the asterisk in the Unit column). Note
that :command:`juju run` is preferred over :command:`juju ssh` but the latter
should be used for sessions requiring user interaction:
Juju upgrade-series complete the machine
.. code:: bash
.. code-block:: none
:linenos:
juju upgrade-series 0 complete
# Perform any workload maintenance pre-upgrade steps here
juju upgrade-series 0 prepare bionic
juju run --machine=0 -- sudo apt update
juju ssh 0 sudo apt full-upgrade
juju ssh 0 sudo do-release-upgrade
# Perform any workload maintenance post-upgrade steps here
# Reboot the machine (if not already done)
juju upgrade-series 0 complete
.. note::
In this generic example there are no `workload maintenance`_ steps to perform.
If there were post-upgrade steps then the prompt to reboot the machine at the
end of :command:`do-release-upgrade` should be answered in the negative and the
reboot will be initiated manually on line 7 (i.e. :command:`sudo reboot`).
The upgrade-series complete command causes all the charms on the given
machine to run their post-series-upgrade hook. For most cases with the
OpenStack charms this re-writes configuration files and resumes the unit.
At the completion of the post-series-upgrade hook the workload status
should be "active" with the message "Unit is ready."
It is possible to invoke the :command:`complete` sub-command before the
upgraded machine is ready to process it. Juju will block until the unit is
ready after being restarted.
Repeat the procedure for non-leader nodes
It is not necessary to repeat the set openstack-origin step.
In lines 4 and 5 the upgrade proceeds in the usual interactive fashion. If a
non-interactive mode is preferred, those two lines can be replaced with:
Perform any cluster completed upgrade tasks after all units of application have been upgraded.
Run the complete-cluster-series-upgrade action on the leader node. This
action informs each node of the cluster the upgrade process is complete
cluster wide. This also updates mysql configuration with all peers in
the cluster.
.. code-block:: none
.. code:: bash
juju run --machine=0 --timeout=30m -- sudo DEBIAN_FRONTEND=noninteractive apt-get --assume-yes \
-o "Dpkg::Options::=--force-confdef" \
-o "Dpkg::Options::=--force-confold" dist-upgrade
juju run --machine=0 --timeout=30m -- sudo DEBIAN_FRONTEND=noninteractive \
do-release-upgrade -f DistUpgradeViewNonInteractive
juju run-action percona-cluster/0 complete-cluster-series-upgrade
The :command:`apt-get` command is preferred while in non-interactive mode (or
with scripting).
Juju set-series to the new series for all future units of an application.
To guarantee that any future unit-add commands create new
instantiations of the application on the correct series it is necessary
to set the series on the application.
.. caution::
.. code:: bash
Performing a series upgrade non-interactively can be risky so the decision
to do so should be made only after careful deliberation.
juju set-series percona-cluster xenial
API Services
~~~~~~~~~~~~
Procedure for the API services in LXD containers. These include but are
not limited to keystone, glance, cinder, neutron-api and
nova-cloud-controller. Any subordinates deployed with these applications
will be upgraded at the same time.
Machines 1 and 2 should now be upgraded in the same way (in no particular
order).
.. note::
Keystone is used in the commands below for example purposes. In this
example, the LXD container the leader node of keystone/0 is deployed
on is machine 0.
It has been reported that a trusty:xenial series upgrade may require an
additional step to ensure a purely non-interactive mode. A file under
``/etc/apt/apt.conf.d`` with a single line as its contents needs to be added
to the target machine pre-upgrade and be removed post-upgrade. It can be
created (here on machine 0) in this way:
Pause hacluster for non-leader units not undergoing upgrade
.. code:: bash
juju run --machine=0 -- "echo 'DPkg::options { "--force-confdef"; "--force-confnew"; }' | sudo tee /etc/apt/apt.conf.d/local"
juju run-action keystone-hacluster/1 pause
juju run-action keystone-hacluster/2 pause
Next steps
----------
When you are ready to perform a series upgrade across your cloud proceed to
appendix :doc:`Series upgrade OpenStack <app-series-upgrade-openstack>`.
Pause non-leader peer units not undergoing upgrade
.. code:: bash
juju run-action keystone/1 pause
juju run-action keystone/2 pause
Juju upgrade-series prepare the leader's machine
.. code:: bash
juju upgrade-series 0 prepare xenial
.. note::
The upgrade-series prepare command causes all the charms on the given
machine to run their pre-series-upgrade hook. For most cases with the
OpenStack charms this pauses the unit. At the completion of the
pre-series-upgrade hook the workload status should be "blocked" with
the message "Ready for do-release-upgrade and reboot."
Execute do-release-upgrade and any post-upgrade operating system tasks
The do-release-upgrade process is performed by the administrator. Any
post do-release-upgrade tasks are also the responsibility of the
administrator.
Reboot
Post do-release-upgrade reboot executed by the administrator.
Set openstack-origin or source for new operating system ("distro")
This step is required and should occur before the first node is
completed but after the other units are paused.
.. code:: bash
juju config keystone source=distro
Juju upgrade-series complete the machine
.. code:: bash
juju upgrade-series 0 complete
.. note::
The upgrade-series complete command causes all the charms on the given
machine to run their post-series-upgrade hook. For most cases with the
OpenStack charms this re-writes configuration files and resumes the unit.
At the completion of the post-series-upgrade hook the workload status
should be "active" with the message "Unit is ready."
Repeat the procedure for non-leader nodes
It is not necessary to repeat the set openstack-origin step.
Juju set-series to the new series for all future units of an application.
To guarantee that any future unit-add commands create new
instantiations of the application on the correct series it
is necessary to set the series on the application.
.. code:: bash
juju set-series keystone xenial
.. raw:: html
<!-- LINKS -->
.. LINKS
.. _Charm upgrades: app-upgrade-openstack#charm-upgrades
Series Specific Procedures
++++++++++++++++++++++++++
Bionic to Focal
~~~~~~~~~~~~~~~
percona-cluster migration to mysql-innodb-cluster and mysql-router
__________________________________________________________________
In Ubuntu 20.04 LTS (Focal) the percona-xtradb-cluster-server package will no
longer be available. It has been replaced by mysql-server-8.0 and mysql-router
in Ubuntu main. Therefore, there is no way to series upgrade percona-cluster to
Focal. Instead the databases hosted by percona-cluster will need to be migrated
to mysql-innodb-cluster and mysql-router will need to be deployed as a
subordinate on the applications that use MySQL as a data store.
.. warning::
Since the DB affects most OpenStack services it is important to have a
sufficient downtime window. The following procedure is written in an attempt
to migrate one service at a time (i.e. keystone, glance, cinder, etc).
However, it may be more practical to migrate all databases at the same time
during an extended downtime window, as there may be unexpected
interdependencies between services.
.. note::
It is possible for percona-cluster to remain on Ubuntu 18.04 LTS while
the rest of the cloud migrates to Ubuntu 20.04 LTS. In fact, this state
will be one step of the migration process.
Procedure
* Leave all the percona-cluster machines on Bionic and upgrade the series of
the remaining machines in the cloud per this document.
* Deploy a mysql-innodb-cluster on Focal.
.. code-block:: none
juju deploy -n 3 mysql-innodb-cluster --series focal
* Deploy (but do not yet relate) an instance of mysql-router for every
application that requires a data store (i.e. every application that was
related to percona-cluster).
.. code-block:: none
juju deploy mysql-router cinder-mysql-router
juju deploy mysql-router glance-mysql-router
juju deploy mysql-router keystone-mysql-router
...
* Add relations between the mysql-router instances and the
mysql-innodb-cluster.
.. code-block:: none
juju add-relation cinder-mysql-router:db-router mysql-innodb-cluster:db-router
juju add-relation glance-mysql-router:db-router mysql-innodb-cluster:db-router
juju add-relation keystone-mysql-router:db-router mysql-innodb-cluster:db-router
...
On a per-application basis:
* Remove the relation between the application charm and the percona-cluster
charm. You can view existing relations with the :command:`juju status
percona-cluster --relations` command.
.. code-block:: none
juju remove-relation keystone:shared-db percona-cluster:shared-db
* Dump the existing database(s) from percona-cluster.
.. note::
In the following, the percona-cluster/0 and mysql-innodb-cluster/0 units
are used as examples. For percona, any unit of the application may be used,
though all the steps should use the same unit. For mysql-innodb-cluster,
the RW unit should be used. The RW unit of the mysql-innodb-cluster can be
determined from the :command:`juju status mysql-innodb-cluster` command.
* Allow Percona to dump databases. See `Percona strict mode`_ to understand
the implications of this setting.
.. code-block:: none
juju run-action --wait percona-cluster/0 set-pxc-strict-mode mode=MASTER
* Dump the specific application's database(s).
.. note::
Depending on downtime restrictions it is possible to dump all databases at
one time: run the ``mysqldump`` action without setting the ``databases``
parameter. Similarly, it is possible to import all the databases into
mysql-innodb-clulster from that single dump file.
.. note::
The database name may or may not match the application name. For example,
while keystone has a DB named keystone, openstack-dashboard has a database
named horizon. Some applications have multiple databases. Notably,
nova-cloud-controller which has at least: nova,nova_api,nova_cell0 and a
nova_cellN for each additional cell. See upstream documentation for the
respective application to determine the database name.
.. code-block:: none
# Single DB
juju run-action --wait percona-cluster/0 mysqldump databases=keystone
# Multiple DBs
juju run-action --wait percona-cluster/0 mysqldump databases=nova,nova_api,nova_cell0
* Return Percona enforcing strict mode. See `Percona strict mode`_ to
understand the implications of this setting.
.. code-block:: none
juju run-action --wait percona-cluster/0 set-pxc-strict-mode mode=ENFORCING
* Transfer the mysqldump file from the percona-cluster unit to the
mysql-innodb-cluster RW unit. The RW unit of the mysql-innodb-cluster can be
determined from juju status: `juju status mysql-innodb-cluster`. Bellow we
use mysql-innodb-cluster/0 as an example.
.. code-block:: none
juju scp percona-cluster/0:/var/backups/mysql/mysqldump-keystone-<DATE>.gz .
juju scp mysqldump-keystone-<DATE>.gz mysql-innodb-cluster/0:/home/ubuntu
* Import the database(s) into mysql-innodb-cluster.
.. code-block:: none
juju run-action --wait mysql-innodb-cluster/0 restore-mysqldump dump-file=/home/ubuntu/mysqldump-keystone-<DATE>.gz
* Relate an instance of mysql-router for every application that requires a data
store (i.e. every application that needed percona-cluster):
.. code-block:: none
juju add-relation keystone:shared-db keystone-mysql-router:shared-db
* Repeat for remaining applications.
An overview of this process can be seen in the OpenStack charmer's team CI `Zaza migration code`_.
Post-migration
As noted above it is possible to run the cloud with percona-cluster remaining
on Bionic indefinitely. Once all databases have been migrated to
mysql-innodb-cluster, all the databases have been backed up, and the cloud has
been verified to be in good working order the percona-cluster application (and
its probable hacluster subordinates) may be removed.
.. code-block:: none
juju remove-application percona-cluster-hacluster
juju remove-application percona-cluster
.. _Zaza migration code: https://github.com/openstack-charmers/zaza-openstack-tests/blob/master/zaza/openstack/charm_tests/mysql/tests.py#L556
.. _Percona strict mode: https://www.percona.com/doc/percona-xtradb-cluster/LATEST/features/pxc-strict-mode.html
.. _OpenStack upgrades: app-series-upgrade-openstack
.. _series upgrade: https://juju.is/docs/upgrading-series
.. _automatic package updates: https://help.ubuntu.com/lts/serverguide/automatic-updates.html.en
.. _Ubuntu OpenStack release cycle: https://ubuntu.com/about/release-cycle#ubuntu-openstack-release-cycle
.. _Application leadership: https://juju.is/docs/implementing-leadership
.. _ubuntu: https://jaas.ai/ubuntu

View File

@ -11,6 +11,7 @@ Appendices
app-encryption-at-rest.rst
app-certificate-management.rst
app-series-upgrade.rst
app-series-upgrade-openstack.rst
app-nova-cells.rst
app-octavia.rst
app-pci-passthrough-gpu.rst