Merge "Rewrite series upgrade documentation"
This commit is contained in:
commit
67a997f45f
|
@ -142,6 +142,8 @@ client-specified AZ during instance creation, one of these zones will be
|
|||
scheduled. When 'true', and MAAS is the backing cloud, this option overrides
|
||||
option ``default-availability-zone``.
|
||||
|
||||
.. _ceph_az:
|
||||
|
||||
Ceph AZ
|
||||
^^^^^^^
|
||||
|
||||
|
|
|
@ -0,0 +1,796 @@
|
|||
=====================================
|
||||
Appendix F2: Series upgrade OpenStack
|
||||
=====================================
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
This document will provide specific steps for how to perform a series upgrade
|
||||
across the entirety of a Charmed OpenStack cloud.
|
||||
|
||||
.. warning::
|
||||
|
||||
This document is based upon the foundational knowledge and guidelines set
|
||||
forth in the more general `Series upgrade`_ appendix. That reference must be
|
||||
studied in-depth prior to attempting the steps outlined here. In particular,
|
||||
ensure that the :ref:`Pre-upgrade requirements <pre-upgrade_requirements>`
|
||||
are satisfied; the :ref:`Specific series upgrade procedures
|
||||
<series_specific_procedures>` have been reviewed and considered; and that
|
||||
:ref:`Workload specific preparations <workload_specific_preparations>` have
|
||||
been addressed during planning.
|
||||
|
||||
Downtime
|
||||
--------
|
||||
|
||||
Although the goal is to minimise downtime the series upgrade process across a
|
||||
cloud will nonetheless result in some level of downtime for the control plane.
|
||||
|
||||
When the machines associated with stateful applications such as percona-cluster
|
||||
and rabbitmq-server undergo a series upgrade all cloud APIs will experience
|
||||
downtime, in addition to the stateful application itself.
|
||||
|
||||
When machines associated with a single API application undergo a series upgrade
|
||||
that individual API will also experience downtime. This is because it is
|
||||
necessary to pause services in order to avoid race condition errors.
|
||||
|
||||
For those applications working in tandem with hacluster, as will be shown, some
|
||||
hacluster units will need to be paused before the upgrade. One should assume
|
||||
that the commencement of an outage coincides with this step (it will cause
|
||||
cluster quorum heartbeats to fail and the service VIP will consequently go
|
||||
offline).
|
||||
|
||||
Reference cloud topology
|
||||
------------------------
|
||||
|
||||
This section describes a hyperconverged cloud topology that this document will
|
||||
use for the procedural steps to follow. Hyperconvergence refers to the practice
|
||||
of co-locating principle applications on the same machine.
|
||||
|
||||
The topology is defined in this way:
|
||||
|
||||
* Only compute and storage charms (and their subordinates) may be co-located.
|
||||
|
||||
* Third-party charms either do not exist or have been thoroughly tested
|
||||
for a series upgrade.
|
||||
|
||||
* The following are containerised:
|
||||
|
||||
* All API applications
|
||||
|
||||
* The percona-cluster application
|
||||
|
||||
* The rabbitmq-server application
|
||||
|
||||
* The ceph-mon application
|
||||
|
||||
Storage charms are charms that manage physical disks. For example, ceph-osd and
|
||||
swift-storage. Example OpenStack subordinate charms are networking SDN charms
|
||||
for the nova-compute charm, or monitoring charms for compute or storage charms.
|
||||
|
||||
.. caution::
|
||||
|
||||
If your cloud differs from this topology you must adapt the procedural steps
|
||||
accordingly. In particular, look at the aspects of co-located applications
|
||||
and containerised applications. Recall that:
|
||||
|
||||
* the :command:`upgrade-series` command:
|
||||
|
||||
* affects all applications residing on the target machine
|
||||
|
||||
* does not affect containers hosted on the target machine
|
||||
|
||||
* an application's leader should be upgraded before its non-leaders
|
||||
|
||||
Generalised OpenStack series upgrade
|
||||
------------------------------------
|
||||
|
||||
This section will summarise the series upgrade steps in the context of specific
|
||||
OpenStack applications. It is an enhancement of the :ref:`Generic series
|
||||
upgrade <generic_series_upgrade>` section in the companion document.
|
||||
|
||||
Applications for which this summary does **not** apply include:
|
||||
|
||||
* nova-compute
|
||||
* ceph-mon
|
||||
* ceph-osd
|
||||
|
||||
This is because the above applications do not require the pausing of units and
|
||||
application leadership is irrelevant for them.
|
||||
|
||||
However, this summary does apply to all API applications (e.g. neutron-api,
|
||||
keystone, nova-cloud-controller), as well as percona-cluster, and
|
||||
rabbitmq-server.
|
||||
|
||||
.. important::
|
||||
|
||||
The first machine to be upgraded is always associated with the leader of the
|
||||
principle application. Let this machine be called the "principle leader
|
||||
machine" and its unit be called the "principle leader unit".
|
||||
|
||||
The steps are as follows:
|
||||
|
||||
#. Set the default series for the principle application and ensure the same has
|
||||
been done to the model.
|
||||
|
||||
#. If hacluster is used, pause the hacluster units not associated with the
|
||||
principle leader machine.
|
||||
|
||||
#. Pause the principle non-leader units.
|
||||
|
||||
#. Perform a series upgrade on the principle leader machine.
|
||||
|
||||
#. Perform any pre-upgrade workload maintenance tasks.
|
||||
|
||||
#. Invoke the :command:`prepare` sub-command.
|
||||
|
||||
#. Upgrade the operating system (APT commands).
|
||||
|
||||
#. Perform any post-upgrade workload maintenance tasks.
|
||||
|
||||
#. Reboot.
|
||||
|
||||
#. Set the value of the (application-dependent) ``openstack-origin`` or the
|
||||
``source`` configuration option to 'distro' (new operating system).
|
||||
|
||||
#. Invoke the :command:`complete` sub-command on the principle leader machine.
|
||||
|
||||
#. Repeat steps 4 and 6 for the application non-leader machines.
|
||||
|
||||
#. Perform any possible cluster completed upgrade tasks once all machines have
|
||||
had their series upgraded.
|
||||
|
||||
.. note::
|
||||
|
||||
Here is a non-extensive list of the most common post-upgrade tasks for
|
||||
OpenStack and supporting charms:
|
||||
|
||||
* percona-cluster: run action ``complete-cluster-series-upgrade`` on the
|
||||
leader unit.
|
||||
* rabbitmq-server: run action ``complete-cluster-series-upgrade`` on the
|
||||
leader unit.
|
||||
* ceilometer: run action ``ceilometer-upgrade`` on the leader unit.
|
||||
* vault: Each vault unit will need to be unsealed after its machine is
|
||||
rebooted.
|
||||
|
||||
Procedures
|
||||
----------
|
||||
|
||||
The procedures are categorised based on application types. The example scenario
|
||||
used throughout is a 'xenial' to 'bionic' series upgrade, within an OpenStack
|
||||
release of Queens (i.e. the starting point is a cloud archive pocket of
|
||||
'xenial-queens').
|
||||
|
||||
Stateful applications
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This section covers the series upgrade procedure for containerised stateful
|
||||
applications. These include:
|
||||
|
||||
* ceph-mon
|
||||
* percona-cluster
|
||||
* rabbitmq-server
|
||||
|
||||
A stateful application is one that maintains the state of various aspects of
|
||||
the cloud. Clustered stateful applications, such as all the ones given above,
|
||||
also require a quorum to function properly. Because of these reasons a stateful
|
||||
application should not have all of its units restarted simultaneously; it must
|
||||
have the series of its corresponding machines upgraded sequentially.
|
||||
|
||||
.. note::
|
||||
|
||||
The concurrent upgrade approach is theoretically possible, although to use
|
||||
it all cloud workloads will need to be stopped in order to ensure
|
||||
consistency. This is not recommended.
|
||||
|
||||
The example procedure will be based on the percona-cluster application.
|
||||
|
||||
.. important::
|
||||
|
||||
Unlike percona-cluster, the ceph-mon and rabbitmq-server applications do not
|
||||
use hacluster to achieve HA, nor do they need backups. Disregard therefore
|
||||
the hacluster and backup steps for these two applications.
|
||||
|
||||
The ceph-mon charm will maintain the MON cluster during a series upgrade, so
|
||||
ceph-mon units do not need to be paused.
|
||||
|
||||
This scenario is represented by the following partial :command:`juju status`
|
||||
command output:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
Model Controller Cloud/Region Version SLA Timestamp
|
||||
upgrade maas-controller mymaas/default 2.7.6 unsupported 18:26:57Z
|
||||
|
||||
App Version Status Scale Charm Store Rev OS Notes
|
||||
percona-cluster 5.6.37 active 3 percona-cluster jujucharms 286 ubuntu
|
||||
percona-cluster-hacluster active 3 hacluster jujucharms 66 ubuntu
|
||||
|
||||
Unit Workload Agent Machine Public address Ports Message
|
||||
percona-cluster/0 active idle 0/lxd/0 10.0.0.47 3306/tcp Unit is ready
|
||||
percona-cluster-hacluster/0* active idle 10.0.0.47 Unit is ready and clustered
|
||||
percona-cluster/1* active idle 1/lxd/0 10.0.0.48 3306/tcp Unit is ready
|
||||
percona-cluster-hacluster/2 active idle 10.0.0.48 Unit is ready and clustered
|
||||
percona-cluster/2 active idle 2/lxd/0 10.0.0.49 3306/tcp Unit is ready
|
||||
percona-cluster-hacluster/1 active idle 10.0.0.49 Unit is ready and clustered
|
||||
|
||||
In summary, the principle leader unit is percona-cluster/1 and is deployed on
|
||||
machine 1/lxd/0 (the principle leader machine).
|
||||
|
||||
.. warning::
|
||||
|
||||
During this upgrade, there will be a MySQL service outage. The HA resources
|
||||
provided by hacluster will **not** be monitored during the series upgrade
|
||||
due to the pausing of units.
|
||||
|
||||
#. Perform a backup of percona-cluster and transfer it to a secure location:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait percona-cluster/1 backup
|
||||
juju scp -- -r percona-cluster/1:/opt/backups/mysql /path/to/local/directory
|
||||
|
||||
Permissions will need to be altered on the remote machine, and note that the
|
||||
last command transfers **all** existing backups.
|
||||
|
||||
#. Set the default series for both the model and the principle application:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju model-config default-series=bionic
|
||||
juju set-series percona-cluster bionic
|
||||
|
||||
#. Pause the hacluster units not associated with the principle leader machine:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait percona-cluster-hacluster/0 pause
|
||||
juju run-action --wait percona-cluster-hacluster/1 pause
|
||||
|
||||
#. Pause the principle non-leader units:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait percona-cluster/0 pause
|
||||
juju run-action --wait percona-cluster/2 pause
|
||||
|
||||
#. Perform a series upgrade on the principle leader machine:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
# Perform any workload maintenance pre-upgrade steps here
|
||||
juju upgrade-series 1/lxd/0 prepare bionic
|
||||
juju run --machine=1/lxd/0 -- sudo apt update
|
||||
juju ssh 1/lxd/0 sudo apt full-upgrade
|
||||
juju ssh 1/lxd/0 sudo do-release-upgrade
|
||||
# Perform any workload maintenance post-upgrade steps here
|
||||
|
||||
There are no pre-upgrade nor post-upgrade workload maintenance steps to
|
||||
perform; the prompt to reboot can be answered in the affirmative.
|
||||
|
||||
#. Set the value of the ``source`` configuration option to 'distro':
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju config percona-cluster source=distro
|
||||
|
||||
#. Invoke the :command:`complete` sub-command on the principle leader machine:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju upgrade-series 1/lxd/0 complete
|
||||
|
||||
At this point the :command:`juju status` output looks like this:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
Model Controller Cloud/Region Version SLA Timestamp
|
||||
upgrade maas-controller mymaas/default 2.7.6 unsupported 19:51:52Z
|
||||
|
||||
App Version Status Scale Charm Store Rev OS Notes
|
||||
percona-cluster 5.7.20 maintenance 3 percona-cluster jujucharms 286 ubuntu
|
||||
percona-cluster-hacluster blocked 3 hacluster jujucharms 66 ubuntu
|
||||
|
||||
Unit Workload Agent Machine Public address Ports Message
|
||||
percona-cluster/0 maintenance idle 0/lxd/0 10.0.0.47 3306/tcp Paused. Use 'resume' action to resume normal service.
|
||||
percona-cluster-hacluster/0* maintenance idle 10.0.0.47 Paused. Use 'resume' action to resume normal service.
|
||||
percona-cluster/1* active idle 1/lxd/0 10.0.0.48 3306/tcp Unit is ready
|
||||
percona-cluster-hacluster/2 blocked idle 10.0.0.48 Resource: res_mysql_11810cc_vip not running
|
||||
percona-cluster/2 maintenance idle 2/lxd/0 10.0.0.49 3306/tcp Paused. Use 'resume' action to resume normal service.
|
||||
percona-cluster-hacluster/1 maintenance idle 10.0.0.49 Paused. Use 'resume' action to resume normal service.
|
||||
|
||||
Machine State DNS Inst id Series AZ Message
|
||||
0 started 10.0.0.44 node1 xenial zone1 Deployed
|
||||
0/lxd/0 started 10.0.0.47 juju-f83fcd-0-lxd-0 xenial zone1 Container started
|
||||
1 started 10.0.0.45 node2 xenial zone2 Deployed
|
||||
1/lxd/0 started 10.0.0.48 juju-f83fcd-1-lxd-0 bionic zone2 Running
|
||||
2 started 10.0.0.46 node3 xenial zone3 Deployed
|
||||
2/lxd/0 started 10.0.0.49 juju-f83fcd-2-lxd-0 xenial zone3 Container started
|
||||
|
||||
#. Repeat steps 5 and 7 for the principle non-leader machines.
|
||||
|
||||
#. Perform any possible cluster completed upgrade tasks once all machines have
|
||||
had their series upgraded:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait percona-cluster/leader complete-cluster-series-upgrade
|
||||
|
||||
For percona-cluster (and rabbitmq-server), the above action is performed on
|
||||
the leader unit. It informs each cluster node that the upgrade process is
|
||||
complete cluster-wide. This also updates MySQL configuration with all peers
|
||||
in the cluster.
|
||||
|
||||
API applications
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
This section covers series upgrade procedures for containerised API
|
||||
applications. These include, but are not limited to:
|
||||
|
||||
* cinder
|
||||
* glance
|
||||
* keystone
|
||||
* neutron-api
|
||||
* nova-cloud-controller
|
||||
|
||||
Machines hosting API applications can have their series upgraded concurrently
|
||||
because those applications are stateless. This results in a dramatically
|
||||
reduced downtime for the application. A sequential approach will not reduce
|
||||
downtime as the HA services will still need to be brought down during the
|
||||
upgrade associated with the application leader.
|
||||
|
||||
The following two sub-sections will show how to perform a series upgrade
|
||||
concurrently for a single API application and for multiple API applications.
|
||||
|
||||
Upgrading a single API application concurrently
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
This example procedure will be based on the keystone application.
|
||||
|
||||
This scenario is represented by the following partial :command:`juju status`
|
||||
command output:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
Model Controller Cloud/Region Version SLA Timestamp
|
||||
upgrade maas-controller mymaas/default 2.7.6 unsupported 22:48:41Z
|
||||
|
||||
App Version Status Scale Charm Store Rev OS Notes
|
||||
keystone 13.0.2 active 3 keystone jujucharms 312 ubuntu
|
||||
keystone-hacluster active 3 hacluster jujucharms 66 ubuntu
|
||||
|
||||
Unit Workload Agent Machine Public address Ports Message
|
||||
keystone/0* active idle 0/lxd/0 10.0.0.70 5000/tcp Unit is ready
|
||||
keystone-hacluster/0* active idle 10.0.0.70 Unit is ready and clustered
|
||||
keystone/1 active idle 1/lxd/0 10.0.0.71 5000/tcp Unit is ready
|
||||
keystone-hacluster/2 active idle 10.0.0.71 Unit is ready and clustered
|
||||
keystone/2 active idle 2/lxd/0 10.0.0.72 5000/tcp Unit is ready
|
||||
keystone-hacluster/1 active idle 10.0.0.72 Unit is ready and clustered
|
||||
|
||||
In summary, the principle leader unit is keystone/0 and is deployed on machine
|
||||
0/lxd/0 (the principle leader machine).
|
||||
|
||||
#. Set the default series for both the model and the principle application:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju model-config default-series=bionic
|
||||
juju set-series keystone bionic
|
||||
|
||||
#. Pause the hacluster units not associated with the principle leader machine:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait keystone-hacluster/1 pause
|
||||
juju run-action --wait keystone-hacluster/2 pause
|
||||
|
||||
#. Pause the principle non-leader units:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait keystone/1 pause
|
||||
juju run-action --wait keystone/2 pause
|
||||
|
||||
#. Perform any workload maintenance pre-upgrade steps on all machines. There
|
||||
are no keystone-specific steps to perform.
|
||||
|
||||
#. Invoke the :command:`prepare` sub-command on all machines, **starting with
|
||||
the principle leader machine**:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju upgrade-series 0/lxd/0 prepare bionic
|
||||
juju upgrade-series 1/lxd/0 prepare bionic
|
||||
juju upgrade-series 2/lxd/0 prepare bionic
|
||||
|
||||
At this point the :command:`juju status` output looks like this:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
Model Controller Cloud/Region Version SLA Timestamp
|
||||
upgrade maas-controller mymaas/default 2.7.6 unsupported 23:11:01Z
|
||||
|
||||
App Version Status Scale Charm Store Rev OS Notes
|
||||
keystone 13.0.2 blocked 3 keystone jujucharms 312 ubuntu
|
||||
keystone-hacluster blocked 3 hacluster jujucharms 66 ubuntu
|
||||
|
||||
Unit Workload Agent Machine Public address Ports Message
|
||||
keystone/0* blocked idle 0/lxd/0 10.0.0.70 5000/tcp Ready for do-release-upgrade and reboot. Set complete when finished.
|
||||
keystone-hacluster/0* blocked idle 10.0.0.70 Ready for do-release-upgrade. Set complete when finished
|
||||
keystone/1 blocked idle 1/lxd/0 10.0.0.71 5000/tcp Ready for do-release-upgrade and reboot. Set complete when finished.
|
||||
keystone-hacluster/2 blocked idle 10.0.0.71 Ready for do-release-upgrade. Set complete when finished
|
||||
keystone/2 blocked idle 2/lxd/0 10.0.0.72 5000/tcp Ready for do-release-upgrade and reboot. Set complete when finished.
|
||||
keystone-hacluster/1 blocked idle 10.0.0.72 Ready for do-release-upgrade. Set complete when finished
|
||||
|
||||
#. Upgrade the operating system on all machines. The non-interactive method is
|
||||
used here:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0 --timeout=10m \
|
||||
-- sudo apt-get update
|
||||
juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0 --timeout=60m \
|
||||
-- sudo DEBIAN_FRONTEND=noninteractive apt-get --assume-yes \
|
||||
-o "Dpkg::Options::=--force-confdef" \
|
||||
-o "Dpkg::Options::=--force-confold" dist-upgrade
|
||||
juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0 --timeout=120m \
|
||||
-- sudo DEBIAN_FRONTEND=noninteractive \
|
||||
do-release-upgrade -f DistUpgradeViewNonInteractive
|
||||
|
||||
.. important::
|
||||
|
||||
Choose values for the ``--timeout`` option that are appropriate for the
|
||||
task at hand.
|
||||
|
||||
#. Perform any workload maintenance post-upgrade steps on all machines. There
|
||||
are no keystone-specific steps to perform.
|
||||
|
||||
#. Reboot all machines:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0 -- sudo reboot
|
||||
|
||||
#. Set the value of the ``openstack-origin`` configuration option to 'distro':
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju config keystone openstack-origin=distro
|
||||
|
||||
#. Invoke the :command:`complete` sub-command on all machines:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju upgrade-series 0/lxd/0 complete
|
||||
juju upgrade-series 1/lxd/0 complete
|
||||
juju upgrade-series 2/lxd/0 complete
|
||||
|
||||
Upgrading multiple API applications concurrently
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
This example procedure will be based on the nova-cloud-controller and glance
|
||||
applications.
|
||||
|
||||
This scenario is represented by the following partial :command:`juju status`
|
||||
command output:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
Model Controller Cloud/Region Version SLA Timestamp
|
||||
upgrade maas-controller mymaas/default 2.7.6 unsupported 19:23:41Z
|
||||
|
||||
App Version Status Scale Charm Store Rev OS Notes
|
||||
glance 16.0.1 active 3 glance jujucharms 295 ubuntu
|
||||
glance-hacluster active 3 hacluster jujucharms 66 ubuntu
|
||||
nova-cc-hacluster active 3 hacluster jujucharms 66 ubuntu
|
||||
nova-cloud-controller 17.0.12 active 3 nova-cloud-controller jujucharms 343 ubuntu
|
||||
|
||||
Unit Workload Agent Machine Public address Ports Message
|
||||
glance/0* active idle 0/lxd/0 10.246.114.39 9292/tcp Unit is ready
|
||||
glance-hacluster/0* active idle 10.246.114.39 Unit is ready and clustered
|
||||
glance/1 active idle 1/lxd/0 10.246.114.40 9292/tcp Unit is ready
|
||||
glance-hacluster/1 active idle 10.246.114.40 Unit is ready and clustered
|
||||
glance/2 active idle 2/lxd/0 10.246.114.41 9292/tcp Unit is ready
|
||||
glance-hacluster/2 active idle 10.246.114.41 Unit is ready and clustered
|
||||
nova-cloud-controller/0 active idle 3/lxd/0 10.246.114.48 8774/tcp,8778/tcp Unit is ready
|
||||
nova-cc-hacluster/2 active idle 10.246.114.48 Unit is ready and clustered
|
||||
nova-cloud-controller/1* active idle 4/lxd/0 10.246.114.43 8774/tcp,8778/tcp Unit is ready
|
||||
nova-cc-hacluster/0* active idle 10.246.114.43 Unit is ready and clustered
|
||||
nova-cloud-controller/2 active idle 5/lxd/0 10.246.114.47 8774/tcp,8778/tcp Unit is ready
|
||||
nova-cc-hacluster/1 active idle 10.246.114.47 Unit is ready and clustered
|
||||
|
||||
In summary,
|
||||
|
||||
* The glance principle leader unit is glance/0 and is deployed on machine
|
||||
0/lxd/0 (the glance principle leader machine).
|
||||
* The nova-cloud-controller principle leader unit is nova-cloud-controller/1
|
||||
and is deployed on machine 4/lxd/0 (the nova-cloud-controller principle
|
||||
leader machine).
|
||||
|
||||
The procedure has been expedited slightly by adding the ``--yes`` confirmation
|
||||
option to the :command:`prepare` sub-command.
|
||||
|
||||
#. Set the default series for both the model and the principle applications:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju model-config default-series=bionic
|
||||
juju set-series glance bionic
|
||||
juju set-series nova-cloud-controller bionic
|
||||
|
||||
#. Pause the hacluster units not associated with their principle leader
|
||||
machines:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait glance-hacluster/1 pause
|
||||
juju run-action --wait glance-hacluster/2 pause
|
||||
juju run-action --wait nova-cc-hacluster/1 pause
|
||||
juju run-action --wait nova-cc-hacluster/2 pause
|
||||
|
||||
#. Pause the principle non-leader units:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait glance/1 pause
|
||||
juju run-action --wait glance/2 pause
|
||||
juju run-action --wait nova-cloud-controller/0 pause
|
||||
juju run-action --wait nova-cloud-controller/2 pause
|
||||
|
||||
#. Perform any workload maintenance pre-upgrade steps on all machines. There
|
||||
are no glance-specific or nova-cloud-controller-specific steps to perform.
|
||||
|
||||
#. Invoke the :command:`prepare` sub-command on all machines, **starting with
|
||||
the principle leader machines**:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju upgrade-series --yes 0/lxd/0 prepare bionic
|
||||
juju upgrade-series --yes 4/lxd/0 prepare bionic
|
||||
juju upgrade-series --yes 1/lxd/0 prepare bionic
|
||||
juju upgrade-series --yes 2/lxd/0 prepare bionic
|
||||
juju upgrade-series --yes 3/lxd/0 prepare bionic
|
||||
juju upgrade-series --yes 5/lxd/0 prepare bionic
|
||||
|
||||
#. Upgrade the operating system on all machines. The non-interactive method is
|
||||
used here:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0,3/lxd/0,4/lxd/0,5/lxd/0 \
|
||||
--timeout=20m -- sudo apt-get update
|
||||
juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0,3/lxd/0,4/lxd/0,5/lxd/0 \
|
||||
--timeout=120m -- sudo DEBIAN_FRONTEND=noninteractive apt-get --assume-yes \
|
||||
-o "Dpkg::Options::=--force-confdef" \
|
||||
-o "Dpkg::Options::=--force-confold" dist-upgrade
|
||||
juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0,3/lxd/0,4/lxd/0,5/lxd/0 \
|
||||
--timeout=200m -- sudo DEBIAN_FRONTEND=noninteractive \
|
||||
do-release-upgrade -f DistUpgradeViewNonInteractive
|
||||
|
||||
#. Perform any workload maintenance post-upgrade steps on all machines. There
|
||||
are no glance-specific or nova-cloud-controller-specific steps to perform.
|
||||
|
||||
#. Reboot all machines:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run --machine=0/lxd/0,1/lxd/0,2/lxd/0,3/lxd/0,4/lxd/0,5/lxd/0 -- sudo reboot
|
||||
|
||||
#. Set the value of the ``openstack-origin`` configuration option to 'distro':
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju config glance openstack-origin=distro
|
||||
juju config nova-cloud-controller openstack-origin=distro
|
||||
|
||||
#. Invoke the :command:`complete` sub-command on all machines:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju upgrade-series 0/lxd/0 complete
|
||||
juju upgrade-series 1/lxd/0 complete
|
||||
juju upgrade-series 2/lxd/0 complete
|
||||
juju upgrade-series 3/lxd/0 complete
|
||||
juju upgrade-series 4/lxd/0 complete
|
||||
juju upgrade-series 5/lxd/0 complete
|
||||
|
||||
Physical machines
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
This section covers series upgrade procedures for applications hosted on
|
||||
physical machines in particular. These typically include:
|
||||
|
||||
* ceph-osd
|
||||
* neutron-gateway
|
||||
* nova-compute
|
||||
|
||||
When performing a series upgrade on a physical machine more attention should be
|
||||
given to any workload maintenance pre-upgrade steps:
|
||||
|
||||
* For compute nodes migrate all running VMs to another hypervisor.
|
||||
* For network nodes force HA routers off of the current node.
|
||||
* Any storage related tasks that may be required.
|
||||
* Any site specific tasks that may be required.
|
||||
|
||||
The following two sub-sections will show how to perform a series upgrade
|
||||
for a single physical machine and for multiple physical machines concurrently.
|
||||
|
||||
Upgrading a single physical machine
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
This example procedure will be based on the nova-compute and ceph-osd
|
||||
applications residing on the same physical machine. Since application
|
||||
leadership does not play a significant role with these two applications, and
|
||||
because the hacluster application is not present, there will be no units to
|
||||
pause (as there were in previous scenarios).
|
||||
|
||||
This scenario is represented by the following partial :command:`juju status`
|
||||
command output:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
Model Controller Cloud/Region Version SLA Timestamp
|
||||
upgrade maas-controller mymaas/default 2.7.6 unsupported 15:23:21Z
|
||||
|
||||
App Version Status Scale Charm Store Rev OS Notes
|
||||
ceph-osd 12.2.12 active 1 ceph-osd jujucharms 301 ubuntu
|
||||
keystone 13.0.2 active 1 keystone jujucharms 312 ubuntu
|
||||
nova-compute 17.0.12 active 1 nova-compute jujucharms 314 ubuntu
|
||||
|
||||
Unit Workload Agent Machine Public address Ports Message
|
||||
ceph-osd/0* active idle 0 10.0.0.235 Unit is ready (1 OSD)
|
||||
keystone/0* active idle 0/lxd/0 10.0.0.240 5000/tcp Unit is ready
|
||||
nova-compute/0* active idle 0 10.0.0.235 Unit is ready
|
||||
|
||||
Machine State DNS Inst id Series AZ Message
|
||||
0 started 10.0.0.235 node1 xenial zone1 Deployed
|
||||
0/lxd/0 started 10.0.0.240 juju-88b27a-0-lxd-0 xenial zone1 Container started
|
||||
|
||||
In summary, the ceph-osd and nova-compute applications are hosted on machine 0.
|
||||
Recall that container 0/lxd/0 will need to have its series upgraded separately.
|
||||
|
||||
#. It is recommended to set the Ceph cluster OSDs to 'noout'. This is typically
|
||||
done at the application level (i.e. not at the unit or machine level):
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait ceph-mon/leader set-noout
|
||||
|
||||
#. All running VMs should be migrated to another hypervisor.
|
||||
|
||||
#. Upgrade the series on machine 0:
|
||||
|
||||
#. Invoke the :command:`prepare` sub-command:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju upgrade-series 0 prepare bionic
|
||||
|
||||
#. Upgrade the operating system:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run --machine=0 -- sudo apt update
|
||||
juju ssh 0 sudo apt full-upgrade
|
||||
juju ssh 0 sudo do-release-upgrade
|
||||
|
||||
#. Reboot (if not already done):
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run --machine=0 -- sudo reboot
|
||||
|
||||
#. Set the value of the ``openstack-origin`` or ``source`` configuration
|
||||
options to 'distro':
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju config nova-compute openstack-origin=distro
|
||||
juju config ceph-osd source=distro
|
||||
|
||||
#. Invoke the :command:`complete` sub-command on the machine:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju upgrade-series 0 complete
|
||||
|
||||
#. If OSDs were previously set to 'noout' then check up/in status of those
|
||||
OSDs in ceph status, then unset 'noout' for the cluster:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run --unit ceph-mon/leader -- ceph status
|
||||
juju run-action --wait ceph-mon/leader unset-noout
|
||||
|
||||
Upgrading multiple physical hosts concurrently
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
When physical machines have their series upgraded concurrently Availability
|
||||
Zones need to be taken into account. Machines should be placed into upgrade
|
||||
groups such that any API services running on them have a maximum of one unit
|
||||
per group. This is to ensure API availability at the reboot stage.
|
||||
|
||||
This simplified bundle is used to demonstrate the general idea:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
series: xenial
|
||||
machines:
|
||||
0: {}
|
||||
1: {}
|
||||
2: {}
|
||||
3: {}
|
||||
4: {}
|
||||
5: {}
|
||||
applications:
|
||||
nova-compute:
|
||||
charm: cs:nova-compute
|
||||
num_units: 3
|
||||
options:
|
||||
openstack-origin: cloud:xenial-queens
|
||||
to:
|
||||
- 0
|
||||
- 2
|
||||
- 4
|
||||
keystone:
|
||||
charm: cs:keystone
|
||||
constraints: mem=1G
|
||||
num_units: 3
|
||||
options:
|
||||
vip: 10.85.132.200
|
||||
openstack-origin: cloud:xenial-queens
|
||||
to:
|
||||
- lxd:1
|
||||
- lxd:3
|
||||
- lxd:5
|
||||
keystone-hacluster:
|
||||
charm: cs:hacluster
|
||||
options:
|
||||
cluster_count: 3
|
||||
|
||||
Three upgrade groups could consist of the following machines:
|
||||
|
||||
#. Machines 0 and 1
|
||||
#. Machines 2 and 3
|
||||
#. Machines 4 and 5
|
||||
|
||||
In this way, a less time-consuming series upgrade can be performed while still
|
||||
ensuring the availability of services.
|
||||
|
||||
.. caution::
|
||||
|
||||
For the ceph-osd application, ensure that rack-aware replication rules exist
|
||||
in the CRUSH map if machines are being rebooted together. This is to prevent
|
||||
significant interruption to running workloads from occurring if the
|
||||
same placement group is hosted on those machines. For example, if ceph-mon
|
||||
is deployed with ``customize-failure-domain`` set to 'true' and the ceph-osd
|
||||
units are hosted on machines in three or more separate Juju AZs you can
|
||||
safely reboot ceph-osd machines concurrently in the same zone. See
|
||||
:ref:`Ceph AZ <ceph_az>` in :doc:`OpenStack high availability <app-ha>` for
|
||||
details.
|
||||
|
||||
Automation
|
||||
----------
|
||||
|
||||
Series upgrades across an OpenStack cloud can be time consuming, even when
|
||||
using concurrent methods wherever possible. They can also be tedious and thus
|
||||
susceptible to human error.
|
||||
|
||||
The following code examples encapsulate the processes described in this
|
||||
document. They are provided solely to illustrate the methods used to develop
|
||||
and test the series upgrade primitives:
|
||||
|
||||
* `Parallel tests`_: An example that is used as a functional verification of
|
||||
a series upgrade in the OpenStack Charms project.
|
||||
* `Upgrade helpers`_: A set of helpers used in the above upgrade example.
|
||||
|
||||
.. caution::
|
||||
|
||||
The example code should only be used for its intended use case of
|
||||
development and testing. Do not attempt to automate a series upgrade on a
|
||||
production cloud.
|
||||
|
||||
.. LINKS
|
||||
.. _Charm upgrades: app-upgrade-openstack#charm-upgrades
|
||||
.. _Series upgrade: app-series-upgrade
|
||||
.. _Parallel tests: https://github.com/openstack-charmers/zaza-openstack-tests/blob/c492ecdcac3b2724833c347e978de97ea2e626d7/zaza/openstack/charm_tests/series_upgrade/parallel_tests.py#L64
|
||||
.. _Upgrade helpers: https://github.com/openstack-charmers/zaza-openstack-tests/blob/9cec2efabe30fb0709bc098c48ec10bcb85cc9d4/zaza/openstack/utilities/parallel_series_upgrade.py
|
|
@ -0,0 +1,182 @@
|
|||
:orphan:
|
||||
|
||||
.. _series_upgrade_specific_procedures:
|
||||
|
||||
==================================
|
||||
Specific series upgrade procedures
|
||||
==================================
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
This page describes procedures that may be required when performing a series
|
||||
upgrade across a Charmed OpenStack cloud. They relate to specific cloud
|
||||
workloads. Please read the more general :doc:`Series upgrade
|
||||
<app-series-upgrade>` appendix before attempting any of the instructions given
|
||||
here.
|
||||
|
||||
.. _percona_series_upgrade_to_focal:
|
||||
|
||||
percona-cluster charm: series upgrade to Focal
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In Ubuntu 20.04 LTS (Focal) the percona-xtradb-cluster-server package will no
|
||||
longer be available. It has been replaced by mysql-server-8.0 and mysql-router
|
||||
in Ubuntu main. Therefore, there is no way to series upgrade percona-cluster to
|
||||
Focal. Instead the databases hosted by percona-cluster will need to be migrated
|
||||
to mysql-innodb-cluster and mysql-router will need to be deployed as a
|
||||
subordinate on the applications that use MySQL as a data store.
|
||||
|
||||
.. warning::
|
||||
|
||||
Since the DB affects most OpenStack services it is important to have a
|
||||
sufficient downtime window. The following procedure is written in an attempt
|
||||
to migrate one service at a time (i.e. keystone, glance, cinder, etc).
|
||||
However, it may be more practical to migrate all databases at the same time
|
||||
during an extended downtime window, as there may be unexpected
|
||||
interdependencies between services.
|
||||
|
||||
.. note::
|
||||
|
||||
It is possible for percona-cluster to remain on Ubuntu 18.04 LTS while
|
||||
the rest of the cloud migrates to Ubuntu 20.04 LTS. In fact, this state
|
||||
will be one step of the migration process.
|
||||
|
||||
Procedure
|
||||
^^^^^^^^^
|
||||
|
||||
* Leave all the percona-cluster machines on Bionic and upgrade the series of
|
||||
the remaining machines in the cloud per this document.
|
||||
|
||||
* Deploy a mysql-innodb-cluster on Focal.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju deploy -n 3 mysql-innodb-cluster --series focal
|
||||
|
||||
* Deploy (but do not yet relate) an instance of mysql-router for every
|
||||
application that requires a data store (i.e. every application that was
|
||||
related to percona-cluster).
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju deploy mysql-router cinder-mysql-router
|
||||
juju deploy mysql-router glance-mysql-router
|
||||
juju deploy mysql-router keystone-mysql-router
|
||||
...
|
||||
|
||||
* Add relations between the mysql-router instances and the
|
||||
mysql-innodb-cluster.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju add-relation cinder-mysql-router:db-router mysql-innodb-cluster:db-router
|
||||
juju add-relation glance-mysql-router:db-router mysql-innodb-cluster:db-router
|
||||
juju add-relation keystone-mysql-router:db-router mysql-innodb-cluster:db-router
|
||||
...
|
||||
|
||||
On a per-application basis:
|
||||
|
||||
* Remove the relation between the application charm and the percona-cluster
|
||||
charm. You can view existing relations with the :command:`juju status
|
||||
percona-cluster --relations` command.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju remove-relation keystone:shared-db percona-cluster:shared-db
|
||||
|
||||
* Dump the existing database(s) from percona-cluster.
|
||||
|
||||
.. note::
|
||||
|
||||
In the following, the percona-cluster/0 and mysql-innodb-cluster/0 units
|
||||
are used as examples. For percona, any unit of the application may be used,
|
||||
though all the steps should use the same unit. For mysql-innodb-cluster,
|
||||
the RW unit should be used. The RW unit of the mysql-innodb-cluster can be
|
||||
determined from the :command:`juju status mysql-innodb-cluster` command.
|
||||
|
||||
* Allow Percona to dump databases. See `Percona strict mode`_ to understand
|
||||
the implications of this setting.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait percona-cluster/0 set-pxc-strict-mode mode=MASTER
|
||||
|
||||
* Dump the specific application's database(s).
|
||||
|
||||
.. note::
|
||||
|
||||
Depending on downtime restrictions it is possible to dump all databases at
|
||||
one time: run the ``mysqldump`` action without setting the ``databases``
|
||||
parameter. Similarly, it is possible to import all the databases into
|
||||
mysql-innodb-clulster from that single dump file.
|
||||
|
||||
.. note::
|
||||
|
||||
The database name may or may not match the application name. For example,
|
||||
while keystone has a DB named keystone, openstack-dashboard has a database
|
||||
named horizon. Some applications have multiple databases. Notably,
|
||||
nova-cloud-controller which has at least: nova,nova_api,nova_cell0 and a
|
||||
nova_cellN for each additional cell. See upstream documentation for the
|
||||
respective application to determine the database name.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
# Single DB
|
||||
juju run-action --wait percona-cluster/0 mysqldump databases=keystone
|
||||
|
||||
# Multiple DBs
|
||||
juju run-action --wait percona-cluster/0 mysqldump databases=nova,nova_api,nova_cell0
|
||||
|
||||
* Return Percona enforcing strict mode. See `Percona strict mode`_ to
|
||||
understand the implications of this setting.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait percona-cluster/0 set-pxc-strict-mode mode=ENFORCING
|
||||
|
||||
* Transfer the mysqldump file from the percona-cluster unit to the
|
||||
mysql-innodb-cluster RW unit. The RW unit of the mysql-innodb-cluster can be
|
||||
determined with :command:`juju status mysql-innodb-cluster`. Bellow we use
|
||||
mysql-innodb-cluster/0 as an example.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju scp percona-cluster/0:/var/backups/mysql/mysqldump-keystone-<DATE>.gz .
|
||||
juju scp mysqldump-keystone-<DATE>.gz mysql-innodb-cluster/0:/home/ubuntu
|
||||
|
||||
* Import the database(s) into mysql-innodb-cluster.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait mysql-innodb-cluster/0 restore-mysqldump dump-file=/home/ubuntu/mysqldump-keystone-<DATE>.gz
|
||||
|
||||
* Relate an instance of mysql-router for every application that requires a data
|
||||
store (i.e. every application that needed percona-cluster):
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju add-relation keystone:shared-db keystone-mysql-router:shared-db
|
||||
|
||||
* Repeat for remaining applications.
|
||||
|
||||
An overview of this process can be seen in the OpenStack charmer's team CI
|
||||
`Zaza migration code`_.
|
||||
|
||||
Post-migration
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
As noted above, it is possible to run the cloud with percona-cluster remaining
|
||||
on Bionic indefinitely. Once all databases have been migrated to
|
||||
mysql-innodb-cluster, all the databases have been backed up, and the cloud has
|
||||
been verified to be in good working order the percona-cluster application (and
|
||||
its probable hacluster subordinates) may be removed.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju remove-application percona-cluster-hacluster
|
||||
juju remove-application percona-cluster
|
||||
|
||||
.. LINKS
|
||||
.. _Zaza migration code: https://github.com/openstack-charmers/zaza-openstack-tests/blob/master/zaza/openstack/charm_tests/mysql/tests.py#L556
|
||||
.. _Percona strict mode: https://www.percona.com/doc/percona-xtradb-cluster/LATEST/features/pxc-strict-mode.html
|
|
@ -1,668 +1,338 @@
|
|||
Appendix F: Series Upgrade
|
||||
==============================
|
||||
|
||||
Introduction
|
||||
++++++++++++
|
||||
|
||||
Juju and OpenStack charms provide the primitives to prepare for and
|
||||
respond to an upgrade from one Ubuntu LTS series to another.
|
||||
|
||||
.. note::
|
||||
|
||||
The recommended best practice is that the Juju machines that comprise the
|
||||
cloud should eventually all be running the same series (e.g. 'xenial' or
|
||||
'bionic', but not a mix of the two).
|
||||
|
||||
Warnings
|
||||
++++++++
|
||||
|
||||
Upgrading a single machine from one LTS to another is a complex task.
|
||||
Doing so on a running OpenStack cloud is an order of magnitude more
|
||||
complex.
|
||||
|
||||
Please read through this document thoroughly before attempting a series
|
||||
upgrade. Please pay particular attention to the Assumptions section and
|
||||
the order of operations.
|
||||
|
||||
The series upgrade should be executed by an administrator or team of
|
||||
administrators who are intimately familiar with the cloud undergoing
|
||||
upgrade, OpenStack in general, working with Juju and OpenStack charms.
|
||||
|
||||
The tasks of preparing stateful OpenStack services for series upgrade is
|
||||
not automated and is the responsibility of the administrator. For
|
||||
example: evacuating a compute node, switching HA routers to a network
|
||||
node, any storage rebalancing that may be required.
|
||||
|
||||
The actual task of executing the do-release-upgrade on an individual
|
||||
machine is not automated. It will be performed by the administrator. Any
|
||||
bespoke preparation for or cleanup after the do-release-upgrade is the
|
||||
responsibility of the administrator.
|
||||
|
||||
The series upgrade process requires API downtime. Although the goal is
|
||||
minimal downtime, it is necessary to pause services to avoid race
|
||||
condition errors. Therefore, the API undergoing upgrade will require
|
||||
downtime.
|
||||
|
||||
Stateful services which OpenStack depends on such as percona-cluster and
|
||||
rabbitmq will affect all APIs during series upgrade and therefore
|
||||
require downtime.
|
||||
|
||||
Third party charms may not have implemented series upgrade yet. Please
|
||||
pay particular attention to SDN and storage charms which may affect
|
||||
cloud operation.
|
||||
|
||||
If the architecture and layout of charms does not match the assumptions
|
||||
section of this document, great care needs to be taken to avoid problems
|
||||
with application leadership across machines. In other words, if most
|
||||
services are not in LXD containers, it is possible to have the leader of
|
||||
percona-cluster on one host and the leader of rabbit on another causing
|
||||
complication's in the procure for series upgrade.
|
||||
|
||||
Test, test, test! The series upgrade process should be tested on a
|
||||
non-production cloud that closely resembles the eventual production
|
||||
environment. Not only does this validate the software involved but it
|
||||
prepares the administrator for the complex task ahead.
|
||||
|
||||
|
||||
Juju
|
||||
++++
|
||||
|
||||
Please read all Juju documentation on the series upgrade feature.
|
||||
|
||||
https://docs.jujucharms.com/devel/en/getting-started
|
||||
|
||||
.. note::
|
||||
The Juju upgrade-series command operates on the machine level. This
|
||||
document will be focused on applications as many require pausing their
|
||||
peers and some subordinates. But it is important to remember the whole
|
||||
machine is upgraded.
|
||||
|
||||
Applications deployed in a LXD container are considered a machine apart
|
||||
from the physical host machine the container is hosted on.
|
||||
|
||||
Upgrading the host machine will not upgrade the LXD contained machines.
|
||||
However, when the required post-upgrade reboot of the host machine
|
||||
occurs all the services contained in LXD containers will be unavailable
|
||||
during the reboot.
|
||||
|
||||
For example a physical host with nova-compute, neutron-openvswitch and
|
||||
ceph-osd colocated as well as hosting a keystone unit in a LXD. When
|
||||
the juju upgrade-series prepare command is executed on the machine,
|
||||
nova-compute, neutron-openvswitch and ceph-osd will execute their
|
||||
pre-series-upgrade hooks but keystone will not. Nor will the LXD
|
||||
operating system be affected by the do-release-upgrade on the host. At
|
||||
reboot however, the keystone unit will be unavailable during the
|
||||
duration of the reboot. Please plan accordingly.
|
||||
|
||||
|
||||
Assumptions
|
||||
+++++++++++
|
||||
|
||||
This document makes a number of assumptions about the architecture and
|
||||
preparation of the cloud undergoing series upgrade. Please review these
|
||||
and compare to the running cloud before performing the series upgrade.
|
||||
|
||||
Preparations
|
||||
~~~~~~~~~~~~
|
||||
|
||||
The entire suite of charms used to manage the cloud should be upgraded to the
|
||||
latest stable charm revision before any major change is made to the cloud such
|
||||
as the current machine series upgrades. See `Charm upgrades`_ for guidance.
|
||||
|
||||
OpenStack is upgraded to the highest version the current LTS supports.
|
||||
Mitaka for Trusty and Queens for Xenial.
|
||||
|
||||
The current Ubuntu operating system is up to date prior to do-release-upgrade.
|
||||
|
||||
Stateful services have been backed up. Percona-cluster and mongodb
|
||||
should be backed up prior to upgrading.
|
||||
|
||||
General cloud health. Confirm the cloud is fully operational before
|
||||
beginning a series upgrade.
|
||||
|
||||
OpenStack charms health. No charms are in hook error. Confirm the health
|
||||
of the juju environment before beginning series upgrade.
|
||||
|
||||
Per machine preparations. Individual compute nodes are evacuated prior
|
||||
to series upgrade. HA routers are moved to network nodes not undergoing
|
||||
series upgrade.
|
||||
|
||||
`Automatic Updates aka. Unattended Upgrades <https://help.ubuntu.com/lts/serverguide/automatic-updates.html.en>`_
|
||||
is enabled by default on Ubuntu Server and must be disabled on all machines
|
||||
prior to initiating the upgrade procedure. This is imperative to stay in
|
||||
control of when and where updates are applied throughout the upgrade procedure.
|
||||
|
||||
|
||||
Hyper-Converged Architecture
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Compute, storage and their subordinates may be colocated.
|
||||
|
||||
API Services are deployed in LXD containers.
|
||||
|
||||
Percona-cluster is deployed in a LXD container.
|
||||
|
||||
Rabbitmq is deployed in a LXD container.
|
||||
|
||||
Third party charms either do not exist or have been thoroughly tested
|
||||
for series upgrade.
|
||||
|
||||
No other non-subordinate charms are colocated on the same machine.
|
||||
|
||||
===========================
|
||||
Appendix F1: Series upgrade
|
||||
===========================
|
||||
|
||||
Overview
|
||||
++++++++
|
||||
--------
|
||||
|
||||
The purpose of this document is to provide foundational knowledge for preparing
|
||||
an administrator to perform a series upgrade across a Charmed OpenStack cloud.
|
||||
This translates to upgrading the operating system of every cloud node to an
|
||||
entirely new version.
|
||||
|
||||
.. note::
|
||||
This overview is not a substitute for understanding the
|
||||
entirety of this document. It is the general case but the individual
|
||||
details matter. Read "where appropriate" at the end of each step.
|
||||
|
||||
Evacuate or otherwise prepare the machine
|
||||
A series upgrade, a charm upgrade, and an OpenStack upgrade are all
|
||||
conceptually different and involve separate operations.
|
||||
|
||||
Pause hacluster for non-leader units not undergoing upgrade
|
||||
Once this document has been studied the administrator will be ready to graduate
|
||||
to the :doc:`Series upgrade OpenStack <app-series-upgrade-openstack>` guide
|
||||
that describes the process in more detail.
|
||||
|
||||
Pause non-leader peer units not undergoing upgrade
|
||||
Concerning the cloud being operated upon, the following is assumed:
|
||||
|
||||
Juju upgrade-series prepare the leader's machine
|
||||
* It is being upgraded from one LTS series to another (e.g. xenial to
|
||||
bionic, bionic to focal, etc.)
|
||||
* Its nodes are backed by MAAS.
|
||||
* Its services are highly available
|
||||
* It is being upgraded with minimal downtime
|
||||
|
||||
Execute do-release-upgrade and any post-upgrade operating system tasks
|
||||
.. warning::
|
||||
|
||||
Reboot
|
||||
Upgrading a single production machine from one LTS to another is a serious
|
||||
task. Doing so for every cloud node can be that much harder. Attempting to
|
||||
do this with minimal cloud downtime is an order of magnitude more complex.
|
||||
|
||||
Set openstack-origin or source for new operating system ("distro")
|
||||
Such an undertaking should be executed by persons who are intimately
|
||||
familiar with Juju and the currently deployed charms (and their related
|
||||
applications). It should first be tested on a non-production cloud that
|
||||
closely resembles the production environment.
|
||||
|
||||
Juju upgrade-series complete the machine
|
||||
The Juju :command:`upgrade-series` command
|
||||
------------------------------------------
|
||||
|
||||
Repeat the steps from prepare to complete for the non-leader machines
|
||||
The Juju :command:`upgrade-series` command is the cornerstone of the entire
|
||||
procedure. This command manages an operating system upgrade of a targeted
|
||||
machine and operates on every application unit hosted on that machine. The
|
||||
command works in conjunction with either the :command:`prepare` or the
|
||||
:command:`complete` sub-command.
|
||||
|
||||
Perform any cluster completed upgrade tasks after all units of
|
||||
application have been upgraded.
|
||||
The basic process is to inform the units on a machine that a series upgrade
|
||||
is about to commence, to perform the upgrade, and then inform the units that
|
||||
the upgrade has finished. In most cases with the OpenStack charms, units will
|
||||
first be paused and be left with a workload status of "blocked" and a message
|
||||
of "Ready for do-release-upgrade and reboot."
|
||||
|
||||
Juju set-series to the new series for all future units of an application.
|
||||
For example, to inform units on machine '0' that an upgrade (to series
|
||||
'bionic') is about to occur:
|
||||
|
||||
Exceptions
|
||||
~~~~~~~~~~
|
||||
.. code-block:: none
|
||||
|
||||
This overview describes the general case that includes the API charms,
|
||||
percona culster and rabbitmq.
|
||||
juju upgrade-series 0 prepare bionic
|
||||
|
||||
The notable exceptions are nova-compute, ceph-mon and ceph-osd which
|
||||
do not require pausing of any units and unit leadership is irrelevant.
|
||||
The :command:`prepare` sub-command causes **all** the charms (including
|
||||
subordinates) on the machine to run their ``pre-series-upgrade`` hook.
|
||||
|
||||
The administrator must then perform the traditional steps involved in upgrading
|
||||
the OS on the targeted machine (in this example, machine '0'). For example,
|
||||
update/upgrade packages with :command:`apt update && apt full-upgrade`; invoke
|
||||
the :command:`do-release-upgrade` command; and reboot the machine once
|
||||
complete.
|
||||
|
||||
Example as code
|
||||
~~~~~~~~~~~~~~~
|
||||
The :command:`complete` sub-command causes **all** the charms (including
|
||||
subordinates) on the machine to run their ``post-series-upgrade`` hook. In most
|
||||
cases with the OpenStack charms, configuration files will be re-written, units
|
||||
will be resumed automatically (if paused), and be left with a workload status
|
||||
of "active" and a message of "Unit is ready":
|
||||
|
||||
Attempting an automated series upgrade on a running production cloud is
|
||||
not recommended. The following example-as-code encapsulates the
|
||||
processes described in this document, and are provided solely to
|
||||
illustrate the methods used to develop and test the series upgrade
|
||||
primitives. The example code should not be consumed in an automation
|
||||
outside of its intended use case (charm dev/test gate automation).
|
||||
.. code-block:: none
|
||||
|
||||
https://github.com/openstack-charmers/zaza/blob/master/zaza/charm_tests/series_upgrade/tests.py
|
||||
juju upgrade-series 0 complete
|
||||
|
||||
https://github.com/openstack-charmers/zaza/blob/master/zaza/utilities/generic.py#L173
|
||||
|
||||
|
||||
Procedures
|
||||
++++++++++
|
||||
|
||||
The following procures are broken up into categories of charms that
|
||||
follow the same procedure.
|
||||
At this point the series upgrade on the machine and its charms is now done. In
|
||||
the :command:`juju status` output the machine's entry under the Series column
|
||||
will have changed from 'xenial' to 'bionic'.
|
||||
|
||||
.. note::
|
||||
Example commands used in this documentation assume a Trusty to Xenial
|
||||
series upgrade, the same approach is used for Xenial to Bionic
|
||||
series upgrades. Unit and machine numbers are examples only they will
|
||||
differ from site to site. For example the machine number 0 is reused
|
||||
purely for example purposes.
|
||||
|
||||
Charms are not obliged to support the two series upgrade hooks but they do
|
||||
make for a more intelligent and a less error-prone series upgrade.
|
||||
|
||||
Physical Host Nodes
|
||||
Containers (and their charms) hosted on the target machine remain unaffected by
|
||||
this command. However, during the required post-upgrade reboot of the host all
|
||||
containerised services will naturally be unavailable.
|
||||
|
||||
See the Juju documentation to learn more about the `series upgrade`_ feature.
|
||||
|
||||
.. _pre-upgrade_requirements:
|
||||
|
||||
Pre-upgrade requirements
|
||||
------------------------
|
||||
|
||||
This is a list of requirements that apply to any cloud. They must be met before
|
||||
making any changes.
|
||||
|
||||
* All the cloud nodes should be using the same series, be in good working
|
||||
order, and be updated with the latest stable software packages (APT
|
||||
upgrades).
|
||||
|
||||
* The cloud should be running the latest OpenStack release supported by the
|
||||
current series (e.g. Mitaka for trusty, Queens for xenial, etc.). See `Ubuntu
|
||||
OpenStack release cycle`_ and `OpenStack upgrades`_.
|
||||
|
||||
* The cloud should be fully operational and error-free.
|
||||
|
||||
* All currently deployed charms should be upgraded to the latest stable charm
|
||||
revision. See `Charm upgrades`_.
|
||||
|
||||
* The Juju model comprising the cloud should be error-free (e.g. there should
|
||||
be no charm hook errors).
|
||||
|
||||
* `Automatic package updates`_ should be disabled on the nodes to avoid
|
||||
potential conflicts with the manual (or scripted) APT steps.
|
||||
|
||||
.. _series_specific_procedures:
|
||||
|
||||
Specific series upgrade procedures
|
||||
----------------------------------
|
||||
|
||||
Charms belonging to the OpenStack Charms project are designed to accommodate
|
||||
the next LTS target series wherever possible. However, a new series may
|
||||
occasionally introduce unavoidable challenges for a deployed charm. For
|
||||
instance, it could be that a charm is replaced by an entirely new charm on the
|
||||
new series. This can happen due to development policy concerning the charms
|
||||
themselves (e.g. the ceph charm is replaced by the ceph-mon and ceph-osd
|
||||
charms) or due to reasons independent of the charms (e.g. the workload software
|
||||
is no longer supported on the new operating system). Any core OpenStack charms
|
||||
affected in this way will be documented below.
|
||||
|
||||
* :ref:`percona-cluster charm: series upgrade to Focal <percona_series_upgrade_to_focal>`
|
||||
|
||||
.. _workload_specific_preparations:
|
||||
|
||||
Workload specific preparations
|
||||
------------------------------
|
||||
|
||||
These are preparations that are specific to the current cloud deployment.
|
||||
Completing them in advance is an integral part of the upgrade.
|
||||
|
||||
Charm upgradability
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Procedure for the physical host nodes which may include nova-compute,
|
||||
neutron-openvswitch and ceph-osd as well as neutron-gateway. Though
|
||||
ceph-mon is most often deployed in LXD containers it follows this
|
||||
procedure.
|
||||
Verify the documented series upgrade processes for all currently deployed
|
||||
charms. Some charms, especially third-party charms, may either not have
|
||||
implemented series upgrade yet or simply may not work with the target series.
|
||||
Pay particular attention to SDN (software defined networking) and storage
|
||||
charms as these play a crucial role in cloud operations.
|
||||
|
||||
.. note::
|
||||
Nova-compute and ceph-osd are used in the commands below for
|
||||
example purposes. In this example, physical host where
|
||||
nova-compute/0 and ceph-osd/0 are deployed is machine 0.
|
||||
Workload maintenance
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Evacuate or otherwise prepare the machine
|
||||
For compute nodes move all running VMs off the physical host.
|
||||
For network nodes force HA routers off of the current node.
|
||||
Any storage related tasks that may be required.
|
||||
Any site specific tasks that may be required.
|
||||
Any workload-specific pre and post series upgrade maintenance tasks should be
|
||||
readied in advance. For example, if a node's workload requires a database then
|
||||
a pre-upgrade backup plan should be drawn up. Similarly, if a workload requires
|
||||
settings to be adjusted post-upgrade then those changes should be prepared
|
||||
ahead of time. Pay particular attention to stateful services due to their
|
||||
importance in cloud operations. Examples include evacuating a compute node,
|
||||
switching an HA router to another node, and storage rebalancing.
|
||||
|
||||
Pre-upgrade tasks are performed before issuing the :command:`prepare`
|
||||
subcommand, and post-upgrade tasks are done immediately prior to issuing the
|
||||
:command:`complete` subcommand.
|
||||
|
||||
Juju upgrade-series prepare the machine
|
||||
.. code:: bash
|
||||
Workflow: sequential vs. concurrent
|
||||
-----------------------------------
|
||||
|
||||
juju upgrade-series 0 prepare xenial
|
||||
In terms of the workflow there are two approaches:
|
||||
|
||||
.. note::
|
||||
The upgrade-series prepare command causes all the charms on the given
|
||||
machine to run their pre-series-upgrade hook. For most cases with the
|
||||
OpenStack charms this pauses the unit. At the completion of the
|
||||
pre-series-upgrade hook the workload status should be "blocked" with
|
||||
the message "Ready for do-release-upgrade and reboot."
|
||||
* Sequential - upgrading one machine at a time
|
||||
* Concurrent - upgrading a group of machines simultaneously
|
||||
|
||||
Execute do-release-upgrade and any post-upgrade operating system tasks
|
||||
The do-release-upgrade process is performed by the administrator. Any
|
||||
post do-release-upgrade tasks are also the responsibility of the
|
||||
administrator.
|
||||
Normally, it is best to upgrade sequentially as this ensures data reliability
|
||||
and availability (we've assumed an HA cloud). This approach also minimises
|
||||
adverse effects to the deployment if something goes wrong.
|
||||
|
||||
Reboot
|
||||
Post do-release-upgrade reboot executed by the administrator.
|
||||
However, for even moderately sized clouds, an intervention based purely on a
|
||||
sequential approach can take a very long time to complete. This is where the
|
||||
concurrent method becomes attractive.
|
||||
|
||||
Set openstack-origin or source for new operating system ("distro")
|
||||
This step is required and should occur before the first node is
|
||||
completed.
|
||||
In general, a concurrent approach is a viable option for API applications but
|
||||
is not an option for stateful applications. During the course of the cloud-wide
|
||||
series upgrade a hybrid strategy is a reasonable choice.
|
||||
|
||||
.. code:: bash
|
||||
To be clear, the above pertains to upgrading the series on machines associated
|
||||
with a single application. It is also possible however to employ similar
|
||||
thinking to multiple applications.
|
||||
|
||||
juju config nova-compute openstack-origin=distro
|
||||
juju config ceph-osd source=distro
|
||||
|
||||
|
||||
Juju upgrade-series complete the machine
|
||||
.. code:: bash
|
||||
|
||||
juju upgrade-series 0 complete
|
||||
|
||||
.. note::
|
||||
|
||||
The upgrade-series complete command causes all the charms on the given
|
||||
machine to run their post-series-upgrade hook. For most cases with the
|
||||
OpenStack charms this re-writes configuration files and resumes the unit.
|
||||
At the completion of the post-series-upgrade hook the workload status
|
||||
should be "active" with the message "Unit is ready."
|
||||
|
||||
Juju set-series to the new series for all future units of an application.
|
||||
To guarantee that any future unit-add commands create new
|
||||
instantiations of the application on the correct series it is necessary
|
||||
to set the series on the application.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
juju set-series nova-compute xenial
|
||||
juju set-series neutron-openvswitch xenial
|
||||
juju set-series ceph-osd xenial
|
||||
|
||||
|
||||
Repeat the procedure for all physical host nodes.
|
||||
It is not necessary to repeat the set openstack-origin step.
|
||||
|
||||
|
||||
|
||||
Stateful Services
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
Procedure for the stateful services deployed on LXD containers.
|
||||
These include percona-cluster and rabbitmq.
|
||||
|
||||
.. warning::
|
||||
|
||||
For Bionic to Focal series upgrades see percona-cluster migration to
|
||||
mysql-innodb-cluster and mysql-router under Series Specific Procedures.
|
||||
Application leadership
|
||||
----------------------
|
||||
|
||||
`Application leadership`_ plays an important role in determining the order in
|
||||
which machines (and their applications) will have their series upgraded. The
|
||||
guiding principle is that an application's unit leader is acted upon by a
|
||||
series upgrade before its non-leaders are (the leader is typically used to
|
||||
coordinate aspects with other services over relations).
|
||||
|
||||
.. note::
|
||||
While percona-cluster is often deployed with hacluster for HA,
|
||||
rabbitmq is not. Ignore the hacluster steps for rabbitmq.
|
||||
Likewise no backup is required of rabbitmq. Percona-cluster is used
|
||||
below for example purposes. In this example, the LXD container the
|
||||
leader node of percona-cluster/0 is deployed on is machine 0.
|
||||
|
||||
Juju will not transfer the leadership of an application (and any
|
||||
subordinate) to another unit while the application is undergoing a series
|
||||
upgrade. This allows a charm to make assumptions that will lead to a more
|
||||
reliable outcome.
|
||||
|
||||
Prepare the machine
|
||||
Perform backups of percona-cluster and scp the backup to a secure
|
||||
location.
|
||||
Assuming that a cloud is intended to eventually undergo a series upgrade, this
|
||||
guideline will generally influence the cloud's topology. Containerisation is an
|
||||
effective response to this.
|
||||
|
||||
.. code:: bash
|
||||
.. important::
|
||||
|
||||
juju run-action percona-cluster/0 backup
|
||||
juju scp -- -r percona-cluster/0:/opt/backups/mysql /path/to/local/backup/dir
|
||||
Applications should be co-located on the same machine only if leadership
|
||||
plays a negligible role. Applications deployed with the compute and storage
|
||||
charms fall into this category.
|
||||
|
||||
.. _generic_series_upgrade:
|
||||
|
||||
Pause hacluster for non-leader units not undergoing upgrade
|
||||
.. code:: bash
|
||||
Generic series upgrade
|
||||
----------------------
|
||||
|
||||
juju run-action percona-cluster-hacluster/1 pause
|
||||
juju run-action percona-cluster-hacluster/2 pause
|
||||
This section contains a generic overview of a series upgrade for three
|
||||
machines, each hosting a unit of the `ubuntu`_ application. The initial and
|
||||
target series are xenial and bionic, respectively.
|
||||
|
||||
This scenario is represented by the following :command:`juju status` command
|
||||
output:
|
||||
|
||||
Pause non-leader peer units not undergoing upgrade
|
||||
.. code:: bash
|
||||
.. code-block:: console
|
||||
|
||||
juju run-action percona-cluster/1 pause
|
||||
juju run-action percona-cluster/2 pause
|
||||
Model Controller Cloud/Region Version SLA Timestamp
|
||||
upgrade maas-controller mymaas/default 2.7.6 unsupported 18:33:49Z
|
||||
|
||||
App Version Status Scale Charm Store Rev OS Notes
|
||||
ubuntu1 16.04 active 3 ubuntu jujucharms 15 ubuntu
|
||||
|
||||
Juju upgrade-series prepare the leader's machine
|
||||
.. code:: bash
|
||||
Unit Workload Agent Machine Public address Ports Message
|
||||
ubuntu1/0* active idle 0 10.0.0.241 ready
|
||||
ubuntu1/1 active idle 1 10.0.0.242 ready
|
||||
ubuntu1/2 active idle 2 10.0.0.243 ready
|
||||
|
||||
juju upgrade-series 0 prepare xenial
|
||||
Machine State DNS Inst id Series AZ Message
|
||||
0 started 10.0.0.241 node2 xenial zone3 Deployed
|
||||
1 started 10.0.0.242 node3 xenial zone4 Deployed
|
||||
2 started 10.0.0.243 node1 xenial zone5 Deployed
|
||||
|
||||
.. note::
|
||||
The upgrade-series prepare command causes all the charms on the given
|
||||
machine to run their pre-series-upgrade hook. For most cases with the
|
||||
OpenStack charms this pauses the unit. At the completion of the
|
||||
pre-series-upgrade hook the workload status should be "blocked" with
|
||||
the message "Ready for do-release-upgrade and reboot."
|
||||
First ensure that any new applications will (by default) use the new series, in
|
||||
this case bionic. This is done by configuring at the model level:
|
||||
|
||||
Execute do-release-upgrade and any post-upgrade operating system tasks
|
||||
The do-release-upgrade process is performed by the administrator. Any
|
||||
post do-release-upgrade tasks are also the responsibility of the
|
||||
administrator.
|
||||
.. code-block:: none
|
||||
|
||||
Reboot
|
||||
Post do-release-upgrade reboot executed by the administrator.
|
||||
juju model-config default-series=bionic
|
||||
|
||||
Set openstack-origin or source for new operating system ("distro")
|
||||
This step is required and should occur before the first node is
|
||||
completed but after the other units are paused.
|
||||
Now do the same at the application level. This will affect any new units of the
|
||||
existing application, in this case 'ubuntu1':
|
||||
|
||||
.. code:: bash
|
||||
.. code-block:: none
|
||||
|
||||
juju config percona-cluster source=distro
|
||||
juju set-series ubuntu1 bionic
|
||||
|
||||
Perform the actual series upgrade. We begin with the machine that houses the
|
||||
application unit leader, machine 0 (see the asterisk in the Unit column). Note
|
||||
that :command:`juju run` is preferred over :command:`juju ssh` but the latter
|
||||
should be used for sessions requiring user interaction:
|
||||
|
||||
Juju upgrade-series complete the machine
|
||||
.. code:: bash
|
||||
.. code-block:: none
|
||||
:linenos:
|
||||
|
||||
juju upgrade-series 0 complete
|
||||
# Perform any workload maintenance pre-upgrade steps here
|
||||
juju upgrade-series 0 prepare bionic
|
||||
juju run --machine=0 -- sudo apt update
|
||||
juju ssh 0 sudo apt full-upgrade
|
||||
juju ssh 0 sudo do-release-upgrade
|
||||
# Perform any workload maintenance post-upgrade steps here
|
||||
# Reboot the machine (if not already done)
|
||||
juju upgrade-series 0 complete
|
||||
|
||||
.. note::
|
||||
In this generic example there are no `workload maintenance`_ steps to perform.
|
||||
If there were post-upgrade steps then the prompt to reboot the machine at the
|
||||
end of :command:`do-release-upgrade` should be answered in the negative and the
|
||||
reboot will be initiated manually on line 7 (i.e. :command:`sudo reboot`).
|
||||
|
||||
The upgrade-series complete command causes all the charms on the given
|
||||
machine to run their post-series-upgrade hook. For most cases with the
|
||||
OpenStack charms this re-writes configuration files and resumes the unit.
|
||||
At the completion of the post-series-upgrade hook the workload status
|
||||
should be "active" with the message "Unit is ready."
|
||||
It is possible to invoke the :command:`complete` sub-command before the
|
||||
upgraded machine is ready to process it. Juju will block until the unit is
|
||||
ready after being restarted.
|
||||
|
||||
Repeat the procedure for non-leader nodes
|
||||
It is not necessary to repeat the set openstack-origin step.
|
||||
In lines 4 and 5 the upgrade proceeds in the usual interactive fashion. If a
|
||||
non-interactive mode is preferred, those two lines can be replaced with:
|
||||
|
||||
Perform any cluster completed upgrade tasks after all units of application have been upgraded.
|
||||
Run the complete-cluster-series-upgrade action on the leader node. This
|
||||
action informs each node of the cluster the upgrade process is complete
|
||||
cluster wide. This also updates mysql configuration with all peers in
|
||||
the cluster.
|
||||
.. code-block:: none
|
||||
|
||||
.. code:: bash
|
||||
juju run --machine=0 --timeout=30m -- sudo DEBIAN_FRONTEND=noninteractive apt-get --assume-yes \
|
||||
-o "Dpkg::Options::=--force-confdef" \
|
||||
-o "Dpkg::Options::=--force-confold" dist-upgrade
|
||||
juju run --machine=0 --timeout=30m -- sudo DEBIAN_FRONTEND=noninteractive \
|
||||
do-release-upgrade -f DistUpgradeViewNonInteractive
|
||||
|
||||
juju run-action percona-cluster/0 complete-cluster-series-upgrade
|
||||
The :command:`apt-get` command is preferred while in non-interactive mode (or
|
||||
with scripting).
|
||||
|
||||
Juju set-series to the new series for all future units of an application.
|
||||
To guarantee that any future unit-add commands create new
|
||||
instantiations of the application on the correct series it is necessary
|
||||
to set the series on the application.
|
||||
.. caution::
|
||||
|
||||
.. code:: bash
|
||||
Performing a series upgrade non-interactively can be risky so the decision
|
||||
to do so should be made only after careful deliberation.
|
||||
|
||||
juju set-series percona-cluster xenial
|
||||
|
||||
|
||||
API Services
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Procedure for the API services in LXD containers. These include but are
|
||||
not limited to keystone, glance, cinder, neutron-api and
|
||||
nova-cloud-controller. Any subordinates deployed with these applications
|
||||
will be upgraded at the same time.
|
||||
Machines 1 and 2 should now be upgraded in the same way (in no particular
|
||||
order).
|
||||
|
||||
.. note::
|
||||
Keystone is used in the commands below for example purposes. In this
|
||||
example, the LXD container the leader node of keystone/0 is deployed
|
||||
on is machine 0.
|
||||
|
||||
It has been reported that a trusty:xenial series upgrade may require an
|
||||
additional step to ensure a purely non-interactive mode. A file under
|
||||
``/etc/apt/apt.conf.d`` with a single line as its contents needs to be added
|
||||
to the target machine pre-upgrade and be removed post-upgrade. It can be
|
||||
created (here on machine 0) in this way:
|
||||
|
||||
Pause hacluster for non-leader units not undergoing upgrade
|
||||
.. code:: bash
|
||||
juju run --machine=0 -- "echo 'DPkg::options { "--force-confdef"; "--force-confnew"; }' | sudo tee /etc/apt/apt.conf.d/local"
|
||||
|
||||
juju run-action keystone-hacluster/1 pause
|
||||
juju run-action keystone-hacluster/2 pause
|
||||
Next steps
|
||||
----------
|
||||
|
||||
When you are ready to perform a series upgrade across your cloud proceed to
|
||||
appendix :doc:`Series upgrade OpenStack <app-series-upgrade-openstack>`.
|
||||
|
||||
Pause non-leader peer units not undergoing upgrade
|
||||
.. code:: bash
|
||||
|
||||
juju run-action keystone/1 pause
|
||||
juju run-action keystone/2 pause
|
||||
|
||||
|
||||
Juju upgrade-series prepare the leader's machine
|
||||
.. code:: bash
|
||||
|
||||
juju upgrade-series 0 prepare xenial
|
||||
|
||||
.. note::
|
||||
The upgrade-series prepare command causes all the charms on the given
|
||||
machine to run their pre-series-upgrade hook. For most cases with the
|
||||
OpenStack charms this pauses the unit. At the completion of the
|
||||
pre-series-upgrade hook the workload status should be "blocked" with
|
||||
the message "Ready for do-release-upgrade and reboot."
|
||||
|
||||
Execute do-release-upgrade and any post-upgrade operating system tasks
|
||||
The do-release-upgrade process is performed by the administrator. Any
|
||||
post do-release-upgrade tasks are also the responsibility of the
|
||||
administrator.
|
||||
|
||||
Reboot
|
||||
Post do-release-upgrade reboot executed by the administrator.
|
||||
|
||||
Set openstack-origin or source for new operating system ("distro")
|
||||
This step is required and should occur before the first node is
|
||||
completed but after the other units are paused.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
juju config keystone source=distro
|
||||
|
||||
|
||||
Juju upgrade-series complete the machine
|
||||
.. code:: bash
|
||||
|
||||
juju upgrade-series 0 complete
|
||||
|
||||
.. note::
|
||||
|
||||
The upgrade-series complete command causes all the charms on the given
|
||||
machine to run their post-series-upgrade hook. For most cases with the
|
||||
OpenStack charms this re-writes configuration files and resumes the unit.
|
||||
At the completion of the post-series-upgrade hook the workload status
|
||||
should be "active" with the message "Unit is ready."
|
||||
|
||||
Repeat the procedure for non-leader nodes
|
||||
It is not necessary to repeat the set openstack-origin step.
|
||||
|
||||
Juju set-series to the new series for all future units of an application.
|
||||
To guarantee that any future unit-add commands create new
|
||||
instantiations of the application on the correct series it
|
||||
is necessary to set the series on the application.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
juju set-series keystone xenial
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<!-- LINKS -->
|
||||
|
||||
.. LINKS
|
||||
.. _Charm upgrades: app-upgrade-openstack#charm-upgrades
|
||||
|
||||
|
||||
Series Specific Procedures
|
||||
++++++++++++++++++++++++++
|
||||
|
||||
Bionic to Focal
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
percona-cluster migration to mysql-innodb-cluster and mysql-router
|
||||
__________________________________________________________________
|
||||
|
||||
|
||||
In Ubuntu 20.04 LTS (Focal) the percona-xtradb-cluster-server package will no
|
||||
longer be available. It has been replaced by mysql-server-8.0 and mysql-router
|
||||
in Ubuntu main. Therefore, there is no way to series upgrade percona-cluster to
|
||||
Focal. Instead the databases hosted by percona-cluster will need to be migrated
|
||||
to mysql-innodb-cluster and mysql-router will need to be deployed as a
|
||||
subordinate on the applications that use MySQL as a data store.
|
||||
|
||||
.. warning::
|
||||
|
||||
Since the DB affects most OpenStack services it is important to have a
|
||||
sufficient downtime window. The following procedure is written in an attempt
|
||||
to migrate one service at a time (i.e. keystone, glance, cinder, etc).
|
||||
However, it may be more practical to migrate all databases at the same time
|
||||
during an extended downtime window, as there may be unexpected
|
||||
interdependencies between services.
|
||||
|
||||
.. note::
|
||||
|
||||
It is possible for percona-cluster to remain on Ubuntu 18.04 LTS while
|
||||
the rest of the cloud migrates to Ubuntu 20.04 LTS. In fact, this state
|
||||
will be one step of the migration process.
|
||||
|
||||
|
||||
Procedure
|
||||
|
||||
* Leave all the percona-cluster machines on Bionic and upgrade the series of
|
||||
the remaining machines in the cloud per this document.
|
||||
|
||||
* Deploy a mysql-innodb-cluster on Focal.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju deploy -n 3 mysql-innodb-cluster --series focal
|
||||
|
||||
* Deploy (but do not yet relate) an instance of mysql-router for every
|
||||
application that requires a data store (i.e. every application that was
|
||||
related to percona-cluster).
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju deploy mysql-router cinder-mysql-router
|
||||
juju deploy mysql-router glance-mysql-router
|
||||
juju deploy mysql-router keystone-mysql-router
|
||||
...
|
||||
|
||||
* Add relations between the mysql-router instances and the
|
||||
mysql-innodb-cluster.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju add-relation cinder-mysql-router:db-router mysql-innodb-cluster:db-router
|
||||
juju add-relation glance-mysql-router:db-router mysql-innodb-cluster:db-router
|
||||
juju add-relation keystone-mysql-router:db-router mysql-innodb-cluster:db-router
|
||||
...
|
||||
|
||||
On a per-application basis:
|
||||
|
||||
* Remove the relation between the application charm and the percona-cluster
|
||||
charm. You can view existing relations with the :command:`juju status
|
||||
percona-cluster --relations` command.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju remove-relation keystone:shared-db percona-cluster:shared-db
|
||||
|
||||
* Dump the existing database(s) from percona-cluster.
|
||||
|
||||
.. note::
|
||||
|
||||
In the following, the percona-cluster/0 and mysql-innodb-cluster/0 units
|
||||
are used as examples. For percona, any unit of the application may be used,
|
||||
though all the steps should use the same unit. For mysql-innodb-cluster,
|
||||
the RW unit should be used. The RW unit of the mysql-innodb-cluster can be
|
||||
determined from the :command:`juju status mysql-innodb-cluster` command.
|
||||
|
||||
* Allow Percona to dump databases. See `Percona strict mode`_ to understand
|
||||
the implications of this setting.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait percona-cluster/0 set-pxc-strict-mode mode=MASTER
|
||||
|
||||
* Dump the specific application's database(s).
|
||||
|
||||
.. note::
|
||||
|
||||
Depending on downtime restrictions it is possible to dump all databases at
|
||||
one time: run the ``mysqldump`` action without setting the ``databases``
|
||||
parameter. Similarly, it is possible to import all the databases into
|
||||
mysql-innodb-clulster from that single dump file.
|
||||
|
||||
.. note::
|
||||
|
||||
The database name may or may not match the application name. For example,
|
||||
while keystone has a DB named keystone, openstack-dashboard has a database
|
||||
named horizon. Some applications have multiple databases. Notably,
|
||||
nova-cloud-controller which has at least: nova,nova_api,nova_cell0 and a
|
||||
nova_cellN for each additional cell. See upstream documentation for the
|
||||
respective application to determine the database name.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
# Single DB
|
||||
juju run-action --wait percona-cluster/0 mysqldump databases=keystone
|
||||
|
||||
# Multiple DBs
|
||||
juju run-action --wait percona-cluster/0 mysqldump databases=nova,nova_api,nova_cell0
|
||||
|
||||
* Return Percona enforcing strict mode. See `Percona strict mode`_ to
|
||||
understand the implications of this setting.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait percona-cluster/0 set-pxc-strict-mode mode=ENFORCING
|
||||
|
||||
* Transfer the mysqldump file from the percona-cluster unit to the
|
||||
mysql-innodb-cluster RW unit. The RW unit of the mysql-innodb-cluster can be
|
||||
determined from juju status: `juju status mysql-innodb-cluster`. Bellow we
|
||||
use mysql-innodb-cluster/0 as an example.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju scp percona-cluster/0:/var/backups/mysql/mysqldump-keystone-<DATE>.gz .
|
||||
juju scp mysqldump-keystone-<DATE>.gz mysql-innodb-cluster/0:/home/ubuntu
|
||||
|
||||
* Import the database(s) into mysql-innodb-cluster.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait mysql-innodb-cluster/0 restore-mysqldump dump-file=/home/ubuntu/mysqldump-keystone-<DATE>.gz
|
||||
|
||||
* Relate an instance of mysql-router for every application that requires a data
|
||||
store (i.e. every application that needed percona-cluster):
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju add-relation keystone:shared-db keystone-mysql-router:shared-db
|
||||
|
||||
* Repeat for remaining applications.
|
||||
|
||||
An overview of this process can be seen in the OpenStack charmer's team CI `Zaza migration code`_.
|
||||
|
||||
Post-migration
|
||||
|
||||
As noted above it is possible to run the cloud with percona-cluster remaining
|
||||
on Bionic indefinitely. Once all databases have been migrated to
|
||||
mysql-innodb-cluster, all the databases have been backed up, and the cloud has
|
||||
been verified to be in good working order the percona-cluster application (and
|
||||
its probable hacluster subordinates) may be removed.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju remove-application percona-cluster-hacluster
|
||||
juju remove-application percona-cluster
|
||||
|
||||
|
||||
.. _Zaza migration code: https://github.com/openstack-charmers/zaza-openstack-tests/blob/master/zaza/openstack/charm_tests/mysql/tests.py#L556
|
||||
.. _Percona strict mode: https://www.percona.com/doc/percona-xtradb-cluster/LATEST/features/pxc-strict-mode.html
|
||||
.. _OpenStack upgrades: app-series-upgrade-openstack
|
||||
.. _series upgrade: https://juju.is/docs/upgrading-series
|
||||
.. _automatic package updates: https://help.ubuntu.com/lts/serverguide/automatic-updates.html.en
|
||||
.. _Ubuntu OpenStack release cycle: https://ubuntu.com/about/release-cycle#ubuntu-openstack-release-cycle
|
||||
.. _Application leadership: https://juju.is/docs/implementing-leadership
|
||||
.. _ubuntu: https://jaas.ai/ubuntu
|
||||
|
|
|
@ -11,6 +11,7 @@ Appendices
|
|||
app-encryption-at-rest.rst
|
||||
app-certificate-management.rst
|
||||
app-series-upgrade.rst
|
||||
app-series-upgrade-openstack.rst
|
||||
app-nova-cells.rst
|
||||
app-octavia.rst
|
||||
app-pci-passthrough-gpu.rst
|
||||
|
|
Loading…
Reference in New Issue