docs: Rework and move upgrades guide

This shouldn't be in the root of the docs: it's not easily discoverable
and it doesn't belong there. We also modify the header style to better
match the OpenStack style guide [1] and set things up for a future guide
on writing database migrations.

[1] https://docs.openstack.org/doc-contrib-guide/rst-conv/titles.html#cg-titles

Change-Id: I277b9ad8d275aa21d2c3ce478d2a30e4f6e5f0ba
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
This commit is contained in:
Stephen Finucane 2021-09-24 17:25:46 +01:00
parent 31303ed65e
commit 0d7f3ba862
5 changed files with 60 additions and 41 deletions

View File

@ -1 +1,2 @@
redirectmatch 301 ^/cinder/([^/]+)/man/cinder-manage.html$ /cinder/$1/cli/cinder-manage.html
redirectmatch 301 ^/cinder/([^/]+)/upgrade.html$ /cinder/$1/admin/upgrades.html

View File

@ -54,6 +54,7 @@ Amazon EC2 Elastic Block Storage (EBS) offering.
blockstorage-volume-number-weigher.rst
blockstorage-default-volume-types.rst
api-configuration
upgrades
.. _`Storage Decisions`: https://docs.openstack.org/arch-design/design-storage/design-storage-arch.html
.. _`OpenStack Operations Guide`: https://wiki.openstack.org/wiki/OpsGuide

View File

@ -1,3 +1,4 @@
========
Upgrades
========
@ -11,6 +12,7 @@ Keeping the control plane running during an upgrade is more difficult. This
document's goal is to provide preliminaries and a detailed procedure of such
upgrade.
Concepts
--------
@ -18,7 +20,7 @@ Here are the key concepts you need to know before reading the section on the
upgrade process:
RPC version pinning
'''''''''''''''''''
~~~~~~~~~~~~~~~~~~~
Through careful RPC versioning, newer services are able to talk to older
services (and vice-versa). The versions are autodetected using information
@ -27,7 +29,7 @@ or ``ServiceTooOld`` exceptions on service start, you're probably having some
old orphaned records in that table.
Graceful service shutdown
'''''''''''''''''''''''''
~~~~~~~~~~~~~~~~~~~~~~~~~
Many cinder services are python processes listening for messages on an AMQP
queue. When the operator sends SIGTERM signal to the process, it stops getting
@ -37,35 +39,51 @@ process starts back up. This gives us a way to shutdown a service using older
code, and start up a service using newer code with minimal impact.
.. note::
Waiting for completion of long-running operations (e.g. slow volume copy
operation) may take a while.
.. note::
This was tested with RabbitMQ messaging backend and may vary with other
backends.
Online Data Migrations
''''''''''''''''''''''
Database upgrades
~~~~~~~~~~~~~~~~~
To make DB schema migrations less painful to execute, since Liberty, all data
migrations are banned from schema migration scripts. Instead, the migrations
should be done by background process in a manner that doesn't interrupt running
services (you can also execute online data migrations with services turned off
if you're doing a cold upgrade). In Ocata a new ``cinder-manage db
online_data_migrations`` utility was added for that purpose. Before upgrading
Ocata to Pike, you need to run this tool in the background, until it tells you
no more migrations are needed. Note that you won't be able to apply Pike's
schema migrations before completing Ocata's online data migrations.
Cinder has two types of database upgrades in use:
- Schema migrations
- Data migrations
Schema migrations are defined in ``cinder/db/migrations/versions``. They are
the routines that transform our database structure, which should be additive
and able to be applied to a running system before service code has been
upgraded.
Data migrations are banned from schema migration scripts and are instead
defined in ``cinder/db/api.py``. They are kept separate to make DB schema
migrations less painful to execute. Instead, the migrations are executed by a
background process in a manner that doesn't interrupt running services (you can
also execute online data migrations with services turned off if you're doing a
cold upgrade). The ``cinder-manage db online_data_migrations`` utility can be
used for this purpose. Before upgrading N to N+1, you need to run this tool in
the background until it tells you no more migrations are needed. Note that you
won't be able to apply N+1's schema migrations before completing N's online
data migrations.
For information on developing your own schema migrations as part of a feature
or bugfix, refer to **TODO**.
API load balancer draining
''''''''''''''''''''''''''
~~~~~~~~~~~~~~~~~~~~~~~~~~
When upgrading API nodes, you can make your load balancer only send new
connections to the newer API nodes, allowing for a seamless update of your API
nodes.
DB prune deleted rows
'''''''''''''''''''''
~~~~~~~~~~~~~~~~~~~~~
Currently resources are soft deleted in the database, so users are able to
track instances in the DB that are created and destroyed in production.
@ -77,18 +95,19 @@ longer as there is more data to migrate. To make pruning easier there's a
records older than specified age.
Versioned object backports
''''''''''''''''''''''''''
~~~~~~~~~~~~~~~~~~~~~~~~~~
RPC pinning ensures new services can talk to the older service's method
signatures. But many of the parameters are objects that may well be too new for
the old service to understand. Cinder makes sure to backport an object to a
version that it is pinned to before sending.
Minimal Downtime Upgrade Procedure
----------------------------------
Plan your upgrade
'''''''''''''''''
~~~~~~~~~~~~~~~~~
* Read and ensure you understand the release notes for the next release.
@ -96,32 +115,33 @@ Plan your upgrade
database. Hence, in case of upgrade failure, restoring database from backup
is the only choice.
* Note that there's an assumption that live upgrade can be performed only
between subsequent releases. This means that you cannot upgrade Liberty
directly into Newton, you need to upgrade to Mitaka first.
* To avoid dependency hell it is advised to have your Cinder services deployed
separately in containers or Python venvs.
* Note that Cinder is basing version detection on what is reported in the
``services`` table in the DB. Before upgrade make sure you don't have any
orphaned old records there, because these can block starting newer services.
You can clean them up using ``cinder-manage service remove <binary> <host>``
.. note::
Cinder is basing version detection on what is reported in the ``services``
table in the DB. Before upgrade make sure you don't have any orphaned old
records there, because these can block starting newer services. You can
clean them up using ``cinder-manage service remove <binary> <host>``
command.
* Assumed service upgrade order is cinder-scheduler, cinder-volume,
cinder-backup and finally cinder-api.
Note that there's an assumption that live upgrade can be performed only between
subsequent releases. This means that you cannot upgrade N directly to N+2, you
need to upgrade to N+1 first.
The assumed service upgrade order is ``cinder-scheduler``, ``cinder-volume``,
``cinder-backup`` and finally ``cinder-api``.
Rolling upgrade process
'''''''''''''''''''''''
~~~~~~~~~~~~~~~~~~~~~~~
To reduce downtime, the services can be upgraded in a rolling fashion. It means
upgrading a few services at a time. To minimise downtime you need to have HA
Cinder deployment, so at the moment a service is upgraded, you'll keep other
service instances running.
Before maintenance window
"""""""""""""""""""""""""
.. rubric:: Before maintenance window
* First you should execute required DB schema migrations. To achieve that
without interrupting your existing installation, install new Cinder code in
@ -133,8 +153,7 @@ Before maintenance window
DB schema changes are done in a way that both the N and N+1 release can
perform operations against the same schema.
During maintenance window
"""""""""""""""""""""""""
.. rubric:: During maintenance window
1. The first service is cinder-scheduler. It is load-balanced by the message
queue, so the only thing you need to worry about is to shut it down
@ -204,9 +223,7 @@ During maintenance window
7. Then you should repeat step 6 for all of the cinder-api services.
After maintenance window
""""""""""""""""""""""""
.. rubric:: After maintenance window
* Once all services are running the new code, double check in the DB that
there are no old orphaned records in ``services`` table (Cinder doesn't

View File

@ -88,7 +88,6 @@ service and other
:maxdepth: 1
Installation Guide <install/index>
Upgrade Process <upgrade>
Administrating Cinder
~~~~~~~~~~~~~~~~~~~~~

View File

@ -1 +1,2 @@
/cinder/latest/man/cinder-manage.html 301 /cinder/latest/cli/cinder-manage.html
/cinder/latest/upgrade.html 301 /cinder/latest/admin/upgrades.html