Follow-up to rolling upgrade docs

This commit makes changes based upon review feedback
in If7f41ccbfd8d3128a142cd6e4f090c5c9825cf60 and discussion
in IRC around the time when that change landed, regarding
nodes being in stable states.

Change-Id: I5cb8a0419f63cb0ea0e19a4f1a1870fa70aae55c
This commit is contained in:
Julia Kreger 2017-08-14 16:19:57 +00:00
parent 62fd4d1a6e
commit 73ed23ccd0

View File

@ -40,6 +40,14 @@ Plan your upgrade
database. Hence, in case of upgrade failure, restoring the database from database. Hence, in case of upgrade failure, restoring the database from
a backup is the only choice. a backup is the only choice.
* Before starting your upgrade, it is best to ensure that all nodes have
reached, or are in, a stable ``provision_state``. Nodes in states with
long running processes such as deploying or cleaning, may fail, and may
require manual intervention to return them to the available hardware pool.
This is most likely in cases where a timeout has occurred or a service was
terminated abruptly. For a visual diagram detailing states and possible
state transitions, please see the
`state machine diagram <https://docs.openstack.org/ironic/latest/contributor/states.html>`_.
Offline upgrades Offline upgrades
================ ================
@ -75,21 +83,34 @@ Once the above is done, do the following:
limits the number of migrations executed in one run. You should complete limits the number of migrations executed in one run. You should complete
all of the migrations as soon as possible after the upgrade. all of the migrations as soon as possible after the upgrade.
.. warning:: You will not be able to start an upgrade to the next release .. warning::
after this one, until this has been completed for the current You will not be able to start an upgrade to the release
release. For example, as part of upgrading from Ocata to Pike, after this one, until this has been completed for the current
you need to complete Pike's data migrations. If this not done, release. For example, as part of upgrading from Ocata to Pike,
you will not be able to upgrade to Queens -- it will not be you need to complete Pike's data migrations. If this not done,
possible to execute Queens' database schema updates. you will not be able to upgrade to Queens -- it will not be
possible to execute Queens' database schema updates.
Rolling upgrades Rolling upgrades
================ ================
Rolling upgrades are available starting with the Pike release; that is, when To Reduce downtime, the services can be upgraded in a rolling fashion, meaning
upgrading from Ocata. This means that it is possible to do an upgrade with to upgrade one or a few services at a time to minimize impact.
Rolling upgrades are available starting with the Pike release. This feature
makes it possible to upgrade between releases, such as Ocata to Pike, with
minimal to no downtime of the Bare Metal API. minimal to no downtime of the Bare Metal API.
Requirements
------------
To facilitate an upgrade in a rolling fashion, you need to have a
highly-available deployment consisting of at least two ironic-api
and two ironic-conductor services.
Use of a load balancer to balance requests across the ironic-api
services is recommended, as it allows for a minimal impact to end users.
Concepts Concepts
-------- --------
@ -110,19 +131,27 @@ that the older services are using. The newer services will backport RPC calls
and objects to their appropriate versions from the pinned release. If the and objects to their appropriate versions from the pinned release. If the
``IncompatibleObjectVersion`` exception occurs, it is most likely due to an ``IncompatibleObjectVersion`` exception occurs, it is most likely due to an
incorrect or unspecified ``[DEFAULT]/pin_release_version`` configuration value. incorrect or unspecified ``[DEFAULT]/pin_release_version`` configuration value.
For example, when it is not set to the older release version, no conversion For example, when ``[DEFAULT]/pin_release_version`` is not set to the older
will happen during the upgrade. release version, no conversion will happen during the upgrade.
Online data migrations Online data migrations
~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
To make database schema migrations less painful to execute, all data migrations To make database schema migrations less painful to execute, we have
are banned from schema migration scripts. The schema migration scripts only implemented process changes to facilitate upgrades.
update the database schema. Data migrations must be done at the end of the
rolling upgrade process, after the schema migration and after the services * All data migrations are banned from schema migration scripts.
have been upgraded to the latest release. The data migration is performed * Schema migration scripts only update the database schema.
using the ``ironic-dbsync online_data_migrations`` command. It can be run in * Data migrations must be done at the end of the rolling upgrade process,
a background process so that it does not interrupt running services. after the schema migration and after the services have been upgraded to
the latest release.
All data migrations are performed using the
``ironic-dbsync online_data_migrations`` command. It can be run as
a background process so that it does not interrupt running services;
however it must be run to completion for a cold upgrade if the intent
is to make use of new features immediately.
(You would also execute the same command with services turned off if (You would also execute the same command with services turned off if
you are doing a cold upgrade). you are doing a cold upgrade).
@ -132,8 +161,8 @@ Pike but did not do the data migrations, you will not be able to upgrade from
Pike to Queens. (More precisely, you will not be able to apply Queens' schema Pike to Queens. (More precisely, you will not be able to apply Queens' schema
migrations.) migrations.)
Graceful service shutdown Graceful conductor service shutdown
~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ironic-conductor service is a Python process listening for messages on a The ironic-conductor service is a Python process listening for messages on a
message queue. When the operator sends the SIGTERM signal to the process, the message queue. When the operator sends the SIGTERM signal to the process, the
@ -147,6 +176,12 @@ older code, and start up a service using newer code with minimal impact.
This was tested with RabbitMQ messaging backend and may vary with other This was tested with RabbitMQ messaging backend and may vary with other
backends. backends.
Nodes that are being acted upon by an ironic-conductor process, which are
not in a stable state, may encounter failures. Node failures that occur
during an upgrade are likely due to timeouts, resulting from delays
involving messages being processed and acted upon by a conductor
during long running, multi-step processes such as deployment or cleaning.
API load balancer draining API load balancer draining
~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -157,13 +192,9 @@ services that have not yet been upgraded.
Rolling upgrade process Rolling upgrade process
----------------------- -----------------------
To reduce downtime, the services can be upgraded in a rolling fashion. It means .. warning::
upgrading one or a few services at a time. To minimise downtime you need to New features and/or new API versions should not be used until after the upgrade
have HA ironic deployment (at least two ironic-api and two ironic-conductor has been completed.
services) so that when a service instance is being upgraded, the other
instances are still running.
**New features should not be used until after the upgrade has been completed.**
Before maintenance window Before maintenance window
~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~
@ -183,23 +214,25 @@ Before maintenance window
database schema changes are done in a way that both the old and new (N and database schema changes are done in a way that both the old and new (N and
N+1) releases can perform operations against the same schema. N+1) releases can perform operations against the same schema.
.. note:: Ironic bases its RPC and object storage format versions on the .. note::
``[DEFAULT]/pin_release_version`` configuration option. It is Ironic bases its RPC and object storage format versions on the
advisable to automate the deployment of changes in configuration ``[DEFAULT]/pin_release_version`` configuration option. It is
files to make the process less error prone and repeatable. advisable to automate the deployment of changes in configuration
files to make the process less error prone and repeatable.
During maintenance window During maintenance window
~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~
#. ironic-conductor services should be upgraded first. Ensure that at least #. All ironic-conductor services should be upgraded first. Ensure that at
one ironic-conductor service is running at all times. For every least one ironic-conductor service is running at all times. For every
ironic-conductor, either one by one or a few at a time: ironic-conductor, either one by one or a few at a time:
* shut down the service. Conductors are load-balanced by the message queue, * shut down the service. Messages from the ironic-api services to the
conductors are load-balanced by the message queue and a hash-ring,
so the only thing you need to worry about is to shut the service down so the only thing you need to worry about is to shut the service down
gracefully (using ``SIGTERM`` signal) to make sure it will finish all the gracefully (using ``SIGTERM`` signal) to make sure it will finish all the
requests being processed before shutting down requests being processed before shutting down.
* upgrade the code and dependencies * upgrade the installed version of ironic and dependencies
* set the ``[DEFAULT]/pin_release_version`` configuration option value to * set the ``[DEFAULT]/pin_release_version`` configuration option value to
the version you are upgrading from (that is, the old version). Based on the version you are upgrading from (that is, the old version). Based on
this setting, the new ironic-conductor services will downgrade any this setting, the new ironic-conductor services will downgrade any
@ -210,15 +243,15 @@ During maintenance window
#. The next service to upgrade is ironic-api. Ensure that at least one #. The next service to upgrade is ironic-api. Ensure that at least one
ironic-api service is running at all times. You may want to start another ironic-api service is running at all times. You may want to start another
instance of the older ironic-api to handle the load while you are upgrading temporary instance of the older ironic-api to handle the load while you are
the original ironic-api services. For every ironic-api service, either one upgrading the original ironic-api services. For every ironic-api service,
by one or a few at a time: either one by one or a few at a time:
* in HA deployment you are typically running them behind a load balancer * in HA deployment you are typically running them behind a load balancer
(for example HAProxy), so you need to take the service instance out of the (for example HAProxy), so you need to take the service instance out of the
balancer balancer
* shut it down * shut it down
* upgrade the code and dependencies * upgrade the installed version of ironic and dependencies
* set the ``[DEFAULT]/pin_release_version`` configuration option value to * set the ``[DEFAULT]/pin_release_version`` configuration option value to
the version you are upgrading from (that is, the old version). Based on the version you are upgrading from (that is, the old version). Based on
this setting, the new ironic-api services will downgrade any RPC this setting, the new ironic-api services will downgrade any RPC
@ -258,7 +291,7 @@ release.
performed. performed.
* Upgrade ``python-ironicclient`` along with other services connecting * Upgrade ``python-ironicclient`` along with other services connecting
to the Bare Metal service as a client, such as nova-compute. to the Bare Metal service as a client, such as ``nova-compute``.
* Run the ``ironic-dbsync online_data_migrations`` command to make sure * Run the ``ironic-dbsync online_data_migrations`` command to make sure
that data migrations are applied. The command lets you limit that data migrations are applied. The command lets you limit
@ -266,11 +299,12 @@ release.
limits the number of migrations executed in one run. You should complete limits the number of migrations executed in one run. You should complete
all of the migrations as soon as possible after the upgrade. all of the migrations as soon as possible after the upgrade.
Note that you will not be able to start an upgrade to the next release after .. warning::
this one, until this has been completed for the current release. For example, Note that you will not be able to start an upgrade to the next release after
as part of upgrading from Ocata to Pike, you need to complete Pike's data this one, until this has been completed for the current release. For example,
migrations. If this not done, you will not be able to upgrade to Queens -- as part of upgrading from Ocata to Pike, you need to complete Pike's data
it will not be possible to execute Queens' database schema updates. migrations. If this not done, you will not be able to upgrade to Queens --
it will not be possible to execute Queens' database schema updates.
Upgrading from Ocata to Pike Upgrading from Ocata to Pike
============================ ============================