From 519e379c7809e4e7903c2648e13da3587ee0f1ce Mon Sep 17 00:00:00 2001 From: Grzegorz Grasza Date: Wed, 4 Jan 2017 15:08:14 +0100 Subject: [PATCH] Rolling upgrade procedure documentation This extends the upgrade guide with information on how to perform a rolling upgrade with minimal downtime. It also adds a new section for offline (cold) upgrades. Partial-bug: #1526283 Co-Authored-By: Ruby Loo Change-Id: If7f41ccbfd8d3128a142cd6e4f090c5c9825cf60 --- doc/source/admin/upgrade-guide.rst | 265 ++++++++++++++++-- .../rolling-upgrades-ccad5159ca3cedbe.yaml | 4 + 2 files changed, 253 insertions(+), 16 deletions(-) create mode 100644 releasenotes/notes/rolling-upgrades-ccad5159ca3cedbe.yaml diff --git a/doc/source/admin/upgrade-guide.rst b/doc/source/admin/upgrade-guide.rst index d5f4edc45c..27d8f5b2d4 100644 --- a/doc/source/admin/upgrade-guide.rst +++ b/doc/source/admin/upgrade-guide.rst @@ -7,37 +7,270 @@ Bare Metal Service Upgrade Guide This document outlines various steps and notes for operators to consider when upgrading their ironic-driven clouds from previous versions of OpenStack. -The ironic service is tightly coupled with the ironic driver that is shipped -with nova. Some special considerations must be taken into account -when upgrading your cloud. +The Bare Metal (ironic) service is tightly coupled with the ironic driver that +is shipped with the Compute (nova) service. Some special considerations must be +taken into account when upgrading your cloud. -Plan your Upgrade +Both offline and rolling upgrades are supported. + +Plan your upgrade ================= +* Rolling upgrades are available starting with the Pike release; that is, when + upgrading from Ocata. This means that it is possible to do an upgrade with + minimal to no downtime of the Bare Metal API. + +* Upgrades are only supported between two consecutive named releases. + This means that you cannot upgrade Ocata directly into Queens; you need to + upgrade into Pike first. + * The `release notes `_ - should always be read carefully when upgrading the ironic service. Starting - with the Mitaka release, specific upgrade steps and considerations are - well-documented in the release notes. + should always be read carefully when upgrading the Bare Metal service. + Specific upgrade steps and considerations are documented there. -* Upgrades are only supported one series at a time, or within a series. - -* Starting with the Liberty release, the ironic service should always be - upgraded before the nova service. +* The Bare Metal service should always be upgraded before the Compute service. .. note:: The ironic virt driver in nova always uses a specific version of the ironic REST API. This API version may be one that was introduced in the same development cycle, so upgrading nova first may result in nova being - unable to use ironic's API. + unable to use the Bare Metal API. -* When upgrading ironic, the following steps should always be taken: +* Make a backup of your database. Ironic does not support downgrading of the + database. Hence, in case of upgrade failure, restoring the database from + a backup is the only choice. - #. Update ironic code, without restarting services. - #. Run database migrations. +Offline upgrades +================ - #. Restart ironic-conductor and ironic-api services. +In an offline (or cold) upgrade, the Bare Metal service is not available +during the upgrade, because all the services have to be taken down. +When upgrading the Bare Metal service, the following steps should always be +taken in this order: + +#. upgrade the ironic-python-agent image + +#. update ironic code, without restarting services + +#. run database schema migrations via ``ironic-dbsync upgrade`` + +#. restart ironic-conductor and ironic-api services + +Once the above is done, do the following: + +* update any applicable configuration options to stop using any deprecated + features or options, and perform any required work to transition to + alternatives. All the deprecated features and options will be supported for + one release cycle, so should be removed before your next upgrade is + performed. + +* upgrade python-ironicclient along with any other services connecting + to the Bare Metal service as a client, such as nova-compute + +* run the ``ironic-dbsync online_data_migrations`` command to make sure + that data migrations are applied. The command lets you limit + the impact of the data migrations with the ``--max-count`` option, which + limits the number of migrations executed in one run. You should complete + all of the migrations as soon as possible after the upgrade. + +.. warning:: You will not be able to start an upgrade to the next release + after this one, until this has been completed for the current + release. For example, as part of upgrading from Ocata to Pike, + you need to complete Pike's data migrations. If this not done, + you will not be able to upgrade to Queens -- it will not be + possible to execute Queens' database schema updates. + + +Rolling upgrades +================ + +Rolling upgrades are available starting with the Pike release; that is, when +upgrading from Ocata. This means that it is possible to do an upgrade with +minimal to no downtime of the Bare Metal API. + +Concepts +-------- + +There are four aspects of the rolling upgrade process to keep in mind: + +* RPC version pinning and versioned object backports +* online data migrations +* graceful service shutdown +* API load balancer draining + +RPC version pinning and versioned object backports +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Through careful RPC versioning, newer services are able to talk to older +services (and vice-versa). The ``[DEFAULT]/pin_release_version`` configuration +option is used for this. It should be set (pinned) to the release version +that the older services are using. The newer services will backport RPC calls +and objects to their appropriate versions from the pinned release. If the +``IncompatibleObjectVersion`` exception occurs, it is most likely due to an +incorrect or unspecified ``[DEFAULT]/pin_release_version`` configuration value. +For example, when it is not set to the older release version, no conversion +will happen during the upgrade. + +Online data migrations +~~~~~~~~~~~~~~~~~~~~~~ + +To make database schema migrations less painful to execute, all data migrations +are banned from schema migration scripts. The schema migration scripts only +update the database schema. Data migrations must be done at the end of the +rolling upgrade process, after the schema migration and after the services +have been upgraded to the latest release. The data migration is performed +using the ``ironic-dbsync online_data_migrations`` command. It can be run in +a background process so that it does not interrupt running services. +(You would also execute the same command with services turned off if +you are doing a cold upgrade). + +This data migration must be completed. If not, you will not be able to +upgrade to future releases. For example, if you had upgraded from Ocata to +Pike but did not do the data migrations, you will not be able to upgrade from +Pike to Queens. (More precisely, you will not be able to apply Queens' schema +migrations.) + +Graceful service shutdown +~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ironic-conductor service is a Python process listening for messages on a +message queue. When the operator sends the SIGTERM signal to the process, the +service stops consuming messages from the queue, so that no additional work is +picked up. It completes any outstanding work and then terminates. During this +process, messages can be left on the queue and will be processed after the +Python process starts back up. This gives us a way to shutdown a service using +older code, and start up a service using newer code with minimal impact. + +.. note:: + This was tested with RabbitMQ messaging backend and may vary with other + backends. + +API load balancer draining +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If you are using a load balancer for the ironic-api services, we recommend that +you redirect requests to the new API services and drain off of the ironic-api +services that have not yet been upgraded. + +Rolling upgrade process +----------------------- + +To reduce downtime, the services can be upgraded in a rolling fashion. It means +upgrading one or a few services at a time. To minimise downtime you need to +have HA ironic deployment (at least two ironic-api and two ironic-conductor +services) so that when a service instance is being upgraded, the other +instances are still running. + +**New features should not be used until after the upgrade has been completed.** + +Before maintenance window +~~~~~~~~~~~~~~~~~~~~~~~~~ + +* Upgrade the ironic-python-agent image + +* Using the new release (ironic code), execute the required database schema + updates by running the database upgrade command: ``ironic-dbsync upgrade``. + These schema change operations should have minimal or no effect on + performance, and should not cause any operations to fail (but please check + the release notes). You can: + + * install the new release on an existing system + * install the new release in a new virtualenv or a container + + At this point, new columns and tables may exist in the database. These + database schema changes are done in a way that both the old and new (N and + N+1) releases can perform operations against the same schema. + +.. note:: Ironic bases its RPC and object storage format versions on the + ``[DEFAULT]/pin_release_version`` configuration option. It is + advisable to automate the deployment of changes in configuration + files to make the process less error prone and repeatable. + +During maintenance window +~~~~~~~~~~~~~~~~~~~~~~~~~ + +#. ironic-conductor services should be upgraded first. Ensure that at least + one ironic-conductor service is running at all times. For every + ironic-conductor, either one by one or a few at a time: + + * shut down the service. Conductors are load-balanced by the message queue, + so the only thing you need to worry about is to shut the service down + gracefully (using ``SIGTERM`` signal) to make sure it will finish all the + requests being processed before shutting down + * upgrade the code and dependencies + * set the ``[DEFAULT]/pin_release_version`` configuration option value to + the version you are upgrading from (that is, the old version). Based on + this setting, the new ironic-conductor services will downgrade any + RPC communication and data objects to conform to the old service. + For example, if you are upgrading from Ocata to Pike, set this value to + ``ocata``. + * start the service + +#. The next service to upgrade is ironic-api. Ensure that at least one + ironic-api service is running at all times. You may want to start another + instance of the older ironic-api to handle the load while you are upgrading + the original ironic-api services. For every ironic-api service, either one + by one or a few at a time: + + * in HA deployment you are typically running them behind a load balancer + (for example HAProxy), so you need to take the service instance out of the + balancer + * shut it down + * upgrade the code and dependencies + * set the ``[DEFAULT]/pin_release_version`` configuration option value to + the version you are upgrading from (that is, the old version). Based on + this setting, the new ironic-api services will downgrade any RPC + communication and data objects to conform to the old service. + For example, if you are upgrading from Ocata to Pike, set this value to + ``ocata``. + * restart the service + * add it back into the load balancer + + After upgrading all the ironic-api services, the Bare Metal service is + running in the new version but with downgraded RPC communication and + database object storage formats. New features can fail when objects are in + the downgraded object formats and some internal RPC API functions may still + not be available. + +#. For all the ironic-conductor services, one at a time: + + * remove the ``[DEFAULT]/pin_release_version`` configuration option setting + * restart the ironic-conductor service + +#. For all the ironic-api services, one at a time: + + * remove the ``[DEFAULT]/pin_release_version`` configuration option setting + * restart the ironic-api service + +After maintenance window +~~~~~~~~~~~~~~~~~~~~~~~~ + +Now that all the services are upgraded, the system is able to use the latest +version of the RPC protocol and able to access all the features of the new +release. + +* Update any applicable configuration options to stop using any deprecated + features or options, and perform any required work to transition to + alternatives. All the deprecated features and options will be supported for + one release cycle, so should be removed before your next upgrade is + performed. + +* Upgrade ``python-ironicclient`` along with other services connecting + to the Bare Metal service as a client, such as nova-compute. + +* Run the ``ironic-dbsync online_data_migrations`` command to make sure + that data migrations are applied. The command lets you limit + the impact of the data migrations with the ``--max-count`` option, which + limits the number of migrations executed in one run. You should complete + all of the migrations as soon as possible after the upgrade. + +Note that you will not be able to start an upgrade to the next release after +this one, until this has been completed for the current release. For example, +as part of upgrading from Ocata to Pike, you need to complete Pike's data +migrations. If this not done, you will not be able to upgrade to Queens -- +it will not be possible to execute Queens' database schema updates. Upgrading from Ocata to Pike ============================ diff --git a/releasenotes/notes/rolling-upgrades-ccad5159ca3cedbe.yaml b/releasenotes/notes/rolling-upgrades-ccad5159ca3cedbe.yaml new file mode 100644 index 0000000000..b4b2cbecfa --- /dev/null +++ b/releasenotes/notes/rolling-upgrades-ccad5159ca3cedbe.yaml @@ -0,0 +1,4 @@ +--- +features: + - Adds support for rolling upgrades, starting from upgrading Ocata to Pike. + For details, see http://docs.openstack.org/ironic/admin/upgrade-guide.html.