Update rolling upgrade steps from upgrades documentation
Modify rolling upgrade steps from exisitng nova upgrades documentation. Also, add pre-requisites required for the zero downtime upgrade under newly created "Plan your upgrade" section. Change-Id: I4e884ac051614d25258734c35208b3abfe5fd3a4 Co-Authored-By: Sivasathurappan Radhakrishnan <siva.radhakrishnan@intel.com>
This commit is contained in:

committed by
John Garbutt

parent
3330d490dd
commit
c0e8aa85e7
@@ -30,6 +30,110 @@ Once we have introduced the key concepts relating to upgrade, we will
|
|||||||
introduce the process needed for a no downtime upgrade of nova.
|
introduce the process needed for a no downtime upgrade of nova.
|
||||||
|
|
||||||
|
|
||||||
|
Minimal Downtime Upgrade Process
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
Plan your upgrade
|
||||||
|
'''''''''''''''''
|
||||||
|
|
||||||
|
* Read and ensure you understand the release notes for the next release.
|
||||||
|
|
||||||
|
* You should ensure all required steps from the previous upgrade have been
|
||||||
|
completed, such as data migrations.
|
||||||
|
|
||||||
|
* Make a backup of your database. Nova does not support downgrading of the
|
||||||
|
database. Hence, in case of upgrade failure, restoring database from backup
|
||||||
|
is the only choice.
|
||||||
|
|
||||||
|
* During upgrade be aware that there will be additional load on nova-conductor.
|
||||||
|
You may find you need to add extra nova-conductor workers to deal with the
|
||||||
|
additional upgrade related load.
|
||||||
|
|
||||||
|
|
||||||
|
Rolling upgrade process
|
||||||
|
'''''''''''''''''''''''
|
||||||
|
|
||||||
|
To reduce downtime, the services can be upgraded in a rolling fashion. It
|
||||||
|
means upgrading a few services at a time. This results in a condition where
|
||||||
|
both old (N) and new (N+1) nova-compute services co-exist for a certain time
|
||||||
|
period. Note that, there is no upgrade of the hypervisor here, this is just
|
||||||
|
upgrading the nova services.
|
||||||
|
|
||||||
|
#. Before maintenance window:
|
||||||
|
|
||||||
|
* Start the process with the controller node. Install the code for the next
|
||||||
|
version of Nova, either in a venv or a separate control plane node,
|
||||||
|
including all the python dependencies.
|
||||||
|
|
||||||
|
* Using the newly installed nova code, run the DB sync.
|
||||||
|
(``nova-manage db sync``; ``nova-manage api_db sync``). These schema
|
||||||
|
change operations should have minimal or no effect on performance, and
|
||||||
|
should not cause any operations to fail.
|
||||||
|
|
||||||
|
* At this point, new columns and tables may exist in the database. These
|
||||||
|
DB schema changes are done in a way that both the N and N+1 release can
|
||||||
|
perform operations against the same schema.
|
||||||
|
|
||||||
|
#. During maintenance window:
|
||||||
|
|
||||||
|
* For maximum safety (no failed API operations), gracefully shutdown all
|
||||||
|
the services (i.e. SIG_TERM) except nova-compute.
|
||||||
|
|
||||||
|
* Start all services, with ``[upgrade_levels]compute=auto`` in nova.conf.
|
||||||
|
It is safest to start nova-conductor first and nova-api last. Note that
|
||||||
|
for older releases, before Liberty, you need to use a static alias name
|
||||||
|
instead of ``auto``, such as ``[upgrade_levels]compute=mitaka``
|
||||||
|
|
||||||
|
* In small batches gracefully shutdown nova-compute (i.e. SIG_TERM), then
|
||||||
|
start the new version of the code with: ``[upgrade_levels]compute=auto``
|
||||||
|
Note this is done in batches so only a few compute nodes will have any
|
||||||
|
delayed API actions, and to ensure there is enough capacity online to
|
||||||
|
service any boot requests that happen during this time.
|
||||||
|
|
||||||
|
#. After maintenance window:
|
||||||
|
|
||||||
|
* Once all services are running the new code, double check in the DB that
|
||||||
|
there are no old orphaned service records.
|
||||||
|
|
||||||
|
* Now all services are upgraded, we need to send the SIG_HUP signal, so all
|
||||||
|
the services clear any cached service version data. When a new service
|
||||||
|
starts, it automatically detects which version of the compute RPC protocol
|
||||||
|
to use, and it can decide if it is safe to do any online data migrations.
|
||||||
|
Note, if you used a static value for the upgrade_level, such as
|
||||||
|
``[upgrade_levels]compute=mitaka``, you must update nova.conf to remove
|
||||||
|
that configuration value before you send the SIG_HUP signal.
|
||||||
|
|
||||||
|
* Now all the services are upgraded, the system is able to use the latest
|
||||||
|
version of the RPC protocol and so get access to all the new features in
|
||||||
|
the new release.
|
||||||
|
|
||||||
|
* Now all the services are running the latest version of the code, and all
|
||||||
|
the services are aware they all have been upgraded, it is safe to
|
||||||
|
transform the data in the database into its new format. While this
|
||||||
|
happens on demand when the system reads a database row that needs
|
||||||
|
updating, we must get all the data transformed into the current version
|
||||||
|
before we next upgrade.
|
||||||
|
|
||||||
|
* This process can also put significant extra write load on the database.
|
||||||
|
Complete all online data migrations using:
|
||||||
|
``nova-manage db online_data_migrations --limit <number>``. Note that you
|
||||||
|
can use the limit argument to reduce the load this operation will place
|
||||||
|
on the database.
|
||||||
|
|
||||||
|
* The limit argument in online data migrations allows you to run a small
|
||||||
|
chunk of upgrades until all of the work is done. Each time it is run, it
|
||||||
|
will show summary of completed and remaining records. You run this command
|
||||||
|
until you see completed and remaining records as zeros. The size of
|
||||||
|
chunks you should use depend on your infrastructure.
|
||||||
|
|
||||||
|
* At this point, you must also ensure you update the configuration, to stop
|
||||||
|
using any deprecated features or options, and perform any required work
|
||||||
|
to transition to alternative features. All the deprecated options should
|
||||||
|
be supported for one cycle, but should be removed before your next
|
||||||
|
upgrade is performed.
|
||||||
|
|
||||||
|
|
||||||
Current Database Upgrade Types
|
Current Database Upgrade Types
|
||||||
------------------------------
|
------------------------------
|
||||||
|
|
||||||
@@ -199,41 +303,6 @@ nova-conductor object backports
|
|||||||
can understand.
|
can understand.
|
||||||
|
|
||||||
|
|
||||||
Process
|
|
||||||
-------
|
|
||||||
|
|
||||||
NOTE:
|
|
||||||
This still requires much work before it can become reality.
|
|
||||||
This is more an aspirational plan that helps describe how all the
|
|
||||||
pieces of the jigsaw fit together.
|
|
||||||
|
|
||||||
This is the planned process for a zero downtime upgrade:
|
|
||||||
|
|
||||||
#. Prune deleted DB rows, check previous migrations are complete
|
|
||||||
|
|
||||||
#. Expand DB schema (e.g. add new column)
|
|
||||||
|
|
||||||
#. Pin RPC versions for all services that are upgraded from this point,
|
|
||||||
using the current version
|
|
||||||
|
|
||||||
#. Upgrade all nova-conductor nodes (to do object backports)
|
|
||||||
|
|
||||||
#. Upgrade all other services, except nova-compute and nova-api,
|
|
||||||
using graceful shutdown
|
|
||||||
|
|
||||||
#. Upgrade nova-compute nodes (this is the bulk of the work).
|
|
||||||
|
|
||||||
#. Unpin RPC versions
|
|
||||||
|
|
||||||
#. Add new API nodes, and enable new features, while using a load balancer
|
|
||||||
to "drain" the traffic from old API nodes
|
|
||||||
|
|
||||||
#. Run the new nova-manage command that ensures all DB records are "upgraded"
|
|
||||||
to new data version
|
|
||||||
|
|
||||||
#. "Contract" DB schema (e.g. drop unused columns)
|
|
||||||
|
|
||||||
|
|
||||||
Testing
|
Testing
|
||||||
-------
|
-------
|
||||||
|
|
||||||
|
Reference in New Issue
Block a user