Update rolling upgrade steps from upgrades documentation

Modify rolling upgrade steps from exisitng nova upgrades documentation. Also, add pre-requisites required for the zero downtime upgrade under newly created "Plan your upgrade" section. Change-Id: I4e884ac051614d25258734c35208b3abfe5fd3a4 Co-Authored-By: Sivasathurappan Radhakrishnan <siva.radhakrishnan@intel.com>
2016-09-20 22:25:40 +00:00
parent 3330d490dd
commit c0e8aa85e7
1 changed files with 104 additions and 35 deletions
--- a/doc/source/upgrade.rst
+++ b/doc/source/upgrade.rst
@@ -30,6 +30,110 @@ Once we have introduced the key concepts relating to upgrade, we will
 introduce the process needed for a no downtime upgrade of nova.
 Minimal Downtime Upgrade Process
 --------------------------------
 Plan your upgrade
 '''''''''''''''''
 * Read and ensure you understand the release notes for the next release.
 * You should ensure all required steps from the previous upgrade have been
  completed, such as data migrations.
 * Make a backup of your database. Nova does not support downgrading of the
  database. Hence, in case of upgrade failure, restoring database from backup
  is the only choice.
 * During upgrade be aware that there will be additional load on nova-conductor.
  You may find you need to add extra nova-conductor workers to deal with the
  additional upgrade related load.
 Rolling upgrade process
 '''''''''''''''''''''''
 To reduce downtime, the services can be upgraded in a rolling fashion. It
 means upgrading a few services at a time. This results in a condition where
 both old (N) and new (N+1) nova-compute services co-exist for a certain time
 period. Note that, there is no upgrade of the hypervisor here, this is just
 upgrading the nova services.
 #. Before maintenance window:
   * Start the process with the controller node. Install the code for the next
     version of Nova, either in a venv or a separate control plane node,
     including all the python dependencies.
   * Using the newly installed nova code, run the DB sync.
     (``nova-manage db sync``; ``nova-manage api_db sync``). These schema
     change operations should have minimal or no effect on performance, and
     should not cause any operations to fail.
   * At this point, new columns and tables may exist in the database. These
     DB schema changes are done in a way that both the N and N+1 release can
     perform operations against the same schema.
 #. During maintenance window:
   * For maximum safety (no failed API operations), gracefully shutdown all
     the services (i.e. SIG_TERM) except nova-compute.
   * Start all services, with ``[upgrade_levels]compute=auto`` in nova.conf.
     It is safest to start nova-conductor first and nova-api last. Note that
     for older releases, before Liberty, you need to use a static alias name
     instead of ``auto``, such as ``[upgrade_levels]compute=mitaka``
   * In small batches gracefully shutdown nova-compute (i.e. SIG_TERM), then
     start the new version of the code with: ``[upgrade_levels]compute=auto``
     Note this is done in batches so only a few compute nodes will have any
     delayed API actions, and to ensure there is enough capacity online to
     service any boot requests that happen during this time.
 #. After maintenance window:
   * Once all services are running the new code, double check in the DB that
     there are no old orphaned service records.
   * Now all services are upgraded, we need to send the SIG_HUP signal, so all
     the services clear any cached service version data. When a new service
     starts, it automatically detects which version of the compute RPC protocol
     to use, and it can decide if it is safe to do any online data migrations.
     Note, if you used a static value for the upgrade_level, such as
     ``[upgrade_levels]compute=mitaka``, you must update nova.conf to remove
     that configuration value before you send the SIG_HUP signal.
   * Now all the services are upgraded, the system is able to use the latest
     version of the RPC protocol and so get access to all the new features in
     the new release.
   * Now all the services are running the latest version of the code, and all
     the services are aware they all have been upgraded, it is safe to
     transform the data in the database into its new format. While this
     happens on demand when the system reads a database row that needs
     updating, we must get all the data transformed into the current version
     before we next upgrade.
   * This process can also put significant extra write load on the database.
     Complete all online data migrations using:
     ``nova-manage db online_data_migrations --limit <number>``. Note that you
     can use the limit argument to reduce the load this operation will place
     on the database.
   * The limit argument in online data migrations allows you to run a small
     chunk of upgrades until all of the work is done. Each time it is run, it
     will show summary of completed and remaining records. You run this command
     until you see completed and remaining records as zeros. The size of
     chunks you should use depend on your infrastructure.
   * At this point, you must also ensure you update the configuration, to stop
     using any deprecated features or options, and perform any required work
     to transition to alternative features. All the deprecated options should
     be supported for one cycle, but should be removed before your next
     upgrade is performed.
 Current Database Upgrade Types
 ------------------------------
@@ -199,41 +303,6 @@ nova-conductor object backports
    can understand.
 Process
 -------
 NOTE:
    This still requires much work before it can become reality.
    This is more an aspirational plan that helps describe how all the
    pieces of the jigsaw fit together.
 This is the planned process for a zero downtime upgrade:
 #. Prune deleted DB rows, check previous migrations are complete
 #. Expand DB schema (e.g. add new column)
 #. Pin RPC versions for all services that are upgraded from this point,
   using the current version
 #. Upgrade all nova-conductor nodes (to do object backports)
 #. Upgrade all other services, except nova-compute and nova-api,
   using graceful shutdown
 #. Upgrade nova-compute nodes (this is the bulk of the work).
 #. Unpin RPC versions
 #. Add new API nodes, and enable new features, while using a load balancer
   to "drain" the traffic from old API nodes
 #. Run the new nova-manage command that ensures all DB records are "upgraded"
   to new data version
 #. "Contract" DB schema (e.g. drop unused columns)
 Testing
 -------