Merge "Update the upgrades part of devref" into stable/ocata

This commit is contained in:
Jenkins 2017-02-08 18:10:34 +00:00 committed by Gerrit Code Review
commit a921005e6f
1 changed files with 68 additions and 66 deletions

View File

@ -54,11 +54,13 @@ Plan your upgrade
Rolling upgrade process
'''''''''''''''''''''''
To reduce downtime, the services can be upgraded in a rolling fashion. It
To reduce downtime, the compute services can be upgraded in a rolling fashion. It
means upgrading a few services at a time. This results in a condition where
both old (N) and new (N+1) nova-compute services co-exist for a certain time
period. Note that, there is no upgrade of the hypervisor here, this is just
upgrading the nova services.
upgrading the nova services. If reduced downtime is not a concern (or lower
complexity is desired), all services may be taken down and restarted at the
same time.
#. Before maintenance window:
@ -80,52 +82,60 @@ upgrading the nova services.
* For maximum safety (no failed API operations), gracefully shutdown all
the services (i.e. SIG_TERM) except nova-compute.
* Start all services, with ``[upgrade_levels]compute=auto`` in nova.conf.
It is safest to start nova-conductor first and nova-api last. Note that
for older releases, before Liberty, you need to use a static alias name
instead of ``auto``, such as ``[upgrade_levels]compute=mitaka``
* Start all services on the new code, with
``[upgrade_levels]compute=auto`` in nova.conf. It is safest to
start nova-conductor first and nova-api last. Note that you may
use a static alias name instead of ``auto``, such as
``[upgrade_levels]compute=newton``. Also note that this step is
only required if compute services are not upgraded in lock-step
with the control services.
* In small batches gracefully shutdown nova-compute (i.e. SIG_TERM), then
start the new version of the code with: ``[upgrade_levels]compute=auto``
Note this is done in batches so only a few compute nodes will have any
delayed API actions, and to ensure there is enough capacity online to
service any boot requests that happen during this time.
* If desired, gracefully shutdown nova-compute (i.e. SIG_TERM)
services in small batches, then start the new version of the code
with: ``[upgrade_levels]compute=auto``. If this batch-based approach
is used, only a few compute nodes will have any delayed API
actions, and to ensure there is enough capacity online to service
any boot requests that happen during this time.
#. After maintenance window:
* Once all services are running the new code, double check in the DB that
there are no old orphaned service records.
there are no old orphaned service records using `nova service-list`.
* Now all services are upgraded, we need to send the SIG_HUP signal, so all
* Now that all services are upgraded, we need to send the SIG_HUP signal, so all
the services clear any cached service version data. When a new service
starts, it automatically detects which version of the compute RPC protocol
to use, and it can decide if it is safe to do any online data migrations.
Note, if you used a static value for the upgrade_level, such as
``[upgrade_levels]compute=mitaka``, you must update nova.conf to remove
that configuration value before you send the SIG_HUP signal.
``[upgrade_levels]compute=newton``, you must update nova.conf to remove
that configuration value and do a full service restart.
* Now all the services are upgraded, the system is able to use the latest
version of the RPC protocol and so get access to all the new features in
the new release.
* Now all the services are upgraded and signaled, the system is able to use
the latest version of the RPC protocol and can access all of the
features in the new release.
* Now all the services are running the latest version of the code, and all
* Once all the services are running the latest version of the code, and all
the services are aware they all have been upgraded, it is safe to
transform the data in the database into its new format. While this
happens on demand when the system reads a database row that needs
transform the data in the database into its new format. While some of this
work happens on demand when the system reads a database row that needs
updating, we must get all the data transformed into the current version
before we next upgrade.
before the next upgrade. Additionally, some data may not be transformed
automatically so performing the data migration is necessary to avoid
performance degredation due to compatibility routines.
* This process can also put significant extra write load on the database.
Complete all online data migrations using:
``nova-manage db online_data_migrations --limit <number>``. Note that you
can use the limit argument to reduce the load this operation will place
on the database.
* The limit argument in online data migrations allows you to run a small
chunk of upgrades until all of the work is done. Each time it is run, it
will show summary of completed and remaining records. You run this command
until you see completed and remaining records as zeros. The size of
chunks you should use depend on your infrastructure.
* This process can put significant extra write load on the
database. Complete all online data migrations using:
``nova-manage db online_data_migrations --limit <number>``. Note
that you can use the limit argument to reduce the load this
operation will place on the database, which allows you to run a
small chunk of the migrations until all of the work is done. Each
time it is run, it will show summary of completed and remaining
records. You run this command until you see completed and
remaining records as zeros. The chunk size you should use depend
on your infrastructure and how much additional load you can
impose on the database. To reduce load, perform smaller batches
with delays between chunks. To reduce time to completion, run
larger batches.
* At this point, you must also ensure you update the configuration, to stop
using any deprecated features or options, and perform any required work
@ -139,27 +149,32 @@ Current Database Upgrade Types
Currently Nova has 2 types of database upgrades that are in use.
#. Offline Migrations
#. Online Migrations
#. Schema Migrations
#. Data Migrations
Offline Migrations consist of:
''''''''''''''''''''''''''''''
Schema Migrations
''''''''''''''''''
#. Database schema migrations from pre-defined migrations in
nova/db/sqlalchemy/migrate_repo/versions.
#. *Deprecated* Database data migrations from pre-defined migrations in
nova/db/sqlalchemy/migrate_repo/versions.
Schema migrations are defined in
``nova/db/sqlalchemy/migrate_repo/versions`` and in
``nova/db/sqlalchemy/api_migrations/migrate_repo/versions``. They are
the routines that transform our database structure, which should be
additive and able to be applied to a running system before service
code has been upgraded.
Online Migrations consist of:
'''''''''''''''''''''''''''''
Data Migrations
'''''''''''''''''
#. Online data migrations from inside Nova object source code.
#. *Future* Online schema migrations using auto-generation from models.
Online data migrations occur in two places:
#. Inline migrations that occur as part of normal run-time
activity as data is read in the old format and written in the
new format
#. Background online migrations that are performed using
``nova-manage`` to complete transformations that will not occur
incidentially due to normal runtime activity.
An example of online data migrations are the flavor migrations done as part
of Nova object version 1.18. This included a transient migration of flavor
@ -273,9 +288,10 @@ Expand/Contract DB Migrations
writing to the database. Taking this a step further, we can make all DB
changes by first adding the new structures, expanding. Then you can slowly
move all the data into a new location and format. Once that is complete,
you can drop bits of the scheme that are no long needed, i.e. contract.
We have plans to implement this here:
https://review.openstack.org/#/c/102545/5/specs/juno/online-schema-changes.rst,cm
you can drop bits of the scheme that are no long needed,
i.e. contract. This happens multiple cycles after we have stopped
using a particular piece of schema, and can happen in a schema
migration without affecting runtime code.
Online Data Migrations using objects
In Kilo we are moving all data migration into the DB objects code.
@ -284,11 +300,9 @@ Online Data Migrations using objects
that are in the old format. For records that are not updated, you need to
run a background process to convert those records into the newer format.
This process must be completed before you contract the database schema.
We have the first example of this happening here:
http://specs.openstack.org/openstack/nova-specs/specs/kilo/approved/flavor-from-sysmeta-to-blob.html
DB prune deleted rows
Currently resources are soft deleted in the database, so users are able
Currently resources are soft deleted in the main database, so users are able
to track instances in the DB that are created and destroyed in production.
However, most people have a data retention policy, of say 30 days or 90
days after which they will want to delete those entries. Not deleting
@ -310,18 +324,6 @@ Once we have all the pieces in place, we hope to move the Grenade testing
to follow this new pattern.
The current tests only cover the existing upgrade process where:
* old computes can run with new control plane
* but control plane is turned off for DB migrations
Unresolved issues
-----------------
Ideally you could rollback. We would need to add some kind of object data
version pinning, so you can be running all new code to some extent, before
there is no path back. Or have some way of reversing the data migration
before the final contract.
It is unknown how expensive on demand object backports would be. We could
instead always send older versions of objects until the RPC pin is removed,
but that means we might have new code getting old objects, which is currently
not the case.