Online schema migration

Specification blueprint online-schema-migration

Change-Id: I657caeaebfc63f59bd7f380bf571859d4d336d46
This commit is contained in:
Grzegorz Grasza
2015-11-13 14:38:17 +01:00
parent bea334c2dc
commit 7fcdc8625f

View File

@@ -0,0 +1,245 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=======================
Online schema migration
=======================
`bp online-schema-migration <https://blueprints.launchpad.net/keystone/+spec/online-schema-migration>`_
Future incompatible changes in sqlalchemy migrations, like removing,
renaming columns and tables can break rolling upgrades (upgrades when
multiple keystone instances are run simultaneously at different versions).
This spec solves one zero downtime related issue inside keystone. We want it
to be a step forward in providing near zero downtime API availability during
a rolling upgrade. Please be aware, that the used persistence layer may not
support zero downtime schema altering operations, which will cause a table to
be locked and inaccessible during database migration.
Support for rolling upgrades is a key theme for the Mitaka cycle [1]_. There
are also new "assert" tags, that could be added to keystone in the future [2]_.
This spec is a statement of commitment by the keystone reviewers to support
rolling upgrades with near zero downtime. The upgrade can be performed by
rolling the new release across a cluster of nodes, upgrading each node
one-by-one.
Problem Description
===================
Currently, schema is migrated before running the new release, to modify the
schema so that it is compatible with the new code version. We don't limit
developer's choice on what schema updates are permissible. However, certain
operations, like table/column drops and name changes can make the database
not compatible with the older release. The period of data migration may be
very long in cases when a table contains thousands or millions of entries,
making the upgrade time consuming, which can become not acceptable if it
requires that the old release must be stopped. Also, going back to the previous
release always means restoring from backup.
Keystone was mentioned at the summit in the ops live upgrades session as one
of the projects which are the closest to being zero-downtime [3]_. Even though
one of the operators stated that he upgraded keystone with near zero downtime,
this is impossible across major releases due to schema changes which cause
incompatibilities between versions.
Proposed Change
===============
Queries generated by SQLAlchemy expect fields defined there to be present
inside the DB. To keep the database in sync, schema migration is executed
before new version of code is run. The key concept of a rolling upgrade is that
two versions may run at the same time, so we want to keep structures which are
used by the previous version in place. New structures are ignored by this old
version, because it still contains old SQLAlchemy models, while the new code
can support new additive structures, like adding a new column with a default
value. Old columns and tables can only be removed only when no longer used by
both current and the previous release (which may still be running). We will
only support keeping compatibility with one previous release to reduce the
migrations implementation complexity. Upgrades have to be applied
incrementally, one release at a time, with only two adjacent versions running
simultaneously.
To address the problem of schema incompatibilities between versions, we can ban
schema changes which cause those incompatibilities, specifically drops and
alters. There is already a solution which is present in nova [4]_.
The test is surrounded by comments with direct wording that signals code
reviewers to take into consideration the impact on upgradeability.
The unit test blocks all alters and drops, but also contains a list of
migrations where we allow altering and dropping things.
The rules for adding exceptions are very specific:
1) Migrations which don't cause incompatibilities are allowed, for example
dropping an index or constraint.
2) Migrations removing structures not used in the previous version are allowed
(we keep compatibility between releases), ex.:
a) feature is deprecated according to the deprecation policies (release 1),
b) code supporting the feature is removed the following release (release 2),
c) table can be dropped a release after the code has been removed (i.e. in
release 3).
3) Any other changes which don't pass this test are disallowed.
We start requiring things to be additive in Mitaka, so we ignore all migrations
before that point.
It is important to note, that this unit test will not catch all issues which
may still be introduced, like adding a column without a default, or adding a
foreign key which cannot be handled by the older version. Data interpretation
incompatibilities may be introduced without altering the schema. This is why
we are also introducing a grenade test, which will attempt to catch other
errors at the CI gate.
Some types of changes, like changing the format in which data is stored, may
require live migration of data. Since there are no ready-made solutions, it is
up to the developer to decide how this is achieved. The data migration may be
started automatically, when the new version is introduced, while an old version
is still running, and finished up with a migration script in the next (third)
release. It could also be scripted, with information on how to do it in the
release notes. A suggested migration would happen in three phases:
a) A new column, with a new format is created in a schema upgrade script.
Because the old version doesn't know about it, the old column is retained
and data is written to both columns. The data should be read only from the
old column at this point, especially if the row can be updated by the old
version, which may still be running.
b) The next version contains an upgrade script which migrates the rest of data
from the old and into the new column. It migrates any data which wasn't
migrated in the process of normally running the service. If the data was
read only from the old column, this release should read data from the new
column, and, as before, write to both (to be backwards compatible).
c) In the following release (in case when the previous release reads data from
the new column), the old column can be removed from SQLAlchemy models and
is no longer used.
d) After it is no longer used, a migrate script can remove the old column.
The first two steps can be squashed into one, if logic is written to
distinguish situations where the columns are updated by the old or the new
version in step one, or if the row is not updated after inserting.
Step b) can also be implemented in the same release, by providing a
configuration, which switches the place from which the data is read. In cases
when a table is no longer used it can be dropped in the next release.
Before removing columns and tables, a sanity check should be performed,
ensuring all data was migrated.
Alternatives
------------
Currently, keystone operators run migrations before running the new version.
We can introduce two-phase migrations like in Neutron [5]_. The approach is to
organize schema migrations scripts into "expand" and "contract" phases that are
also linked to major release versions in a backwards compatible way. The
"expand" phase is run before the upgrade, and the "contract" phase can be
executed after all the services are running with the new version.
The last two steps (above) could be squashed into one, if two-phase migrations
are implemented. In this case, the last step can be run in the contract phase
of the same release.
A similar approach for two-phase migrations, taken by nova [6]_ was determined
experimental and will be removed in Mitaka [7]_.
Security Impact
---------------
The code which does migrations may become more complicated, since they are done
online. We have to ensure that updates are correctly made to two places and
read in a correct way when we want to maintain compatibility.
Notifications Impact
--------------------
None
Other End User Impact
---------------------
None
Performance Impact
------------------
The new way of doing schema changes entails that some data migrations would
have to be done online (for example, we would have to maintain data in two
places, before migration is finished), which could impact performance. On the
other hand the performance is impacted infinitely when the service is down for
performing the upgrade.
Other Deployer Impact
---------------------
Ability to perform online schema migration will have a large and positive
impact on deployment.
Developer Impact
----------------
Currently, we don't limit developer's choice on what schema updates are
permissible. This change is proposing a unit test and a grenade test, that will
limit the changes that can be done in one release - changes will have to be
split between releases. Still, patches could be added to the exceptions list
and a proper release node could be added, notifying the operator about the
need and scope of downtime.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
xek
Work Items
----------
* Prepare a unit test blocking alters and drops in SQL migrations.
* Prepare developer documentation with examples.
* Help review patches which fall over on this test.
* Add grenade CI test, with altering requests sent to two keystone instances
at different versions
Dependencies
============
None
Documentation Impact
====================
Developer documentation with examples will be added.
References
==========
.. [1] https://etherpad.openstack.org/p/mitaka-crossproject-themes
.. [2] http://permalink.gmane.org/gmane.comp.cloud.openstack.devel/69083
.. [3] https://etherpad.openstack.org/p/TYO-ops-upgrades
.. [4] https://github.com/openstack/nova/blob/stable/liberty/nova/tests/unit/db/test_migrations.py#L224-L225
.. [5] https://blueprints.launchpad.net/neutron/+spec/online-schema-migrations
.. [6] https://blueprints.launchpad.net/nova/+spec/online-schema-changes
.. [7] https://etherpad.openstack.org/p/mitaka-nova-upgrade