Online schema migration
Specification blueprint online-schema-migration Change-Id: I657caeaebfc63f59bd7f380bf571859d4d336d46
This commit is contained in:
245
specs/mitaka/online-schema-migration.rst
Normal file
245
specs/mitaka/online-schema-migration.rst
Normal file
@@ -0,0 +1,245 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=======================
|
||||
Online schema migration
|
||||
=======================
|
||||
|
||||
`bp online-schema-migration <https://blueprints.launchpad.net/keystone/+spec/online-schema-migration>`_
|
||||
|
||||
|
||||
Future incompatible changes in sqlalchemy migrations, like removing,
|
||||
renaming columns and tables can break rolling upgrades (upgrades when
|
||||
multiple keystone instances are run simultaneously at different versions).
|
||||
|
||||
This spec solves one zero downtime related issue inside keystone. We want it
|
||||
to be a step forward in providing near zero downtime API availability during
|
||||
a rolling upgrade. Please be aware, that the used persistence layer may not
|
||||
support zero downtime schema altering operations, which will cause a table to
|
||||
be locked and inaccessible during database migration.
|
||||
|
||||
Support for rolling upgrades is a key theme for the Mitaka cycle [1]_. There
|
||||
are also new "assert" tags, that could be added to keystone in the future [2]_.
|
||||
|
||||
This spec is a statement of commitment by the keystone reviewers to support
|
||||
rolling upgrades with near zero downtime. The upgrade can be performed by
|
||||
rolling the new release across a cluster of nodes, upgrading each node
|
||||
one-by-one.
|
||||
|
||||
|
||||
Problem Description
|
||||
===================
|
||||
|
||||
Currently, schema is migrated before running the new release, to modify the
|
||||
schema so that it is compatible with the new code version. We don't limit
|
||||
developer's choice on what schema updates are permissible. However, certain
|
||||
operations, like table/column drops and name changes can make the database
|
||||
not compatible with the older release. The period of data migration may be
|
||||
very long in cases when a table contains thousands or millions of entries,
|
||||
making the upgrade time consuming, which can become not acceptable if it
|
||||
requires that the old release must be stopped. Also, going back to the previous
|
||||
release always means restoring from backup.
|
||||
|
||||
Keystone was mentioned at the summit in the ops live upgrades session as one
|
||||
of the projects which are the closest to being zero-downtime [3]_. Even though
|
||||
one of the operators stated that he upgraded keystone with near zero downtime,
|
||||
this is impossible across major releases due to schema changes which cause
|
||||
incompatibilities between versions.
|
||||
|
||||
|
||||
Proposed Change
|
||||
===============
|
||||
|
||||
Queries generated by SQLAlchemy expect fields defined there to be present
|
||||
inside the DB. To keep the database in sync, schema migration is executed
|
||||
before new version of code is run. The key concept of a rolling upgrade is that
|
||||
two versions may run at the same time, so we want to keep structures which are
|
||||
used by the previous version in place. New structures are ignored by this old
|
||||
version, because it still contains old SQLAlchemy models, while the new code
|
||||
can support new additive structures, like adding a new column with a default
|
||||
value. Old columns and tables can only be removed only when no longer used by
|
||||
both current and the previous release (which may still be running). We will
|
||||
only support keeping compatibility with one previous release to reduce the
|
||||
migrations implementation complexity. Upgrades have to be applied
|
||||
incrementally, one release at a time, with only two adjacent versions running
|
||||
simultaneously.
|
||||
|
||||
To address the problem of schema incompatibilities between versions, we can ban
|
||||
schema changes which cause those incompatibilities, specifically drops and
|
||||
alters. There is already a solution which is present in nova [4]_.
|
||||
|
||||
The test is surrounded by comments with direct wording that signals code
|
||||
reviewers to take into consideration the impact on upgradeability.
|
||||
|
||||
The unit test blocks all alters and drops, but also contains a list of
|
||||
migrations where we allow altering and dropping things.
|
||||
|
||||
The rules for adding exceptions are very specific:
|
||||
|
||||
1) Migrations which don't cause incompatibilities are allowed, for example
|
||||
dropping an index or constraint.
|
||||
2) Migrations removing structures not used in the previous version are allowed
|
||||
(we keep compatibility between releases), ex.:
|
||||
|
||||
a) feature is deprecated according to the deprecation policies (release 1),
|
||||
b) code supporting the feature is removed the following release (release 2),
|
||||
c) table can be dropped a release after the code has been removed (i.e. in
|
||||
release 3).
|
||||
|
||||
3) Any other changes which don't pass this test are disallowed.
|
||||
|
||||
We start requiring things to be additive in Mitaka, so we ignore all migrations
|
||||
before that point.
|
||||
|
||||
It is important to note, that this unit test will not catch all issues which
|
||||
may still be introduced, like adding a column without a default, or adding a
|
||||
foreign key which cannot be handled by the older version. Data interpretation
|
||||
incompatibilities may be introduced without altering the schema. This is why
|
||||
we are also introducing a grenade test, which will attempt to catch other
|
||||
errors at the CI gate.
|
||||
|
||||
Some types of changes, like changing the format in which data is stored, may
|
||||
require live migration of data. Since there are no ready-made solutions, it is
|
||||
up to the developer to decide how this is achieved. The data migration may be
|
||||
started automatically, when the new version is introduced, while an old version
|
||||
is still running, and finished up with a migration script in the next (third)
|
||||
release. It could also be scripted, with information on how to do it in the
|
||||
release notes. A suggested migration would happen in three phases:
|
||||
|
||||
a) A new column, with a new format is created in a schema upgrade script.
|
||||
Because the old version doesn't know about it, the old column is retained
|
||||
and data is written to both columns. The data should be read only from the
|
||||
old column at this point, especially if the row can be updated by the old
|
||||
version, which may still be running.
|
||||
b) The next version contains an upgrade script which migrates the rest of data
|
||||
from the old and into the new column. It migrates any data which wasn't
|
||||
migrated in the process of normally running the service. If the data was
|
||||
read only from the old column, this release should read data from the new
|
||||
column, and, as before, write to both (to be backwards compatible).
|
||||
c) In the following release (in case when the previous release reads data from
|
||||
the new column), the old column can be removed from SQLAlchemy models and
|
||||
is no longer used.
|
||||
d) After it is no longer used, a migrate script can remove the old column.
|
||||
|
||||
The first two steps can be squashed into one, if logic is written to
|
||||
distinguish situations where the columns are updated by the old or the new
|
||||
version in step one, or if the row is not updated after inserting.
|
||||
|
||||
Step b) can also be implemented in the same release, by providing a
|
||||
configuration, which switches the place from which the data is read. In cases
|
||||
when a table is no longer used it can be dropped in the next release.
|
||||
|
||||
Before removing columns and tables, a sanity check should be performed,
|
||||
ensuring all data was migrated.
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Currently, keystone operators run migrations before running the new version.
|
||||
We can introduce two-phase migrations like in Neutron [5]_. The approach is to
|
||||
organize schema migrations scripts into "expand" and "contract" phases that are
|
||||
also linked to major release versions in a backwards compatible way. The
|
||||
"expand" phase is run before the upgrade, and the "contract" phase can be
|
||||
executed after all the services are running with the new version.
|
||||
|
||||
The last two steps (above) could be squashed into one, if two-phase migrations
|
||||
are implemented. In this case, the last step can be run in the contract phase
|
||||
of the same release.
|
||||
|
||||
A similar approach for two-phase migrations, taken by nova [6]_ was determined
|
||||
experimental and will be removed in Mitaka [7]_.
|
||||
|
||||
|
||||
Security Impact
|
||||
---------------
|
||||
|
||||
The code which does migrations may become more complicated, since they are done
|
||||
online. We have to ensure that updates are correctly made to two places and
|
||||
read in a correct way when we want to maintain compatibility.
|
||||
|
||||
|
||||
Notifications Impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Other End User Impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
The new way of doing schema changes entails that some data migrations would
|
||||
have to be done online (for example, we would have to maintain data in two
|
||||
places, before migration is finished), which could impact performance. On the
|
||||
other hand the performance is impacted infinitely when the service is down for
|
||||
performing the upgrade.
|
||||
|
||||
|
||||
Other Deployer Impact
|
||||
---------------------
|
||||
|
||||
Ability to perform online schema migration will have a large and positive
|
||||
impact on deployment.
|
||||
|
||||
Developer Impact
|
||||
----------------
|
||||
|
||||
Currently, we don't limit developer's choice on what schema updates are
|
||||
permissible. This change is proposing a unit test and a grenade test, that will
|
||||
limit the changes that can be done in one release - changes will have to be
|
||||
split between releases. Still, patches could be added to the exceptions list
|
||||
and a proper release node could be added, notifying the operator about the
|
||||
need and scope of downtime.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
xek
|
||||
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Prepare a unit test blocking alters and drops in SQL migrations.
|
||||
* Prepare developer documentation with examples.
|
||||
* Help review patches which fall over on this test.
|
||||
* Add grenade CI test, with altering requests sent to two keystone instances
|
||||
at different versions
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Developer documentation with examples will be added.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] https://etherpad.openstack.org/p/mitaka-crossproject-themes
|
||||
.. [2] http://permalink.gmane.org/gmane.comp.cloud.openstack.devel/69083
|
||||
.. [3] https://etherpad.openstack.org/p/TYO-ops-upgrades
|
||||
.. [4] https://github.com/openstack/nova/blob/stable/liberty/nova/tests/unit/db/test_migrations.py#L224-L225
|
||||
.. [5] https://blueprints.launchpad.net/neutron/+spec/online-schema-migrations
|
||||
.. [6] https://blueprints.launchpad.net/nova/+spec/online-schema-changes
|
||||
.. [7] https://etherpad.openstack.org/p/mitaka-nova-upgrade
|
||||
Reference in New Issue
Block a user