Propose scripted online schema migrations

This specification discusses the issue of database schema
migrations which may proceed while allowing both the previous
and the updated version of the Neutron database API to run
against that schema at the same time.  This is part of a
larger approach which is to allow a Neutron application
to be upgraded to a new version without incurring downtime
while the database schema is migrated.

Change-Id: Ib27c6eb8c901536f1f30d0911dedd4db13d76e9f
This commit is contained in:
Mike Bayer
2015-06-17 19:45:13 -04:00
committed by Ihar Hrachyshka
parent 1a5530ddd5
commit d82a92f3c4

View File

@@ -0,0 +1,665 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
========================
Online Schema Migrations
========================
https://blueprints.launchpad.net/neutron/+spec/online-schema-migrations
:Author: Mike Bayer <mike.bayer@redhat.com>
This specification discusses the issue of database schema migrations which
may proceed while allowing both the previous and the updated version of the
Neutron database API to run against that schema at the same time. This is
part of a larger approach which is to allow a Neutron application to
be upgraded to a new version without incurring downtime while the database
schema is migrated.
To achieve this goal fully, several areas must be addressed:
* The first is that database schema migrations can be applied which
don't impact the old version of the software as it runs. These migrations
are referred to as **expansive** migrations, which only add new elements,
never removing any. Once the old software has been entirely replaced
with the new version, a second series of migrations known as the **contract**
migrations are run separately; these migrations remove the old elements
that are no longer used.
* The newer version of the software must also be prepared to accommodate
for the fact that the old version of the software may still be running
as well, meaning it may need to read and persist data from both the old
and new schema structures simultaneously.
* Within the scope of the new software referring to both versions of the
schema, a strategy for data migrations must be devised. These migrations
can run over time as a function of the data access code itself slowly
moving data to the new format, or can run as separate scripts or processes.
* The sequence of movements between expand/contract, old and new software
versions, and migration of data must also be orchestrated fully.
Front-end RPC clients and database access services are typically upgraded
independently of each other, and additionally at which point "contract" is safe to run must
be established.
This blueprint is primarily concerned with only the **first** bulletpoint,
that of organizing schema migrations such that those which are strictly
"expansive" may be run separately from those which are "contractual".
The other bullets above will need to be considered separately.
Note that an approach to the problem addressed here has already been
accepted for Nova, also called "online schema migrations". The
specification here builds upon the work of Nova's, proposing
essentially the same concept, but implemented slightly differently,
in such a way that there is no
sharp break from Neutron's existing system of using Alembic migration
scripts, and does not abandon the use of version identifiers which
identify an explicit, known state of the schema. There is also
a proposal for upstream changes in Alembic so that both Nova's "live"
approach and the "scripted" approach here can share the same codebase
against a revised Alembic autogeneration API that allows much greater
extensibility.
Problem Description
===================
Database migrations of Neutron and other Openstack applications traditionally
involve the replacement of some version of the schema with another one;
tables and columns are dropped, new ones added. This change
necessarily involves that the software which communicates with the schema
must also change at the same time, where the old version is shut down
completely before beginning the migration, the migration then proceeds fully
offline, and then the new version is started. In terms of a multiple-node
Openstack deployment, this means that the entire application
on all nodes must be fully shut down and upgraded globally all at once.
The offline migration may also be time consuming in terms of what kinds of
operations are present and what target database is in use.
The business requirements of many key Openstack consumers is such that
the downtime involved with fully upgrading all nodes simultaneously as
well as running full schema migrations during that downtime is no
longer acceptable; a new approach that allows the application to keep
running while the migration goes on must be developed, in particular
for key Openstack components such as Nova, Neutron, and Cinder.
Proposed Change
===============
Within this document we will address the goal of organizing schema
migrations into "expand" and "contract" phases that are also linked to
major release versions. The phases are as follows:
* Migrations that run under "expand" are "additive" (e.g. tables,
columns, indexes and constraints are only created, not dropped)
only, and are safe to run while the old version of the application
continues to run.
* Migrations that run under "contract" are "subtractive" (e.g. tables,
columns, indexes and constraints are only dropped, not created) and
only run once software running on all nodes communicates exclusively
with the new version of the schema, and all data has been migrated.
The steps involved in each of the two phases will be rendered as
explicit migration directives within Alembic migration scripts, as is
already the case for Neutron. The only difference will be that a
given migration will be broken out into individual scripts for each
phase of operation that the migration includes. These scripts will
be assembled into semi-independent lineages that can be run
separately. These lineages will also be classified among release
versions, so that migration lineages will be targetable at the level
of both release and phase, e.g. "expand liberty", "contract M", etc.
The new scheme is supported by Alembic's recently added support for
long-lived branches, roots, branch names, and individual file
directories. The workflow can be implemented at a proof-of-concept
level without writing any new code, by creating the new directory
structure and manually assembling new migration files into the
appropriate branches using Alembic's updated command line tools.
However, in order to facilitate the use of Alembic autogenerate, new
features will be added upstream to Alembic's autogenerate API which
will allow for the creation of custom autogenerate behaviors and
filesystem flows. We will build a new tool that adapts Nova's current
online schema migrations logic to this new API such that the logic
used to group migrations into "expand" and "contract" steps may now
stream those instructions into individual migration files, targeted
into the file structure referred to above. It is hoped that this
same tool will also be able to continue to send migration directives
directly to a database as well, thus allowing Nova's current "live"
approach to be rolled into the same codebase. Improvements and
behavioral contracts for the "expand" / "contract" workflow system
will apply both to the "live" and "scripted" approaches, thus making
the two approaches that much more interchangeable.
Alembic Migrations
------------------
Right now, Neutron makes use of Alembic migrations, which involves a
series of migration scripts organized into the
``neutron/db/migration/alembic_migrations/versions`` directory. These
scripts are organized into a kind of backwards linked list structure,
where each script is identified by a six-byte hash scheme, and
contains a variable that links it to the *previous* hash in the
series. The rationale for this linked structure is that new versions
can be inserted into the middle of the chain without impacting more
than one existing migration file; by using a "backwards" linking
model, and new versions can be added to the end of the list without
impacting any existing versions.
Recent versions of Alembic have been enhanced to reconsider this
"backwards linked list" structure as just a specialization of a more flexible
structure, the directed acyclic graph, or DAG. In this approach, we
remove the requirement that each migration script can only refer to a single
anscestor (e.g. dependency), as well as the requirement that only one
migration script can refer to a particular ancestor. The structure basically
becomes open to the concepts of branching and merging which are very
familiar in version control systems. Alembic now has the ability to
run upgrades or downgrades along individual branches which are tracked
individually within a database schema, meaning a schema's "head" version
may be in fact a series of hashes, each representing the "head" of an
individual revision stream. The branches can optionally originate from entirely
independent root revisions with no dependencies on each other, and can
also be organized into individual subdirectories. Revisions within
branches can also refer to specific revisions within other branches as
dependencies, and branches may be merged back together into a single
revision stream.
Expand and Contract Scripts
---------------------------
The current design of a migration script includes that it indicates
a specific "version" of the schema, and includes directives that apply
all necessary changes to the database at once. If we look for example
at the script ``2d2a8a565438_hierarchical_binding.py``, we will see::
# .../alembic_migrations/versions/2d2a8a565438_hierarchical_binding.py
def upgrade():
# .. inspection code ...
op.create_table(
'ml2_port_binding_levels',
sa.Column('port_id', sa.String(length=36), nullable=False),
sa.Column('host', sa.String(length=255), nullable=False),
# ... more columns ...
)
for table in port_binding_tables:
op.execute((
"INSERT INTO ml2_port_binding_levels "
"SELECT port_id, host, 0 AS level, driver, segment AS segment_id "
"FROM %s "
"WHERE host <> '' "
"AND driver <> '';"
) % table)
op.drop_constraint(fk_name_dvr[0], 'ml2_dvr_port_bindings', 'foreignkey')
op.drop_column('ml2_dvr_port_bindings', 'cap_port_filter')
op.drop_column('ml2_dvr_port_bindings', 'segment')
op.drop_column('ml2_dvr_port_bindings', 'driver')
# ... more DROP instructions ...
The above script contains directives that are both under the "expand"
and "contract" categories, as well as some data migrations. the ``op.create_table``
directive is an "expand"; it may be run safely while the old version of the
application still runs, as the old code simply doesn't look for this table.
The ``op.drop_constraint`` and ``op.drop_column`` directives are
"contract" directives (the drop column moreso than the drop constraint); running
at least the ``op.drop_column`` directives means that the old version of the
application will fail, as it will attempt to access these columns which no longer
exist.
The data migrations in this script are adding new
rows to the newly added ``ml2_port_binding_levels`` table. Data migrations
may or may not be "safe" to run within the "expand" or "contract" phase,
depending on the nature of the data. It is expected that most data migrations
will run outside of migration scripts going forward, and instead be implemented
as part of the model/API layer as the application runs.
Note that this spec suggests, but not requires Neutron to move to live
data migrations implemented in the application instead of migration
scripts. This part will require a separate consideration and is out of
scope for the spec.
Under the proposed plan, the above script, assuming it were added as part
of the new architecture, would be stated as two scripts; an "expand" and a
"contract" script::
# expansion operations
# .../alembic_migrations/versions/liberty/expand/2bde560fc638_hierarchical_binding.py
def upgrade():
op.create_table(
'ml2_port_binding_levels',
sa.Column('port_id', sa.String(length=36), nullable=False),
sa.Column('host', sa.String(length=255), nullable=False),
# ... more columns ...
)
# contraction operations
# .../alembic_migrations/versions/liberty/contract/4405aedc050e_hierarchical_binding.py
def upgrade():
op.drop_constraint(fk_name_dvr[0], 'ml2_dvr_port_bindings', 'foreignkey')
op.drop_column('ml2_dvr_port_bindings', 'cap_port_filter')
op.drop_column('ml2_dvr_port_bindings', 'segment')
op.drop_column('ml2_dvr_port_bindings', 'driver')
# ... more DROP instructions ...
The two scripts would be present in different subdirectories and also
part of entirely separate versioning streams, discussed in the section
below "New Migration Layout". The "expand" operations are in the
"expand" script, and the "contract" operations are in the "contract"
script.
The data migrations are removed, as these are expected to
generally not occur within schema migrations any more. However,
the approach remains compatible with allowing "safe" data migrations to be
manually placed within the expand or contract scripts if deemed appropriate
in some cases.
For the time being, until live data migration is accepted in Neutron, data
migration rules belong to one of script subtrees.
New Migration Layout
--------------------
With Alembic's new capabilities, we can propose a new structure for
Neutron's migration files that is compatible with "expand" / "contract"
while at the same time remains compatible with the existing stream
of Alembic migration files in Neutron. A new directory/branch
structure will be laid out which allows all versions/streams to be apparent::
neutron/db/migration/alembic_migrations/...
...versions/
<existing version>.py
<existing version>.py
<existing version>.py
...
versions/liberty/
versions/liberty/expand/
<expansion script>.py
<expansion script>.py
...
versions/liberty/contract/
<contract script>.py
<contract script>.py
...
versions/M_release/
versions/M_release/expand/
versions/M_release/contract/
... etc
Above, the existing /versions/ directory with all of its current migration
scripts remains intact; these versions are still the scripts that take a
Neutron database up through Kilo at least. Following those, a new
series of subdirectories are added, organized among major Openstack releases,
and within each subdirectory, the "expand" and "contract" series of scripts
are themselves separate.
The series of scripts within ``/expand/`` and ``/contract/`` are
themselves originating from independent "roots"; that is, the "down"
revision for the bottommost script in each directory is ``None``.
The production of these scripts is supported by the ``alembic revision`` command,
which now includes options to place files in specific directories as well
as what the "down revision" of a given revision is to be, including
that it may be a "root", thus allowing the creation of new branches
and roots.
Cross-Branch Dependencies
--------------------------
To accommodate for the fact that the scripts in ``liberty/expand``
can't be run until all the old scripts in ``versions/`` have run, as
well as that individual scripts in ``liberty/contract/`` can't run until
their correpsonding "expand" has run, Alembic's cross-branch dependency
feature will be used. From a DAG point of view, this is the same as a script
declaring another one as a dependency, but from Alembic's perspective the
script is not considered to be any kind of "down revision"; only a script
whose version must be invoked within the target schema before the current
one can be run. They are indicated in Alembic scripts as a separate directive::
# revision identifiers, used by Alembic.
revision = '2a95102259be'
down_revision = '29f859a13ea'
branch_labels = None
depends_on=('55af2cb1c267', '4fcb78af5a01')
By establishing "depends_on", a particular script indicates what migration
scripts in other branches need to be run first, before this one can.
When we instruct Alembic to invoke this migration, it will ensure that
all dependency scripts are run first. It is expected that
the automated script creation tool will be able to build out these
directives automatically.
Alembic branch dependencies are discussed in the Alembic documentation
referred to in the References section.
Branch Labels
-------------
Alembic also now provides for "branch labels", meaning that in addition
to having our migration files in different directories, versioned across
independent branches with independent roots, we can also apply one or
more "labels" to a branch as a whole which is then addressable using Alembic's
command line tools.
Whereas in Neutron today we can see some migration scripts that are intentionally
named ``juno_release.py`` and ``kilo_release.py``, we can apply these names
to the branches as a whole. Like branch dependencies, these are also
indicated as directives within migration scripts; however, the branch
label only need be present in *any* single revision script within the
branch. Typically, the first script within the branch is a good
choice for placing a label::
# revision identifiers, used by Alembic.
revision = '2a95102259be'
down_revision = None # because we are a "root"
branch_labels = ('liberty_expand', 'release_expand')
depends_on='55af2cb1c267'
So above, we would apply names such as ``"liberty_expand"`` and
``"liberty_contract"`` to ``liberty/expand`` and ``liberty/contract`` branches,
appropriately. This allows Alembic commands to be run which refer to the
branch as a whole, such as::
alembic upgrade liberty_expand@head
Where above, all migrations up until the ``liberty_expand`` branch will
be run (including dependent versions from the old series of migration
files first, if not already run). This will allow Neutron's command suite
to accommodate specific target points within the new versioning scheme
without the need to become aware of specific revisions.
If labels that are agnostic of "release" are desired, such as a branch
that indicates "run all the expand steps up to the current release", we
can add additional "latest release" labels that move to new branches
as new releases are established.
New Neutron DB Commands
-----------------------
Right now, Neutron allows database upgrades running the ``neutron-db-manage``
script, which links into Alembic's own ``upgrade`` command. This script
will be enhanced to allow for running individual migration streams by taking
advantage of new argument forms that are part of Alembic. The ``alembic upgrade``
command will still be used but will now be passed the appropriate branch
labels specific to the target operation, such as ``neutron-db-manage expand``
or ``neutron-db-manage contract``.
Automation of Scripting
-----------------------
The previous sections essentially make possible the entire "expand" / "contract"
workflow completely, in such a way that workflow from Neutron's existing
Alembic versioning scripts is maintained without any backwards incompatibility.
However, the addition of new migration scripts would at first be available
only by manually targeting each portion of the workflow individually.
We instead can enhance Neutron's use of Alembic "autogenerate" such
that a single revision autogenerate step can produce multiple files as
needed; a migration that includes both "expand" and "contract"
directives would generate two separate scripts.
Right now, Nova OSM makes use of Alembic autogenerate in order to derive
information about how a target database differs from the model established
in code. It uses a public method ``compare_metadata()`` to achieve this;
``compare_metadata()`` returns a simple list of "diffs" which refer to changes
in schema objects like tables, columns, and constraints. Nova OSM then
keys "operational" objects such as ``AddTable``, ``DropColumn``, ``AddConstraint``,
etc. These "operational" objects then link back into Alembic's API,
associated with corresponding "operation" constructs in Alembic such
as ``op.create_table()``, ``op.drop_column()``, ``op.add_constraint()``.
Nova's OSM is basically consuming an Alembic "autogenerate diff" stream
and streaming it into an Alembic "run operations" stream. It follows that
Alembic can provide infrastructure such that an autogenerate
diff stream can be supplied directly as a migration operation stream.
Both the "live" OSM approach of Nova and the "scripted" approach
proposed here can consume this same operation stream, partition it
based on operation type into "expand" and "contract" streams, and then
direct those streams either to a live database context for "live" migrations
*or* to a series of migration scripts for scripted migrations.
The ``alembic revision`` command will also be opened up such that a plugin
may establish an open-ended series of revision scripts generated from
portions of these operational streams. The end result will be that
a single call to ``alembic revision --autogenerate`` as performed by
Neutron developers today will generate separate "expand" and "contract" scripts
directly.
These new APIs are already underway in upstream development branches and are
tracked by separate Alembic issues (see References). By closely linking the
implementations for "live" migrations and "scripted"
migrations, it is hoped that the majority of ongoing effort within OSM
can contribute to both approaches simultaneously, thus reducing the risk
that work is wasted either if one or the other approach is abandoned
or that improvements and workflows begin to diverge if both approaches
remain in active use.
Data Model Impact
-----------------
Expand/contract workflow itself has no direct impact on the data model. The
other aspects of online schema migrations, namely support of multiple
versions of a schema simultanouesly as well as moving data between those
structures at runtime have an enormous impact; however, that's outside the
scope of this document.
REST API Impact
---------------
none
Security Impact
---------------
none
Other End User Impact
---------------------
End users will be using a modified workflow when schema upgrades
are performed, running "expand" and "contract" steps separately and
at the appropriate time.
In terms of backup procedures, there is no difference. Database always
represents some specific subset of head revisions (the only difference
between the proposed feature and the current state is that the subset
has more than one element).
If/once we adopt live data migration in Neutron, it won't change a lot
in terms of database backups either way. The only significant thing is
that the same logical object could now be represented by multiple
versions of database rows, depending on whether migration for the
object is complete. Anyway, backups would still work as usual.
Performance Impact
------------------
none
Developer Impact
----------------
Developers should continue to use ``alembic revision --autogenerate``
in order to create new migration scripts. This operation will create
multiple scripts, so to the degree that developers need to manually
tune these scripts, they'll be dealing with more than one script. Since
data migrations will generally no longer be within these scripts, and
also since we now have the ability to render custom directives via
autogenerate, it is hoped that pretty much anything Nova's "live" OSM
approach can automate can also be completely automatic within the
"scripted" approach as well.
Alternatives
------------
Live migrations were originally proposed as a replacement for
SQLAlchemy-Migrate, which has the additional issues of a very rigid
and unworkable numbering scheme as well as verbose migration scripts
that rely heavily on full table declarations and reflection. These
are not issues for projects that already use Alembic, as Alembic was
designed to solve these problems among others.
Live migration also offers the advantages that no script
at all needs to be generated or committed into the source repository,
and also that because there is literally no way to alter how
migrations will proceed on a per-change basis, developers are not
given any opportunity to inadvertently produce a migration that is
non-expansive or to inappropriately write statements in a migration
that result in performing a data migration. This is noted
as allowing a "purely declarative" approach to migrations, where the
model code is all that's needed to indicate how to get to the new
schema.
However, this advantage only works under limited circumstances. While
simple cases are easily automated by both approaches, the issue of
accommodating for special cases, unsupported features, and variability
in support and/or reliability on various backends is not addressed
by the "live" approach. Live migrations only offers in such cases that
the upstream migration system must be modified to accommodate for the
target case, or the application must be modified to no longer require
such a schema migration. Special cases include changes to complex
types like ENUMs and precision numerics, operations involving CHECK
constraints and some kinds of server defaults, special constructs such
as indexes that use vendor-specific extensions, and even simple
things like changes of table or column name that can't be
distinguished from an add/drop of two separate objects.
The availability and reliability of the reflection and autogenerate
features on backends is not necessarily consistent, nor can the
Alembic / SQLAlchemy projects make any such guarantee. In particular,
less common backends such as that of IBM DB2, which is published
independently of SQLAlchemy or Alembic, may not support some
operations correctly at all, and it is not currently known to what
extent autogenerate and reflection produce accurate results,
especially in thorny cases such as indexes, unique constraints, and
column types. Alembic's autogenerate feature was not intended to be
used in the way that live migrations does, and it's a risky assumption
that it will produce correct results perfectly under all cases on
thousands of production systems.
While the "scripted" OSM approach maintains reliance upon explicit migration
scripts that must be checked in and occasionally edited, the advantage to this
approach is that the sequence of migration steps to be run are produced
just once, up front, in a controlled environment. Unusual migrations against
special types or other constructs are again a non-issue as they can be
scripted explicitly as needed. These steps can then be
carefully reviewed and tested by developers and then shipped, where there
is no risk of them doing something entirely different when run against
a backend of a lesser-used vendor or with unusual configurations. They
also maintain the advantage that operator-maintained schema
structures are unaffected; the "live" approach documents that operators
would need to re-create their own schema structures after a "contract" is run.
In order to maintain that scripted migrations stay appropriately
expansive / contractual and without inappropriate data migrations in
the face of developer intervention, we should require that developers
use the autogenerated migration scripts that are generated for them as
is, and that they don't modify these scripts except to support
operations that aren't supported as automatable migrations. Schema
migrations should be tested as part of the CI process including that
the "online upgrades" are tested against previous API versions, and
migration scripts of course go through the usual Gerrit code review
process; identifying migrations that are non-expansive or are data
migrations is not difficult.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Mike Bayer
Other contributors:
Ann Kamyshnikova
Henry Gessau
Ihar Hrachyshka
Work Items
----------
* update neutron-db-manage to support upgrade for multiple heads.
* update neutron-db-manage revision to generate multiple scripts.
* get alembic updated to include update stream generated.
* use the alembic autogenerate feature to filter operations into subtrees.
* implement new expand/contract neutron-db-manage commands.
* document the change in the upgrade flow in user and developer docs.
* introduce testing for expand-only schema upgrade.
Dependencies
============
Upstream changes to Alembic for the autogenerate integration aspect.
Testing
=======
Functional tests should include that "expand" migrations are run and that
the previous version of the API still works fully against an expanded
migration.
Documentation Impact
====================
User Documentation
------------------
Expand/contract workflow will need to be documented.
Developer Documentation
-----------------------
Usage of autogenerate along with expand/contract workflow can be documented.
References
==========
.. [#] Alembic's Branching Model
http://alembic.readthedocs.org/en/latest/branches.html
.. [#] Online Schema Migrations in Nova
http://specs.openstack.org/openstack/nova-specs/specs/liberty/approved/online-schema-changes.html
.. [#] Nova's Overall Online Upgrade approach
http://docs.openstack.org/developer/nova/devref/upgrade.html
.. [#] Operations as Objects
https://bitbucket.org/zzzeek/alembic/issue/302/operations-as-objects
.. [#] Extensible Revision / Autogenerate strategies
https://bitbucket.org/zzzeek/alembic/issue/301/extensible-revision-autogenerate
.. [#] Neutron patch to rearrange migration directory into subtrees
https://review.openstack.org/194198