This review syncs the state of launchpad with the spec repo for things implemented up to liberty-3. There are no changes to the specs, just things being moved around. Change-Id: I930d33532b268b6e933c8be06a0569c20fd09586
217 lines
7.2 KiB
ReStructuredText
217 lines
7.2 KiB
ReStructuredText
..
|
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
|
License.
|
|
|
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
|
|
|
============================================
|
|
Add a service version number to the database
|
|
============================================
|
|
|
|
https://blueprints.launchpad.net/nova/+spec/service-version-number
|
|
|
|
We have previously identified online data migrations as critical for
|
|
supporting live upgrades. We do well with that today, as long as
|
|
control services are upgraded together (conductor, api, etc). In order
|
|
to extend this further, we need a way to determine that some of the
|
|
control services have been upgraded, but not all. This information
|
|
will allow us to avoid upgrading data online until all of the services
|
|
are upgraded to the point at which they can handle the new schema.
|
|
|
|
Problem description
|
|
===================
|
|
|
|
Partially upgrading control services will result in newer conductors
|
|
beginning to convert data to expand into new schema before older
|
|
conductors and other control services are ready. If you upgrade all
|
|
your control services together, this works well, but if you don't (as
|
|
would be more realistic) you have the potential to break some of the
|
|
older control services.
|
|
|
|
For example, in Kilo we can apply the database schema before starting
|
|
to upgrade any of the code. However, once we upgrade any one component
|
|
that talks to the database directly, online migrations will begin to
|
|
happen, and any other nodes that read from the database directly will
|
|
be confused as things are starting to move. Since services like
|
|
nova-api, nova-scheduler, nova-conductor, etc all talk directly to the
|
|
database, we're currently unable to upgrade these services
|
|
independently. If we had this service version number available, we
|
|
could avoid doing any online migrations until all the affected
|
|
services are upgraded to a new-enough point.
|
|
|
|
Use Cases
|
|
----------
|
|
|
|
As a deployer, I want to be able to stage my upgrades of control
|
|
services, without having to take down *all* of my conductor, api,
|
|
scheduler, etc nodes and restart them together spanning a live data
|
|
migration.
|
|
|
|
Project Priority
|
|
-----------------
|
|
|
|
This is an upgrades enhancement.
|
|
|
|
Proposed change
|
|
===============
|
|
|
|
The proposed change is adding a version number column to the services
|
|
table. Each service will report its version number when it updates its
|
|
service record. We will be able to determine if all services are on
|
|
the same level of code by checking to see if there is more than one
|
|
version in the table (optionally grouped by service). We can use this
|
|
information to conditionally enable online data migrations. For
|
|
example, we can check to see if all the conductors are upgraded to the
|
|
same level by doing something like this:
|
|
|
|
SELECT DISTINCT version FROM services WHERE binary='conductor';
|
|
|
|
If we do that at conductor startup, we can set a flag to not enable
|
|
migrations that require conductor services to be newer than a specific
|
|
version. We could also refresh this on SIGHUP like we currently do for
|
|
config reload.
|
|
|
|
Initially, the column will be created with a default of NULL, and the
|
|
Service object will treat NULL as "version 1". Subsequent changes will
|
|
set versions to 2. Any time we do an online data migration, we will
|
|
need to bump the service version number.
|
|
|
|
Longer-term we could try to tie RPC versions to this information to
|
|
make pinning easier. Because that has potentially hard-to-quantify
|
|
implications on backports (which occasionally do need to modify RPC
|
|
interfaces), I propose we leave that out of the scope of this work for
|
|
now. Even still, tying RPC versions to this service version would be
|
|
work for the next cycle once this is in place and we can depend on
|
|
it.
|
|
|
|
The other thing that this enables is the ability to have a service
|
|
start up and know that it is massively out of date. Presumably we
|
|
should be able to have a conductor start up and say "wow, I'm much
|
|
older than the other conductors, I should log an error and exit."
|
|
Determining what the minimum version is and should be is something we
|
|
would do in M when we have this for the N and N-1 releases such that
|
|
we can depend on it.
|
|
|
|
We need to do this in L so that we can leverage it in M. If we delay
|
|
this until M, we won't be able to rely on it until N.
|
|
|
|
Why not use semver?
|
|
-------------------
|
|
|
|
In things like our RPC API versions, we use semver-like version
|
|
numbers. This allows us to make decisions about incoming calls and
|
|
whether they're compatible with a newer node, and generally define
|
|
rules for what is a breaking (i.e. major bump) change.
|
|
|
|
This service version number doesn't imply any semantics itself, but
|
|
rather just provides a vector with which we can orient ourselves in
|
|
time to make other decisions. As described elsewhere in this spec,
|
|
that may mean that we can decide what RPC version to use, or whether
|
|
it is safe to start doing online data migrations. *Those* decisions
|
|
extract semantic meaning out of the service version vector, and they
|
|
may have significantly different rules (as would certainly be the case
|
|
with the aforementioned RPC and DB decisions).
|
|
|
|
Alternatives
|
|
------------
|
|
|
|
We can continue to do what we do today, which is start converting data
|
|
from one schema to the other as soon as we roll new code. We keep the
|
|
restriction of all control services being required to update together.
|
|
|
|
We could also codify this in config somehow, but that will require
|
|
much more operator intervention, and increases the likelihood of error.
|
|
|
|
Data model impact
|
|
-----------------
|
|
|
|
A schema migration will be written to add a version column to the
|
|
table. The version will be an integer, nullable, default to NULL.
|
|
|
|
The Service object will be extended to support this version, and will
|
|
treat a NULL version as version "1", which will avoid us having to do
|
|
any data migrations on existing service records. New saves will
|
|
initially write version "2" for all services.
|
|
|
|
REST API impact
|
|
---------------
|
|
|
|
None.
|
|
|
|
Security impact
|
|
---------------
|
|
|
|
None.
|
|
|
|
Notifications impact
|
|
--------------------
|
|
|
|
None.
|
|
|
|
Other end user impact
|
|
---------------------
|
|
|
|
None.
|
|
|
|
Performance Impact
|
|
------------------
|
|
|
|
None.
|
|
|
|
Other deployer impact
|
|
---------------------
|
|
|
|
This will make upgrading nova services easier and more
|
|
flexible/forgiving for deployers.
|
|
|
|
Developer impact
|
|
----------------
|
|
|
|
Developers (and reviewers) will need to ensure that the service
|
|
version number is bumped across any online data migration that we do
|
|
(like the recent flavor restructuring).
|
|
|
|
Implementation
|
|
==============
|
|
|
|
Assignee(s)
|
|
-----------
|
|
|
|
Primary assignee:
|
|
danms
|
|
|
|
Other contributors:
|
|
alaski
|
|
|
|
Work Items
|
|
----------
|
|
|
|
* Write the schema migration
|
|
* Update the sqlalchemy models and service object
|
|
* Write some object methods to help with querying service version
|
|
numbers in ways that will be friendly for determining upgrade
|
|
feasibility.
|
|
* Extend the service startup code to check version spread and persist
|
|
so that we can use that as a static flag for enabling migrations.
|
|
|
|
Dependencies
|
|
============
|
|
|
|
None. This is mostly early setup for being able to do more interesting
|
|
upgrade things in M.
|
|
|
|
Testing
|
|
=======
|
|
|
|
This should be fully testable with unit tests.
|
|
|
|
Documentation Impact
|
|
====================
|
|
|
|
Ideally, this should make upgrades require less documentation.
|
|
|
|
References
|
|
==========
|
|
|
|
* http://specs.openstack.org/openstack/nova-specs/specs/kilo/approved/flavor-from-sysmeta-to-blob.html
|