Merge "Add service-version-number spec"

2015-07-10 14:56:27 +00:00 · 2015-07-10 14:56:27 +00:00 · 3dd6b62ff4
parent 9ddf630100 4a50f6a47c
commit 3dd6b62ff4
1 changed files with 216 additions and 0 deletions
--- a/specs/liberty/approved/service-version-number.rst
+++ b/specs/liberty/approved/service-version-number.rst
@ -0,0 +1,216 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+============================================
+Add a service version number to the database
+============================================
+
+https://blueprints.launchpad.net/nova/+spec/service-version-number
+
+We have previously identified online data migrations as critical for
+supporting live upgrades. We do well with that today, as long as
+control services are upgraded together (conductor, api, etc). In order
+to extend this further, we need a way to determine that some of the
+control services have been upgraded, but not all. This information
+will allow us to avoid upgrading data online until all of the services
+are upgraded to the point at which they can handle the new schema.
+
+Problem description
+===================
+
+Partially upgrading control services will result in newer conductors
+beginning to convert data to expand into new schema before older
+conductors and other control services are ready. If you upgrade all
+your control services together, this works well, but if you don't (as
+would be more realistic) you have the potential to break some of the
+older control services.
+
+For example, in Kilo we can apply the database schema before starting
+to upgrade any of the code. However, once we upgrade any one component
+that talks to the database directly, online migrations will begin to
+happen, and any other nodes that read from the database directly will
+be confused as things are starting to move.  Since services like
+nova-api, nova-scheduler, nova-conductor, etc all talk directly to the
+database, we're currently unable to upgrade these services
+independently. If we had this service version number available, we
+could avoid doing any online migrations until all the affected
+services are upgraded to a new-enough point.
+
+Use Cases
+----------
+
+As a deployer, I want to be able to stage my upgrades of control
+services, without having to take down *all* of my conductor, api,
+scheduler, etc nodes and restart them together spanning a live data
+migration.
+
+Project Priority
+-----------------
+
+This is an upgrades enhancement.
+
+Proposed change
+===============
+
+The proposed change is adding a version number column to the services
+table. Each service will report its version number when it updates its
+service record. We will be able to determine if all services are on
+the same level of code by checking to see if there is more than one
+version in the table (optionally grouped by service). We can use this
+information to conditionally enable online data migrations. For
+example, we can check to see if all the conductors are upgraded to the
+same level by doing something like this:
+
+  SELECT DISTINCT version FROM services WHERE binary='conductor';
+
+If we do that at conductor startup, we can set a flag to not enable
+migrations that require conductor services to be newer than a specific
+version. We could also refresh this on SIGHUP like we currently do for
+config reload.
+
+Initially, the column will be created with a default of NULL, and the
+Service object will treat NULL as "version 1". Subsequent changes will
+set versions to 2. Any time we do an online data migration, we will
+need to bump the service version number.
+
+Longer-term we could try to tie RPC versions to this information to
+make pinning easier. Because that has potentially hard-to-quantify
+implications on backports (which occasionally do need to modify RPC
+interfaces), I propose we leave that out of the scope of this work for
+now. Even still, tying RPC versions to this service version would be
+work for the next cycle once this is in place and we can depend on
+it.
+
+The other thing that this enables is the ability to have a service
+start up and know that it is massively out of date. Presumably we
+should be able to have a conductor start up and say "wow, I'm much
+older than the other conductors, I should log an error and exit."
+Determining what the minimum version is and should be is something we
+would do in M when we have this for the N and N-1 releases such that
+we can depend on it.
+
+We need to do this in L so that we can leverage it in M. If we delay
+this until M, we won't be able to rely on it until N.
+
+Why not use semver?
+-------------------
+
+In things like our RPC API versions, we use semver-like version
+numbers. This allows us to make decisions about incoming calls and
+whether they're compatible with a newer node, and generally define
+rules for what is a breaking (i.e. major bump) change.
+
+This service version number doesn't imply any semantics itself, but
+rather just provides a vector with which we can orient ourselves in
+time to make other decisions. As described elsewhere in this spec,
+that may mean that we can decide what RPC version to use, or whether
+it is safe to start doing online data migrations. *Those* decisions
+extract semantic meaning out of the service version vector, and they
+may have significantly different rules (as would certainly be the case
+with the aforementioned RPC and DB decisions).
+
+Alternatives
+------------
+
+We can continue to do what we do today, which is start converting data
+from one schema to the other as soon as we roll new code. We keep the
+restriction of all control services being required to update together.
+
+We could also codify this in config somehow, but that will require
+much more operator intervention, and increases the likelihood of error.
+
+Data model impact
+-----------------
+
+A schema migration will be written to add a version column to the
+table. The version will be an integer, nullable, default to NULL.
+
+The Service object will be extended to support this version, and will
+treat a NULL version as version "1", which will avoid us having to do
+any data migrations on existing service records. New saves will
+initially write version "2" for all services.
+
+REST API impact
+---------------
+
+None.
+
+Security impact
+---------------
+
+None.
+
+Notifications impact
+--------------------
+
+None.
+
+Other end user impact
+---------------------
+
+None.
+
+Performance Impact
+------------------
+
+None.
+
+Other deployer impact
+---------------------
+
+This will make upgrading nova services easier and more
+flexible/forgiving for deployers.
+
+Developer impact
+----------------
+
+Developers (and reviewers) will need to ensure that the service
+version number is bumped across any online data migration that we do
+(like the recent flavor restructuring).
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  danms
+
+Other contributors:
+  alaski
+
+Work Items
+----------
+
+* Write the schema migration
+* Update the sqlalchemy models and service object
+* Write some object methods to help with querying service version
+  numbers in ways that will be friendly for determining upgrade
+  feasibility.
+* Extend the service startup code to check version spread and persist
+  so that we can use that as a static flag for enabling migrations.
+
+Dependencies
+============
+
+None. This is mostly early setup for being able to do more interesting
+upgrade things in M.
+
+Testing
+=======
+
+This should be fully testable with unit tests.
+
+Documentation Impact
+====================
+
+Ideally, this should make upgrades require less documentation.
+
+References
+==========
+
+* http://specs.openstack.org/openstack/nova-specs/specs/kilo/approved/flavor-from-sysmeta-to-blob.html