Merge "devref: add upgrade strategy page"
This commit is contained in:
commit
c5cf4f54be
@ -20,6 +20,7 @@
|
|||||||
''''''' Heading 4
|
''''''' Heading 4
|
||||||
(Avoid deeper levels because they do not render well.)
|
(Avoid deeper levels because they do not render well.)
|
||||||
|
|
||||||
|
.. _alembic_migrations:
|
||||||
|
|
||||||
Alembic Migrations
|
Alembic Migrations
|
||||||
==================
|
==================
|
||||||
|
@ -182,10 +182,7 @@ Backward compatibility
|
|||||||
|
|
||||||
Document common pitfalls as well as good practices done when extending the RPC Interfaces.
|
Document common pitfalls as well as good practices done when extending the RPC Interfaces.
|
||||||
|
|
||||||
* The Neutron upgrade path requires the server to support the previous version of
|
* Make yourself familiar with :ref:`Upgrade review guidelines <upgrade_review_guidelines>`.
|
||||||
the agent. Any changes to the existing RPC methods must be compatible with the
|
|
||||||
previous version of the agent. Otherwise a version bump is required and the old
|
|
||||||
method must be kept under the previous version RPC endpoint.
|
|
||||||
|
|
||||||
|
|
||||||
Scalability issues
|
Scalability issues
|
||||||
|
@ -69,6 +69,7 @@ Neutron Internals
|
|||||||
oslo-incubator
|
oslo-incubator
|
||||||
callbacks
|
callbacks
|
||||||
dns_order
|
dns_order
|
||||||
|
upgrade
|
||||||
|
|
||||||
Testing
|
Testing
|
||||||
-------
|
-------
|
||||||
|
@ -95,6 +95,8 @@ This class implements the server side of the interface. The
|
|||||||
oslo_messaging.Target() defined says that this class currently implements
|
oslo_messaging.Target() defined says that this class currently implements
|
||||||
version 1.1 of the interface.
|
version 1.1 of the interface.
|
||||||
|
|
||||||
|
.. _rpc_versioning:
|
||||||
|
|
||||||
Versioning
|
Versioning
|
||||||
----------
|
----------
|
||||||
|
|
||||||
|
@ -21,6 +21,8 @@
|
|||||||
(Avoid deeper levels because they do not render well.)
|
(Avoid deeper levels because they do not render well.)
|
||||||
|
|
||||||
|
|
||||||
|
.. _rpc_callbacks:
|
||||||
|
|
||||||
Neutron Messaging Callback System
|
Neutron Messaging Callback System
|
||||||
=================================
|
=================================
|
||||||
|
|
||||||
|
250
doc/source/devref/upgrade.rst
Normal file
250
doc/source/devref/upgrade.rst
Normal file
@ -0,0 +1,250 @@
|
|||||||
|
..
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||||
|
not use this file except in compliance with the License. You may obtain
|
||||||
|
a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software
|
||||||
|
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||||
|
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||||
|
License for the specific language governing permissions and limitations
|
||||||
|
under the License.
|
||||||
|
|
||||||
|
|
||||||
|
Convention for heading levels in Neutron devref:
|
||||||
|
======= Heading 0 (reserved for the title in a document)
|
||||||
|
------- Heading 1
|
||||||
|
~~~~~~~ Heading 2
|
||||||
|
+++++++ Heading 3
|
||||||
|
''''''' Heading 4
|
||||||
|
(Avoid deeper levels because they do not render well.)
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Much of this document discusses upgrade considerations for the Neutron
|
||||||
|
reference implementation using Neutron's agents. It's expected that each
|
||||||
|
Neutron plugin provides its own documentation that discusses upgrade
|
||||||
|
considerations specific to that choice of backend. For example, OVN does
|
||||||
|
not use Neutron agents, but does have a local controller that runs on each
|
||||||
|
compute node. OVN supports rolling upgrades, but information about how that
|
||||||
|
works should be covered in the documentation for networking-ovn, the OVN
|
||||||
|
Neutron plugin.
|
||||||
|
|
||||||
|
Upgrade strategy
|
||||||
|
================
|
||||||
|
|
||||||
|
There are two general upgrade scenarios supported by Neutron:
|
||||||
|
|
||||||
|
#. All services are shut down, code upgraded, then all services are started again.
|
||||||
|
#. Services are upgraded gradually, based on operator service windows.
|
||||||
|
|
||||||
|
The latter is the preferred way to upgrade an OpenStack cloud, since it allows
|
||||||
|
for more granularity and less service downtime. This scenario is usually called
|
||||||
|
'rolling upgrade'.
|
||||||
|
|
||||||
|
Rolling upgrade
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Rolling upgrades imply that during some interval of time there will be services
|
||||||
|
of different code versions running and interacting in the same cloud. It puts
|
||||||
|
multiple constraints onto the software.
|
||||||
|
|
||||||
|
#. older services should be able to talk with newer services.
|
||||||
|
#. older services should not require the database to have older schema
|
||||||
|
(otherwise newer services that require the newer schema would not work).
|
||||||
|
|
||||||
|
`More info on rolling upgrades in OpenStack
|
||||||
|
<http://governance.openstack.org/reference/tags/assert_supports-rolling-upgrade.html>`_.
|
||||||
|
|
||||||
|
Those requirements are achieved in Neutron by:
|
||||||
|
|
||||||
|
#. If the Neutron backend makes use of Neutron agents, the Neutron server have
|
||||||
|
backwards compatibility code to deal with older messaging payloads.
|
||||||
|
#. isolating a single service that accesses database (neutron-server).
|
||||||
|
|
||||||
|
To simplify the matter, it's always assumed that the order of service upgrades
|
||||||
|
is as following:
|
||||||
|
|
||||||
|
#. first, all neutron-servers are upgraded.
|
||||||
|
#. then, if applicable, neutron agents are upgraded.
|
||||||
|
|
||||||
|
This approach allows us to avoid backwards compatibility code on agent side and
|
||||||
|
is in line with other OpenStack projects that support rolling upgrades
|
||||||
|
(specifically, nova).
|
||||||
|
|
||||||
|
Server upgrade
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Neutron-server is the very first component that should be upgraded to the new
|
||||||
|
code. It's also the only component that relies on new database schema to be
|
||||||
|
present, other components communicate with the cloud through AMQP and hence do
|
||||||
|
not depend on particular database state.
|
||||||
|
|
||||||
|
Database upgrades are implemented with alembic migration chains.
|
||||||
|
|
||||||
|
Database upgrade is split into two parts:
|
||||||
|
|
||||||
|
#. neutron-db-manage upgrade --expand
|
||||||
|
#. neutron-db-manage upgrade --contract
|
||||||
|
|
||||||
|
Each part represents a separate alembic branch.
|
||||||
|
|
||||||
|
:ref:`More info on alembic scripts <alembic_migrations>`.
|
||||||
|
|
||||||
|
The former step can be executed while old neutron-server code is running. The
|
||||||
|
latter step requires *all* neutron-server instances to be shut down. Once it's
|
||||||
|
complete, neutron-servers can be started again.
|
||||||
|
|
||||||
|
Agents upgrade
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
This section does not apply when the cloud does not use AMQP agents to
|
||||||
|
provide networking services to instances. In that case, other backend
|
||||||
|
specific upgrade instructions may also apply.
|
||||||
|
|
||||||
|
Once neutron-server services are restarted with the new database schema and the
|
||||||
|
new code, it's time to upgrade Neutron agents.
|
||||||
|
|
||||||
|
Note that in the meantime, neutron-server should be able to serve AMQP messages
|
||||||
|
sent by older versions of agents which are part of the cloud.
|
||||||
|
|
||||||
|
The recommended order of agent upgrade (per node) is:
|
||||||
|
|
||||||
|
#. first, L2 agents (openvswitch, linuxbridge, sr-iov).
|
||||||
|
#. then, all other agents (L3, DHCP, Metadata, ...).
|
||||||
|
|
||||||
|
The rationale of the agent upgrade order is that L2 agent is usually
|
||||||
|
responsible for wiring ports for other agents to use, so it's better to allow
|
||||||
|
it to do its job first and then proceed with other agents that will use the
|
||||||
|
already configured ports for their needs.
|
||||||
|
|
||||||
|
Each network/compute node can have its own upgrade schedule that is independent
|
||||||
|
of other nodes.
|
||||||
|
|
||||||
|
AMQP considerations
|
||||||
|
+++++++++++++++++++
|
||||||
|
|
||||||
|
Since it's always assumed that neutron-server component is upgraded before
|
||||||
|
agents, only the former should handle both old and new RPC versions.
|
||||||
|
|
||||||
|
The implication of that is that no code that handles UnsupportedVersion
|
||||||
|
oslo.messaging exceptions belongs to agent code.
|
||||||
|
|
||||||
|
:ref:`More information about RPC versioning <rpc_versioning>`.
|
||||||
|
|
||||||
|
Interface signature
|
||||||
|
'''''''''''''''''''
|
||||||
|
|
||||||
|
An RPC interface is defined by its name, version, and (named) arguments that
|
||||||
|
it accepts. There are no strict guarantees that arguments will have expected
|
||||||
|
types or meaning, as long as they are serializable.
|
||||||
|
|
||||||
|
Message content versioning
|
||||||
|
''''''''''''''''''''''''''
|
||||||
|
|
||||||
|
To provide better compatibility guarantees for rolling upgrades, RPC interfaces
|
||||||
|
could also define specific format for arguments they accept. In OpenStack
|
||||||
|
world, it's usually implemented using oslo.versionedobjects library, and
|
||||||
|
relying on the library to define serialized form for arguments that are passed
|
||||||
|
thru AMQP wire.
|
||||||
|
|
||||||
|
Note that Neutron has *not* adopted oslo.versionedobjects library for its RPC
|
||||||
|
interfaces yet (except for QoS feature).
|
||||||
|
|
||||||
|
:ref:`More information about RPC callbacks used for QoS <rpc_callbacks>`.
|
||||||
|
|
||||||
|
Networking backends
|
||||||
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Backend software upgrade should not result in any data plane disruptions.
|
||||||
|
Meaning, e.g. Open vSwitch L2 agent should not reset flows or rewire ports;
|
||||||
|
Neutron L3 agent should not delete namespaces left by older version of the
|
||||||
|
agent; Neutron DHCP agent should not require immediate DHCP lease renewal; etc.
|
||||||
|
|
||||||
|
The same considerations apply to setups that do not rely on agents. Meaning,
|
||||||
|
f.e. OpenDaylight or OVN controller should not break data plane connectivity
|
||||||
|
during its upgrade process.
|
||||||
|
|
||||||
|
Upgrade testing
|
||||||
|
---------------
|
||||||
|
|
||||||
|
`Grenade <https://github.com/openstack-dev/grenade>`_ is the OpenStack project
|
||||||
|
that is designed to validate upgrade scenarios.
|
||||||
|
|
||||||
|
Currently, only offline (non-rolling) upgrade scenario is validated in Neutron
|
||||||
|
gate. The upgrade scenario follows the following steps:
|
||||||
|
|
||||||
|
#. the 'old' cloud is set up using latest stable release code
|
||||||
|
#. all services are stopped
|
||||||
|
#. code is updated to the patch under review
|
||||||
|
#. new database migration scripts are applied, if needed
|
||||||
|
#. all services are started
|
||||||
|
#. the 'new' cloud is validated with a subset of tempest tests
|
||||||
|
|
||||||
|
The scenario validates that no configuration option names are changed in one
|
||||||
|
cycle. More generally, it validates that the 'new' cloud is capable of running
|
||||||
|
using the 'old' configuration files. It also validates that database migration
|
||||||
|
scripts can be executed.
|
||||||
|
|
||||||
|
The scenario does *not* validate AMQP versioning compatibility.
|
||||||
|
|
||||||
|
Other projects (for example Nova) have so called 'partial' grenade jobs where
|
||||||
|
some services are left running using the old version of code. Such a job would
|
||||||
|
be needed in Neutron gate to validate rolling upgrades for the project. Till
|
||||||
|
that time, it's all up to reviewers to catch compatibility issues in patches on
|
||||||
|
review.
|
||||||
|
|
||||||
|
Another hole in testing belongs to split migration script branches. It's
|
||||||
|
assumed that an 'old' cloud can successfully run after 'expand' migration
|
||||||
|
scripts from the 'new' cloud are applied to its database; but it's not
|
||||||
|
validated in gate.
|
||||||
|
|
||||||
|
.. _upgrade_review_guidelines:
|
||||||
|
|
||||||
|
Review guidelines
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
There are several upgrade related gotchas that should be tracked by reviewers.
|
||||||
|
|
||||||
|
First things first, a general advice to reviewers: make sure new code does not
|
||||||
|
violate requirements set by `global OpenStack deprecation policy
|
||||||
|
<http://governance.openstack.org/reference/tags/assert_follows-standard-deprecation.html>`_.
|
||||||
|
|
||||||
|
Now to specifics:
|
||||||
|
|
||||||
|
#. Configuration options:
|
||||||
|
|
||||||
|
* options should not be dropped from the tree without waiting for
|
||||||
|
deprecation period (currently it's one development cycle long) and a
|
||||||
|
deprecation message issued if the deprecated option is used.
|
||||||
|
* option values should not change their meaning between releases.
|
||||||
|
|
||||||
|
#. Data plane:
|
||||||
|
|
||||||
|
* agent restart should not result in data plane disruption (no Open vSwitch
|
||||||
|
ports reset; no network namespaces deleted; no device names changed).
|
||||||
|
|
||||||
|
#. RPC versioning:
|
||||||
|
|
||||||
|
* no RPC version major number should be bumped before all agents had a
|
||||||
|
chance to upgrade (meaning, at least one release cycle is needed before
|
||||||
|
compatibility code to handle old clients is stripped from the tree).
|
||||||
|
* no compatibility code should be added to agent side of AMQP interfaces.
|
||||||
|
* server code should be able to handle all previous versions of agents,
|
||||||
|
unless the major version of an interface is bumped.
|
||||||
|
* no RPC interface arguments should change their meaning, or names.
|
||||||
|
* new arguments added to RPC interfaces should not be mandatory. It means
|
||||||
|
that server should be able to handle old requests, without the new
|
||||||
|
argument specified. Also, if the argument is not passed, the old behaviour
|
||||||
|
before the addition of the argument should be retained.
|
||||||
|
|
||||||
|
#. Database migrations:
|
||||||
|
|
||||||
|
* migration code should be split into two branches (contract, expand) as
|
||||||
|
needed. No code that is unsafe to execute while neutron-server is running
|
||||||
|
should be added to expand branch.
|
||||||
|
* if possible, contract migrations should be minimized or avoided to reduce
|
||||||
|
the time when API endpoints must be down during database upgrade.
|
Loading…
Reference in New Issue
Block a user