devref: Rolling upgrade mechanism for rpc-callbacks

Versioned Object push notifications require the server to be aware
of supported versions in the agents, since they are subscribed
to neutron-vo-<resource-type>-<version-number>.

During upgrade time, the server would need to downgrade and serialize
the objects across version subset, and send it to the fanout
queues for agent consumption.

One manual solution could be manual admin pinning, but we can do
better than that, making administrator lives easier if we provide
a reliable mechanism for remote version auto discovery.

Change-Id: I02b694137eb2d58e5f2f3e7631f0e4b90f7c17ad
This commit is contained in:
Miguel Angel Ajo
2015-11-03 10:51:56 +01:00
parent 5b804251ac
commit 0ddb66f0ed

View File

@@ -65,28 +65,6 @@ versioned_objects serialization/deserialization with the
obj_to_primitive(target_version=..) and primitive_to_obj() [#ov_serdes]_
methods is used internally to convert/retrieve objects before/after messaging.
Considering rolling upgrades, there are several scenarios to look at:
* publisher (generally neutron-server or a service) and subscriber (agent)
know the same version of the objects, so they serialize, and deserialize
without issues.
* publisher knows (and sends) an older version of the object, subscriber
will get the object updated to latest version on arrival before any
callback is called.
* publisher sends a newer version of the object, subscriber won't be able
to deserialize the object, in this case (PLEASE DISCUSS), we can think of two
strategies:
The strategy for upgrades will be:
During upgrades, we pin neutron-server to a compatible version for resource
fanout updates, and the server sends both the old, and the newer version.
The new agents process updates, taking the newer version of the resource
fanout updates. When the whole system upgraded, we un-pin the compatible
version fanout.
Serialized versioned objects look like::
{'versioned_object.version': '1.0',
@@ -101,6 +79,102 @@ Serialized versioned objects look like::
'name': u'aaa'},
'versioned_object.namespace': 'versionedobjects'}
Rolling upgrades strategy
-------------------------
In this section we assume the standard Neutron upgrade process, which means
upgrade the server first and then upgrade the agents:
:doc:`More information about the upgrade strategy <upgrade>`.
The plan is to provide a semi-automatic method which avoids manual pinning and
unpinning of versions by the administrator which could be prone to error.
Resource pull requests
~~~~~~~~~~~~~~~~~~~~~~
Resource pull requests will always be ok because the underlying resource RPC
does provide the version of the requested resource id / ids. The server will
be upgraded first, so it will always be able to satisfy any version the agents
request.
Resource push notifications
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Agents will subscribe to the neutron-vo-<resource_type>-<version> fanout queue
which carries updated objects for the version they know about. The versions
they know about depend on the runtime Neutron versioned objects they started with.
When the server upgrades, it should be able to instantly calculate a census of
agent versions per object (we will define a mechanism for this in a later
section). It will use the census to send fanout messages on all the version
span a resource type has.
For example, if neutron-server knew it has rpc-callback aware agents with
versions 1.0, and versions 1.2 of resource type "A", any update would be sent
to neutron-vo-A_1.0 and neutron-vo-A_1.2.
TODO(mangelajo): Verify that after upgrade is finished any unused messaging
resources (queues, exchanges, and so on) are released as older agents go away
and neutron-server stops producing new message casts. Otherwise document the
need for a neutron-server restart after rolling upgrade has finished if we
want the queues cleaned up.
Leveraging agent state reports for object version discovery
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
We would add a row to the agent db for tracking agent known objects and version
numbers. This would resemble the implementation of the configuration column.
Agents would report at start not only their configuration now, but also
their subscribed object type / version pairs, that would be stored in the
database and would be available to any neutron-server requesting it::
'subscribed_versions': {'QoSPolicy': '1.1',
'SecurityGroup': '1.0',
'Port': '1.0'}
There's a subset of Liberty agents depending on QoSPolicy that will
require 'QoSPolicy': '1.0' if the qos plugin is installed. We will be able
to identify those by the binary name (included in the report):
* 'neutron-openvswitch-agent'
* 'neutron-sriov-nic-agent'
Version discovery
+++++++++++++++++
With the above mechanism in place and considering the exception of
neutron-openvswitch-agent and neutron-sriov-agent requiring QoSpolicy 1.0,
we could discover the subset of versions to be sent on every push
notification.
Agents that are in down state would be excluded from this calculation.
We would use an extended timeout for agents in this calculation to make sure
we're on the safe side, specially if deployer marked agents with low
timeouts.
Starting at Mitaka, any agent interested in versioned objects via this API
should report their resource/version tuples of interest (the resource type/
version pairs they're subscribed to).
Caching mechanism
'''''''''''''''''
The version subset per object will be cached to avoid DB requests on every push
given that we assume that all old agents are already registered at the time of
upgrade.
Cached subset will be re-evaluated (to cut down the version sets as agents
upgrade) after configured TTL.
As a fast path to update this cache on all neutron-servers when upgraded agents
come up (or old agents revive after a long timeout or even a downgrade), we could
introduce a fanout queue consumed by servers, to additionally notify from one
agent to all neutron-servers about the "versions of interest" in the agent just
comming up.
All notifications for all calculated version sets must be sent, as non-upgraded
agents would otherwise not receive them.
It is safe to send notifications to any fanout queue as they will be discarded
if no agent is listening.
Topic names for every resource type RPC endpoint
------------------------------------------------
@@ -205,8 +279,7 @@ The server/publisher side may look like::
References
----------
.. [#ov_serdes] https://github.com/openstack/oslo.versionedobjects/blob/master/oslo_versionedobjects/tests/test_objects.py#L621
.. [#vo_mkcompat] https://github.com/openstack/oslo.versionedobjects/blob/master/oslo_versionedobjects/base.py#L460
.. [#vo_mkcptests] https://github.com/openstack/oslo.versionedobjects/blob/master/oslo_versionedobjects/tests/test_objects.py#L111
.. [#vo_versioning] https://github.com/openstack/oslo.versionedobjects/blob/master/oslo_versionedobjects/base.py#L236
.. [#ov_serdes] https://github.com/openstack/oslo.versionedobjects/blob/ce00f18f7e9143b5175e889970564813189e3e6d/oslo_versionedobjects/tests/test_objects.py#L410
.. [#vo_mkcompat] https://github.com/openstack/oslo.versionedobjects/blob/ce00f18f7e9143b5175e889970564813189e3e6d/oslo_versionedobjects/base.py#L474
.. [#vo_mkcptests] https://github.com/openstack/oslo.versionedobjects/blob/ce00f18f7e9143b5175e889970564813189e3e6d/oslo_versionedobjects/tests/test_objects.py#L114
.. [#vo_versioning] https://github.com/openstack/oslo.versionedobjects/blob/ce00f18f7e9143b5175e889970564813189e3e6d/oslo_versionedobjects/base.py#L248