Merge "Add spec for push notification refactor"
This commit is contained in:
commit
a4e76357c6
|
@ -0,0 +1,335 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=============================
|
||||
Push Notifications for Agents
|
||||
=============================
|
||||
|
||||
RFE:
|
||||
https://bugs.launchpad.net/neutron/+bug/1516195
|
||||
|
||||
Launchpad blueprint:
|
||||
https://blueprints.launchpad.net/neutron/+spec/push-notifications
|
||||
|
||||
The current method we use to get information from the server to the agent
|
||||
is driven by notification and error-triggered calls to the server by the agent.
|
||||
So during normal operation, the server will send out a notification that a
|
||||
specific object has changed (e.g. a port) and then the agent will respond
|
||||
to that by querying the server for information about that port. If the
|
||||
agent encounters a failure while processing changes, it will start over
|
||||
and re-query the server in the process.
|
||||
|
||||
The load on the server from this agent-driven approach can be very
|
||||
unpredictable depending on the changes to object states on the neutron
|
||||
server. For example, a single network update will result in a query from
|
||||
every L2 agent with a port on that network.
|
||||
|
||||
This blueprint aims to change the pattern we use to get information to the
|
||||
agents to primarily be based on pushing the object state out in the change
|
||||
notifications. For anything not changed to leverage this method of retrieval
|
||||
(e.g. initial agent startup still needs to poll), the AMQP timeout handling
|
||||
will be fixed to ensure it has an exponential back-off to prevent the agents
|
||||
from stampeding the server.
|
||||
|
||||
|
||||
Problem Description
|
||||
===================
|
||||
|
||||
An outage of a few agents and their recovery can lead to all of the agents
|
||||
drowning the neutron servers with requests. This can cause the neutron servers
|
||||
to fail to respond in time, which results in more retry requests building up,
|
||||
leaving the entire system useless until operator intervention.
|
||||
|
||||
This is caused by 3 problems:
|
||||
|
||||
* We don't make optimal use of server notifications. There are times when
|
||||
the server will send a notification to an agent to inform it that something
|
||||
has changed. Then the agent has to make a call back to the server to get the
|
||||
relevant details. This means a single L3 rescheduling event of a set of
|
||||
routers due to a failed L3 agent can result in N more calls to the server
|
||||
where N is the number of routers. Compounding this issue, a single agent
|
||||
may make multiple calls to the server for a single operation (e.g. the L2
|
||||
agent will make one call for port info, and then another for security group
|
||||
info).
|
||||
|
||||
* The agents will give up after a short period of time on a request and retry
|
||||
the request or issue an even more expensive request (e.g. if synchronizing
|
||||
info for one item fails, a major issue is assumed so a request to sync all
|
||||
items is issued). So by the time the server finishes fulfilling a request,
|
||||
the client is no longer waiting for the response so it goes in the
|
||||
trash. As this compounds, it leaves the server processing a massive queue
|
||||
of requests that won't even have listeners for the responses.
|
||||
|
||||
* Related to the second item is the fact that the agents are aggressive in
|
||||
their retry mechanisms. If a request times out, that request is immediately
|
||||
retried with the same timeout value; that is, they have no back-off
|
||||
mechanism. (This has now been addressed by
|
||||
https://review.openstack.org/#/c/280595/ which adds backoff,
|
||||
sleep, and jitter.)
|
||||
|
||||
|
||||
Proposed Change
|
||||
===============
|
||||
|
||||
Eliminate expensive cases where calls are made to the neutron server in
|
||||
response to a notification generated by the server. In most of these cases
|
||||
where the agent is just asking for regular neutron objects
|
||||
(e.g. ports, networks), we can leverage the RPC callbacks mechanism
|
||||
introduced in Liberty[1] to have the server send the entire changed object
|
||||
as part of the notification so the agent has the information it needs.
|
||||
|
||||
The main targets for this will be the security group info call,
|
||||
the get_device_details call, and the sync_routers call. Others will be
|
||||
included if the change is trivial once these three are done.
|
||||
The DHCP agent already relies on push notifications, so it will just
|
||||
be updated to use the revision number to detect the out of order events
|
||||
it's susceptible to now.
|
||||
|
||||
For the remaining calls that cannot easily be converted into the callbacks
|
||||
mechanism (e.g. the security groups call which blends several objects,
|
||||
the initial synchronization mechanism, and agent-generated calls), a nicer
|
||||
timeout mechanism will be implemented with an exponential back-off and timeout
|
||||
increase so a heavily loaded server is not continuously hammered to death.
|
||||
|
||||
|
||||
Changes to RPC callback mechanism
|
||||
---------------------------------
|
||||
|
||||
The current issue with the RPC callback mechanism and sending objects as
|
||||
notifications is a lack of server operation ordering guarantees and
|
||||
AMQP message ordering guarantees.
|
||||
|
||||
To illustrate the first issue, examine the following order of events that
|
||||
happen when two servers update the same port:
|
||||
|
||||
* Server 1 commits update to DB
|
||||
* Server 2 commits update to DB
|
||||
* Server 2 sends notification
|
||||
* Server 1 sends notification
|
||||
|
||||
If the agent receives the notifications in the order in which they are
|
||||
delivered to AMQP, it will think the state delivered by Server 1 is the
|
||||
current state when it is actually the state committed by Server 2.
|
||||
|
||||
We also have the same issue when oslo messaging doesn't guarantee
|
||||
message order (e.g. ZeroMQ). Even if Server 1 sends immediately after
|
||||
its commit and before Server 2 commits and sends, one or more of the
|
||||
agents could end up seeing Server 2's message before Server 1's.
|
||||
|
||||
To handle this, we will add a revision number, implemented as a monotonic
|
||||
counter, to each object. This counter will be incremented on any update
|
||||
so any agent can immediately identify stale messages.
|
||||
|
||||
To address deletes arriving before updates, agents will be expected
|
||||
to keep a set of the UUIDs that have been deleted. Upon receiving an update,
|
||||
the agent will check this set for the object's UUID and ignore the update
|
||||
since deletes are permanent and UUIDs cannot be re-used. If we do make IDs
|
||||
recyclable in the future, this can be replaced with a strategy to confirm
|
||||
ID existence with the server or we can add another internal UUID that
|
||||
cannot be specified.
|
||||
|
||||
Note that this doesn't guaruntee message ordering for the agent because
|
||||
that is a property of the message backend, but it does give the agent the
|
||||
necessary info to re-order messages when it receives them so they can
|
||||
determine which one reflects the more recent state of the DB.
|
||||
|
||||
|
||||
Data Model Impact
|
||||
-----------------
|
||||
|
||||
A 'revision_number' column will be added to the standard attr table. This
|
||||
column will just be a simple big integer used as monotonic counter that
|
||||
will be updated whenever the object is updated on the neutron server.
|
||||
This revision number can then be used by the agents to automatically
|
||||
discard any object states that are older than the state they already have.
|
||||
|
||||
This revision_number will use the version counter feature which is built-in
|
||||
to SQLAlchemy: http://docs.sqlalchemy.org/en/latest/orm/versioning.html
|
||||
Each time an object is updated, the server will perform a compare-and-swap
|
||||
operation based on the revision number. This ensures that each update must
|
||||
start with the current revision number or it will fail with a StaleDataError.
|
||||
The API layer can catch this error with the current DB retry mechanism and
|
||||
start over with the latest revision number.
|
||||
|
||||
While SQLAlchemy will automatically bump the revision for us when the record
|
||||
for an object is updated (e.g. a standard attr description field), it will
|
||||
not update it if it's a related object changing (e.g. adding an IP address
|
||||
to the port or changing its status). So we will have to manually trigger
|
||||
the revision bump (either via a PRECOMMIT callback or inline code) for
|
||||
any operations that we want to trigger the revision number bump.
|
||||
|
||||
What this guarantees:
|
||||
|
||||
- An object in a notification is newer (from a DB state perspective) than
|
||||
an object with a lower revision number. So any objects with lower revision
|
||||
numbers can safely be ignored since they represent stale DB state.
|
||||
|
||||
What this doesn't guarantee:
|
||||
|
||||
- Message ordering 'on the wire'. An AMQP listener may end up receiving an
|
||||
older state than a message it has already received. It's up to the listener
|
||||
to look at the revision number to determine if the message is stale.
|
||||
- That each intermediary state is transmitted. If a notification mechanism
|
||||
reads the DB to get the full object to send, the DB state may have progressed
|
||||
so it will notify with the latest state than the state that triggered the
|
||||
original notification. This is acceptable for all of our use cases since we
|
||||
only care about the current state of the object to wire up the dataplane. It
|
||||
is also effectively what we have now since the DB state could change between
|
||||
when the agent gets a notification and when it actually asks the server for
|
||||
details.
|
||||
- Reliability of the notifications themselves. This doesn't address the issue
|
||||
we currently have where a dropped notification is not detected.
|
||||
|
||||
|
||||
Notifications Impact
|
||||
--------------------
|
||||
|
||||
Making existing notifications significantly more data-rich. The hope here is
|
||||
to eliminate many of the expensive RPC calls that each agent makes and have
|
||||
each agent derive all state from notifications with one sync method for
|
||||
recovery/initialization that we can focus on optimizing.
|
||||
|
||||
This will result in more data being sent up front by the server to the
|
||||
messaging layer, but it will eliminate the data that would be sent in
|
||||
response to a call request from the agent in the current pattern. For a
|
||||
single agent, the only gain is the elimination of the notification and
|
||||
call messages; but for multiple agents interested in the same resource,
|
||||
it eliminates extra DB calls and extra messages from the server to fulfill
|
||||
those calls.
|
||||
|
||||
This pattern will result in fewer messages sent to oslo messaging because
|
||||
of the elimination of the calls from the agents that would result in the
|
||||
same payload we are preemptively broadcasting once instead of casting
|
||||
multiple times to each requesting agent.
|
||||
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Higher ratio of neutron agents per server afforded by a large reduction in
|
||||
sporadic queries by the agents.
|
||||
|
||||
This comes at a cost of effectively serializing operations on an individual
|
||||
object due to the compare and swap operation on the server. For example,
|
||||
if two server threads try to update a single object concurrently and both
|
||||
read the current state of the object at the same time, one will fail on
|
||||
commit with a StaleDataError which will be retried by the API layer.
|
||||
Previously both of these would succeed because the UPDATE statement
|
||||
would have no compare-and-swap WHERE criteria. However, this is a very
|
||||
reasonable performance cost to pay considering that concurrent updates to
|
||||
the same API object are not common.
|
||||
|
||||
|
||||
Other Deployer Impact
|
||||
---------------------
|
||||
|
||||
N/A - upgrade path will maintain normal N-1 backward compatibility on the
|
||||
server so all of the current RPC endpoints will be left untouched for one
|
||||
cycle.
|
||||
|
||||
|
||||
Developer Impact
|
||||
----------------
|
||||
|
||||
Need to change development guidelines to avoid the implementation of new
|
||||
direct server calls.
|
||||
|
||||
The notifications will have to send out oslo versioned objects since
|
||||
notifications don't have RPC versions. So at a minimum we need to
|
||||
switch to oslo versioned objects in the notification code if we
|
||||
can't get them fully implemented everywhere else. To do this we
|
||||
can leverage the RPC callbacks mechanism.
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Maintain the current information retrieval pattern and just adjust the timeout
|
||||
mechanism for everything to include back-offs or use cast/cast instead of
|
||||
calls. This will allow a system to automatically recover from self-induced
|
||||
death by stampede, but it will not make the performance any more predictable.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
kevinbenton
|
||||
Ihar Hrachyshka
|
||||
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Exponential back-off for timeouts on agents
|
||||
* Implement 'revision' extension to add the revision_number column to the
|
||||
data-model and expose it as a standard attribute.
|
||||
* Write tests to ensure revisions are incremented as expected
|
||||
* Write (at least one) test that verifies a StaleDataError is triggered
|
||||
in the event of concurrent updates.
|
||||
* Update DHCP agent to make use of this new 'revision' field to discard stale
|
||||
updates. This will be used as the proof of concept for this approach since
|
||||
the DHCP agent is currently exposed to operating on stale data with out of
|
||||
order messages.
|
||||
* Replace the use of sync_routers calls on the L3 agents for the most frequent
|
||||
operations (e.g. floating IP associations, etc) with RPC callbacks once the
|
||||
OVO work allows it.
|
||||
* Stand up grenade partial job to make sure agents using different OVO versions
|
||||
maintain N-1 compatibility
|
||||
* Update devref for callbacks
|
||||
|
||||
|
||||
Possible Future Work
|
||||
--------------------
|
||||
* Switch to cast/cast pattern so agent isn't blocked waiting on server
|
||||
* Setup a periodic system based on these revision numbers to have the agents
|
||||
figure out if they have lost updates from the server. (e.g. periodic
|
||||
broadcasts of revision numbers and UUIDs, sums of collections of revisions,
|
||||
etc.).
|
||||
* Add an 'RPC pain multiplier' option that just causes all calls to the
|
||||
neutron server to be duplicated X number of times. That way we can set
|
||||
it to something like 200 for the gate which will force us to make every
|
||||
call reasonably performant.
|
||||
* Allow the HTTP API to perform compare and swap updates by placing an if-match
|
||||
header with the revision number, which would cause the update to fail if
|
||||
the version changed.
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
* The grenade partial job will be important to ensure we maintain our N-1
|
||||
backward compatibility with agents from the previous release.
|
||||
* API tests will be added to ensure the basic operation of the revision numbers
|
||||
* Functional and unit tests to test the agent reactions to payloads
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
|
||||
User Documentation
|
||||
------------------
|
||||
|
||||
N/A
|
||||
|
||||
|
||||
Developer Documentation
|
||||
-----------------------
|
||||
|
||||
Devref guidelines on the pattern for getting information to agents and what
|
||||
the acceptability criteria are for calls to the server.
|
||||
|
||||
RPC callbacks devref will need to be updated with notification strategy.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
1. http://git.openstack.org/cgit/openstack/neutron/tree/doc/source/devref/rpc_callbacks.rst
|
||||
2. https://www.rabbitmq.com/semantics.html#ordering
|
Loading…
Reference in New Issue