Merge "highly distributed coordinated notifications"
This commit is contained in:
commit
3a91dd6c91
182
specs/liberty/distributed-coordinated-notifications.rst
Normal file
182
specs/liberty/distributed-coordinated-notifications.rst
Normal file
@ -0,0 +1,182 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
============================================
|
||||
Highly Distributed Coordinated Notifications
|
||||
============================================
|
||||
|
||||
https://blueprints.launchpad.net/ceilometer/+spec/disributed-coordinated-notifications
|
||||
|
||||
In Kilo, support for coordinated notifications agents was added. This enabled
|
||||
users to deploy multiple notification agents and ensured related messages
|
||||
within a pipeline were funnelled into the same agent to allow for proper
|
||||
aggregation calculations.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
In the initial implementation, all the data relating to a pipeline was
|
||||
funneled into a single queue per pipeline. While it ensured all corresponding
|
||||
data is sent to the same place, it removed the ability to scale horizontally
|
||||
as the pipeline queues can only have a single consumer listening to it.
|
||||
This means that while multiple agents/handlers could be used to pull data off
|
||||
main OpenStack queue, once the data reached pipeline processing, it was
|
||||
relegated to a single worker.
|
||||
|
||||
Data can be handled in parallel even at the pipeline level. For example, when
|
||||
there are no transformers, datapoints do not need to be handled sequentially.
|
||||
Additionally, when transformers are present, datapoints of different resources
|
||||
have no relevance to each other and can be handled in parallel.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
To parallelise and scale out processing, we will create multiple copies of
|
||||
each pipeline. When a datapoint arrives, we will bucketise each datapoint by
|
||||
a hashed grouping key. Each transfromer will have a grouping key assigned to it
|
||||
to note any dependency requirements (ie. transformers that work on resource
|
||||
ids). When setting up pipeline, all the keys of transformers in the pipeline
|
||||
will be combined to ensure that related datapoints will be consistently sent
|
||||
to the same pipeline for processing.
|
||||
|
||||
The basic workflow is as follows::
|
||||
|
||||
* on notification agent startup, create a listener for main queue
|
||||
* for each pipeline definition, we create x queues and x listeners where x
|
||||
corresponds to the number of notification agents registered to group
|
||||
* when a datapoint is received, agent builds sample.
|
||||
* after sample is built, we hash fields defined by transformer requirements
|
||||
and mod by number of agents.
|
||||
* using mod value we push datapoint to corresponding pipeline queue.
|
||||
* same processing steps here on out (listener grabs data -> pipeline -> pub)
|
||||
|
||||
This solution CANNOT handle multiple grouping_keys in pipeline. To properly
|
||||
handle multiple grouping_keys in a pipeline we need to requeue after each
|
||||
transform. The logic would become: main queue -> build sample ->
|
||||
pipe1.transform1 queue -> pipe1.transform2 queue -> etc -> publish.
|
||||
|
||||
Studying the existing transformers we have::
|
||||
|
||||
* Accumulator - this does not really have any grouping requirements, it just
|
||||
batches samples.
|
||||
* Arithmetic - this is grouped by resource_id
|
||||
* RateOfChange - this is grouped by name+resource_id. but it can more
|
||||
be more generally grouped by just resource_id
|
||||
* Aggregator - this is grouped by name+resource_id+<custom> but can also be
|
||||
more generally grouped by just resource_id
|
||||
|
||||
Based on the above, it seems like resource_id is always a valid general
|
||||
grouping key and the transformers may do more granular groupings themselves.
|
||||
Because of this, it seems safe to assume we don't need to support multiple
|
||||
grouping keys (for now).
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
1. Dumb easy fix is to detect whether a pipeline has transformers. If not, it
|
||||
can be consumed by any number of consumers and thus we can assign multiple
|
||||
listeners to those queues. This doesn't help distribute load of pipelines
|
||||
with transformers but allows for transformers spanning resources.
|
||||
|
||||
2. We implement a shared memory/storage mechanism so any worker can discover
|
||||
the historical context. This is hard. I feel like i would end up recreating
|
||||
Storm/Spark
|
||||
|
||||
3. Magic
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Pipeline impact
|
||||
---------------
|
||||
|
||||
Nothing from user point of view. Internally, we will have more pipeline
|
||||
queues.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None. Unless we decide to add option to define number of copies of pipeline
|
||||
queues.
|
||||
|
||||
Performance/Scalability Impacts
|
||||
-------------------------------
|
||||
|
||||
Positive. It will distribute pipeline processing when running in coordinated
|
||||
mode. Message queues are consistently used with thousands of queues so
|
||||
creating copies of pipeline queues (a relatively small set) should not be an
|
||||
issue.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
chungg
|
||||
|
||||
Ongoing maintainer:
|
||||
chungg
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* add grouping key to each transformer to define grouping of datapoints
|
||||
* will be just resource_id
|
||||
* add support to build pipeline hashing from above grouping keys
|
||||
* add functionality to create pipelines queues per agent and distribution
|
||||
logic.
|
||||
|
||||
|
||||
Future lifecycle
|
||||
================
|
||||
|
||||
Support different grouping keys in a pipeline.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Test already exists. Just need to validate that we create appropriate amount
|
||||
of copies.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
None. Maybe dev docs.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
https://docs.google.com/presentation/d/1QgjDOLRnKDboqP8P1LvV0kR5aQEv_VJsDtMlh6u7tIY/edit?usp=sharing
|
Loading…
Reference in New Issue
Block a user