Merge "highly distributed coordinated notifications"

2015-08-11 09:07:14 +00:00 · 2015-08-11 09:07:14 +00:00 · 3a91dd6c91
commit 3a91dd6c91
parent 917d20718b 6a0fac34c1
1 changed files with 182 additions and 0 deletions
--- a/specs/liberty/distributed-coordinated-notifications.rst
+++ b/specs/liberty/distributed-coordinated-notifications.rst
@ -0,0 +1,182 @@
 ..
 This work is licensed under a Creative Commons Attribution 3.0 Unported
 License.
 http://creativecommons.org/licenses/by/3.0/legalcode
 ============================================
 Highly Distributed Coordinated Notifications
 ============================================
 https://blueprints.launchpad.net/ceilometer/+spec/disributed-coordinated-notifications
 In Kilo, support for coordinated notifications agents was added. This enabled
 users to deploy multiple notification agents and ensured related messages
 within a pipeline were funnelled into the same agent to allow for proper
 aggregation calculations.
 Problem description
 ===================
 In the initial implementation, all the data relating to a pipeline was
 funneled into a single queue per pipeline. While it ensured all corresponding
 data is sent to the same place, it removed the ability to scale horizontally
 as the pipeline queues can only have a single consumer listening to it.
 This means that while multiple agents/handlers could be used to pull data off
 main OpenStack queue, once the data reached pipeline processing, it was
 relegated to a single worker.
 Data can be handled in parallel even at the pipeline level. For example, when
 there are no transformers, datapoints do not need to be handled sequentially.
 Additionally, when transformers are present, datapoints of different resources
 have no relevance to each other and can be handled in parallel.
 Proposed change
 ===============
 To parallelise and scale out processing, we will create multiple copies of
 each pipeline. When a datapoint arrives, we will bucketise each datapoint by
 a hashed grouping key. Each transfromer will have a grouping key assigned to it
 to note any dependency requirements (ie. transformers that work on resource
 ids). When setting up pipeline, all the keys of transformers in the pipeline
 will be combined to ensure that related datapoints will be consistently sent
 to the same pipeline for processing.
 The basic workflow is as follows::
  * on notification agent startup, create a listener for main queue
  * for each pipeline definition, we create x queues and x listeners where x
    corresponds to the number of notification agents registered to group
  * when a datapoint is received, agent builds sample.
  * after sample is built, we hash fields defined by transformer requirements
    and mod by number of agents.
  * using mod value we push datapoint to corresponding pipeline queue.
  * same processing steps here on out (listener grabs data -> pipeline -> pub)
 This solution CANNOT handle multiple grouping_keys in pipeline. To properly
 handle multiple grouping_keys in a pipeline we need to requeue after each
 transform. The logic would become: main queue -> build sample ->
 pipe1.transform1 queue -> pipe1.transform2 queue -> etc -> publish.
 Studying the existing transformers we have::
  * Accumulator - this does not really have any grouping requirements, it just
                  batches samples.
  * Arithmetic - this is grouped by resource_id
  * RateOfChange - this is grouped by name+resource_id. but it can more
                   be more generally grouped by just resource_id
  * Aggregator - this is grouped by name+resource_id+<custom> but can also be
                 more generally grouped by just resource_id
 Based on the above, it seems like resource_id is always a valid general
 grouping key and the transformers may do more granular groupings themselves.
 Because of this, it seems safe to assume we don't need to support multiple
 grouping keys (for now).
 Alternatives
 ------------
 1. Dumb easy fix is to detect whether a pipeline has transformers. If not, it
   can be consumed by any number of consumers and thus we can assign multiple
   listeners to those queues. This doesn't help distribute load of pipelines
   with transformers but allows for transformers spanning resources.
 2. We implement a shared memory/storage mechanism so any worker can discover
   the historical context. This is hard. I feel like i would end up recreating
   Storm/Spark
 3. Magic
 Data model impact
 -----------------
 None
 REST API impact
 ---------------
 None
 Security impact
 ---------------
 None
 Pipeline impact
 ---------------
 Nothing from user point of view. Internally, we will have more pipeline
 queues.
 Other end user impact
 ---------------------
 None. Unless we decide to add option to define number of copies of pipeline
 queues.
 Performance/Scalability Impacts
 -------------------------------
 Positive. It will distribute pipeline processing when running in coordinated
 mode. Message queues are consistently used with thousands of queues so
 creating copies of pipeline queues (a relatively small set) should not be an
 issue.
 Other deployer impact
 ---------------------
 None
 Developer impact
 ----------------
 None
 Implementation
 ==============
 Assignee(s)
 -----------
 Primary assignee:
  chungg
 Ongoing maintainer:
  chungg
 Work Items
 ----------
 * add grouping key to each transformer to define grouping of datapoints
  * will be just resource_id
 * add support to build pipeline hashing from above grouping keys
 * add functionality to create pipelines queues per agent and distribution
  logic.
 Future lifecycle
 ================
 Support different grouping keys in a pipeline.
 Dependencies
 ============
 None
 Testing
 =======
 Test already exists. Just need to validate that we create appropriate amount
 of copies.
 Documentation Impact
 ====================
 None. Maybe dev docs.
 References
 ==========
 https://docs.google.com/presentation/d/1QgjDOLRnKDboqP8P1LvV0kR5aQEv_VJsDtMlh6u7tIY/edit?usp=sharing