.. This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode ================================================================== Horizontal scaling and work-load partitioning of the Central Agent ================================================================== https://blueprints.launchpad.net/ceilometer/+spec/central-agent-partitioning The central agent's job is polling resources for information, transforming that information into samples and passing the samples on to the Collector Agent. This specification proposes an implementation of coordination between multiple Central Agents, which could then dynamically distribute the workload between them, providing scalability and high availability. Problem description =================== Currently, each Central Agent retrieves a set of resources and polls all of them. If we have multiple Central Agents running, they all poll the same set of resources, which prevents us from scaling out horizontally. At the start of each polling interval, each of the pollsters retrieves a list of resources to poll from its Discovery plugin (configured in the pipeline or a default one). This makes the discovery process a great place to implement the coordination and partitioning logic, while the pollsters themselves can remain in blissful ignorance of anything going on. Proposed change =============== The basic idea is to use the *tooz* [1]_ library for group membership and hashing to assign resources to active Central Agents. * Discovery plugins on different Central Agents that discover the same set of resources would join the same group in *tooz*. * For each polling interval, after independently discovering the global set of resources that need to be polled by one or other of the central agents, they would get a list of all group members (Discovery plugins in other Central Agents that have the same set of resources). * They would use hashing to determine the set of resources assigned to the Discovery's Central Agent. **Determining the resources we're responsible for** We have a list of resources and get a list of active agents from *tooz*. We then get our assigned resources as follows:: our_key = sorted(agents).index(our_agent_uuid) our_resources = [] for resource in resources: key = hash(resource) mod len(agents) if key == our_key: our_resources.append(resource) # or more pythonic our_resources = [r for r in resources if hash(r) mod len(agents) == our_key] In essence we hash the resources to of buckets and only poll the resources that fall into our bucket. A good hash function [3]_ ensures that the resources are evenly distributed to the active Central Agents. **Grouping** The pollster's Discovery plugin (be it a Compute Discovery, Hardware Discovery, etc.) provides the scope its resources are a part of. For example, if a Discovery plugin isn't constrained to a subset of resources, as is the case for most Discovery plugins, then it should simply join the global group of unconstrained Discovery plugins. If, on the other hand, the resources that the Discovery plugin can discover are constrained, like in the case of Compute Discovery, then the group name should reflect their scope. An example of this would be 'compute--discovery'. This way only the pollsters that are polling the same host will share their workload between them. **What happens when we start another agent (or stop an existing one)?** *tooz* allows us to register a callback that is called when a member joins or leaves the group. It keeps track of member liveness using a heartbeat mechanism. When a member joins or leaves the group, this is what happens to: * **The agents already running** If they are currently in the middle of polling, they complete their full polling cycle and only then they re-balance their hash buckets. * **The agent we just started** Joins a group first, but then waits for one polling interval to ensure all the other agents have updated their hash buckets as well, then starts polling. * **The agent that is stopped/crashes mid-cycle** The resources that the stopped agent hasn't polled yet will not be polled in this cycle, but they will be polled from the next one on. **Generalizing the implementation for re-use** The need for coordinated assignment of "things" (resources, alarms, ...) to agents is not unique to the Central Agent. Currently, the Alarm Evaluator could make use of it as well to have multiple Alarm Evaluators running, each evaluating their share of alarms. This functionality could be captured in a *PartitionCoordinator* class, which agents could use like:: partition_coordinator = PartitionCoordinator(group='alarm') partition_coordinator.start() every evaluation_interval: all_alarms = get_all_alarms() my_alarms = partition_coordinator.get_my_subset(all_alarms) for a in my_alarms: evaluate(a) .. note:: The actual change-over of the alarm partitioning coordination to the proposed approach will be tracked in a separate blueprint. or in the case of the central agent:: partition_coordinator = PartitionCoordinator(group='central_agent') partition_coordinator.start() every polling_interval: all_resources = discover_resources() my_resource = partition_coordinator.get_my_subset(all_resources) for r in my_resources: poll(r) Alternatives ------------ * **Fabio Gianetti's approach** [2]_. Fabio's approach uses source<->agent assignments in the database for figuring out what to poll and a heartbeat in combination with additional agents listening for that heartbeat for failure detection. In contrast, this proposal uses *tooz* for failure detection (via heartbeats as well). Additionally, the resource allocation is more dynamic since the resources are assigned to agents evenly at any point in time. It is also more lightweight since we don't need to keep an explicit resource<->agent mapping in the database, but use hashing instead. * **Locking** Another approach would be to use distributed locking provided by *tooz*. Before a pollster would poll a resource, it'd need to acquire its lock. Pollsters contend for the locks and whoever gets the lock, polls the resource. The downside of this approach is the overhead of distributed locking. Acquiring a distributed lock incurs a cost (time, network traffic). When using distributed locks for resource contention, this cost is incurred per-resource. Whereas in the approach with group membership, the coordination cost is incurred only when a member joins/leaves the group, the frequency of which is negligible compared to the amount of resources. Data model impact ----------------- None REST API impact --------------- None Security impact --------------- None Pipeline impact --------------- None Other end user impact --------------------- None Performance/Scalability Impacts ------------------------------- Positive Other deployer impact --------------------- If deployers want to use multiple central agents, they will need to deploy one of the tooz backends (ZooKeeper, memcached, possibly just an AMQP broker soon) Developer impact ---------------- None Implementation ============== Assignee(s) ----------- Primary assignee: nejc-saje Other contributors: chdent Ongoing maintainer: ceilometer team Work Items ---------- Future lifecycle ================ Dependencies ============ * tooz * one of the backends for tooz (ZooKeeper, memcached, possibly just oslo.messaging) Testing ======= The implementation should be tested with unit tests. Documentation Impact ==================== Operator's manual should explain the process and properties of running multiple Central Agents. References ========== .. [1] https://github.com/stackforge/tooz .. [2] https://review.openstack.org/#/c/101282/5 .. [3] http://en.wikipedia.org/wiki/Hash_function#Properties