7.8 KiB

Raw Blame History

Add central agent HA and workload partitioning

https://blueprints.launchpad.net/fuel/+spec/ceilometer-central-agent-ha

Implement Redis installation and using it as a coordination backend for ceilometer central agents

Problem description

A detailed description of the problem:

Currently there are several Ceilometer services which do not support workload partitioning in MOS: central agent, alarm evaluator and agent notification. During Juno release cycle workload partitioning for central agent was implemented. In Kilo, partition coordination was introduced to alarm evaluator and agent notification. In Liberty, coordination for notification agent was further improved. Thus, it should be supported in MOS. For this purpose we should provide tooz library support. This library is responsible for coordination between services and supports several backends: zookeeper, redis, memcached. Redis was chosen as the tooz backend in MOS.

Proposed change

Support for Ceilometer services coordination is an experimental feature and it was decided to implement it as a fuel plugin.

Its implementation requires the following things to be done: * Implement Redis installation on controller nodes in HA mode * Prepare Redis packages and their dependencies * Enable partitioning in config for ceilometer central agents * Enable partitioning in config for ceilometer alarm evaluator * Enable partitioning in config for ceilometer notification agents

Installation diagram for central agent is below. The schemas for alarm-evaluator and notification agent are similar

+---------------------+
|                     |
|  +---------------+  |
|  |  ceilometer   +-------------------------+
|  | central agent |  |                      |
|  +---------------+  |                      |
|                     |                      |
|  Primary controller |                      |
|                     |                      |
|  +---------------+  |                      |
|  |     redis     <------------------------------+
|  |     master    |  |                      |    |
|  +---------------+  |                      |    |
|                     |                      |    |
+---------------------+                      |    |
                                             |    |
+---------------------+                      |    |
|                     |                      |    |
|  +---------------+  |                      |    |
|  |  ceilometer   +-------------------------+    |
|  | central agent |  |                      |    |
|  +---------------+  |                      |    |
|                     |               +------v----+--+
|     controller 1    |               |              |
|                     |               | Coordination |
|  +---------------+  |               |              |
|  |     redis     |  |               +------^----+--+
|  |     slave1    |  |                      |    |
|  |               <------------------------------+
|  +---------------+  |                      |    |
|                     |                      |    |
+---------------------+                      |    |
                                             |    |
+---------------------+                      |    |
|                     |                      |    |
|  +---------------+  |                      |    |
|  |  ceilometer   +-------------------------+    |
|  | central agent |  |                           |
|  +---------------+  |                           |
|                     |                           |
|     controller 2    |                           |
|                     |                           |
|  +---------------+  |                           |
|  |     redis     |  |                           |
|  |     slave2    <------------------------------+
|  |               |  |
|  +---------------+  |
|                     |
+---------------------+

Alternatives

We may use MQ queues for task ditribution between the services. The problem is that MQ is one of the most weak point in OpenStack now and it may be not safe to make it responsible for HA and coordination.

Data model impact

None

REST API impact

None

Upgrade impact

These changes will be needed in puppet scripts:

Add redis module
Configure ceilometer agents to be partitioned

This change will be needed in packages:

Use upstream Redis packages and its dependencies

Security impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

Performance should become better because the same amount of work will be done using several workers

Other deployer impact

This could be installed only in HA mode with ceilometer

Developer impact

None

Implementation

Assignee(s)

Primary assignee:: Ivan Berezovskiy
Other contributors:: Nadya Shakhat, Ilya Tyaptin, Igor Degtiarov
Reviewer:: Vladimir Kuklin Sergii Golovatiuk
QA:: Vitaly Gusev

Work Items

Implement redis installation from puppet (iberezovskiy)
Configure ceilometer central agent (iberezovskiy)
Configure alarm evaluator (Nadya Shakhat)
Configure notification agents (Nadya Shakhat)
Write a documentation (Nadya Shakhat)

Dependencies

None

Testing

General testing approach:

Environment with ceilometer in HA mode should be successfully deployed
Redis cluster should be with one master and two slaves
Ensure that after node with redis master was broken ceilometer services can work with new redis master

Testing approach for central agent:

Ceilometer should collect all enabled polling meters for deployed environment
Ensure that the sets of meters to be polled by each central agent are disjoint
Ensure that after one central agent is broken, during the next polling cycle all measurements will be rescheduled between two another, and all of meters will be collected

Testing approach for alarm evaluator:

Ensure that alarms can be successfully created
Ensure that after one alarm evaluator is broken, during the next alarm evaluation cycle all alarms will be rescheduled between two another for further evaluation and all of alarms will be successfully evaluated
Ensure that the sets of alarms for each alarm evaluator are disjoint

Testing approach for notification agent:

Ensure that messages don't not stuck in notification.info queue
Ensure that IPC queues are created in MQ, chech that list of IPC queues corresponds to pipeline.yaml and each queue has the one consumer
Ensure that after one alarm evaluator was broken, during the next alarm evaluation cycle all alarms will be rescheduled between two another for further evaluation and all of them will be successfully evaluated

Documentation Impact

A Plugin Guide about redis plugin installation should be created. Also, the document about ceilometer HA and partitioning should be done.

For validation and testing purpose, the test plan and test report should be provided.

References

Central agent: https://github.com/openstack/ceilometer-specs/blob/master/specs/juno/central-agent-partitioning.rst
Notification agent: https://github.com/openstack/ceilometer-specs/blob/master/specs/kilo/notification-coordiation.rst
Notification agent cont.: https://github.com/openstack/ceilometer-specs/blob/master/specs/liberty/distributed-coordinated-notifications.rst

7.8 KiB Raw Blame History

Add central agent HA and workload partitioning

Problem description

Proposed change

Alternatives

Data model impact

REST API impact

Upgrade impact

Security impact

Notifications impact

Other end user impact

Performance Impact

Other deployer impact

Developer impact

Implementation

Assignee(s)

Work Items

Dependencies

Testing

Documentation Impact

References

7.8 KiB

Raw Blame History