Simon Pasquier 51f593692f s/ElasticSearch/Elasticsearch/ where appropriate

Unlike OpenStack, there is no capital 'S' in Elasticsearch.

Change-Id: I6bd00983d2677a57c0ea080b2fd8226cef56f88f

2015-04-23 15:50:06 +02:00

4.5 KiB

Raw Blame History

Fuel plugin for the Logging, Monitoring and Alerting collector

https://blueprints.launchpad.net/fuel/+spec/lma-collector-plugin

The LMA (Logging, Monitoring & Alerting) collector is a service running on each OpenStack node that collects metrics, logs and notifications. This data can be sent to Elasticsearch¹ and/or InfluxDB² backends for diagnostic, troubleshooting and alerting purposes.

Problem description

There is currently no comprehensive set of tools integrated with Fuel for monitoring, diagnosing and troubleshooting the deployed OpenStack environments.

The LMA collector aims at addressing the following use cases:

Send logs and notifications to Elasticsearch so operators can more easily troubleshoot issues.
Send metrics to InfluxDB so operators can monitor and diagnose the usage of resources. This will cover:
- Operating system metrics (CPU, RAM, ...).
- Service metrics (MySQL, RabbitMQ, ...).
- OpenStack metrics (for instance, the number of free/used vCPUs).
- Metrics extracted from logs and notifications (for instance, the HTTP response times).

Proposed change

Implement a Fuel plugin that will install and configure the LMA collector service on all the OpenStack nodes.

The LMA collector service is based on 2 open source tools:

collectd³ for collecting the system and service metrics.
Heka⁴ for collecting the logs and notifications and for sending the data to the storage backends.

Alternatives

It might have been implemented as part of Fuel core but we decided to make it as a plugin for several reasons:

This isn't something that all operators may want to deploy.
Any new additional functionality makes the project's testing more difficult, which is an additional risk for the Fuel release.
Ideally, this effort may be of interest for non-Fuel based deployments, too.

We could also have leveraged the Zabbix implementation already available since Fuel 5.1 but Zabbix doesn't cover the same use cases:

It isn't a log management solution.
It isn't particularly suited for storing timeseries.

Data model impact

None

REST API impact

None

Upgrade impact

None

Security impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

Since the collector service runs as a daemon on all the nodes, it will consume resources from the nodes. However the components it is built upon have a small footprint both in terms of CPU usage and memory (collectd is written in C while Heka is written in Go).

Other deployer impact

The deployer will have to run an Elasticsearch cluster and/or an InfluxDB cluster to store the collected data. Eventually, these requirements will be addressed by additional Fuel plugins once the custom role feature⁵ gets available.

Developer impact

None

Implementation

Assignee(s)

Primary assignee:: Simon Pasquier <spasquier@mirantis.com> (feature lead, developer)
Other contributors:: Guillaume Thouvenin <gthouvenin@mirantis.com> (developer) Swann Croiset <scroiset@mirantis.com> (developer) Irina Povolotskaya <ipovolotskaya@mirantis.com> (tech writer)

Work Items

Implement the Fuel plugin.
Implement the Puppet manifests.
Testing.
Write the documentation.

Dependencies

Fuel 6.0 and higher.

Testing

Prepare a test plan.
Test the plugin by deploying environments with all Fuel deployment modes.
Create integration tests with Elasticsearch and InfluxDB backends.

Documentation Impact

Deployment Guide (how to install the storage backends, how to prepare an environment for installation, how to install the plugin, how to deploy an OpenStack environment with the plugin).
User Guide (which features the plugin provides, how to use them in the deployed OpenStack environment).
Test Plan.
Test Report.

4.5 KiB Raw Blame History

Fuel plugin for the Logging, Monitoring and Alerting collector

Problem description

Proposed change

Alternatives

Data model impact

REST API impact

Upgrade impact

Security impact

Notifications impact

Other end user impact

Performance Impact

Other deployer impact

Developer impact

Implementation

Assignee(s)

Work Items

Dependencies

Testing

Documentation Impact

References

4.5 KiB

Raw Blame History