Unlike OpenStack, there is no capital 'S' in Elasticsearch. Change-Id: I6bd00983d2677a57c0ea080b2fd8226cef56f88f
4.5 KiB
Fuel plugin for the Logging, Monitoring and Alerting collector
https://blueprints.launchpad.net/fuel/+spec/lma-collector-plugin
The LMA (Logging, Monitoring & Alerting) collector is a service running on each OpenStack node that collects metrics, logs and notifications. This data can be sent to Elasticsearch1 and/or InfluxDB2 backends for diagnostic, troubleshooting and alerting purposes.
Problem description
There is currently no comprehensive set of tools integrated with Fuel for monitoring, diagnosing and troubleshooting the deployed OpenStack environments.
The LMA collector aims at addressing the following use cases:
- Send logs and notifications to Elasticsearch so operators can more easily troubleshoot issues.
- Send metrics to InfluxDB so operators can monitor and diagnose the
usage of resources. This will cover:
- Operating system metrics (CPU, RAM, ...).
- Service metrics (MySQL, RabbitMQ, ...).
- OpenStack metrics (for instance, the number of free/used vCPUs).
- Metrics extracted from logs and notifications (for instance, the HTTP response times).
Proposed change
Implement a Fuel plugin that will install and configure the LMA collector service on all the OpenStack nodes.
The LMA collector service is based on 2 open source tools:
- collectd3 for collecting the system and service metrics.
- Heka4 for collecting the logs and notifications and for sending the data to the storage backends.
Alternatives
It might have been implemented as part of Fuel core but we decided to make it as a plugin for several reasons:
- This isn't something that all operators may want to deploy.
- Any new additional functionality makes the project's testing more difficult, which is an additional risk for the Fuel release.
- Ideally, this effort may be of interest for non-Fuel based deployments, too.
We could also have leveraged the Zabbix implementation already available since Fuel 5.1 but Zabbix doesn't cover the same use cases:
- It isn't a log management solution.
- It isn't particularly suited for storing timeseries.
Data model impact
None
REST API impact
None
Upgrade impact
None
Security impact
None
Notifications impact
None
Other end user impact
None
Performance Impact
Since the collector service runs as a daemon on all the nodes, it will consume resources from the nodes. However the components it is built upon have a small footprint both in terms of CPU usage and memory (collectd is written in C while Heka is written in Go).
Other deployer impact
The deployer will have to run an Elasticsearch cluster and/or an InfluxDB cluster to store the collected data. Eventually, these requirements will be addressed by additional Fuel plugins once the custom role feature5 gets available.
Developer impact
None
Implementation
Assignee(s)
- Primary assignee:
-
Simon Pasquier <spasquier@mirantis.com> (feature lead, developer)
- Other contributors:
-
Guillaume Thouvenin <gthouvenin@mirantis.com> (developer) Swann Croiset <scroiset@mirantis.com> (developer) Irina Povolotskaya <ipovolotskaya@mirantis.com> (tech writer)
Work Items
- Implement the Fuel plugin.
- Implement the Puppet manifests.
- Testing.
- Write the documentation.
Dependencies
- Fuel 6.0 and higher.
Testing
- Prepare a test plan.
- Test the plugin by deploying environments with all Fuel deployment modes.
- Create integration tests with Elasticsearch and InfluxDB backends.
Documentation Impact
- Deployment Guide (how to install the storage backends, how to prepare an environment for installation, how to install the plugin, how to deploy an OpenStack environment with the plugin).
- User Guide (which features the plugin provides, how to use them in the deployed OpenStack environment).
- Test Plan.
- Test Report.