fuel-plugin-lma-collector/specs/lma-collector-plugin-spec.rst

4.5 KiB

Fuel plugin for the Logging, Monitoring and Alerting collector

https://blueprints.launchpad.net/fuel/+spec/lma-collector-plugin

The LMA (Logging, Monitoring & Alerting) collector is a service running on each OpenStack node that collects metrics, logs and notifications. This data can be sent to Elasticsearch1 and/or InfluxDB2 backends for diagnostic, troubleshooting and alerting purposes.

Problem description

There is currently no comprehensive set of tools integrated with Fuel for monitoring, diagnosing and troubleshooting the deployed OpenStack environments.

The LMA collector aims at addressing the following use cases:

  • Send logs and notifications to Elasticsearch so operators can more easily troubleshoot issues.
  • Send metrics to InfluxDB so operators can monitor and diagnose the usage of resources. This will cover:
    • Operating system metrics (CPU, RAM, ...).
    • Service metrics (MySQL, RabbitMQ, ...).
    • OpenStack metrics (for instance, the number of free/used vCPUs).
    • Metrics extracted from logs and notifications (for instance, the HTTP response times).

Proposed change

Implement a Fuel plugin that will install and configure the LMA collector service on all the OpenStack nodes.

The LMA collector service is based on 2 open source tools:

  • collectd3 for collecting the system and service metrics.
  • Heka4 for collecting the logs and notifications and for sending the data to the storage backends.

Alternatives

It might have been implemented as part of Fuel core but we decided to make it as a plugin for several reasons:

  • This isn't something that all operators may want to deploy.
  • Any new additional functionality makes the project's testing more difficult, which is an additional risk for the Fuel release.
  • Ideally, this effort may be of interest for non-Fuel based deployments, too.

We could also have leveraged the Zabbix implementation already available since Fuel 5.1 but Zabbix doesn't cover the same use cases:

  • It isn't a log management solution.
  • It isn't particularly suited for storing timeseries.

Data model impact

None

REST API impact

None

Upgrade impact

None

Security impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

Since the collector service runs as a daemon on all the nodes, it will consume resources from the nodes. However the components it is built upon have a small footprint both in terms of CPU usage and memory (collectd is written in C while Heka is written in Go).

Other deployer impact

The deployer will have to run an Elasticsearch cluster and/or an InfluxDB cluster to store the collected data. Eventually, these requirements will be addressed by additional Fuel plugins once the custom role feature5 gets available.

Developer impact

None

Implementation

Assignee(s)

Primary assignee:

Simon Pasquier <spasquier@mirantis.com> (feature lead, developer)

Other contributors:

Guillaume Thouvenin <gthouvenin@mirantis.com> (developer) Swann Croiset <scroiset@mirantis.com> (developer) Irina Povolotskaya <ipovolotskaya@mirantis.com> (tech writer)

Work Items

  • Implement the Fuel plugin.
  • Implement the Puppet manifests.
  • Testing.
  • Write the documentation.

Dependencies

  • Fuel 6.0 and higher.

Testing

  • Prepare a test plan.
  • Test the plugin by deploying environments with all Fuel deployment modes.
  • Create integration tests with Elasticsearch and InfluxDB backends.

Documentation Impact

  • Deployment Guide (how to install the storage backends, how to prepare an environment for installation, how to install the plugin, how to deploy an OpenStack environment with the plugin).
  • User Guide (which features the plugin provides, how to use them in the deployed OpenStack environment).
  • Test Plan.
  • Test Report.

References


  1. http://www.elasticsearch.org/↩︎

  2. http://www.influxdb.com/↩︎

  3. https://www.collectd.org/↩︎

  4. http://hekad.readthedocs.org/↩︎

  5. https://blueprints.launchpad.net/fuel/+spec/role-as-a-plugin↩︎