fuel-plugin-lma-collector/specs/lma-collector-plugin-spec.rst

..
 This work is licensed under a Creative Commons Attribution 3.0 Unported
 License.

 http://creativecommons.org/licenses/by/3.0/legalcode

==============================================================
Fuel plugin for the Logging, Monitoring and Alerting collector
==============================================================

https://blueprints.launchpad.net/fuel/+spec/lma-collector-plugin

The LMA (Logging, Monitoring & Alerting) collector is a service running on each
OpenStack node that collects metrics, logs and notifications. This data can be
sent to Elasticsearch [#]_ and/or InfluxDB [#]_ backends for diagnostic,
troubleshooting and alerting purposes.

Problem description
===================

There is currently no comprehensive set of tools integrated with Fuel for
monitoring, diagnosing and troubleshooting the deployed OpenStack environments.

The LMA collector aims at addressing the following use cases:

* Send logs and notifications to Elasticsearch so operators can more easily
  troubleshoot issues.

* Send metrics to InfluxDB so operators can monitor and diagnose the usage
  of resources. This will cover:

  + Operating system metrics (CPU, RAM, ...).

  + Service metrics (MySQL, RabbitMQ, ...).

  + OpenStack metrics (for instance, the number of free/used vCPUs).

  + Metrics extracted from logs and notifications (for instance, the HTTP
    response times).

Proposed change
===============

Implement a Fuel plugin that will install and configure the LMA collector
service on all the OpenStack nodes.

The LMA collector service is based on 2 open source tools:

* collectd [#]_ for collecting the system and service metrics.

* Heka [#]_ for collecting the logs and notifications and for sending the data
  to the storage backends.

Alternatives
------------

It might have been implemented as part of Fuel core but we decided to make it
as a plugin for several reasons:

* This isn't something that all operators may want to deploy.

* Any new additional functionality makes the project's testing more difficult,
  which is an additional risk for the Fuel release.

* Ideally, this effort may be of interest for non-Fuel based deployments, too.

We could also have leveraged the Zabbix implementation already available since
Fuel 5.1 but Zabbix doesn't cover the same use cases:

* It isn't a log management solution.

* It isn't particularly suited for storing timeseries.


Data model impact
-----------------

None

REST API impact
---------------

None

Upgrade impact
--------------

None

Security impact
---------------

None

Notifications impact
--------------------

None

Other end user impact
---------------------

None

Performance Impact
------------------

Since the collector service runs as a daemon on all the nodes, it will consume
resources from the nodes. However the components it is built upon have a small
footprint both in terms of CPU usage and memory (collectd is written in C while
Heka is written in Go).

Other deployer impact
---------------------

The deployer will have to run an Elasticsearch cluster and/or an InfluxDB
cluster to store the collected data. Eventually, these requirements will be
addressed by additional Fuel plugins once the custom role feature [#]_ gets
available.

Developer impact
----------------

None

Implementation
==============

Assignee(s)
-----------

Primary assignee:
  Simon Pasquier <spasquier@mirantis.com> (feature lead, developer)

Other contributors:
  Guillaume Thouvenin <gthouvenin@mirantis.com> (developer)
  Swann Croiset <scroiset@mirantis.com> (developer)
  Irina Povolotskaya <ipovolotskaya@mirantis.com> (tech writer)


Work Items
----------

* Implement the Fuel plugin.

* Implement the Puppet manifests.

* Testing.

* Write the documentation.

Dependencies
============

* Fuel 6.0 and higher.

Testing
=======

* Prepare a test plan.

* Test the plugin by deploying environments with all Fuel deployment modes.

* Create integration tests with Elasticsearch and InfluxDB backends.

Documentation Impact
====================

* Deployment Guide (how to install the storage backends, how to prepare an
  environment for installation, how to install the plugin, how to deploy an
  OpenStack environment with the plugin).

* User Guide (which features the plugin provides, how to use them in the
  deployed OpenStack environment).

* Test Plan.

* Test Report.

References
==========

.. [#] http://www.elasticsearch.org/

.. [#] http://www.influxdb.com/

.. [#] https://www.collectd.org/

.. [#] http://hekad.readthedocs.org/

.. [#] https://blueprints.launchpad.net/fuel/+spec/role-as-a-plugin