Added overview content in Readme.md

Change-Id: I07f36114cab65ec99c213362b557427e3f017b1f
This commit is contained in:
Patrick Petit 2015-10-07 14:11:09 +02:00
parent 1c43773cd6
commit 1b0990bf32
1 changed files with 56 additions and 13 deletions

View File

@ -4,18 +4,58 @@ Logging, Monitoring and Alerting (LMA) Collector Plugin for Fuel
Overview
--------
The Logging, Monitoring & Alerting (LMA) *Collector* is a kind of advanced
monitoring agent that should be installed on each of the OpenStack nodes
you want to monitor.
The Collector is a key component of the LMA Toolchain since it is
individually responsible for supporting all the sensing, measurement,
collection, analysis and computation functions for the node it is running on.
The Logging, Monitoring & Alerting (LMA) collector is a service running on each
OpenStack node that collects logs, OpenStack notifications and metrics. It is
also able to detect anomalous events and generate alerts to external monitoring
systems.
A wealth of operational data are collected from a variety of sources including
the log files, collectd and RabbitMQ for the OpenStack notifications.
The Collector, which runs on the active controller of the control plane cluster, is
called the *Aggregator* because it performs additional aggregation and
multivariate correlation functions to compute service healthiness metrics at
the cluster level.
An important function of the Collector is to sanitize and transform the ingested
raw operational data into internal messages which uses the Heka
message structure. This structure is used to match, filter and route certain
types of messages to plugins written in Lua which perform the analysis and
computation functions of the toolchain.
* Logs and notifications are sent to an Elasticsearch server for diagnostic,
troubleshooting and alerting purposes.
* Metrics are sent to an InfluxDB server for usage and performance analysis as
well as alerting purposes.
* Alerts are sent to a Nagios server or directly to a SMTP server.
Its main building blocks are:
* collectd which is bundled with a collection of standard and purpose-built
plugins for OpenStack.
* Heka which is the swiss army knife we use for data processing.
* A collection of Heka plugins written in Lua.
There are three types of Lua plugins running in the LMA Collector / Aggregator:
* The input plugins to collect, decode, and sanitize the operational data that
are transformed into internal messages which in turn are injected into the
Heka pipeline.
* The filter plugins to execute the alarms, the anomaly detection logic
and the correlation functions.
* The output plugins to encode and transmit the messages to external systems like
Elasticsearch, InfluxDB or Nagios where the information is persisted or further processed.
The output of the Collector / Aggregator is of four kinds:
* The logs and notifications that are sent to Elasticsearch for indexing.
Elasticsearch combined with Kibana provides an insightful log analytic dashboards.
* The metrics which are sent to InfluxDB.
InfluxDB combined with Grafana provides insightful time-series analytic dashboards.
* The health status checks that are sent to Nagios (or through SMTP) for all the OpenStack
services and clusters of nodes.
* The annotation messages that are sent to InfluxDB. The annotation messages contain
information about what caused a cluster of services or a cluster of nodes to change a state.
The annotation messages provide root cause analysis hints whenever possible.
The annotation messages are also used to construct the alert notifications sent via SMTP.
Please check the [LMA Collector Plugin for Fuel
](http://fuel-plugin-lma-collector.readthedocs.org/en/latest/index.html)
documentation for additional details.
Requirements
------------
@ -25,7 +65,7 @@ Requirements
| -------------------------------------------------------- | --------------------------------------------------------------- |
| Mirantis OpenStack compatility | 6.1 or higher |
| A running Elasticsearch server<br>(for log analytics) | 1.4 or higher, the RESTful API must be enabled over port 9200 |
| A running InfluxDB server<br>(for metric analytics) | 0.9.2 or higher, the RESTful API must be enabled over port 8086 |
| A running InfluxDB server<br>(for metric analytics) | 0.9.4 or higher, the RESTful API must be enabled over port 8086 |
| A running Nagios server<br>(for infrastructure alerting) | 3.5 or higher, the command CGI must be enabled |
@ -197,13 +237,15 @@ Release Notes
**0.8.0**
* Support for alerting with 2 modes:
* Support for alerting in two different modes:
* Email notifications.
* Integration with Nagios.
* Support of InfluxDB 0.9.2 and higher.
* Upgrade to InfluxDB 0.9.4.
* Upgrade to Grafana 2.1
* Management of the LMA collector service by Pacemaker on the controller nodes
for improved reliability.
* Monitoring of the LMA toolchain components.
* Monitoring of the LMA toolchain components (self-monitoring).
* Support for configurable alarm rules in the Collector.
**0.7.0**
@ -250,5 +292,6 @@ Contributors
------------
* Guillaume Thouvenin <gthouvenin@mirantis.com>
* Patrick Petit <ppetit@mirantis.com>
* Simon Pasquier <spasquier@mirantis.com>
* Swann Croiset <scroiset@mirantis.com>