Changed the title of the document to reflect the fact this document is not only about the Collector but the LMA toolchain as a whole.
Improved significantly the wording of the introduction. Change-Id: I4f9c3dc344920ba59b9a765dd913af0183d5aab8
This commit is contained in:
parent
9e6fd8a570
commit
e7545b7ad3
|
@ -1,56 +1,75 @@
|
||||||
===========================================
|
===============================================================
|
||||||
Welcome to the LMA Collector Documentation!
|
Welcome to the Mirantis OpenStack LMA Toolchain Documentation !
|
||||||
===========================================
|
===============================================================
|
||||||
|
|
||||||
The Logging, Monitoring and Alerting (LMA) Collector, that we will refer hereafter as the LMA Collector or just the Collector,
|
Introduction
|
||||||
is a **Fuel plugin** which gathers raw operational data from a variety of sources including log messages,
|
============
|
||||||
`collectd <https://collectd.org/>`_, and the `OpenStack notifications <https://wiki.openstack.org/wiki/SystemUsageData>`_
|
|
||||||
to be sent to external systems that will take action on them.
|
|
||||||
|
|
||||||
Overview
|
The Mirantis OpenStack LMA (Logging, Monitoring and Alerting) Toolchain is comprised
|
||||||
=========
|
of a collection of open-source tools to help you monitor and diagnose problems in your
|
||||||
|
OpenStack environment. These tools are packaged and delivered as `Fuel plugins
|
||||||
|
<https://wiki.openstack.org/wiki/Fuel/Plugins>`_ you can install from within the
|
||||||
|
graphic user interface of Fuel starting with Mirantis OpenStack version 6.1.
|
||||||
|
|
||||||
The goal of the LMA Collector is to capture all **raw operational data** that we think are relevant to **increase the operational visibility**
|
From a high level view, the LMA Toolchain includes:
|
||||||
of your OpenStack cloud.
|
|
||||||
|
|
||||||
To achieve that goal, the raw operational data are parsed and sanitised to be turned into an internal
|
* The LMA Collector (or just the Collector) to gather all operational data that we
|
||||||
`Heka <https://github.com/mozilla-services/heka>`_ message representation that can
|
think are relevant to increase the **operational visibility** over your OpenStack
|
||||||
be further processed and routed to external systems that will take action on them.
|
environment. Those data are collected from a variety of sources including the log messages,
|
||||||
Examples of external systems handled by the LMA Collector out-of-the-box include:
|
`collectd <https://collectd.org/>`_, and the `OpenStack notifications bus <https://wiki.openstack.org/wiki/SystemUsageData>`_
|
||||||
|
* Pluggable external systems we call **satellite clusters** which can take action on the
|
||||||
|
data received from the Collectors running on the OpenStack nodes.
|
||||||
|
|
||||||
* `ElasticSearch <http://www.elasticsearch.org/>`_, a powerful open source search server based on Lucene and analytics
|
The Collector is best described as a **pluggable message processing and routing pipeline**.
|
||||||
engine that makes data like log messages and notifications easy to explore and correlate.
|
Its core components are :
|
||||||
* `InfluxDB <http://influxdb.com/>`_, an open-source and distributed time-series database to store system metrics.
|
|
||||||
|
|
||||||
By combining the Collector with ElasticSearch and `Kibana <http://www.elasticsearch.org/overview/kibana/>`_,
|
* Collectd that is bundled with a collection of monitoring plugins. Many of them are purpose-built
|
||||||
the LMA Toolchain provides an end-to-end solution that delivers real-time insights about all events in your OpenStack cloud.
|
for OpenStack.
|
||||||
This can very useful to detect errors and search for their root cause.
|
* `Heka <https://github.com/mozilla-services/heka>`_ which is the cornerstone component
|
||||||
|
of the Collector.
|
||||||
|
* A collection of Heka plugins written in Lua to decode, process and encode the data to be sent
|
||||||
|
to external systems.
|
||||||
|
|
||||||
Likewise, combining the Collector with InfluxDB and its `Grafana’s <http://grafana.org/>`_ metrics analytics front-end,
|
The primary function of the Collector is to transform the acquired raw
|
||||||
allows you to identify service failures, troubleshoot performance bottlenecks and plan the capacity needed to meet changing demands
|
operational data into an internal message representation that is based on the
|
||||||
for your OpenStack cloud.
|
`Heka message structure <http://hekad.readthedocs.org/en/latest/message/index.html>`_.
|
||||||
|
that can be further exploited to, for example, detect anomalies or create
|
||||||
|
new metric messages.
|
||||||
|
|
||||||
The LMA Collector can be viewed as a **pluggable processing and routing pipeline** for operational data.
|
The satellite clusters delivered as part of the LMA Toolchain starting with Mirantis OpenStack 6.1 include:
|
||||||
Its core constituants are :
|
|
||||||
|
|
||||||
* Collectd that is provided with a large collection of service checks and system stats plugins
|
* `ElasticSearch <http://www.elasticsearch.org/>`_, a powerful open source search server based
|
||||||
* Heka is an open-source stream processing software written in Go developed by Mozilla.
|
on Lucene and analytics engine that makes data like log messages and notifications easy to explore and analyse.
|
||||||
Heka is the cornerstone component of the LMA Collector.
|
* `InfluxDB <http://influxdb.com/>`_, an open-source and distributed time-series database to store and search metrics.
|
||||||
* A collection of Heka plugins written in Lua to turn the raw operational data into structured
|
|
||||||
messages that can be further analyzed and routed by other Heka plugins.
|
|
||||||
|
|
||||||
Lastly, the LMA Collector is designed to be both insightful and adaptable to your own specific environment.
|
By combining ElasticSearch with `Kibana <http://www.elasticsearch.org/overview/kibana/>`_,
|
||||||
|
the LMA Toolchain provides an effective way to search and correlate all service-affecting events
|
||||||
|
that occurred in the system for root cause analysis.
|
||||||
|
|
||||||
For example, thanks to Heka's extensibility, it is quite easy to plug an external monitoring system like Nagios into the LMA Collector.
|
Likewise, by combining InfluxDB with `Grafana <http://grafana.org/>`_, the LMA Toolchain
|
||||||
This is simply done through enabling the Nagios output plugin and define the appropriate
|
brings you insightful metrics analytics to visualise how OpenStack behaves over time.
|
||||||
`message matcher <https://hekad.readthedocs.org/en/v0.9.0/message_matcher.html#message-matcher>`_ criteria
|
This includes metrics for the OpenStack services status and a variety of resource usage
|
||||||
for the category of messages you want to send out to Nagios. You should obviously not do that through hacking the
|
and performance indicators. The ability to visualise time-series over a period of time that
|
||||||
configuration of the nodes running production but through modifying and reapplying the Puppet manifests that shipped with the Fuel plugin.
|
can vary from 5 minutes to the last 30 days helps anticipating failure conditions and plan
|
||||||
We also encourage you to read the Heka `documentation <https://hekad.readthedocs.org/en/v0.9.0/index.html>`_ to get familiar with the technology.
|
capacity ahead of time to cope with a changing demand.
|
||||||
|
|
||||||
The rest of this documents is organised in several chapters that will take you through a description of the internal message
|
Furthermore, the LMA Toolchain has been designed with the dual objective to be both insightful and adaptive.
|
||||||
format used for each category of operational data that are handled by the Collector.
|
|
||||||
|
|
||||||
|
It is, for example, quite possible (without any code change) to integrate the Collector
|
||||||
|
with an external monitoring application like Nagios. This could simply be done through enabling
|
||||||
|
the Nagios output plugin of Heka for a subset of messages matching the
|
||||||
|
`message matcher <https://hekad.readthedocs.org/en/latest/message_matcher.html#message-matcher>`_
|
||||||
|
syntax of the output plugin. You should probably not modify the configuration of the LMA
|
||||||
|
Collector manually but apply any configuration change to the Puppet manifests that are shipped
|
||||||
|
with the LMA Collector plugin for Fuel. Many other integration combinations are possible thanks
|
||||||
|
to the extreme flexibility of Heka.
|
||||||
|
|
||||||
|
We recommend you to read the Heka `documentation <https://hekad.readthedocs.org/en/latest/index.html>`_
|
||||||
|
to become more familiar with that technology.
|
||||||
|
|
||||||
|
The rest of this document is organised in several chapters that will take you through a
|
||||||
|
description of the internal message structure for the categories of operational data
|
||||||
|
that are handled by the LMA Toolchain.
|
||||||
|
|
||||||
Table of Contents
|
Table of Contents
|
||||||
=================
|
=================
|
||||||
|
|
|
@ -24,7 +24,7 @@ attributes:
|
||||||
|
|
||||||
elasticsearch_node_name:
|
elasticsearch_node_name:
|
||||||
value: 'elasticsearch'
|
value: 'elasticsearch'
|
||||||
label: "ElasticSearch node's name"
|
label: "ElasticSearch node name"
|
||||||
description: 'Label of the node running the ElasticSearch/Kibana plugin that is deployed in the environment.'
|
description: 'Label of the node running the ElasticSearch/Kibana plugin that is deployed in the environment.'
|
||||||
weight: 30
|
weight: 30
|
||||||
type: "text"
|
type: "text"
|
||||||
|
@ -71,7 +71,7 @@ attributes:
|
||||||
|
|
||||||
influxdb_node_name:
|
influxdb_node_name:
|
||||||
value: 'influxdb'
|
value: 'influxdb'
|
||||||
label: "InfluxDB node's name"
|
label: "InfluxDB node name"
|
||||||
description: 'Label of the node running the InfluxDB/Grafana plugin that is deployed in the environment.'
|
description: 'Label of the node running the InfluxDB/Grafana plugin that is deployed in the environment.'
|
||||||
weight: 65
|
weight: 65
|
||||||
type: "text"
|
type: "text"
|
||||||
|
|
Loading…
Reference in New Issue