[docs] Edits the StackLight InfluxDB-Grafana plugin guide structure

Edits the table of contents and overall structure of the
StackLight InfluxDB-Grafana plugin for Fuel documentation.

Change-Id: Icc9d73fca27513e8336cca1486bfc83634b5d116
This commit is contained in:
Maria Zlatkova
2016-07-20 17:31:01 +03:00
parent dc782a9106
commit ce43f9b5f6
17 changed files with 668 additions and 639 deletions

View File

@@ -6,7 +6,7 @@ source_suffix = '.rst'
master_doc = 'index'
project = u'The StackLight InfluxDB-Grafana plugin for Fuel'
copyright = u'2015, Mirantis Inc.'
copyright = u'2016, Mirantis Inc.'
version = '0.10'
release = '0.10.0'

View File

@@ -0,0 +1,135 @@
.. _plugin_configuration:
Plugin configuration
--------------------
To configure the **StackLight InfluxDB-Grafana Plugin**, you need to follow these steps:
1. `Create a new environment
<http://docs.openstack.org/developer/fuel-docs/userdocs/fuel-user-guide/create-environment/start-create-env.html>`_.
2. Click on the *Settings* tab of the Fuel web UI and select the *Other* category.
3. Scroll down through the settings until you find the **InfluxDB-Grafana Server
Plugin** section. You should see a page like this:
.. image:: ../images/influx_grafana_settings.png
:width: 800
4. Tick the **InfluxDB-Grafana Plugin** box and fill-in the required fields as indicated below.
a. Specify the number of days of retention for your data.
b. Specify the InfluxDB admin password (called root password in the InfluxDB documentation).
c. Specify the database name (default is lma).
d. Specify the InfluxDB username and password.
e. Specify the Grafana username and password.
5. Since the introduction of Grafana 2.6.0, the plugin now uses a MySQL database
to store its configuration data such as the dashboard templates.
a. Select **Local MySQL** if you want to create the Grafana database using the MySQL server
of the OpenStack control-plane. Otherwise, select **Remote server** and specify
the fully qualified name or IP address of the MySQL server you want to use.
b. Then, specify the MySQL database name, username and password that will be used
to access that database.
6. Tick the *Enable TLS for Grafana* box if you want to encrypt your
Grafana credentials (username, password). Then, fill-in the required
fields as indicated below.
.. image:: ../images/tls_settings.png
:width: 800
a. Specify the DNS name of the Grafana server. This parameter is used
to create a link in the Fuel dashboard to the Grafana server.
#. Specify the location of a PEM file that contains the certificate
and the private key of the Grafana server that will be used in TLS handchecks
with the client.
7. Tick the *Use LDAP for Grafana authentication* box if you want to authenticate
via LDAP to Grafana. Then, fill-in the required fields as indicated below.
.. image:: ../images/ldap_auth.png
:width: 800
a. Select the *LDAPS* button if you want to enable LDAP authentication
over SSL.
#. Specify one or several LDAP server addresses separated by a space. Those
addresses must be accessible from the node where Grafana is installed.
Note that addresses external to the *management network* are not routable
by default (see the note below).
#. Specify the LDAP server port number or leave it empty to use the defaults.
#. Specify the *Bind DN* of a user who has search priviliges on the LDAP server.
#. Specify the password of the user identified by the *Bind DN* above.
#. Specify the *Base DN* in the Directory Information Tree (DIT) from where
to search for users.
#. Specify a valid user search filter (ex. (uid=%s)).
The result of the search should return a unique user entry.
#. Specify a valid search filter to search for users.
Example ``(uid=%s)``
You can further restrict access to Grafana to those users who
are member of a specific LDAP group.
a. Tick the *Enable group-based authorization*.
#. Specify the LDAP group *Base DN* in the DIT from where to search
for groups.
#. Specify the LDAP group search filter.
Example ``(&(objectClass=posixGroup)(memberUid=%s))``
#. Specify the CN of the LDAP group that will be mapped to the *admin role*
#. Specify the CN of the LDAP group that will be mapped to the *viewer role*
Users who have the *admin role* can modify the Grafana dashboards
or create new ones. Users who have the *viewer role* can only
visualise the Grafana dashboards.
7. `Configure your environment
<http://docs.openstack.org/developer/fuel-docs/userdocs/fuel-user-guide/configure-environment.html>`_.
.. note:: By default, StackLight is configured to use the *management network*,
of the so-called `Default Node Network Group
<http://docs.openstack.org/developer/fuel-docs/userdocs/fuel-user-guide/configure-environment/network-settings.html>`_.
While this default setup may be appropriate for small deployments or
evaluation purposes, it is recommended not to use this network
for StackLight in production. It is instead recommended to create a network
dedicated to StackLight using the `networking templates
<https://docs.mirantis.com/openstack/fuel/fuel-8.0/operations.html#using-networking-templates>`_
capability of Fuel. Using a dedicated network for StackLight will
improve performances and reduce the monitoring footprint on the
control-plane. It will also facilitate access to the Gafana UI
after deployment as the *management network* is not routable.
8. Click the *Nodes* tab and assign the *InfluxDB_Grafana* role
to the node(s) where you want to install the plugin.
You can see in the example below that the *InfluxDB_Grafana*
role is assigned to three nodes along side with the
*Alerting_Infrastructure* and the *Elasticsearch_Kibana* roles.
Here, the three plugins of the LMA toolchain backend servers are
installed on the same nodes. You can assign the *InfluxDB_Grafana*
role to either one node (standalone install) or three nodes for HA.
.. image:: ../images/influx_grafana_role.png
:width: 800
.. note:: Installing the InfluxDB server on more than three nodes
is currently not possible using the Fuel plugin.
Similarly, installing the InfluxDB server on two nodes
is not recommended to avoid split-brain situations in the Raft
consensus of the InfluxDB cluster as well as the *Pacemaker* cluster
which is responsible of the VIP address failover.
To be also noted that it is possible to add or remove nodes
with the *InfluxDB_Grafana* role in the cluster after deployment.
9. `Adjust the disk partitioning if necessary
<http://docs.openstack.org/developer/fuel-docs/userdocs/fuel-user-guide/configure-environment/customize-partitions.html>`_.
By default, the InfluxDB-Grafana Plugin allocates:
* 20% of the first available disk for the operating system by honoring
a range of 15GB minimum to 50GB maximum.
* 10GB for */var/log*.
* At least 30 GB for the InfluxDB database in */var/lib/influxdb*.
10. `Deploy your environment
<http://docs.openstack.org/developer/fuel-docs/userdocs/fuel-user-guide/deploy-environment.html>`_.

View File

@@ -0,0 +1,33 @@
.. _definitions:
Key terms
---------
The table below lists the key terms and acronyms that are used
in this document.
+---------------------+-------------------------------------------------------+
| **Terms & acronyms**| **Definition** |
+=====================+=======================================================+
| The Collector | The StackLight Collector is a smart monitoring agent |
| | running on every node. It collects and processes |
| | the metrics of your OpenStack environment. |
+---------------------+-------------------------------------------------------+
| InfluxDB | InfluxDB is a time-series, metrics, and analytics |
| | open-source database (MIT license). It is written in |
| | Go and has no external dependencies. |
| | InfluxDB is targeted at use cases for DevOps, metrics,|
| | sensor data, and real-time analytics. |
+---------------------+-------------------------------------------------------+
| Grafana | Grafana is an Apache 2.0 licensed general purpose |
| | dashboard and graph composer. It is focused on |
| | providing rich ways to visualize metrics time-series, |
| | mainly though graphs but supports other ways to |
| | visualize data through a pluggable panel architecture.|
| | |
| | It has rich support for Graphite, InfluxDB, and |
| | OpenTSDB and also supports other data sources through |
| | plugins. Grafana is most commonly used for |
| | infrastructure monitoring, application monitoring, and|
| | metric analytics. |
+---------------------+-------------------------------------------------------+

View File

@@ -1,21 +1,37 @@
================================================================
Welcome to the StackLight InfluxDB-Grafana Plugin Documentation!
================================================================
=========================================================================
Welcome to the StackLight InfluxDB-Grafana plugin for Fuel documentation!
=========================================================================
User documentation
==================
Overview
~~~~~~~~
.. toctree::
:maxdepth: 2
:maxdepth: 1
overview
releases
installation
user
intro
definitions
requirements
limitations
release_notes
licenses
appendix
references
Indices and Tables
==================
Installing and configuring StackLight InfluxDB-Grafana plugin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* :ref:`search`
.. toctree::
:maxdepth: 1
install_intro
install
configure_plugin
verification
Using StackLight InfluxDB-Grafana plugin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. toctree::
:maxdepth: 1
usage
troubleshooting

View File

@@ -1,10 +1,7 @@
.. _user_installation:
Installation Guide
==================
InfluxDB-Grafana Fuel Plugin installation using the RPM file of the Fuel Plugins Catalog
----------------------------------------------------------------------------------------
Install using the RPM file of the Fuel plugins catalog
------------------------------------------------------
To install the StackLight InfluxDB-Grafana Fuel Plugin using the RPM file of the Fuel Plugins
Catalog, you need to follow these steps:
@@ -30,8 +27,8 @@ Catalog, you need to follow these steps:
---|----------------------|----------|----------------
1 | influxdb_grafana | 0.10.0 | 4.0.0
StackLight InfluxDB-Grafana Fuel Plugin installtion from source
---------------------------------------------------------------
Install from source
-------------------
Alternatively, you may want to build the RPM file of the plugin from source if,
for example, you want to test the latest features of the master branch or customize the plugin.
@@ -78,15 +75,4 @@ node so that you won't have to copy that file later on.
7. Now that you have created the RPM file, you can install the plugin using the `fuel plugins --install` command::
[root@fuel ~] fuel plugins --install ./fuel-plugin-influxdb-grafana/*.noarch.rpm
StackLight InfluxDB-Grafana Fuel plugin software components
-----------------------------------------------------------
+----------------+-------------------------------------+
| Components | Version |
+================+=====================================+
| InfluxDB | v0.11.1 for Ubuntu (64-bit) |
+----------------+-------------------------------------+
| Grafana | v3.0.4 for Ubuntu (64-bit) |
+----------------+-------------------------------------+
[root@fuel ~] fuel plugins --install ./fuel-plugin-influxdb-grafana/*.noarch.rpm

View File

@@ -0,0 +1,19 @@
Introduction
------------
You can install the StackLight InfluxDB-Grafana plugin using one of the
following options:
• Install using the RPM file
• Install from source
The following is a list of software components installed by the StackLight
InfluxDB-Grafana plugin:
+----------------+-------------------------------------+
| Components | Version |
+================+=====================================+
| InfluxDB | v0.11.1 for Ubuntu (64-bit) |
+----------------+-------------------------------------+
| Grafana | v3.0.4 for Ubuntu (64-bit) |
+----------------+-------------------------------------+

28
doc/source/intro.rst Normal file
View File

@@ -0,0 +1,28 @@
.. _intro:
Introduction
------------
The **StackLight InfluxDB-Grafana Fuel Plugin** is used to install and configure
InfluxDB and Grafana which collectively provide access to the
metrics analytics of Mirantis OpenStack.
InfluxDB is a powerful distributed time-series database
to store and search metrics time-series. The metrics analytics are used to
visualize the time-series and the annotations produced by the StackLight Collector.
The annotations contain insightful information about the faults and anomalies
that resulted in a change of state for the clusters of nodes and services
of the OpenStack environment.
The InfluxDB-Grafana Plugin is an indispensable tool to answering
the questions "what has changed in my OpenStack environment, when and why?".
Grafana is installed with a collection of predefined dashboards for each
of the OpenStack services that are monitored.
Among those dashboards, the *Main Dashboard* provides a single pane of glass
overview of your OpenStack environment status.
InfluxDB and Grafana are key components
of the `LMA Toolchain project <https://launchpad.net/lma-toolchain>`_
as shown in the figure below.
.. image:: ../images/toolchain_map.png
:width: 430pt

View File

@@ -1,10 +1,10 @@
.. _licenses:
Licenses
========
--------
Third Party Components
----------------------
++++++++++++++++++++++
+----------+-----------------------+-----------+
| Name | Project Web Site | License |
@@ -15,7 +15,7 @@ Third Party Components
+----------+-----------------------+-----------+
Puppet modules
--------------
++++++++++++++
+---------+--------------------------------------------------+-----------+
| Name | Project Web Site | License |

View File

@@ -0,0 +1,9 @@
.. _plugin_limitations:
Limitations
-----------
Currently, the size of an InfluxDB cluster the Fuel plugin can deploy is limited to three nodes. In addition to this,
each node of the InfluxDB cluster is configured to run under the *meta* node role and the *data* node role. Therefore,
it is not possible to separate the nodes participating in the Raft consensus cluster from
the nodes accessing the data replicas.

View File

@@ -1,85 +0,0 @@
.. _user_overview:
Overview
========
The **StackLight InfluxDB-Grafana Fuel Plugin** is used to install and configure
InfluxDB and Grafana which collectively provide access to the
metrics analytics of Mirantis OpenStack.
InfluxDB is a powerful distributed time-series database
to store and search metrics time-series. The metrics analytics are used to
visualize the time-series and the annotations produced by the StackLight Collector.
The annotations contain insightful information about the faults and anomalies
that resulted in a change of state for the clusters of nodes and services
of the OpenStack environment.
The InfluxDB-Grafana Plugin is an indispensable tool to answering
the questions "what has changed in my OpenStack environment, when and why?".
Grafana is installed with a collection of predefined dashboards for each
of the OpenStack services that are monitored.
Among those dashboards, the *Main Dashboard* provides a single pane of glass
overview of your OpenStack environment status.
InfluxDB and Grafana are key components
of the `LMA Toolchain project <https://launchpad.net/lma-toolchain>`_
as shown in the figure below.
.. image:: ../images/toolchain_map.png
:align: center
.. _plugin_requirements:
Requirements
------------
+------------------------+--------------------------------------------------------------------------------------------+
| **Requirement** | **Version/Comment** |
+========================+============================================================================================+
| Disk space | The plugins specification requires to provision at least 15GB of disk space for the |
| | system, 10GB for the logs and 30GB for the database. The installation of the |
| | plugin will fail if there is less than 55GB of disk space available on the node. |
+------------------------+--------------------------------------------------------------------------------------------+
| Mirantis OpenStack | 8.0, 9.0 |
+------------------------+--------------------------------------------------------------------------------------------+
| Hardware configuration | The hardware configuration (RAM, CPU, disk(s)) required by this plugin depends on the size |
| | of your cloud environment and other factors like the retention policy. An average |
| | setup would require a quad-core server with 8 GB of RAM and access to a 500-1000 IOPS disk.|
| | Please check the `InfluxDB Hardware Sizing Guide |
| | <https://docs.influxdata.com/influxdb/v0.10/guides/hardware_sizing/>`_ for additional |
| | sizing information. |
| | |
| | It is also highly recommended to use a dedicated disk for your data storage. Otherwise, |
| | The InfluxDB-Grafana Plugin will use the root filesystem by default. |
+------------------------+--------------------------------------------------------------------------------------------+
Limitations
-----------
Currently, the size of an InfluxDB cluster the Fuel plugin can deploy is limited to three nodes. In addition to this,
each node of the InfluxDB cluster is configured to run under the *meta* node role and the *data* node role. Therefore,
it is not possible to separate the nodes participating in the Raft consensus cluster from
the nodes accessing the data replicas.
Key terms, acronyms and abbreviations
-------------------------------------
+----------------------+--------------------------------------------------------------------------------------------+
| **Terms & acronyms** | **Definition** |
+======================+============================================================================================+
| The Collector | The StackLight Collector is a smart monitoring agent running on every node which collects |
| | and process the metrics of your OpenStack environment. |
+----------------------+--------------------------------------------------------------------------------------------+
| InfluxDB | InfluxDB is a time-series, metrics, and analytics open-source database (MIT license). |
| | Its written in Go and has no external dependencies. |
| | |
| | InfluxDB is targeted at use cases for DevOps, metrics, sensor data, and real-time |
| | analytics. |
+----------------------+--------------------------------------------------------------------------------------------+
| Grafana | Grafana is an (Apache 2.0 Licensed) general purpose dashboard and graph composer. |
| | It's focused on providing rich ways to visualize metrics time-series, mainly though graphs |
| | but supports other ways to visualize data through a pluggable panel architecture. |
| | |
| | It currently has rich support for Graphite, InfluxDB and OpenTSDB and also supports other |
| | data sources via plugins. Grafana is most commonly used for infrastructure monitoring, |
| | application monitoring and metric analytics. |
+----------------------+--------------------------------------------------------------------------------------------+

View File

@@ -1,8 +1,8 @@
.. _user_appendix:
.. _references:
Appendix
========
References
----------
* The `InfluxDB-Grafana plugin <https://github.com/openstack/fuel-plugin-influxdb-grafana>`_ project at GitHub.
* The official `InfluxDB documentation <https://influxdb.com/docs/v0.9/>`_.
* The official `Grafana documentation <http://docs.grafana.org/v3.0>`_.
* The `InfluxDB-Grafana plugin <https://github.com/openstack/fuel-plugin-influxdb-grafana>`_ project at GitHub
* The official `InfluxDB documentation <https://influxdb.com/docs/v0.9/>`_
* The official `Grafana documentation <http://docs.grafana.org/v3.0>`_

View File

@@ -1,10 +1,10 @@
.. _releases:
.. _release_notes:
Release Notes
=============
Release notes
-------------
Version 0.10.0
--------------
0.10.0
++++++
* Changes
@@ -16,8 +16,8 @@ Version 0.10.0
* Upgrade to InfluxDB v0.11.1.
* Upgrade to Grafana v3.0.4.
Version 0.9.0
-------------
0.9.0
+++++
- A new dashboard for hypervisor metrics.
- A new dashboard for InfluxDB cluster.
@@ -27,8 +27,8 @@ Version 0.9.0
- Add support for InfluxDB clustering (beta state).
- Use MySQL as Grafana backend to support HA.
Version 0.8.0
-------------
0.8.0
+++++
- Add support for the "influxdb_grafana" Fuel Plugin role instead of
the "base-os" role which had several limitations.
@@ -38,7 +38,7 @@ Version 0.8.0
- Several dashboard visualisation improvements.
- A new self-monitoring dashboard.
Version 0.7.0
-------------
0.7.0
+++++
- Initial release of the plugin. This is a beta version.
- Initial release of the plugin. This is a beta version.

View File

@@ -0,0 +1,27 @@
.. _plugin_requirements:
Requirements
------------
+-----------------------+-----------------------------------------------------------------------+
| **Requirement** | **Version/Comment** |
+=======================+=======================================================================+
| Disk space | The plugins specification requires to provision at least 15GB of disk|
| | spase for the system, 10GB for the logs and 30GB for the database. The|
| | installation of the plugin will fail if there is less than 55GB of |
| | disk space available on the node. |
+-----------------------+-----------------------------------------------------------------------+
| Mirantis OpenStack | 8.0, 9.0 |
+-----------------------+-----------------------------------------------------------------------+
| Hardware configuration| The hardware configuration (RAM, CPU, disk(s)) required by this plugin|
| | depends on the size of your cloud environment and other factors like |
| | the retention policy. An average setup would require a quad-core |
| | server with 8 GB of RAM and access to a 500-1000 IOPS disk. |
| | See the `InfluxDB Hardware Sizing Guide |
| | <https://docs.influxdata.com/influxdb/v0.10/guides/hardware_sizing/>`_|
| | for additional sizing information. |
| | |
| | It is also highly recommended to use a dedicated disk for your data |
| | storage. Otherwise, the InfluxDB-Grafana Plugin will use the root |
| | filesystem by default. |
+-----------------------+-----------------------------------------------------------------------+

View File

@@ -0,0 +1,51 @@
.. _troubleshooting:
Troubleshooting
---------------
If you get no data in Grafana, follow these troubleshooting tips.
#. First, check that the LMA Collector is running properly by following the
LMA Collector troubleshooting instructions in the
`LMA Collector Fuel Plugin User Guide <http://fuel-plugin-lma-collector.readthedocs.org/>`_.
#. Check that the nodes are able to connect to the InfluxDB cluster via the VIP address
(see above how to get the InfluxDB cluster VIP address) on port *8086*::
root@node-2:~# curl -I http://<VIP>:8086/ping
The server should return a 204 HTTP status::
HTTP/1.1 204 No Content
Request-Id: cdc3c545-d19d-11e5-b457-000000000000
X-Influxdb-Version: 0.10.0
Date: Fri, 12 Feb 2016 15:32:19 GMT
#. Check that InfluxDB cluster VIP address is up and running::
root@node-1:~# crm resource status vip__influxdb
resource vip__influxdb is running on: node-1.test.domain.local
#. Check that the InfluxDB service is started on all nodes of the cluster::
root@node-1:~# service influxdb status
influxdb Process is running [ OK ]
#. If not, (re)start it::
root@node-1:~# service influxdb start
Starting the process influxdb [ OK ]
influxdb process was started [ OK ]
#. Check that Grafana server is running::
root@node-1:~# service grafana-server status
* grafana is running
#. If not, (re)start it::
root@node-1:~# service grafana-server start
* Starting Grafana Server
#. If none of the above solves the problem, check the logs in ``/var/log/influxdb/influxdb.log``
and ``/var/log/grafana/grafana.log`` to find out what might have gone wrong.

217
doc/source/usage.rst Normal file
View File

@@ -0,0 +1,217 @@
.. _usage:
Exploring your time-series with Grafana
---------------------------------------
The InfluxDB-Grafana Plugin comes with a collection of predefined
dashboards you can use to visualize the time-series stored in InfluxDB.
Please check the LMA Collector documentation for a complete list of all the
`metrics time-series <http://fuel-plugin-lma-collector.readthedocs.org/en/latest/appendix_b.html>`_
that are collected and stored in InfluxDB.
The Main Dashboard
++++++++++++++++++
We suggest you start with the **Main Dashboard**, as shown
below, as an entry to the other dashboards.
The **Main Dashboard** provides a single pane of glass from where you can visualize the
overall health status of your OpenStack services such as Nova and Cinder
but also HAProxy, MySQL and RabbitMQ to name a few..
.. image:: ../images/grafana_main.png
:width: 800
As you can see, the **Main Dashboard** (as most dashboards) provides
a drop down menu list in the upper left corner of the window
from where you can pick a particular metric dimension such as
the *controller name* or the *device name* you want to select.
In the example above, the system metrics of *node-48* are
being displayed in the dashboard.
Within the **OpenStack Services** row, each of the services
represented can be assigned five different status.
.. note:: The precise determination of a service health status depends
on the correlation policies implemented for that service by a `Global Status Evaluation (GSE)
plugin <http://fuel-plugin-lma-collector.readthedocs.org/en/latest/alarms.html#cluster-policies>`_.
The meaning associated with a service health status is the following:
- **Down**: One or several primary functions of a service
cluster has failed. For example,
all API endpoints of a service cluster like Nova
or Cinder are failed.
- **Critical**: One or several primary functions of a
service cluster are severely degraded. The quality
of service delivered to the end-user should be severely
impacted.
- **Warning**: One or several primary functions of a
service cluster are slightly degraded. The quality
of service delivered to the end-user should be slightly
impacted.
- **Unknown**: There is not enough data to infer the actual
health status of a service cluster.
- **Okay**: None of the above was found to be true.
The **Virtual Compute Resources** row provides an overview of
the amount of virtual resources being used by the compute nodes
including the number of virtual CPUs, the amount of memory
and disk space being used as well as the amount of virtual
resources remaining available to create new instances.
The "System" row provides an overview of the amount of physical
resources being used on the control plane (the controller cluster).
You can select a specific controller using the
controller's drop down list in the left corner of the toolbar.
The "Ceph" row provides an overview of the resources usage
and current health status of the Ceph cluster when it is deployed
in the OpenStack environment.
The **Main Dashboard** is also an entry point to access more detailed
dashboards for each of the OpenStack services that are monitored.
For example, if you click on the *Nova box*, the **Nova
Dashboard** is displayed.
.. image:: ../images/grafana_nova.png
:width: 800
The Nova dashboard
++++++++++++++++++
The **Nova Dashboard** provides a detailed view of the
Nova service's related metrics.
The **Service Status** row provides information about the Nova service
cluster health status as a whole including the status of the API frontend
(the HAProxy public VIP), a counter of HTTP 5xx errors,
the HTTP requests response time and status code.
The **Nova API** row provides information about the current health status of
the API backends (nova-api, ec2-api, ...).
The **Nova Services** row provides information about the current and
historical status of the Nova *workers*.
The **Instances** row provides information about the number of active
instances in error and instances creation time statistics.
The **Resources** row provides various virtual resources usage indicators.
Self-monitoring dashboards
++++++++++++++++++++++++++
The first **Self-Monitoring Dashboard** was introduced in LMA 0.8.
The intent of the self-monitoring dashboards is to bring operational
insights about how the monitoring system itself (the toolchain) performs overall.
The **Self-Monitoring Dashboard**, provides information about the *hekad*
and *collectd* processes.
In particular, it gives information about the amount of system resources
consumed by these processes, the time allocated to the Lua plugins
running within *hekad*, the amount of messages being processed and
the time it takes to process those messages.
Again, it is possible to select a particular node view using the drop down
menu list.
With LMA 0.9, we have introduced two new dashboards.
#. The **Elasticsearch Cluster Dashboard** provides information about
the overall health status of the Elasticsearch cluster including
the state of the shards, the number of pending tasks and various resources
usage metrics.
#. The **InfluxDB Cluster Dashboard** provides statistics about the InfluxDB
processes running in the InfluxDB cluster including various resources usage metrics.
The hypervisor dashboard
++++++++++++++++++++++++
LMA 0.9 introduces a new **Hypervisor Dashboard** which brings operational
insights about the virtual instances managed through *libvirt*.
As shown in the figure below, the **Hypervisor Dashboard** assembles a
view of various *libvirt* metrics. A dropdown menu list allows to pick
a particular instance UUID running on a particular node. In the
example below, the metrics for the instance id *ba844a75-b9db-4c2f-9cb9-0b083fe03fb7*
running on *node-4* are displayed.
.. image:: ../images/grafana_hypervisor.png
:width: 800
Check the LMA Collector documentation for additional information about the
`*libvirt* metrics <http://fuel-plugin-lma-collector.readthedocs.org/en/latest/appendix_b.html#libvirt>`_
that are displayed in the **Hypervisor Dashboard**.
Other dashboards
++++++++++++++++
In total there are 19 different dashboards you can use to
explore different time-series facets of your OpenStack environment.
Viewing faults and anomalies
++++++++++++++++++++++++++++
The LMA Toolchain is capable of detecting a number of service-affecting
conditions such as the faults and anomalies that occured in your OpenStack
environment.
Those conditions are reported in annotations that are displayed in
Grafana. The Grafana annotations contain a textual
representation of the alarm (or set of alarms) that were triggered
by the Collectors for a service.
In other words, the annotations contain valuable insights
that you could use to diagnose and
troubleshoot problems. Furthermore, with the Grafana annotations,
the system makes a distinction between what is estimated as a
direct root cause versus what is estimated as an indirect
root cause. This is internally represented in a dependency graph.
There are first degree dependencies used to describe situations
whereby the health status of an entity
strictly depends on the health status of another entity. For
example Nova as a service has first degree dependencies
with the nova-api endpoints and the nova-scheduler workers. But
there are also second degree dependencies whereby the health
status of an entity doesn't strictly depends on the health status
of another entity, although it might, depending on other operations
being performed. For example, by default we declared that Nova
has a second degree dependency with Neutron. As a result, the
health status of Nova will not be directly impacted by the health
status of Neutron but the annotation will provide
a root cause analysis hint. Let's assume a situation
where Nova has changed from *okay* to *critical* status (because of
5xx HTTP errors) and that Neutron has been in *down* status for a while.
In this case, the Nova dashboard will display an annotation showing that
Nova has changed to a *warning* status because the system has detected
5xx errors and that it may be due to the fact that Neutron is *down*.
An example of what an annotation looks like is shown below.
.. image:: ../images/grafana_nova_annot.png
:width: 800
This annotation shows that the health status of Nova is *down*
because there is no *nova-api* service backend (viewed from HAProxy)
that is *up*.
Hiding nodes from dashboards
++++++++++++++++++++++++++++
When you remove a node from the environment, it is still displayed in
the 'server' and 'controller' drop-down lists. To hide it from the list
you need to edit the associated InfluxDB query in the *templating* section.
For example, if you want to remove *node-1*, you need to add the following
condition to the *where* clause::
and hostname != 'node-1'
.. image:: ../images/remove_controllers_from_templating.png
If you want to hide more than one node you can add more conditions like this::
and hostname != 'node-1' and hostname != 'node-2'
This should be done for all dashboards that display the deleted node and you
need to save them afterwards.

View File

@@ -1,499 +0,0 @@
.. _user_guide:
User Guide
==========
.. _plugin_configuration:
Plugin configuration
--------------------
To configure the **StackLight InfluxDB-Grafana Plugin**, you need to follow these steps:
1. `Create a new environment
<http://docs.openstack.org/developer/fuel-docs/userdocs/fuel-user-guide/create-environment/start-create-env.html>`_.
2. Click on the *Settings* tab of the Fuel web UI and select the *Other* category.
3. Scroll down through the settings until you find the **InfluxDB-Grafana Server
Plugin** section. You should see a page like this:
.. image:: ../images/influx_grafana_settings.png
:width: 800
4. Tick the **InfluxDB-Grafana Plugin** box and fill-in the required fields as indicated below.
a. Specify the number of days of retention for your data.
b. Specify the InfluxDB admin password (called root password in the InfluxDB documentation).
c. Specify the database name (default is lma).
d. Specify the InfluxDB username and password.
e. Specify the Grafana username and password.
5. Since the introduction of Grafana 2.6.0, the plugin now uses a MySQL database
to store its configuration data such as the dashboard templates.
a. Select **Local MySQL** if you want to create the Grafana database using the MySQL server
of the OpenStack control-plane. Otherwise, select **Remote server** and specify
the fully qualified name or IP address of the MySQL server you want to use.
b. Then, specify the MySQL database name, username and password that will be used
to access that database.
6. Tick the *Enable TLS for Grafana* box if you want to encrypt your
Grafana credentials (username, password). Then, fill-in the required
fields as indicated below.
.. image:: ../images/tls_settings.png
:width: 800
a. Specify the DNS name of the Grafana server. This parameter is used
to create a link in the Fuel dashboard to the Grafana server.
#. Specify the location of a PEM file that contains the certificate
and the private key of the Grafana server that will be used in TLS handchecks
with the client.
7. Tick the *Use LDAP for Grafana authentication* box if you want to authenticate
via LDAP to Grafana. Then, fill-in the required fields as indicated below.
.. image:: ../images/ldap_auth.png
:width: 800
a. Select the *LDAPS* button if you want to enable LDAP authentication
over SSL.
#. Specify one or several LDAP server addresses separated by a space. Those
addresses must be accessible from the node where Grafana is installed.
Note that addresses external to the *management network* are not routable
by default (see the note below).
#. Specify the LDAP server port number or leave it empty to use the defaults.
#. Specify the *Bind DN* of a user who has search priviliges on the LDAP server.
#. Specify the password of the user identified by the *Bind DN* above.
#. Specify the *Base DN* in the Directory Information Tree (DIT) from where
to search for users.
#. Specify a valid user search filter (ex. (uid=%s)).
The result of the search should return a unique user entry.
#. Specify a valid search filter to search for users.
Example ``(uid=%s)``
You can further restrict access to Grafana to those users who
are member of a specific LDAP group.
a. Tick the *Enable group-based authorization*.
#. Specify the LDAP group *Base DN* in the DIT from where to search
for groups.
#. Specify the LDAP group search filter.
Example ``(&(objectClass=posixGroup)(memberUid=%s))``
#. Specify the CN of the LDAP group that will be mapped to the *admin role*
#. Specify the CN of the LDAP group that will be mapped to the *viewer role*
Users who have the *admin role* can modify the Grafana dashboards
or create new ones. Users who have the *viewer role* can only
visualise the Grafana dashboards.
7. `Configure your environment
<http://docs.openstack.org/developer/fuel-docs/userdocs/fuel-user-guide/configure-environment.html>`_.
.. note:: By default, StackLight is configured to use the *management network*,
of the so-called `Default Node Network Group
<http://docs.openstack.org/developer/fuel-docs/userdocs/fuel-user-guide/configure-environment/network-settings.html>`_.
While this default setup may be appropriate for small deployments or
evaluation purposes, it is recommended not to use this network
for StackLight in production. It is instead recommended to create a network
dedicated to StackLight using the `networking templates
<https://docs.mirantis.com/openstack/fuel/fuel-8.0/operations.html#using-networking-templates>`_
capability of Fuel. Using a dedicated network for StackLight will
improve performances and reduce the monitoring footprint on the
control-plane. It will also facilitate access to the Gafana UI
after deployment as the *management network* is not routable.
8. Click the *Nodes* tab and assign the *InfluxDB_Grafana* role
to the node(s) where you want to install the plugin.
You can see in the example below that the *InfluxDB_Grafana*
role is assigned to three nodes along side with the
*Alerting_Infrastructure* and the *Elasticsearch_Kibana* roles.
Here, the three plugins of the LMA toolchain backend servers are
installed on the same nodes. You can assign the *InfluxDB_Grafana*
role to either one node (standalone install) or three nodes for HA.
.. image:: ../images/influx_grafana_role.png
:width: 800
.. note:: Installing the InfluxDB server on more than three nodes
is currently not possible using the Fuel plugin.
Similarly, installing the InfluxDB server on two nodes
is not recommended to avoid split-brain situations in the Raft
consensus of the InfluxDB cluster as well as the *Pacemaker* cluster
which is responsible of the VIP address failover.
To be also noted that it is possible to add or remove nodes
with the *InfluxDB_Grafana* role in the cluster after deployment.
9. `Adjust the disk partitioning if necessary
<http://docs.openstack.org/developer/fuel-docs/userdocs/fuel-user-guide/configure-environment/customize-partitions.html>`_.
By default, the InfluxDB-Grafana Plugin allocates:
* 20% of the first available disk for the operating system by honoring
a range of 15GB minimum to 50GB maximum.
* 10GB for */var/log*.
* At least 30 GB for the InfluxDB database in */var/lib/influxdb*.
10. `Deploy your environment
<http://docs.openstack.org/developer/fuel-docs/userdocs/fuel-user-guide/deploy-environment.html>`_.
.. _plugin_install_verification:
Plugin verification
-------------------
Be aware that depending on the number of nodes and deployment setup,
deploying a Mirantis OpenStack environment can typically take anything
from 30 minutes to several hours. But once your deployment is complete,
you should see a notification message indicating that you deployment
successfully completed as in the figure below.
.. image:: ../images/deployment_notification.png
:width: 800
Verifying InfluxDB
~~~~~~~~~~~~~~~~~~
You should verify that the InfluxDB cluster is running properly.
First, you need first to retreive the InfluxDB cluster VIP address.
Here is how to proceed.
#. On the Fuel Master node, find the IP address of a node where the InfluxDB
server is installed using the following command::
[root@fuel ~]# fuel nodes
id | status | name | cluster | ip | mac | roles |
---|----------|------------------|---------|------------|-----|------------------|
1 | ready | Untitled (fa:87) | 1 | 10.109.0.8 | ... | influxdb_grafana |
2 | ready | Untitled (12:aa) | 1 | 10.109.0.3 | ... | influxdb_grafana |
3 | ready | Untitled (4e:6e) | 1 | 10.109.0.7 | ... | influxdb_grafana |
#. Then `ssh` to anyone of these nodes (ex. *node-1*) and type the command::
root@node-1:~# hiera lma::influxdb::vip
10.109.1.4
This tells you that the VIP address of your InfluxDB cluster is *10.109.1.4*.
#. With that VIP address type the command::
root@node-1:~# /usr/bin/influx -database lma -password lmapass \
--username root -host 10.109.1.4 -port 8086
Visit https://enterprise.influxdata.com to register for updates,
InfluxDB server management, and monitoring.
Connected to http://10.109.1.4:8086 version 0.10.0
InfluxDB shell 0.10.0
>
As you can see, executing */usr/bin/influx* will start an interactive CLI and automatically connect to
the InfluxDB server. Then if you type::
> show series
You should see a dump of all the time-series collected so far.
Then, if you type::
> show servers
name: data_nodes
----------------
id http_addr tcp_addr
1 node-1:8086 node-1:8088
3 node-2:8086 node-2:8088
5 node-3:8086 node-3:8088
name: meta_nodes
----------------
id http_addr tcp_addr
1 node-1:8091 node-1:8088
2 node-2:8091 node-2:8088
4 node-3:8091 node-3:8088
You should see a list of the nodes participating in the `InfluxDB cluster
<https://docs.influxdata.com/influxdb/v0.10/guides/clustering/>`_ with their roles (data or meta).
Verifying Grafana
~~~~~~~~~~~~~~~~~
From the Fuel dDashboard, click on the **Grafana** link (or enter the IP address
and port number if your DNS is not setup).
The first time you access Grafana, you are requested to
authenticate using your credentials.
.. image:: ../images/grafana_login.png
:width: 800
Then you should be redirected to the *Grafana Home Page*
from where you can select a dashboard as shown below.
.. image:: ../images/grafana_home.png
:width: 800
Exploring your time-series with Grafana
---------------------------------------
The InfluxDB-Grafana Plugin comes with a collection of predefined
dashboards you can use to visualize the time-series stored in InfluxDB.
Please check the LMA Collector documentation for a complete list of all the
`metrics time-series <http://fuel-plugin-lma-collector.readthedocs.org/en/latest/appendix_b.html>`_
that are collected and stored in InfluxDB.
The Main Dashboard
~~~~~~~~~~~~~~~~~~
We suggest you start with the **Main Dashboard**, as shown
below, as an entry to the other dashboards.
The **Main Dashboard** provides a single pane of glass from where you can visualize the
overall health status of your OpenStack services such as Nova and Cinder
but also HAProxy, MySQL and RabbitMQ to name a few..
.. image:: ../images/grafana_main.png
:width: 800
As you can see, the **Main Dashboard** (as most dashboards) provides
a drop down menu list in the upper left corner of the window
from where you can pick a particular metric dimension such as
the *controller name* or the *device name* you want to select.
In the example above, the system metrics of *node-48* are
being displayed in the dashboard.
Within the **OpenStack Services** row, each of the services
represented can be assigned five different status.
.. note:: The precise determination of a service health status depends
on the correlation policies implemented for that service by a `Global Status Evaluation (GSE)
plugin <http://fuel-plugin-lma-collector.readthedocs.org/en/latest/alarms.html#cluster-policies>`_.
The meaning associated with a service health status is the following:
- **Down**: One or several primary functions of a service
cluster has failed. For example,
all API endpoints of a service cluster like Nova
or Cinder are failed.
- **Critical**: One or several primary functions of a
service cluster are severely degraded. The quality
of service delivered to the end-user should be severely
impacted.
- **Warning**: One or several primary functions of a
service cluster are slightly degraded. The quality
of service delivered to the end-user should be slightly
impacted.
- **Unknown**: There is not enough data to infer the actual
health status of a service cluster.
- **Okay**: None of the above was found to be true.
The **Virtual Compute Resources** row provides an overview of
the amount of virtual resources being used by the compute nodes
including the number of virtual CPUs, the amount of memory
and disk space being used as well as the amount of virtual
resources remaining available to create new instances.
The "System" row provides an overview of the amount of physical
resources being used on the control plane (the controller cluster).
You can select a specific controller using the
controller's drop down list in the left corner of the toolbar.
The "Ceph" row provides an overview of the resources usage
and current health status of the Ceph cluster when it is deployed
in the OpenStack environment.
The **Main Dashboard** is also an entry point to access more detailed
dashboards for each of the OpenStack services that are monitored.
For example, if you click on the *Nova box*, the **Nova
Dashboard** is displayed.
.. image:: ../images/grafana_nova.png
:width: 800
The Nova Dashboard
~~~~~~~~~~~~~~~~~~
The **Nova Dashboard** provides a detailed view of the
Nova service's related metrics.
The **Service Status** row provides information about the Nova service
cluster health status as a whole including the status of the API frontend
(the HAProxy public VIP), a counter of HTTP 5xx errors,
the HTTP requests response time and status code.
The **Nova API** row provides information about the current health status of
the API backends (nova-api, ec2-api, ...).
The **Nova Services** row provides information about the current and
historical status of the Nova *workers*.
The **Instances** row provides information about the number of active
instances in error and instances creation time statistics.
The **Resources** row provides various virtual resources usage indicators.
Self-Monitoring Dashboards
~~~~~~~~~~~~~~~~~~~~~~~~~~
The first **Self-Monitoring Dashboard** was introduced in LMA 0.8.
The intent of the self-monitoring dashboards is to bring operational
insights about how the monitoring system itself (the toolchain) performs overall.
The **Self-Monitoring Dashboard**, provides information about the *hekad*
and *collectd* processes.
In particular, it gives information about the amount of system resources
consumed by these processes, the time allocated to the Lua plugins
running within *hekad*, the amount of messages being processed and
the time it takes to process those messages.
Again, it is possible to select a particular node view using the drop down
menu list.
With LMA 0.9, we have introduced two new dashboards.
#. The **Elasticsearch Cluster Dashboard** provides information about
the overall health status of the Elasticsearch cluster including
the state of the shards, the number of pending tasks and various resources
usage metrics.
#. The **InfluxDB Cluster Dashboard** provides statistics about the InfluxDB
processes running in the InfluxDB cluster including various resources usage metrics.
The Hypervisor Dashboard
~~~~~~~~~~~~~~~~~~~~~~~~
LMA 0.9 introduces a new **Hypervisor Dashboard** which brings operational
insights about the virtual instances managed through *libvirt*.
As shown in the figure below, the **Hypervisor Dashboard** assembles a
view of various *libvirt* metrics. A dropdown menu list allows to pick
a particular instance UUID running on a particular node. In the
example below, the metrics for the instance id *ba844a75-b9db-4c2f-9cb9-0b083fe03fb7*
running on *node-4* are displayed.
.. image:: ../images/grafana_hypervisor.png
:width: 800
Check the LMA Collector documentation for additional information about the
`*libvirt* metrics <http://fuel-plugin-lma-collector.readthedocs.org/en/latest/appendix_b.html#libvirt>`_
that are displayed in the **Hypervisor Dashboard**.
Other Dashboards
~~~~~~~~~~~~~~~~
In total there are 19 different dashboards you can use to
explore different time-series facets of your OpenStack environment.
Viewing Faults and Anomalies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The LMA Toolchain is capable of detecting a number of service-affecting
conditions such as the faults and anomalies that occured in your OpenStack
environment.
Those conditions are reported in annotations that are displayed in
Grafana. The Grafana annotations contain a textual
representation of the alarm (or set of alarms) that were triggered
by the Collectors for a service.
In other words, the annotations contain valuable insights
that you could use to diagnose and
troubleshoot problems. Furthermore, with the Grafana annotations,
the system makes a distinction between what is estimated as a
direct root cause versus what is estimated as an indirect
root cause. This is internally represented in a dependency graph.
There are first degree dependencies used to describe situations
whereby the health status of an entity
strictly depends on the health status of another entity. For
example Nova as a service has first degree dependencies
with the nova-api endpoints and the nova-scheduler workers. But
there are also second degree dependencies whereby the health
status of an entity doesn't strictly depends on the health status
of another entity, although it might, depending on other operations
being performed. For example, by default we declared that Nova
has a second degree dependency with Neutron. As a result, the
health status of Nova will not be directly impacted by the health
status of Neutron but the annotation will provide
a root cause analysis hint. Let's assume a situation
where Nova has changed from *okay* to *critical* status (because of
5xx HTTP errors) and that Neutron has been in *down* status for a while.
In this case, the Nova dashboard will display an annotation showing that
Nova has changed to a *warning* status because the system has detected
5xx errors and that it may be due to the fact that Neutron is *down*.
An example of what an annotation looks like is shown below.
.. image:: ../images/grafana_nova_annot.png
:width: 800
This annotation shows that the health status of Nova is *down*
because there is no *nova-api* service backend (viewed from HAProxy)
that is *up*.
Hiding nodes from dashboards
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When you remove a node from the environment, it is still displayed in
the 'server' and 'controller' drop-down lists. To hide it from the list
you need to edit the associated InfluxDB query in the *templating* section.
For example, if you want to remove *node-1*, you need to add the following
condition to the *where* clause::
and hostname != 'node-1'
.. image:: ../images/remove_controllers_from_templating.png
If you want to hide more than one node you can add more conditions like this::
and hostname != 'node-1' and hostname != 'node-2'
This should be done for all dashboards that display the deleted node and you
need to save them afterwards.
Troubleshooting
---------------
If you get no data in Grafana, follow these troubleshooting tips.
#. First, check that the LMA Collector is running properly by following the
LMA Collector troubleshooting instructions in the
`LMA Collector Fuel Plugin User Guide <http://fuel-plugin-lma-collector.readthedocs.org/>`_.
#. Check that the nodes are able to connect to the InfluxDB cluster via the VIP address
(see above how to get the InfluxDB cluster VIP address) on port *8086*::
root@node-2:~# curl -I http://<VIP>:8086/ping
The server should return a 204 HTTP status::
HTTP/1.1 204 No Content
Request-Id: cdc3c545-d19d-11e5-b457-000000000000
X-Influxdb-Version: 0.10.0
Date: Fri, 12 Feb 2016 15:32:19 GMT
#. Check that InfluxDB cluster VIP address is up and running::
root@node-1:~# crm resource status vip__influxdb
resource vip__influxdb is running on: node-1.test.domain.local
#. Check that the InfluxDB service is started on all nodes of the cluster::
root@node-1:~# service influxdb status
influxdb Process is running [ OK ]
#. If not, (re)start it::
root@node-1:~# service influxdb start
Starting the process influxdb [ OK ]
influxdb process was started [ OK ]
#. Check that Grafana server is running::
root@node-1:~# service grafana-server status
* grafana is running
#. If not, (re)start it::
root@node-1:~# service grafana-server start
* Starting Grafana Server
#. If none of the above solves the problem, check the logs in ``/var/log/influxdb/influxdb.log``
and ``/var/log/grafana/grafana.log`` to find out what might have gone wrong.

View File

@@ -0,0 +1,92 @@
.. _verification:
Plugin verification
-------------------
Be aware that depending on the number of nodes and deployment setup,
deploying a Mirantis OpenStack environment can typically take anything
from 30 minutes to several hours. But once your deployment is complete,
you should see a notification message indicating that you deployment
successfully completed as in the figure below.
.. image:: ../images/deployment_notification.png
:width: 800
Verifying InfluxDB
~~~~~~~~~~~~~~~~~~
You should verify that the InfluxDB cluster is running properly.
First, you need first to retreive the InfluxDB cluster VIP address.
Here is how to proceed.
#. On the Fuel Master node, find the IP address of a node where the InfluxDB
server is installed using the following command::
[root@fuel ~]# fuel nodes
id | status | name | cluster | ip | mac | roles |
---|----------|------------------|---------|------------|-----|------------------|
1 | ready | Untitled (fa:87) | 1 | 10.109.0.8 | ... | influxdb_grafana |
2 | ready | Untitled (12:aa) | 1 | 10.109.0.3 | ... | influxdb_grafana |
3 | ready | Untitled (4e:6e) | 1 | 10.109.0.7 | ... | influxdb_grafana |
#. Then `ssh` to anyone of these nodes (ex. *node-1*) and type the command::
root@node-1:~# hiera lma::influxdb::vip
10.109.1.4
This tells you that the VIP address of your InfluxDB cluster is *10.109.1.4*.
#. With that VIP address type the command::
root@node-1:~# /usr/bin/influx -database lma -password lmapass \
--username root -host 10.109.1.4 -port 8086
Visit https://enterprise.influxdata.com to register for updates,
InfluxDB server management, and monitoring.
Connected to http://10.109.1.4:8086 version 0.10.0
InfluxDB shell 0.10.0
>
As you can see, executing */usr/bin/influx* will start an interactive CLI and automatically connect to
the InfluxDB server. Then if you type::
> show series
You should see a dump of all the time-series collected so far.
Then, if you type::
> show servers
name: data_nodes
----------------
id http_addr tcp_addr
1 node-1:8086 node-1:8088
3 node-2:8086 node-2:8088
5 node-3:8086 node-3:8088
name: meta_nodes
----------------
id http_addr tcp_addr
1 node-1:8091 node-1:8088
2 node-2:8091 node-2:8088
4 node-3:8091 node-3:8088
You should see a list of the nodes participating in the `InfluxDB cluster
<https://docs.influxdata.com/influxdb/v0.10/guides/clustering/>`_ with their roles (data or meta).
Verifying Grafana
~~~~~~~~~~~~~~~~~
From the Fuel dDashboard, click on the **Grafana** link (or enter the IP address
and port number if your DNS is not setup).
The first time you access Grafana, you are requested to
authenticate using your credentials.
.. image:: ../images/grafana_login.png
:width: 800
Then you should be redirected to the *Grafana Home Page*
from where you can select a dashboard as shown below.
.. image:: ../images/grafana_home.png
:width: 800