documentation fixes; first steps guide

Change-Id: I8b6bcd32f9c8eac5f74a1c35fb28ee1f94e7e9d4
This commit is contained in:
Elisha Rosensweig 2016-04-18 18:47:05 +03:00
parent 05aa2e8617
commit e7c97e0075
6 changed files with 153 additions and 21 deletions

View File

@ -53,6 +53,8 @@ alarm is normalized. There are several guidelines for creating a config file:
- Defining a config file for each datasource is recommended, but not mandatory.
Datasources with no such configuration will use the values as-is.
Once the file is modified, you must restart **vitrage-graph** service to load
the changes.
Default Configuration
+++++++++++++++++++++

View File

@ -21,7 +21,7 @@ Installation
wget -q "https://labs.consol.de/repo/stable/RPM-GPG-KEY" -O - | sudo apt-key add -
2. Update your repo with the OMD site. For example, for ubuntu wheezy release:
2. Update your repo with the OMD site. For example, for ubuntu trusty release:
::
sudo bash -c "echo 'deb http://labs.consol.de/repo/stable/ubuntu trusty main' >> /etc/apt/sources.list"

View File

@ -53,6 +53,8 @@ resource is normalized. Some guidelines for creating a config file:
- Defining a config file for each datasource is recommended, but not mandatory.
Datasources with no such configuration will use the values as-is.
Once the file is modified, you must restart **vitrage-graph** service to load
the changes.
Default Configuration
+++++++++++++++++++++

View File

@ -39,7 +39,7 @@ Some physical entities, such as switches, can not be retrieved from OpenStack,
and so are defined here.
There may be more than one configuration file. All files will be read from
*/etc/vitrage/static_plugins/*. See previous section on how to configure this
*/etc/vitrage/static_datasources/*. See previous section on how to configure this
location.
Format
@ -79,15 +79,15 @@ of switch-2
entities:
- type: switch
name: switch-1
id: 11111
id: switch-1 # should be same as name
state: available
relationships:
- type: nova.host
name: host-1
id: 22222
id: host-1 # should be same as name
relation_type: attached
- type: switch
name: switch-2
id: 33333
id: switch-2 # should be same as name
relation_type: backup

View File

@ -0,0 +1,113 @@
===============================
Vitrage - Getting Started Guide
===============================
This document explains how to get started using Vitrage. Here you will find
easy-to-follow instructions on how to install & configure Vitrage to suit
your needs, try out its different functions, and expand it's capabilities.
Before you start
================
Installation
============
- `Enable Vitrage in devstack <https://github.com/openstack/vitrage/blob/master/devstack/README.rst/>`_
- `Enable Vitrage in horizon <https://github.com/openstack/vitrage-dashboard/blob/master/README.rst/>`_
- run ./stack
Nagios Installation & Configuration
===================================
Nagios_ is a widely-used tool for monitoring hardware and software systems.
It periodically runs tests on the entities it monitors, and sets the state
of these tests to OK (pass) or different levels of severity.
Vitrage comes with Nagios as a datasource, The examples given below use Nagios
as the trigger for deduced alarms, states and RCA templates in Vitrage.
.. _Nagios: https://www.nagios.org/
- `Install Nagios on your devstack <https://github.com/openstack/vitrage/blob/master/doc/source/nagios-devstack-installation.rst/>`_
- `Configure Nagios datasource <https://github.com/openstack/vitrage/blob/master/doc/source/nagios-config.rst>`_
Vitrage in action
=================
In order to see Vitrage in action, it comes prepackaged with a sample template
that demonstrate its functionality. This can be found (with default config) at
*/etc/vitrage/templates*.
In the example shown here, we will cause Nagios to report high memory usage on
the devstack host. As a result and as defined in our sample template, Vitrage
will change the state of the hosted instances to "suboptimal", raise an alarm
on each and indicate that the host-level alarm is the cause for the instance
alarms.
Setting up
----------
- Deploy several (3-5) instances on your devstack. Make sure that they are
in state "Running" before continuing.
- In your browser, go to the Nagios site you defined. If you used the
steps defined above,
- URL: *http://<IP>:54321/my_site/omd/*.
- Select "Classic Nagios GUI" (other views are ok as well, the instructions
below on raising alarms are for this view)
- User/Password: omdadmin/omd
- Set the "Memory Used" test to "Warning":
- Click on *Services --> Memory Used*
- On the right pane, select "Submit passive check result for this service"
- For the "Check result" enter "Warning", and for "Check Output" enter
"High memory usage". Click *commit*, then *Done*.
- On the right pane, select "Stop accepting passive checks for this service"
and then *Done*.
With the alarm on the host now activated, lets see how this is expressed in
Vitrage.
Deduced State
-------------
- In the Horizon UI, select *Vitrage --> Topology*
- The UI will now show the Sunburst view of the compute hierarchy. The color
of each resource reflects its state: green (ok), yellow (warning), red
(critical).
A list of alarms will appear in the UI, showing an alarm on the host, as well
as one alarm per instance.
Deduced Alarm
-------------
- In the Horizon UI, select *Vitrage --> Alarms*
- A list of alarms will appear in the UI, showing an alarm on the host, as well
as one alarm per instance.
Root Cause Analysis
-------------------
- In the Horizon UI, select *Vitrage --> Alarms*
- Select a host alarm, and click on the RCA icon in the far right-hand side of
the screen. This will show how the host alarm caused the instance alarms
Advanced Usage
==============
Modify states & severities
--------------------------
Since each data-source might represent a resource state or alarm severity
differently, for each data-source you can define it's own mapping to the
*normalized* states/severities supported in Vitrage. This will impact UI and
templates behavior that depends on these fields.
- `Resource state configuration <https://github.com/openstack/vitrage/blob/master/doc/source/resource-state-config.rst/>`_
- `Alarm severity configuration <https://github.com/openstack/vitrage/blob/master/doc/source/alarm-state-config.rst/>`_
Writing your own templates
--------------------------
For more information regarding Vitrage templates, their format and how to add
them, see here_.
.. _here: https://github.com/openstack/vitrage/blob/master/doc/source/vitrage-template-format.rst

View File

@ -9,10 +9,14 @@ Add Nova Instance
:align: center
#. Nova Synchronizer plugin queries all Nova instances, or gets a message bus notification about a new Nova instance
#. Nova Synchronizer plugin sends corresponding events to the Entity Queue
#. The Entity Processor polls the Entity Queue and gets the new Nova Instance event
#. The Entity Processor passes the event to the Nova Instance Transformer plugin, which returns a Vertex with the instance data, and an edge to the host Vertex in the graph
#. Nova datasource Driver queries all Nova instances, or gets a message bus
notification about a new Nova instance
#. Nova datasource Driver sends corresponding events to the Entity Queue
#. The Entity Processor polls the Entity Queue and gets the new Nova Instance
event
#. The Entity Processor passes the event to the Nova Instance Transformer,
which returns a Vertex with the instance data, with an edge to the host
Vertex in the graph
#. The Entity Processor adds the new vertex and edge to the Graph
.. image:: ./images/add_nova_instance_graph.png
@ -27,10 +31,12 @@ Add Aodh Alarm
:align: center
#. Aodh Synchronizer plugin queries all Aodh alarms, or gets a notification (TBD) about an Aodh alarm state change
#. Aodh Synchronizer plugin sends corresponding events to the Entity Queue
#. The Entity Processor polls the Entity Queue and gets the Aodh Alarm event, for example threshold alarm on Instance1 CPU
#. The Entity Processor passes the event to the Aodh Alarm Transformer plugin, which returns a Vertex with the alarm data, and an edge to the instance Vertex
#. Aodh Driver queries all Aodh alarms
#. Aodh Driver sends corresponding events to the Entity Queue
#. The Entity Processor polls the Entity Queue and gets the Aodh Alarm event,
for example threshold alarm on Instance-1 CPU
#. The Entity Processor passes the event to the Aodh Alarm Transformer, which
returns a Vertex with the alarm data, with an edge to the instance Vertex
#. The Entity Processor adds the new vertex and edge to the Graph
.. image:: ./images/add_aodh_alarm_graph.png
@ -45,12 +51,18 @@ Nagios Alarm Causes Deduced Alarm
:align: center
5. (steps 1-5) Nagios Synchronizer plugin pushes a nagios alarm on a switch to the Entity Queue, which is converted by Nagios Transformer to a vertex and inserted to the Graph
6. The Evaluator is notified about a new Vertex (Nagios switch alarm) that was added to the graph
7. The Evaluator performs its calculations (TBD) and deduces that alarms should be triggered on every instance on every host attached to this switch
5. (steps 1-4) Nagios datasource driver pushes a nagios alarm on a switch to
the Entity Queue, which is converted by Nagios Transformer to a vertex and
inserted to the Graph
6. The Evaluator is notified about a new Vertex (Nagios switch alarm) that was
added to the graph
7. The Evaluator performs its calculations and deduces that alarms should be
triggered on every instance on every host attached to this switch
8. The Evaluator pushes alarms to the Entity Queue
9. The Evaluator asks the notifier to notify on these new alarms
10. Aodh Notifier creates new alarm definitions in Aodh, and sets their states to "alarm"
9. The graph is updated with these new alarms
10. The graph writes to the message bus that new alarms were created
11. Aodh Notifier creates new alarm definitions in Aodh, and sets their states
to "alarm"
.. image:: ./images/nagios_causes_deduced_graph.png
:width: 100%
@ -64,15 +76,18 @@ Create RCA Insights
:align: center
#. The Evaluator is notified of a new alarm.
#. The Evaluator evaluates the templates and the Graph (TBD), and decides that there is a root cause relation between two alarms. It adds a "causes" edge to the Graph
#. The Evaluator is notified of a new alarm *Alarm-X*.
#. The Evaluator evaluates the templates and the Graph, and decides that there
is a root cause relation between *Alarm-X* and *Alarm-Y*. It adds a "causes"
edge to the Graph
.. image:: ./images/rca_graph.png
:width: 100%
:align: center
Note that in future versions the graph with RCA information may become more complex, for example:
Note that in future versions the graph with RCA information may become more
complex, for example:
.. image:: ./images/complex_rca_graph.png
:width: 100%