diff --git a/doc/source/alarm-state-config.rst b/doc/source/alarm-state-config.rst index 570b516a5..94766f3ef 100644 --- a/doc/source/alarm-state-config.rst +++ b/doc/source/alarm-state-config.rst @@ -53,6 +53,8 @@ alarm is normalized. There are several guidelines for creating a config file: - Defining a config file for each datasource is recommended, but not mandatory. Datasources with no such configuration will use the values as-is. +Once the file is modified, you must restart **vitrage-graph** service to load +the changes. Default Configuration +++++++++++++++++++++ diff --git a/doc/source/nagios-devstack-installation.rst b/doc/source/nagios-devstack-installation.rst index fa9a75d77..0f24c0c56 100644 --- a/doc/source/nagios-devstack-installation.rst +++ b/doc/source/nagios-devstack-installation.rst @@ -21,7 +21,7 @@ Installation wget -q "https://labs.consol.de/repo/stable/RPM-GPG-KEY" -O - | sudo apt-key add - -2. Update your repo with the OMD site. For example, for ubuntu wheezy release: +2. Update your repo with the OMD site. For example, for ubuntu trusty release: :: sudo bash -c "echo 'deb http://labs.consol.de/repo/stable/ubuntu trusty main' >> /etc/apt/sources.list" diff --git a/doc/source/resource-state-config.rst b/doc/source/resource-state-config.rst index 677742430..9a9675fe3 100644 --- a/doc/source/resource-state-config.rst +++ b/doc/source/resource-state-config.rst @@ -53,6 +53,8 @@ resource is normalized. Some guidelines for creating a config file: - Defining a config file for each datasource is recommended, but not mandatory. Datasources with no such configuration will use the values as-is. +Once the file is modified, you must restart **vitrage-graph** service to load +the changes. Default Configuration +++++++++++++++++++++ diff --git a/doc/source/static-physical-config.rst b/doc/source/static-physical-config.rst index 455492333..b8230e9a8 100644 --- a/doc/source/static-physical-config.rst +++ b/doc/source/static-physical-config.rst @@ -39,7 +39,7 @@ Some physical entities, such as switches, can not be retrieved from OpenStack, and so are defined here. There may be more than one configuration file. All files will be read from -*/etc/vitrage/static_plugins/*. See previous section on how to configure this +*/etc/vitrage/static_datasources/*. See previous section on how to configure this location. Format @@ -79,15 +79,15 @@ of switch-2 entities: - type: switch name: switch-1 - id: 11111 + id: switch-1 # should be same as name state: available relationships: - type: nova.host name: host-1 - id: 22222 + id: host-1 # should be same as name relation_type: attached - type: switch name: switch-2 - id: 33333 + id: switch-2 # should be same as name relation_type: backup diff --git a/doc/source/vitrage-first_steps.rst b/doc/source/vitrage-first_steps.rst new file mode 100644 index 000000000..ff71278c0 --- /dev/null +++ b/doc/source/vitrage-first_steps.rst @@ -0,0 +1,113 @@ +=============================== +Vitrage - Getting Started Guide +=============================== + +This document explains how to get started using Vitrage. Here you will find +easy-to-follow instructions on how to install & configure Vitrage to suit +your needs, try out its different functions, and expand it's capabilities. + +Before you start +================ + +Installation +============ +- `Enable Vitrage in devstack `_ +- `Enable Vitrage in horizon `_ +- run ./stack + + +Nagios Installation & Configuration +=================================== +Nagios_ is a widely-used tool for monitoring hardware and software systems. +It periodically runs tests on the entities it monitors, and sets the state +of these tests to OK (pass) or different levels of severity. + +Vitrage comes with Nagios as a datasource, The examples given below use Nagios +as the trigger for deduced alarms, states and RCA templates in Vitrage. + +.. _Nagios: https://www.nagios.org/ + +- `Install Nagios on your devstack `_ +- `Configure Nagios datasource `_ + + +Vitrage in action +================= + +In order to see Vitrage in action, it comes prepackaged with a sample template +that demonstrate its functionality. This can be found (with default config) at +*/etc/vitrage/templates*. + +In the example shown here, we will cause Nagios to report high memory usage on +the devstack host. As a result and as defined in our sample template, Vitrage +will change the state of the hosted instances to "suboptimal", raise an alarm +on each and indicate that the host-level alarm is the cause for the instance +alarms. + +Setting up +---------- +- Deploy several (3-5) instances on your devstack. Make sure that they are + in state "Running" before continuing. +- In your browser, go to the Nagios site you defined. If you used the + steps defined above, + - URL: *http://:54321/my_site/omd/*. + - Select "Classic Nagios GUI" (other views are ok as well, the instructions + below on raising alarms are for this view) + - User/Password: omdadmin/omd +- Set the "Memory Used" test to "Warning": + - Click on *Services --> Memory Used* + - On the right pane, select "Submit passive check result for this service" + - For the "Check result" enter "Warning", and for "Check Output" enter + "High memory usage". Click *commit*, then *Done*. + - On the right pane, select "Stop accepting passive checks for this service" + and then *Done*. + +With the alarm on the host now activated, lets see how this is expressed in +Vitrage. + + +Deduced State +------------- + +- In the Horizon UI, select *Vitrage --> Topology* +- The UI will now show the Sunburst view of the compute hierarchy. The color + of each resource reflects its state: green (ok), yellow (warning), red + (critical). + + A list of alarms will appear in the UI, showing an alarm on the host, as well + as one alarm per instance. + + +Deduced Alarm +------------- + +- In the Horizon UI, select *Vitrage --> Alarms* +- A list of alarms will appear in the UI, showing an alarm on the host, as well + as one alarm per instance. + + +Root Cause Analysis +------------------- +- In the Horizon UI, select *Vitrage --> Alarms* +- Select a host alarm, and click on the RCA icon in the far right-hand side of + the screen. This will show how the host alarm caused the instance alarms + +Advanced Usage +============== + +Modify states & severities +-------------------------- +Since each data-source might represent a resource state or alarm severity +differently, for each data-source you can define it's own mapping to the +*normalized* states/severities supported in Vitrage. This will impact UI and +templates behavior that depends on these fields. + +- `Resource state configuration `_ +- `Alarm severity configuration `_ + +Writing your own templates +-------------------------- +For more information regarding Vitrage templates, their format and how to add +them, see here_. + +.. _here: https://github.com/openstack/vitrage/blob/master/doc/source/vitrage-template-format.rst diff --git a/doc/source/vitrage-use-cases.rst b/doc/source/vitrage-use-cases.rst index c6686f351..23e1fcfc0 100644 --- a/doc/source/vitrage-use-cases.rst +++ b/doc/source/vitrage-use-cases.rst @@ -9,10 +9,14 @@ Add Nova Instance :align: center -#. Nova Synchronizer plugin queries all Nova instances, or gets a message bus notification about a new Nova instance -#. Nova Synchronizer plugin sends corresponding events to the Entity Queue -#. The Entity Processor polls the Entity Queue and gets the new Nova Instance event -#. The Entity Processor passes the event to the Nova Instance Transformer plugin, which returns a Vertex with the instance data, and an edge to the host Vertex in the graph +#. Nova datasource Driver queries all Nova instances, or gets a message bus + notification about a new Nova instance +#. Nova datasource Driver sends corresponding events to the Entity Queue +#. The Entity Processor polls the Entity Queue and gets the new Nova Instance + event +#. The Entity Processor passes the event to the Nova Instance Transformer, + which returns a Vertex with the instance data, with an edge to the host + Vertex in the graph #. The Entity Processor adds the new vertex and edge to the Graph .. image:: ./images/add_nova_instance_graph.png @@ -27,10 +31,12 @@ Add Aodh Alarm :align: center -#. Aodh Synchronizer plugin queries all Aodh alarms, or gets a notification (TBD) about an Aodh alarm state change -#. Aodh Synchronizer plugin sends corresponding events to the Entity Queue -#. The Entity Processor polls the Entity Queue and gets the Aodh Alarm event, for example threshold alarm on Instance1 CPU -#. The Entity Processor passes the event to the Aodh Alarm Transformer plugin, which returns a Vertex with the alarm data, and an edge to the instance Vertex +#. Aodh Driver queries all Aodh alarms +#. Aodh Driver sends corresponding events to the Entity Queue +#. The Entity Processor polls the Entity Queue and gets the Aodh Alarm event, + for example threshold alarm on Instance-1 CPU +#. The Entity Processor passes the event to the Aodh Alarm Transformer, which + returns a Vertex with the alarm data, with an edge to the instance Vertex #. The Entity Processor adds the new vertex and edge to the Graph .. image:: ./images/add_aodh_alarm_graph.png @@ -45,12 +51,18 @@ Nagios Alarm Causes Deduced Alarm :align: center -5. (steps 1-5) Nagios Synchronizer plugin pushes a nagios alarm on a switch to the Entity Queue, which is converted by Nagios Transformer to a vertex and inserted to the Graph -6. The Evaluator is notified about a new Vertex (Nagios switch alarm) that was added to the graph -7. The Evaluator performs its calculations (TBD) and deduces that alarms should be triggered on every instance on every host attached to this switch +5. (steps 1-4) Nagios datasource driver pushes a nagios alarm on a switch to + the Entity Queue, which is converted by Nagios Transformer to a vertex and + inserted to the Graph +6. The Evaluator is notified about a new Vertex (Nagios switch alarm) that was + added to the graph +7. The Evaluator performs its calculations and deduces that alarms should be + triggered on every instance on every host attached to this switch 8. The Evaluator pushes alarms to the Entity Queue -9. The Evaluator asks the notifier to notify on these new alarms -10. Aodh Notifier creates new alarm definitions in Aodh, and sets their states to "alarm" +9. The graph is updated with these new alarms +10. The graph writes to the message bus that new alarms were created +11. Aodh Notifier creates new alarm definitions in Aodh, and sets their states + to "alarm" .. image:: ./images/nagios_causes_deduced_graph.png :width: 100% @@ -64,15 +76,18 @@ Create RCA Insights :align: center -#. The Evaluator is notified of a new alarm. -#. The Evaluator evaluates the templates and the Graph (TBD), and decides that there is a root cause relation between two alarms. It adds a "causes" edge to the Graph +#. The Evaluator is notified of a new alarm *Alarm-X*. +#. The Evaluator evaluates the templates and the Graph, and decides that there + is a root cause relation between *Alarm-X* and *Alarm-Y*. It adds a "causes" + edge to the Graph .. image:: ./images/rca_graph.png :width: 100% :align: center -Note that in future versions the graph with RCA information may become more complex, for example: +Note that in future versions the graph with RCA information may become more +complex, for example: .. image:: ./images/complex_rca_graph.png :width: 100%