Merge "Update documentation for 1.0"

This commit is contained in:
Jenkins 2016-10-14 08:34:15 +00:00 committed by Gerrit Code Review
commit 1c12b277bf
3 changed files with 2290 additions and 16 deletions

File diff suppressed because it is too large Load Diff

View File

@ -368,12 +368,21 @@ file. This file has the following sections:
to that category of nodes. For example:: to that category of nodes. For example::
node_cluster_alarms: node_cluster_alarms:
controller: controller-nodes:
cpu: ['cpu-critical-controller', 'cpu-warning-controller'] apply_to_node: controller
root-fs: ['root-fs-critical', 'root-fs-warning'] alerting: enabled
log-fs: ['log-fs-critical', 'log-fs-warning'] members:
cpu:
alarms: ['cpu-critical-controller', 'cpu-warning-controller']
root-fs:
alarms: ['root-fs-critical', 'root-fs-warning']
log-fs:
alarms: ['log-fs-critical', 'log-fs-warning']
hdd-errors:
alerting: enabled_with_notification
alarms: ['hdd-errors-critical']
Creates three alarm groups for the cluster of nodes called 'controller': Creates four alarm groups for the cluster of controller nodes:
* The *cpu* alarm group is mapped to two alarms defined in the ``alarms`` * The *cpu* alarm group is mapped to two alarms defined in the ``alarms``
section known as the 'cpu-critical-controller' and section known as the 'cpu-critical-controller' and
@ -388,6 +397,13 @@ file. This file has the following sections:
section known as the 'log-fs-critical' and 'log-fs-warning' alarms. These section known as the 'log-fs-critical' and 'log-fs-warning' alarms. These
alarms monitor the file system where the logs are created on the alarms monitor the file system where the logs are created on the
controller nodes. controller nodes.
* The *hdd-errors* alarm group is mapped to the 'hdd-errors-critical' alarm
defined in the ``alarms`` section. This alarm monitors the ``kern.log``
log entries containing critical IO errors detected by the kernel.
The *hdd-error* alarm obtains the *enabled_with_notification* alerting
attribute, meaning that the operator will be notified if any of the
controller nodes encounters a disk failure. Other alarms do not trigger
notification per node but at an aggregated cluster level.
.. note:: An *alarm group* is a mere implementation artifact (although it .. note:: An *alarm group* is a mere implementation artifact (although it
has functional value) that is primarily used to distribute the alarms has functional value) that is primarily used to distribute the alarms
@ -425,7 +441,7 @@ structure of that file.
important to keep exactly the same copy of important to keep exactly the same copy of
``/etc/hiera/override/gse_filters.yaml`` across all the nodes of the ``/etc/hiera/override/gse_filters.yaml`` across all the nodes of the
OpenStack environment including the node(s) where Nagios is installed. OpenStack environment including the node(s) where Nagios is installed.
The aggregation rules and correlation policies are defined in the ``/etc/hiera/override/gse_filters.yaml`` configuration file. The aggregation rules and correlation policies are defined in the ``/etc/hiera/override/gse_filters.yaml`` configuration file.
This file has the following sections: This file has the following sections:
@ -590,6 +606,7 @@ the service cluster aggregation rules::
output_metric_name: cluster_service_status output_metric_name: cluster_service_status
interval: 10 interval: 10
warm_up_period: 20 warm_up_period: 20
alerting: enabled_with_notification
clusters: clusters:
nova-api: nova-api:
policy: highest_severity policy: highest_severity
@ -638,6 +655,10 @@ Where
| The number of seconds after a (re)start that the GSE plugin will wait | The number of seconds after a (re)start that the GSE plugin will wait
before emitting its metric messages. before emitting its metric messages.
| alerting
| Type: string (one of 'disabled', 'enabled' or 'enabled_with_notification').
| The alerting configuration of the service clusters.
| clusters | clusters
| Type: list | Type: list
| The list of service clusters that the plugin handles. See | The list of service clusters that the plugin handles. See
@ -720,6 +741,7 @@ cluster aggregation rules::
output_metric_name: cluster_node_status output_metric_name: cluster_node_status
interval: 10 interval: 10
warm_up_period: 80 warm_up_period: 80
alerting: enabled_with_notification
clusters: clusters:
controller: controller:
policy: majority_of_members policy: majority_of_members
@ -768,6 +790,10 @@ Where
| The number of seconds after a (re)start that the GSE plugin will wait | The number of seconds after a (re)start that the GSE plugin will wait
before emitting its metric messages. before emitting its metric messages.
| alerting
| Type: string (one of 'disabled', 'enabled' or 'enabled_with_notification').
| The alerting configuration of the node clusters.
| clusters | clusters
| Type: list | Type: list
| The list of node clusters that the plugin handles. See | The list of node clusters that the plugin handles. See

View File

@ -10,6 +10,34 @@ Release notes
Version 1.0.0 Version 1.0.0
+++++++++++++ +++++++++++++
The StackLight Collector plugin 1.0.0 for Fuel contains the following updates:
New alarms:
* Monitor RabbitMQ based on Pacemaker point-of-view
* Monitor all partitions and OSD disk(s)
* Horizon HTTP 5xx errors
* Keystone slow response times
* HDD errors
* SWAP percent usage
* Network packet drops
* Local OpenStack API checks
* Local checks for services: Apache, Memcached, MySQL, RabbitMQ, Pacemaker
Alarm enhancements:
* Added the ``group by`` attribute support for alarm rules
* Added support for ``pattern matching`` to filter metric dimensions
Bug fixes:
* Fixed the concurrent execution of logrotate.
See `#1455104 <https://bugs.launchpad.net/lma-toolchain/+bug/1455104>`_.
* Implemented the capability for the Elasticsearch bulk size to increase when
required. See `#1617211 <https://bugs.launchpad.net/lma-toolchain/+bug/1617211>`_.
* Implemented the capability to use RabbitMQ management API in place of the
:command:`rabbitmqctl` command.
Version 0.10.0 Version 0.10.0
++++++++++++++ ++++++++++++++