Merge "Update documentation for 1.0"

This commit is contained in:
Jenkins 2016-10-14 08:34:15 +00:00 committed by Gerrit Code Review
commit 1c12b277bf
3 changed files with 2290 additions and 16 deletions

File diff suppressed because it is too large Load Diff

View File

@ -368,12 +368,21 @@ file. This file has the following sections:
to that category of nodes. For example::
node_cluster_alarms:
controller:
cpu: ['cpu-critical-controller', 'cpu-warning-controller']
root-fs: ['root-fs-critical', 'root-fs-warning']
log-fs: ['log-fs-critical', 'log-fs-warning']
controller-nodes:
apply_to_node: controller
alerting: enabled
members:
cpu:
alarms: ['cpu-critical-controller', 'cpu-warning-controller']
root-fs:
alarms: ['root-fs-critical', 'root-fs-warning']
log-fs:
alarms: ['log-fs-critical', 'log-fs-warning']
hdd-errors:
alerting: enabled_with_notification
alarms: ['hdd-errors-critical']
Creates three alarm groups for the cluster of nodes called 'controller':
Creates four alarm groups for the cluster of controller nodes:
* The *cpu* alarm group is mapped to two alarms defined in the ``alarms``
section known as the 'cpu-critical-controller' and
@ -388,6 +397,13 @@ file. This file has the following sections:
section known as the 'log-fs-critical' and 'log-fs-warning' alarms. These
alarms monitor the file system where the logs are created on the
controller nodes.
* The *hdd-errors* alarm group is mapped to the 'hdd-errors-critical' alarm
defined in the ``alarms`` section. This alarm monitors the ``kern.log``
log entries containing critical IO errors detected by the kernel.
The *hdd-error* alarm obtains the *enabled_with_notification* alerting
attribute, meaning that the operator will be notified if any of the
controller nodes encounters a disk failure. Other alarms do not trigger
notification per node but at an aggregated cluster level.
.. note:: An *alarm group* is a mere implementation artifact (although it
has functional value) that is primarily used to distribute the alarms
@ -425,7 +441,7 @@ structure of that file.
important to keep exactly the same copy of
``/etc/hiera/override/gse_filters.yaml`` across all the nodes of the
OpenStack environment including the node(s) where Nagios is installed.
The aggregation rules and correlation policies are defined in the ``/etc/hiera/override/gse_filters.yaml`` configuration file.
This file has the following sections:
@ -590,6 +606,7 @@ the service cluster aggregation rules::
output_metric_name: cluster_service_status
interval: 10
warm_up_period: 20
alerting: enabled_with_notification
clusters:
nova-api:
policy: highest_severity
@ -638,6 +655,10 @@ Where
| The number of seconds after a (re)start that the GSE plugin will wait
before emitting its metric messages.
| alerting
| Type: string (one of 'disabled', 'enabled' or 'enabled_with_notification').
| The alerting configuration of the service clusters.
| clusters
| Type: list
| The list of service clusters that the plugin handles. See
@ -720,6 +741,7 @@ cluster aggregation rules::
output_metric_name: cluster_node_status
interval: 10
warm_up_period: 80
alerting: enabled_with_notification
clusters:
controller:
policy: majority_of_members
@ -768,6 +790,10 @@ Where
| The number of seconds after a (re)start that the GSE plugin will wait
before emitting its metric messages.
| alerting
| Type: string (one of 'disabled', 'enabled' or 'enabled_with_notification').
| The alerting configuration of the node clusters.
| clusters
| Type: list
| The list of node clusters that the plugin handles. See

View File

@ -10,6 +10,34 @@ Release notes
Version 1.0.0
+++++++++++++
The StackLight Collector plugin 1.0.0 for Fuel contains the following updates:
New alarms:
* Monitor RabbitMQ based on Pacemaker point-of-view
* Monitor all partitions and OSD disk(s)
* Horizon HTTP 5xx errors
* Keystone slow response times
* HDD errors
* SWAP percent usage
* Network packet drops
* Local OpenStack API checks
* Local checks for services: Apache, Memcached, MySQL, RabbitMQ, Pacemaker
Alarm enhancements:
* Added the ``group by`` attribute support for alarm rules
* Added support for ``pattern matching`` to filter metric dimensions
Bug fixes:
* Fixed the concurrent execution of logrotate.
See `#1455104 <https://bugs.launchpad.net/lma-toolchain/+bug/1455104>`_.
* Implemented the capability for the Elasticsearch bulk size to increase when
required. See `#1617211 <https://bugs.launchpad.net/lma-toolchain/+bug/1617211>`_.
* Implemented the capability to use RabbitMQ management API in place of the
:command:`rabbitmqctl` command.
Version 0.10.0
++++++++++++++