Update documentation for 1.0
Change-Id: I055bf6d534ed712b6f0b194454ef62b1902d1c01
This commit is contained in:
parent
8db734a584
commit
d248ed36d7
File diff suppressed because it is too large
Load Diff
@ -368,12 +368,21 @@ file. This file has the following sections:
|
||||
to that category of nodes. For example::
|
||||
|
||||
node_cluster_alarms:
|
||||
controller:
|
||||
cpu: ['cpu-critical-controller', 'cpu-warning-controller']
|
||||
root-fs: ['root-fs-critical', 'root-fs-warning']
|
||||
log-fs: ['log-fs-critical', 'log-fs-warning']
|
||||
controller-nodes:
|
||||
apply_to_node: controller
|
||||
alerting: enabled
|
||||
members:
|
||||
cpu:
|
||||
alarms: ['cpu-critical-controller', 'cpu-warning-controller']
|
||||
root-fs:
|
||||
alarms: ['root-fs-critical', 'root-fs-warning']
|
||||
log-fs:
|
||||
alarms: ['log-fs-critical', 'log-fs-warning']
|
||||
hdd-errors:
|
||||
alerting: enabled_with_notification
|
||||
alarms: ['hdd-errors-critical']
|
||||
|
||||
Creates three alarm groups for the cluster of nodes called 'controller':
|
||||
Creates four alarm groups for the cluster of controller nodes:
|
||||
|
||||
* The *cpu* alarm group is mapped to two alarms defined in the ``alarms``
|
||||
section known as the 'cpu-critical-controller' and
|
||||
@ -388,6 +397,13 @@ file. This file has the following sections:
|
||||
section known as the 'log-fs-critical' and 'log-fs-warning' alarms. These
|
||||
alarms monitor the file system where the logs are created on the
|
||||
controller nodes.
|
||||
* The *hdd-errors* alarm group is mapped to the 'hdd-errors-critical' alarm
|
||||
defined in the ``alarms`` section. This alarm monitors the ``kern.log``
|
||||
log entries containing critical IO errors detected by the kernel.
|
||||
The *hdd-error* alarm obtains the *enabled_with_notification* alerting
|
||||
attribute, meaning that the operator will be notified if any of the
|
||||
controller nodes encounters a disk failure. Other alarms do not trigger
|
||||
notification per node but at an aggregated cluster level.
|
||||
|
||||
.. note:: An *alarm group* is a mere implementation artifact (although it
|
||||
has functional value) that is primarily used to distribute the alarms
|
||||
@ -590,6 +606,7 @@ the service cluster aggregation rules::
|
||||
output_metric_name: cluster_service_status
|
||||
interval: 10
|
||||
warm_up_period: 20
|
||||
alerting: enabled_with_notification
|
||||
clusters:
|
||||
nova-api:
|
||||
policy: highest_severity
|
||||
@ -638,6 +655,10 @@ Where
|
||||
| The number of seconds after a (re)start that the GSE plugin will wait
|
||||
before emitting its metric messages.
|
||||
|
||||
| alerting
|
||||
| Type: string (one of 'disabled', 'enabled' or 'enabled_with_notification').
|
||||
| The alerting configuration of the service clusters.
|
||||
|
||||
| clusters
|
||||
| Type: list
|
||||
| The list of service clusters that the plugin handles. See
|
||||
@ -720,6 +741,7 @@ cluster aggregation rules::
|
||||
output_metric_name: cluster_node_status
|
||||
interval: 10
|
||||
warm_up_period: 80
|
||||
alerting: enabled_with_notification
|
||||
clusters:
|
||||
controller:
|
||||
policy: majority_of_members
|
||||
@ -768,6 +790,10 @@ Where
|
||||
| The number of seconds after a (re)start that the GSE plugin will wait
|
||||
before emitting its metric messages.
|
||||
|
||||
| alerting
|
||||
| Type: string (one of 'disabled', 'enabled' or 'enabled_with_notification').
|
||||
| The alerting configuration of the node clusters.
|
||||
|
||||
| clusters
|
||||
| Type: list
|
||||
| The list of node clusters that the plugin handles. See
|
||||
|
@ -10,6 +10,34 @@ Release notes
|
||||
Version 1.0.0
|
||||
+++++++++++++
|
||||
|
||||
The StackLight Collector plugin 1.0.0 for Fuel contains the following updates:
|
||||
|
||||
New alarms:
|
||||
|
||||
* Monitor RabbitMQ based on Pacemaker point-of-view
|
||||
* Monitor all partitions and OSD disk(s)
|
||||
* Horizon HTTP 5xx errors
|
||||
* Keystone slow response times
|
||||
* HDD errors
|
||||
* SWAP percent usage
|
||||
* Network packet drops
|
||||
* Local OpenStack API checks
|
||||
* Local checks for services: Apache, Memcached, MySQL, RabbitMQ, Pacemaker
|
||||
|
||||
Alarm enhancements:
|
||||
|
||||
* Added the ``group by`` attribute support for alarm rules
|
||||
* Added support for ``pattern matching`` to filter metric dimensions
|
||||
|
||||
Bug fixes:
|
||||
|
||||
* Fixed the concurrent execution of logrotate.
|
||||
See `#1455104 <https://bugs.launchpad.net/lma-toolchain/+bug/1455104>`_.
|
||||
* Implemented the capability for the Elasticsearch bulk size to increase when
|
||||
required. See `#1617211 <https://bugs.launchpad.net/lma-toolchain/+bug/1617211>`_.
|
||||
* Implemented the capability to use RabbitMQ management API in place of the
|
||||
:command:`rabbitmqctl` command.
|
||||
|
||||
Version 0.10.0
|
||||
++++++++++++++
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user