Update documentation for 1.0

Change-Id: I055bf6d534ed712b6f0b194454ef62b1902d1c01
2016-10-12 17:02:48 +02:00 · 2016-10-12 17:02:48 +02:00 · d248ed36d7
commit d248ed36d7
parent 8db734a584
3 changed files with 2290 additions and 16 deletions
--- a/doc/user/source/appendix_alarms.rst
+++ b/doc/user/source/appendix_alarms.rst
--- a/doc/user/source/configure_alarms.rst
+++ b/doc/user/source/configure_alarms.rst
@ -368,12 +368,21 @@ file. This file has the following sections:
   to that category of nodes. For example::

     node_cluster_alarms:
-        controller:
-         cpu: ['cpu-critical-controller', 'cpu-warning-controller']
-         root-fs: ['root-fs-critical', 'root-fs-warning']
-         log-fs: ['log-fs-critical', 'log-fs-warning']
+        controller-nodes:
+            apply_to_node: controller
+            alerting: enabled
+            members:
+                cpu:
+                    alarms: ['cpu-critical-controller', 'cpu-warning-controller']
+                root-fs:
+                    alarms: ['root-fs-critical', 'root-fs-warning']
+                log-fs:
+                    alarms: ['log-fs-critical', 'log-fs-warning']
+                hdd-errors:
+                    alerting: enabled_with_notification
+                    alarms: ['hdd-errors-critical']

-   Creates three alarm groups for the cluster of nodes called 'controller':
+   Creates four alarm groups for the cluster of controller nodes:

   * The *cpu* alarm group is mapped to two alarms defined in the ``alarms``
     section known as the 'cpu-critical-controller' and
@ -388,6 +397,13 @@ file. This file has the following sections:
     section known as the 'log-fs-critical' and 'log-fs-warning' alarms. These
     alarms monitor the file system where the logs are created on the
     controller nodes.
+   * The *hdd-errors* alarm group is mapped to the 'hdd-errors-critical' alarm
+     defined in the ``alarms`` section. This alarm monitors the ``kern.log``
+     log entries containing critical IO errors detected by the kernel.
+     The *hdd-error* alarm obtains the *enabled_with_notification* alerting
+     attribute, meaning that the operator will be notified if any of the
+     controller nodes encounters a disk failure. Other alarms do not trigger
+     notification per node but at an aggregated cluster level.

   .. note:: An *alarm group* is a mere implementation artifact (although it
      has functional value) that is primarily used to distribute the alarms
@ -590,6 +606,7 @@ the service cluster aggregation rules::
    output_metric_name: cluster_service_status
    interval: 10
    warm_up_period: 20
+    alerting: enabled_with_notification
    clusters:
      nova-api:
        policy: highest_severity
@ -638,6 +655,10 @@ Where
 |   The number of seconds after a (re)start that the GSE plugin will wait
    before emitting its metric messages.

+| alerting
+|   Type: string (one of 'disabled', 'enabled' or 'enabled_with_notification').
+|   The alerting configuration of the service clusters.
+
 | clusters
 |   Type: list
 |   The list of service clusters that the plugin handles. See
@ -720,6 +741,7 @@ cluster aggregation rules::
    output_metric_name: cluster_node_status
    interval: 10
    warm_up_period: 80
+    alerting: enabled_with_notification
    clusters:
      controller:
        policy: majority_of_members
@ -768,6 +790,10 @@ Where
 |   The number of seconds after a (re)start that the GSE plugin will wait
    before emitting its metric messages.

+| alerting
+|   Type: string (one of 'disabled', 'enabled' or 'enabled_with_notification').
+|   The alerting configuration of the node clusters.
+
 | clusters
 |   Type: list
 |   The list of node clusters that the plugin handles. See
--- a/doc/user/source/release_notes.rst
+++ b/doc/user/source/release_notes.rst
@ -10,6 +10,34 @@ Release notes
 Version 1.0.0
 +++++++++++++

+The StackLight Collector plugin 1.0.0 for Fuel contains the following updates:
+
+New alarms:
+
+  * Monitor RabbitMQ based on Pacemaker point-of-view
+  * Monitor all partitions and OSD disk(s)
+  * Horizon HTTP 5xx errors
+  * Keystone slow response times
+  * HDD errors
+  * SWAP percent usage
+  * Network packet drops
+  * Local OpenStack API checks
+  * Local checks for services: Apache, Memcached, MySQL, RabbitMQ, Pacemaker
+
+Alarm enhancements:
+
+  * Added the ``group by`` attribute support for alarm rules
+  * Added support for ``pattern matching`` to filter metric dimensions
+
+Bug fixes:
+
+ * Fixed the concurrent execution of logrotate.
+   See `#1455104 <https://bugs.launchpad.net/lma-toolchain/+bug/1455104>`_.
+ * Implemented the capability for the Elasticsearch bulk size to increase when
+   required. See `#1617211 <https://bugs.launchpad.net/lma-toolchain/+bug/1617211>`_.
+ * Implemented the capability to use RabbitMQ management API in place of the
+   :command:`rabbitmqctl` command.
+
 Version 0.10.0
 ++++++++++++++