monitoring/collectd-extensions
Eric MacDonald d37490b814 Add alarm audit to starlingx collectd fm notifier plugin
This update adds common plugin support for alarm state auditing.
The audit is able to detect and correct the following alarm
state errors:

   Error Case                Correction Action
   -----------------------   -----------------
 - stale alarm             ; delete alarm
 - missing alarm           ; assert alarm
 - alarm severity mismatch ; refresh alarm

The common audit is enabled for the fm_notifier plugin that supports
alarm managment for the following resources.

 - CPU with alarm id 100.101
 - Memory with alarm id 100.103
 - Filesystem with alarm id 100.104

Other plugins may use this common audit in the future but only the
above resources have the audit enabled for them by this update.

Test Plan:

PASS: Verify stale alarm detection/correction handling
PASS: Verify missing alarm detection/correction  handling
PASS: Verify alarm severity mismatch detection/correction handling
PASS: Verify hosts only audits its own specified alarms
PASS: Verify success path of monitoring a single and mix
      of base and instance alarms of varying severity while
      such alarm conditions come and go
PASS: Verify alarm audit of mix of base and instance alarms
      over a collectd process restart
PASS: Verify audit handling of alarm that migrates from
      major to critical to major to clear
PASS: Verify audit handling transition between alarm and
      no alarm conditions
PASS: Verify soak of random cpu, memory and filesystem
      overage alarm assertions and clears that also involve
      manual alarm deletions, assertions and severity changes
      that exercise new audit features

Regression:

PASS: Verify alarm and audit handling over Swact with mounted
      filesystem that has active alarm
PASS: Verify collectd logs following a system install and
      while alarms are managed during above soak
PASS: Verify behavior while FM is killed or stopped/started
PASS: Verify Standard system install with Sanity and Regression
PASS: Verify AIO DX/DC systems install with Sanity and Regression

Closes-Bug: 1925210
Change-Id: I1cafd17ad07ec769240de92ae4e67cb1357f0992
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2021-04-20 11:48:51 -04:00
..
centos Avoid loading collectd's default plugins 2021-02-03 12:03:06 -05:00
src Add alarm audit to starlingx collectd fm notifier plugin 2021-04-20 11:48:51 -04:00
PKG-INFO Align PKG-INFO for Collectd & Influxdb Extensions 2019-07-15 16:53:36 -04:00