metal/mtce/src
Eric MacDonald a42301c19b Make successful pmon-restart clear failed restarts count
The pmon-restart service, through a call to respawn_process,
increments that process's restarts counter but does not clear
that counter after a successful restart.

So, each pmon-restart mistakenly contributes to that process's
failure count. This has the effect of pre-loading that process's
restart counter by one for every pmon-restart of that process.

The effect is best described by example.
Say a process is pmon-restart'ed 4 times during one day which
increments that process's restart counter to 4. So assuming its
conf file specifies its threshold is 3 ; its already exceeded
its threshold. Then, even days later that process experiences
a real failure pmon will immediate take the severity action
because the failure threshold had already been exceeded.

This update ensures a process's restart counter is cleared
after successful pmon-restart operation ; in the process pid
registration phase of recovery.

Test Plan:

PASS: Verify pmon-restart continues to work.
PASS: Verify proper thresholding of failed process following
      many pmon-restart operations.
PEND: Verify pmon-restart and process failure automated test script
      against this update. 5 loops, all processes.

Change-Id: Ib01446f2e053846cd30cb0ca0e06d7c987cdf581
Closes-Bug: 1853330
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-11-21 14:58:28 +00:00
..
alarm Add alarm retry support to maintenance alarm handling daemon 2019-10-07 09:07:49 -04:00
common Merge "Maintenance Redfish support useability enhancements." 2019-10-10 18:38:21 +00:00
fsmon Add LSB headers to mtce service scripts 2019-08-29 11:20:14 -05:00
fsync Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
heartbeat Merge "Fix format-truncation warnings" 2019-10-16 20:32:51 +00:00
hostw Update host watchdog CONFIG_MASK 2019-10-30 16:40:56 -04:00
hwmon Separate hardware monitor power and thermal senser data 2019-10-17 20:53:14 -04:00
lmon Merge "Modify the wrong size count paramter of memset in lmonHdlr.cpp" 2019-10-28 15:18:26 +00:00
maintenance Merge "Fix Mtce's VIM systems query handling" 2019-10-16 20:27:38 +00:00
mtclog Set restricted permissions for mtce logfiles 2019-07-17 18:19:52 -04:00
pmon Make successful pmon-restart clear failed restarts count 2019-11-21 14:58:28 +00:00
public Set SHELL in Makefiles that use bash constructs 2018-12-07 14:09:48 -06:00
scripts Add redfish power/reset/reinstall bmc support to maintenance 2019-09-26 15:59:35 -04:00
LICENSE Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
Makefile Remove Resource Monitor ; aka rmon, from the load 2019-03-19 16:12:38 -04:00