From a42301c19be61bd133bd6d509aec89b340a6e51b Mon Sep 17 00:00:00 2001 From: Eric MacDonald Date: Wed, 20 Nov 2019 14:37:16 -0500 Subject: [PATCH] Make successful pmon-restart clear failed restarts count The pmon-restart service, through a call to respawn_process, increments that process's restarts counter but does not clear that counter after a successful restart. So, each pmon-restart mistakenly contributes to that process's failure count. This has the effect of pre-loading that process's restart counter by one for every pmon-restart of that process. The effect is best described by example. Say a process is pmon-restart'ed 4 times during one day which increments that process's restart counter to 4. So assuming its conf file specifies its threshold is 3 ; its already exceeded its threshold. Then, even days later that process experiences a real failure pmon will immediate take the severity action because the failure threshold had already been exceeded. This update ensures a process's restart counter is cleared after successful pmon-restart operation ; in the process pid registration phase of recovery. Test Plan: PASS: Verify pmon-restart continues to work. PASS: Verify proper thresholding of failed process following many pmon-restart operations. PEND: Verify pmon-restart and process failure automated test script against this update. 5 loops, all processes. Change-Id: Ib01446f2e053846cd30cb0ca0e06d7c987cdf581 Closes-Bug: 1853330 Signed-off-by: Eric MacDonald --- mtce/centos/build_srpm.data | 2 +- mtce/src/pmon/pmonHdlr.cpp | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/mtce/centos/build_srpm.data b/mtce/centos/build_srpm.data index dc0d3529..7bbb93a8 100644 --- a/mtce/centos/build_srpm.data +++ b/mtce/centos/build_srpm.data @@ -1,3 +1,3 @@ SRC_DIR="src" -TIS_PATCH_VER=155 +TIS_PATCH_VER=156 BUILD_IS_SLOW=5 diff --git a/mtce/src/pmon/pmonHdlr.cpp b/mtce/src/pmon/pmonHdlr.cpp index b97bdd39..7b5977d1 100644 --- a/mtce/src/pmon/pmonHdlr.cpp +++ b/mtce/src/pmon/pmonHdlr.cpp @@ -1142,6 +1142,7 @@ int register_process ( process_config_type * ptr ) ilog ("%s Registered (%d)\n", ptr->process , pid ); ptr->failed = false ; ptr->registered = true ; + ptr->restarts_cnt = 0 ; passiveStageChange ( ptr, PMON_STAGE__MANAGE ) ; if ( ptr->active_monitoring == false ) { @@ -1166,6 +1167,7 @@ int register_process ( process_config_type * ptr ) else { ptr->failed = false ; + ptr->restarts_cnt = 0 ; manage_alarm ( ptr, PMON_CLEAR ); passiveStageChange ( ptr, PMON_STAGE__MANAGE ) ; }