Make successful pmon-restart clear failed restarts count

The pmon-restart service, through a call to respawn_process,
increments that process's restarts counter but does not clear
that counter after a successful restart.

So, each pmon-restart mistakenly contributes to that process's
failure count. This has the effect of pre-loading that process's
restart counter by one for every pmon-restart of that process.

The effect is best described by example.
Say a process is pmon-restart'ed 4 times during one day which
increments that process's restart counter to 4. So assuming its
conf file specifies its threshold is 3 ; its already exceeded
its threshold. Then, even days later that process experiences
a real failure pmon will immediate take the severity action
because the failure threshold had already been exceeded.

This update ensures a process's restart counter is cleared
after successful pmon-restart operation ; in the process pid
registration phase of recovery.

Test Plan:

PASS: Verify pmon-restart continues to work.
PASS: Verify proper thresholding of failed process following
      many pmon-restart operations.
PEND: Verify pmon-restart and process failure automated test script
      against this update. 5 loops, all processes.

Change-Id: Ib01446f2e053846cd30cb0ca0e06d7c987cdf581
Closes-Bug: 1853330
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
This commit is contained in:
Eric MacDonald 2019-11-20 14:37:16 -05:00
parent 66e8fbd747
commit a42301c19b
2 changed files with 3 additions and 1 deletions

View File

@ -1,3 +1,3 @@
SRC_DIR="src"
TIS_PATCH_VER=155
TIS_PATCH_VER=156
BUILD_IS_SLOW=5

View File

@ -1142,6 +1142,7 @@ int register_process ( process_config_type * ptr )
ilog ("%s Registered (%d)\n", ptr->process , pid );
ptr->failed = false ;
ptr->registered = true ;
ptr->restarts_cnt = 0 ;
passiveStageChange ( ptr, PMON_STAGE__MANAGE ) ;
if ( ptr->active_monitoring == false )
{
@ -1166,6 +1167,7 @@ int register_process ( process_config_type * ptr )
else
{
ptr->failed = false ;
ptr->restarts_cnt = 0 ;
manage_alarm ( ptr, PMON_CLEAR );
passiveStageChange ( ptr, PMON_STAGE__MANAGE ) ;
}