From ad6c1383c693abb9ed27ff11303d86d2f2c1d2ea Mon Sep 17 00:00:00 2001 From: Eric MacDonald Date: Sat, 30 Jul 2022 20:37:50 +0000 Subject: [PATCH] Extend startuptime in collectd's pmon config file The maintenance Process Monitor (pmond) experiences a spawn timeout while restarting or recovering the collectd process over a failure in Debian. Collectd's current process monitor config file startuptime is 3 secs. Collectd version 5.8.1 spawn time in CentOS is very quick, ~1 sec log example: collectd spawned in 1.000 secs centos rpm : collectd-5.8.1-4.el7.x86_64 Collectd version 5.12.0 spawn time in Debian is 3-6 seconds log example: collectd spawned in 6.000 secs (also 3,4,5 secs) debian pkg : base-bullseye.lst:collectd 5.12.0-7 It seems the new version of collectd in the Debian environment takes longer to spawn, exceeding the current 3 second timeout. This update extends the collectd pmon config file startuptime to 10 seconds to account for this version and environment change. Test Plan: PASS: Verify collectd process manual restart with new startuptime in - both CentOS and Debian and on both vbox and real hw PASS: Verify collectd process kill recovery with new startuptime in - both CentOS and Debian and on both vbox and real hw Regression: PASS: Verify soak of collectd process restart 50+ loops PASS: Verify soak of collectd process recovery 50+ loops PASS: Verify process manual restart of all pmon monitored processes PASS: Verify process kill recovery of all pmon monitored processes Story: 2009968 Task: 45917 Signed-off-by: Eric MacDonald Change-Id: I9849e5cca5d5a3c216ffb26a788f310c2820a984 --- collectd-extensions/src/collectd.conf.pmon | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/collectd-extensions/src/collectd.conf.pmon b/collectd-extensions/src/collectd.conf.pmon index 8d905d4..25a52b3 100644 --- a/collectd-extensions/src/collectd.conf.pmon +++ b/collectd-extensions/src/collectd.conf.pmon @@ -9,7 +9,7 @@ interval = 5 ; number of seconds to wait between restarts debounce = 10 ; number of seconds that a process needs to remain ; running before degrade is removed and retry count ; is cleared. -startuptime = 3 ; Seconds to wait after process start before starting the debounce monitor +startuptime = 10 ; Seconds to wait after process start before starting the debounce monitor mode = passive ; Monitoring mode: passive (default) or active ; passive: process death monitoring (default: always) ; active : heartbeat monitoring, i.e. request / response messaging