Liveness tweak to avoid overload on cpu

This is one of proposed set of solutions to high platform CPU usage
issues seen at multiple users. OIDC pods had configured the
Liveness initialDelaySeconds, periodSeconds and timeoutSeconds to
1, 10 and 1 respectively. The timeoutSeconds was very agressive, and
may cause issues on low spec CPUs.

+---------------------+-------------+-- ---------+
|      Liveness       | Old Values  | New Values |
+---------------------+-------------+---- -------+
| initialDelaySeconds |      1      |    13      |
|    periodSeconds    |      10     |    13      |
|    timeoutSeconds   |      1      |    8       |
+---------------------+-------------+------------+

Was reported that there are a lot 5 seconds liveness probes configured
in the system, so we choose not 5 multiple values to avoid concurrency

Test Plan:

PASS: Deploy a SX using a stx.9.0 master ISO.
PASS: Configure the kubelet log verbocity to 4 and restart kubelet
      service in order to show the Liveness probes logs on the
      /var/log/daemon.log file.
PASS: Apply the oidc-auth-apps using the configuration guide.
PASS: Check if the Liveness probes parameters are configured like:
      - periodSeconds: 10
      - initialDelaySeconds: 1
      - timeoutSeconds: 1
      for the "stx-oidc-client" pod and for "oidc-dex" pod by using
      the 'kubeclt get pod <mypod> -o yaml' command.
PASS: Check if the Liveness probe is logging at 10 seconds a time
      watching the /var/log/daemon.log log file.
PASS: Build the new oidc-auth-app tarball with the changes.
PASS: Update test. Do the app update with the new built tarball by
      using the 'system application-update <tarball>' command.
PASS: Check if the Liveness probes parameters are configured like:
      - periodSeconds: 13
      - initialDelaySeconds: 13
      - timeoutSeconds: 8
      for the "stx-oidc-client" pod and for "oidc-dex" pod by using
      the 'kubeclt get pod <mypod> -o yaml' command.
PASS: Check if the Liveness probe is logging in 13 seconds instead of
      10 seconds.
PASS: Restore to a snapshot before apply oidc-auth-apps from master
      ISO, oidc-auth-apps status should be uploaded.
PASS: Delete the current version of oidc-auth-apps using the command
      'system application-delete oidc-auth-apps'
PASS: Upload the new oidc-auth-apps tarball just built using the
      command 'system application-upload <tarball>'.
PASS: Apply the new oidc-auth-apps using the configuration guide.
PASS: Recheck if the Liveness probes parameters are configured like:
      - periodSeconds: 13
      - initialDelaySeconds: 13
      - timeoutSeconds: 8
      for the "stx-oidc-client" pod and for "oidc-dex" pod by using
      the 'kubeclt get pod <mypod> -o yaml' command.
PASS: Recheck if the Liveness probe is logging in 13 seconds instead
      of 10 seconds.
PASS: Perform oidc-auth-apps test by creating a user, apply
      rolebiding and authenticate it using oidc-auth command, check
      if the new user can send k8s commands based on its roles.

Closes-Bug: 2077365

Change-Id: I7c547f3fef43c1d8d703a99746271c2333b2e1a6
Signed-off-by: Joaci Morais <Joaci.deMorais@windriver.com>
This commit is contained in:
Joaci Morais 2024-08-15 10:31:07 -03:00
parent 8a18f4c3aa
commit 17bda2e4d9
5 changed files with 88 additions and 0 deletions

View File

@ -34,6 +34,7 @@ config:
listen: https://0.0.0.0:5555 listen: https://0.0.0.0:5555
redirect_uri: https://10.10.10.3:30555/callback redirect_uri: https://10.10.10.3:30555/callback
# Default probe configs
livenessProbe: livenessProbe:
initialDelaySeconds: 1 initialDelaySeconds: 1
failureThreshold: 1 failureThreshold: 1

View File

@ -0,0 +1,62 @@
From 114850f8fb58d006292b0e2e871a235b1cf5e9c4 Mon Sep 17 00:00:00 2001
From: Joaci Morais <Joaci.deMorais@windriver.com>
Date: Fri, 16 Aug 2024 11:03:55 -0300
Subject: [PATCH] Added support to tweak liveness Probe
We need to adjust periodSeconds and timeoutSeconds in the
livenessProbe and readinessProbe in order to avoid heavy load
on weak cpu's
Signed-off-by: Joaci Morais <Joaci.deMorais@windriver.com>
---
templates/deployment.yaml | 8 ++++++++
values.yaml | 13 +++++++++++++
2 files changed, 21 insertions(+)
diff --git a/templates/deployment.yaml b/templates/deployment.yaml
index 247dd39f..d976df08 100644
--- a/templates/deployment.yaml
+++ b/templates/deployment.yaml
@@ -107,10 +107,18 @@ spec:
httpGet:
path: /healthz/live
port: telemetry
+ initialDelaySeconds: {{ .Values.livenessProbe.initialDelaySeconds }}
+ periodSeconds: {{ .Values.livenessProbe.periodSeconds }}
+ timeoutSeconds: {{ .Values.livenessProbe.timeoutSeconds }}
+ failureThreshold: {{ .Values.livenessProbe.failureThreshold }}
readinessProbe:
httpGet:
path: /healthz/ready
port: telemetry
+ initialDelaySeconds: {{ .Values.readinessProbe.initialDelaySeconds }}
+ periodSeconds: {{ .Values.readinessProbe.periodSeconds }}
+ timeoutSeconds: {{ .Values.readinessProbe.timeoutSeconds }}
+ failureThreshold: {{ .Values.readinessProbe.failureThreshold }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
volumeMounts:
diff --git a/values.yaml b/values.yaml
index 7452791e..a3088aa3 100644
--- a/values.yaml
+++ b/values.yaml
@@ -334,3 +334,16 @@ networkPolicy:
# ports:
# - port: 636
# protocol: TCP
+
+# Default probe configs
+livenessProbe:
+ initialDelaySeconds: 1
+ failureThreshold: 3
+ periodSeconds: 10
+ timeoutSeconds: 1
+
+readinessProbe:
+ initialDelaySeconds: 1
+ failureThreshold: 3
+ periodSeconds: 10
+ timeoutSeconds: 1
\ No newline at end of file
--
2.25.1

View File

@ -1 +1,2 @@
0001-Create-new-config-value-extraStaticClients.patch 0001-Create-new-config-value-extraStaticClients.patch
0001-Added-support-to-tweak-liveness-Probe.patch

View File

@ -70,3 +70,15 @@ strategy:
maxUnavailable: 1 maxUnavailable: 1
maxSurge: 1 maxSurge: 1
type: RollingUpdate type: RollingUpdate
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 13
periodSeconds: 13
timeoutSeconds: 8
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 13
periodSeconds: 13
timeoutSeconds: 8

View File

@ -32,3 +32,15 @@ affinity:
- stx-oidc-client - stx-oidc-client
topologyKey: kubernetes.io/hostname topologyKey: kubernetes.io/hostname
helmv3Compatible: true helmv3Compatible: true
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 13
periodSeconds: 13
timeoutSeconds: 8
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 13
periodSeconds: 13
timeoutSeconds: 8