Update RabbitMQ probes

The healthcheck used to liveness and readiness probes is considered
intrusive and can produce false positives[0]. It's just returning that
the pod is healthy if it's fully booted and whether the pod is waiting
for a peer to appear to sync tables, it cannot succeed and eventually
enters into a pod restart deadlock.
The command is also deprecated and will be removed in future version.
Updating the probes based on current recommenation from community[1].
Because of that, we change the liveness to check the rabbit connection
to check if the rabbit is connected and running and also changed the
readiness to ping the pod after boot to see if it's ready.

Ref:
[0] https://www.rabbitmq.com/monitoring.html#deprecations
[1] https://www.rabbitmq.com/monitoring.html#health-checks

Test plan:
PASS: Close rabbitmq connection inside pod and verify if Liveness fail
and rebooted the pod.
PASS: Readiness check pod state and verify if can ping it to check if
the pod is health after get up.
PASS: Build on Debian

Story: 2009892
Task: 44674

Signed-off-by: Arthur Luz de Avila <arthur.luzdeavila@windriver.com>
Change-Id: Ib98df7e1135b7b5cee50d63a9ab06eb4a961cd4b
This commit is contained in:
Arthur Luz de Avila 2022-03-02 15:05:47 -03:00 committed by Arthur Luz de Ávila
parent e1735fe05a
commit abdf99ad03
5 changed files with 144 additions and 1 deletions

View File

@ -31,6 +31,7 @@ Patch19: 0019-Add-force_boot-command-to-rabbit-start-template.patch
Patch20: 0020-Fix-tls-in-openstack-helm-infra.patch
Patch21: 0021-Remove-mariadb-tls.patch
Patch22: 0022-Remove-rabbitmq-tls.patch
Patch23: 0023-Update-RabbitMQ-probes.patch
BuildRequires: helm
BuildRequires: chartmuseum
@ -56,6 +57,7 @@ Openstack Helm Infra charts
%patch20 -p1
%patch21 -p1
%patch22 -p1
%patch23 -p1
%build
# Host a server for the charts

View File

@ -0,0 +1,70 @@
From 3a76480c003dc6c1a522fba1c70278bad04930c2 Mon Sep 17 00:00:00 2001
From: Roy Tang <rt7380@att.com>
Date: Fri, 13 Aug 2021 19:08:21 -0400
Subject: [PATCH] Update RabbitMQ probes
The current health check that is used for readiness and liveness
probes is considered intrusive and is prompt to produce false
positives[0]. The command is also deprecated and will be removed
in future version. Updating the probes based on current
recommenation from community[1].
Ref:
[0] https://www.rabbitmq.com/monitoring.html#deprecations
[1] https://www.rabbitmq.com/monitoring.html#health-checks
Change-Id: I83750731150ff9a276f59e3c1288129581fceba5
---
rabbitmq/Chart.yaml | 2 +-
rabbitmq/templates/bin/_rabbitmq-liveness.sh.tpl | 3 +--
rabbitmq/templates/bin/_rabbitmq-readiness.sh.tpl | 2 +-
releasenotes/notes/rabbitmq.yaml | 1 +
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/rabbitmq/Chart.yaml b/rabbitmq/Chart.yaml
index 79b0daff..061ead2d 100644
--- a/rabbitmq/Chart.yaml
+++ b/rabbitmq/Chart.yaml
@@ -15,6 +15,6 @@ apiVersion: v1
appVersion: v3.7.26
description: OpenStack-Helm RabbitMQ
name: rabbitmq
-version: 0.1.13
+version: 0.1.14
home: https://github.com/rabbitmq/rabbitmq-server
...
diff --git a/rabbitmq/templates/bin/_rabbitmq-liveness.sh.tpl b/rabbitmq/templates/bin/_rabbitmq-liveness.sh.tpl
index 943209aa..d07626b2 100644
--- a/rabbitmq/templates/bin/_rabbitmq-liveness.sh.tpl
+++ b/rabbitmq/templates/bin/_rabbitmq-liveness.sh.tpl
@@ -19,6 +19,5 @@ set -e
if [ -f /tmp/rabbit-disable-liveness-probe ]; then
exit 0
else
- timeout 5 bash -c "true &>/dev/null </dev/tcp/${MY_POD_IP}/${PORT_AMPQ}"
- exec rabbitmqctl node_health_check
+ exec rabbitmq-diagnostics -q check_port_connectivity
fi
diff --git a/rabbitmq/templates/bin/_rabbitmq-readiness.sh.tpl b/rabbitmq/templates/bin/_rabbitmq-readiness.sh.tpl
index 6184b35c..14ef11cd 100644
--- a/rabbitmq/templates/bin/_rabbitmq-readiness.sh.tpl
+++ b/rabbitmq/templates/bin/_rabbitmq-readiness.sh.tpl
@@ -19,5 +19,5 @@ set -e
if [ -f /tmp/rabbit-disable-readiness ]; then
exit 1
else
- exec rabbitmqctl node_health_check
+ exec rabbitmq-diagnostics ping
fi
diff --git a/releasenotes/notes/rabbitmq.yaml b/releasenotes/notes/rabbitmq.yaml
index 95bf38e5..cdc2841d 100644
--- a/releasenotes/notes/rabbitmq.yaml
+++ b/releasenotes/notes/rabbitmq.yaml
@@ -13,4 +13,5 @@ rabbitmq:
- 0.1.11 Add TLS support for helm test
- 0.1.12 Added helm hook post-install and post-upgrade for rabbitmq wait cluster job
- 0.1.13 Add prestop action and version 3.8.x upgrade prep
+ - 0.1.14 Update readiness and liveness probes
...
--
2.17.1

View File

@ -14,3 +14,4 @@
0020-Fix-tls-in-openstack-helm-infra.patch
0021-Remove-mariadb-tls.patch
0022-Remove-rabbitmq-tls.patch
0023-Update-RabbitMQ-probes.patch

View File

@ -0,0 +1,70 @@
From 3a76480c003dc6c1a522fba1c70278bad04930c2 Mon Sep 17 00:00:00 2001
From: Roy Tang <rt7380@att.com>
Date: Fri, 13 Aug 2021 19:08:21 -0400
Subject: [PATCH] Update RabbitMQ probes
The current health check that is used for readiness and liveness
probes is considered intrusive and is prompt to produce false
positives[0]. The command is also deprecated and will be removed
in future version. Updating the probes based on current
recommenation from community[1].
Ref:
[0] https://www.rabbitmq.com/monitoring.html#deprecations
[1] https://www.rabbitmq.com/monitoring.html#health-checks
Change-Id: I83750731150ff9a276f59e3c1288129581fceba5
---
rabbitmq/Chart.yaml | 2 +-
rabbitmq/templates/bin/_rabbitmq-liveness.sh.tpl | 3 +--
rabbitmq/templates/bin/_rabbitmq-readiness.sh.tpl | 2 +-
releasenotes/notes/rabbitmq.yaml | 1 +
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/rabbitmq/Chart.yaml b/rabbitmq/Chart.yaml
index 79b0daff..061ead2d 100644
--- a/rabbitmq/Chart.yaml
+++ b/rabbitmq/Chart.yaml
@@ -15,6 +15,6 @@ apiVersion: v1
appVersion: v3.7.26
description: OpenStack-Helm RabbitMQ
name: rabbitmq
-version: 0.1.13
+version: 0.1.14
home: https://github.com/rabbitmq/rabbitmq-server
...
diff --git a/rabbitmq/templates/bin/_rabbitmq-liveness.sh.tpl b/rabbitmq/templates/bin/_rabbitmq-liveness.sh.tpl
index 943209aa..d07626b2 100644
--- a/rabbitmq/templates/bin/_rabbitmq-liveness.sh.tpl
+++ b/rabbitmq/templates/bin/_rabbitmq-liveness.sh.tpl
@@ -19,6 +19,5 @@ set -e
if [ -f /tmp/rabbit-disable-liveness-probe ]; then
exit 0
else
- timeout 5 bash -c "true &>/dev/null </dev/tcp/${MY_POD_IP}/${PORT_AMPQ}"
- exec rabbitmqctl node_health_check
+ exec rabbitmq-diagnostics -q check_port_connectivity
fi
diff --git a/rabbitmq/templates/bin/_rabbitmq-readiness.sh.tpl b/rabbitmq/templates/bin/_rabbitmq-readiness.sh.tpl
index 6184b35c..14ef11cd 100644
--- a/rabbitmq/templates/bin/_rabbitmq-readiness.sh.tpl
+++ b/rabbitmq/templates/bin/_rabbitmq-readiness.sh.tpl
@@ -19,5 +19,5 @@ set -e
if [ -f /tmp/rabbit-disable-readiness ]; then
exit 1
else
- exec rabbitmqctl node_health_check
+ exec rabbitmq-diagnostics ping
fi
diff --git a/releasenotes/notes/rabbitmq.yaml b/releasenotes/notes/rabbitmq.yaml
index 95bf38e5..cdc2841d 100644
--- a/releasenotes/notes/rabbitmq.yaml
+++ b/releasenotes/notes/rabbitmq.yaml
@@ -13,4 +13,5 @@ rabbitmq:
- 0.1.11 Add TLS support for helm test
- 0.1.12 Added helm hook post-install and post-upgrade for rabbitmq wait cluster job
- 0.1.13 Add prestop action and version 3.8.x upgrade prep
+ - 0.1.14 Update readiness and liveness probes
...
--
2.17.1

View File

@ -401,7 +401,7 @@ data:
size: 1Gi
source:
type: tar
location: http://172.17.0.1/helm_charts/starlingx/rabbitmq-0.1.13.tgz
location: http://172.17.0.1/helm_charts/starlingx/rabbitmq-0.1.14.tgz
subpath: rabbitmq
reference: master
dependencies: