From 5909bcbdef49843295a1b8717a564bc4cf6e6491 Mon Sep 17 00:00:00 2001 From: Frank Ritchie Date: Fri, 31 Jul 2020 13:23:03 -0400 Subject: [PATCH] Use hostPID for ceph-mgr deployment This change is to address a memory leak in the ceph-mgr deployment. The leak has also been noted in: https://review.opendev.org/#/c/711085 Without this change memory usage for the active ceph-mgr pod will steadily increase by roughly 100MiB per hour until all available memory has been exhausted. Reset messages will also be seen in the active and standby ceph-mgr pod logs. Sample messages: --- 0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1 0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1 0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1 --- The root cause of the resets and associated memory leak appears to be due to multiple ceph pods sharing the same IP address (due to hostNetwork being true) and PID (due to hostPID being false). In the messages above the "1" at the end of the line is the PID. Ceph appears to use the Version:IP:Port/PID (v2:10.0.0.226:6808/1) tuple as a unique identifier. When hostPID is false conflicts arise. Setting hostPID to true stops the reset messages and memory leak. Change-Id: I9821637e75e8f89b59cf39842a6eb7e66518fa2c --- ceph-client/templates/deployment-mgr.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/ceph-client/templates/deployment-mgr.yaml b/ceph-client/templates/deployment-mgr.yaml index 13fbfe0c5..d7adccf1b 100644 --- a/ceph-client/templates/deployment-mgr.yaml +++ b/ceph-client/templates/deployment-mgr.yaml @@ -51,6 +51,7 @@ spec: nodeSelector: {{ .Values.labels.mgr.node_selector_key }}: {{ .Values.labels.mgr.node_selector_value }} hostNetwork: true + hostPID: true dnsPolicy: {{ .Values.pod.dns_policy }} initContainers: {{ tuple $envAll "mgr" list | include "helm-toolkit.snippets.kubernetes_entrypoint_init_container" | indent 8 }}