From 77cbb985f294fbcb48c1a2946f2d63174588710e Mon Sep 17 00:00:00 2001 From: Robert Church Date: Fri, 8 Mar 2019 15:54:19 -0500 Subject: [PATCH] For stability bump size of rabbitmq PV to 1Gi The rabbitmq chart requests a 256Mi PV for operational storage. With CentOS 7.5 and 7.6 kernels, a jbd2 kernel thread hang is observed after a long soak period. Once this occurs, a host reboot is required to recover access to the PV. We have been able to reliably recreate this using the stock upstream CentOS 7.6 kernel and the latest Ceph Jewel LTS (10.2.11) version using fsstress. This is currently pointing to a race condition in the filesystem code. With a reliable test available for this, other scenarios to characterize this have been performed including using different volume sizes and using different ext4 filesystem formatting options. We've been unable to cause the hang using a 1Gi PV over an extended soak period so we'll update the stx-openstack manifest to request a 1Gi PV until the root cause and fix has been addressed in the kernel. Change-Id: Ia0e5b7ffb049c6e3cedfb4a6d3afda597eedb18a Related-Bug: #1814595 Signed-off-by: Robert Church --- .../stx-openstack/stx-openstack-helm/centos/build_srpm.data | 2 +- .../stx-openstack-helm/manifests/manifest.yaml | 6 ++++++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/kubernetes/applications/stx-openstack/stx-openstack-helm/centos/build_srpm.data b/kubernetes/applications/stx-openstack/stx-openstack-helm/centos/build_srpm.data index 386586efcb..6cbd5c2cd4 100644 --- a/kubernetes/applications/stx-openstack/stx-openstack-helm/centos/build_srpm.data +++ b/kubernetes/applications/stx-openstack/stx-openstack-helm/centos/build_srpm.data @@ -1,3 +1,3 @@ SRC_DIR="stx-openstack-helm" COPY_LIST_TO_TAR="$PKG_BASE/../../../helm-charts/rbd-provisioner $PKG_BASE/../../../helm-charts/garbd $PKG_BASE/../../../helm-charts/ceph-pools-audit" -TIS_PATCH_VER=7 +TIS_PATCH_VER=8 diff --git a/kubernetes/applications/stx-openstack/stx-openstack-helm/stx-openstack-helm/manifests/manifest.yaml b/kubernetes/applications/stx-openstack/stx-openstack-helm/stx-openstack-helm/manifests/manifest.yaml index b4089f4f0d..5b022b891a 100644 --- a/kubernetes/applications/stx-openstack/stx-openstack-helm/stx-openstack-helm/manifests/manifest.yaml +++ b/kubernetes/applications/stx-openstack/stx-openstack-helm/stx-openstack-helm/manifests/manifest.yaml @@ -329,6 +329,12 @@ data: anti: type: default: requiredDuringSchedulingIgnoredDuringExecution + # TODO: Revert to upstream defaults once the following LP is resolved: + # https://bugs.launchpad.net/starlingx/+bug/1814595. By changing this PV + # size to 1Gi from the default 265Mi, this avoids the kernel hang from the + # filesystem race as seen in the LP. + volume: + size: 1Gi source: type: tar location: http://172.17.0.1/helm_charts/rabbitmq-0.1.0.tgz