Browse Source

Fix nginx getting OOM killed

* Set requests.memory=256MiB  for the nginx-ingress-controller pod
We decided to leave limits open as this will allow to support most
of the generic use cases.
* QoS from Guaranteed to Burstable
This will make that application and ingress starve each other,
both fighting for node resources for an optimal usage of CPU
* Set priority class so that pods can take priority on a node that
might have No CPU taint.

task: 37625
story: 2006945

Change-Id: Ibc77ea64ad47f1a400d040f419f38bf52b9f0141
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
changes/53/696053/4
Diogo Guerra 3 years ago committed by Diogo Guerra
parent
commit
e75e28dcbf
  1. 14
      magnum/drivers/common/templates/kubernetes/helm/ingress-nginx.sh
  2. 11
      releasenotes/notes/fix-nginx-getting-oom-killed-76139fd8b57e6c15.yaml

14
magnum/drivers/common/templates/kubernetes/helm/ingress-nginx.sh

@ -104,12 +104,9 @@ data:
replicaCount: 1
minAvailable: 1
resources:
limits:
cpu: 100m
memory: 64Mi
requests:
cpu: 100m
memory: 64Mi
cpu: 200m
memory: 256Mi
autoscaling:
enabled: false
customTemplate:
@ -163,7 +160,7 @@ data:
release: prometheus-operator
namespace: kube-system
lifecycle: {}
priorityClassName: ""
priorityClassName: "system-node-critical"
revisionHistoryLimit: 10
defaultBackend:
enabled: true
@ -182,9 +179,6 @@ data:
replicaCount: 1
minAvailable: 1
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
@ -196,7 +190,7 @@ data:
loadBalancerSourceRanges: []
servicePort: 80
type: ClusterIP
priorityClassName: ""
priorityClassName: "system-cluster-critical"
rbac:
create: true
podSecurityPolicy:

11
releasenotes/notes/fix-nginx-getting-oom-killed-76139fd8b57e6c15.yaml

@ -0,0 +1,11 @@
---
upgrade:
- |
nginx-ingress-controller QoS changed from Guaranteed to Burstable.
Priority class 'system-cluster-critical' or higher for
nginx-ingress-controller.
fixes:
- |
nginx-ingress-controller requests.memory increased to 256MiB. This is a
result of tests that showed the pod getting oom killed by the node on
a relatively generic use case.
Loading…
Cancel
Save