Make DNS pod autoscale

DNS service is a very critical service in k8s world, though it's not a part of k8s itself. So it would be nice to have it replicate more than 1 and on differents nodes to have high availbility. Otherwise, services running on k8s cluster will be broken if the node contains DNS pod down. Another sample is, when user would like to do a cluster upgrade, services will be borken when the node containers DNS pod being replaced. You can find lots of discussion about this, please refer [1],[2] and [3]. [1] https://github.com/kubernetes/kubeadm/issues/128 [2] https://github.com/kubernetes/kubernetes/issues/40063 [3] https://github.com/kubernetes/kops/issues/2693 Closes-Bug: #1757554 Change-Id: Ic64569d4bdcf367955398d5badef70e7afe33bbb (cherry picked from commit 54a4ac9f84)
2018-03-22 22:38:56 +13:00 · 2018-03-22 22:38:56 +13:00 · 2fc72e9b0f
parent 9a01ab8a8a
commit 2fc72e9b0f
3 changed files with 123 additions and 2 deletions
--- a/doc/source/user/index.rst
+++ b/doc/source/user/index.rst
@ -1242,6 +1242,27 @@ _`ingress_controller_role`

    kubectl label node <node-name> role=ingress

+DNS
+---
+
+CoreDNS is a critical service in Kubernetes cluster for service discovery. To
+get high availability for CoreDNS pod for Kubernetes cluster, now Magnum
+supports the autoscaling of CoreDNS using `cluster-proportional-autoscaler
+<https://github.com/kubernetes-incubator/cluster-proportional-autoscaler>`_.
+With cluster-proportional-autoscaler, the replicas of CoreDNS pod will be
+autoscaled based on the nodes and cores in the clsuter to prevent single
+point failure.
+
+The scaling parameters and data points are provided via a ConfigMap to the
+autoscaler and it refreshes its parameters table every poll interval to be up
+to date with the latest desired scaling parameters. Using ConfigMap means user
+can do on-the-fly changes(including control mode) without rebuilding or
+restarting the scaler containers/pods. Please refer `Autoscale the DNS Service
+in a Cluster
+<https://kubernetes.io/docs/tasks/administer-cluster/dns-horizontal-autoscaling/#tuning-autoscaling-parameters>`_
+for more info.
+
+
 Swarm
 =====

--- a/magnum/drivers/common/templates/kubernetes/fragments/core-dns-service.sh
+++ b/magnum/drivers/common/templates/kubernetes/fragments/core-dns-service.sh
@ -2,7 +2,9 @@

 . /etc/sysconfig/heat-params

-_prefix=${CONTAINER_INFRA_PREFIX:-docker.io/coredns/}
+_dns_prefix=${CONTAINER_INFRA_PREFIX:-docker.io/coredns/}
+_autoscaler_prefix=${CONTAINER_INFRA_PREFIX:-docker.io/googlecontainer/}
+
 CORE_DNS=/etc/kubernetes/manifests/kube-coredns.yaml
 [ -f ${CORE_DNS} ] || {
    echo "Writing File: $CORE_DNS"
@ -93,7 +95,7 @@ spec:
          operator: "Exists"
      containers:
      - name: coredns
-        image: ${_prefix}coredns:1.0.1
+        image: ${_dns_prefix}coredns:1.0.1
        imagePullPolicy: Always
        args: [ "-conf", "/etc/coredns/Corefile" ]
        volumeMounts:
@ -150,6 +152,96 @@ spec:
  - name: metrics
    port: 9153
    protocol: TCP
+---
+kind: ServiceAccount
+apiVersion: v1
+metadata:
+  name: kube-dns-autoscaler
+  namespace: kube-system
+  labels:
+    addonmanager.kubernetes.io/mode: Reconcile
+---
+kind: ClusterRole
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+  name: system:kube-dns-autoscaler
+  labels:
+    addonmanager.kubernetes.io/mode: Reconcile
+rules:
+  - apiGroups: [""]
+    resources: ["nodes"]
+    verbs: ["list"]
+  - apiGroups: [""]
+    resources: ["replicationcontrollers/scale"]
+    verbs: ["get", "update"]
+  - apiGroups: ["extensions"]
+    resources: ["deployments/scale", "replicasets/scale"]
+    verbs: ["get", "update"]
+# Remove the configmaps rule once below issue is fixed:
+# kubernetes-incubator/cluster-proportional-autoscaler#16
+  - apiGroups: [""]
+    resources: ["configmaps"]
+    verbs: ["get", "create"]
+---
+kind: ClusterRoleBinding
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+  name: system:kube-dns-autoscaler
+  labels:
+    addonmanager.kubernetes.io/mode: Reconcile
+subjects:
+  - kind: ServiceAccount
+    name: kube-dns-autoscaler
+    namespace: kube-system
+roleRef:
+  kind: ClusterRole
+  name: system:kube-dns-autoscaler
+  apiGroup: rbac.authorization.k8s.io
+
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: kube-dns-autoscaler
+  namespace: kube-system
+  labels:
+    k8s-app: kube-dns-autoscaler
+    kubernetes.io/cluster-service: "true"
+    addonmanager.kubernetes.io/mode: Reconcile
+spec:
+  selector:
+    matchLabels:
+      k8s-app: kube-dns-autoscaler
+  template:
+    metadata:
+      labels:
+        k8s-app: kube-dns-autoscaler
+      annotations:
+        scheduler.alpha.kubernetes.io/critical-pod: ''
+    spec:
+      priorityClassName: system-cluster-critical
+      containers:
+      - name: autoscaler
+        image: ${_autoscaler_prefix}cluster-proportional-autoscaler-amd64:1.1.2
+        resources:
+            requests:
+                cpu: "20m"
+                memory: "10Mi"
+        command:
+          - /cluster-proportional-autoscaler
+          - --namespace=kube-system
+          - --configmap=kube-dns-autoscaler
+          # Should keep target in sync with above coredns deployment name
+          - --target=Deployment/coredns
+          # When cluster is using large nodes(with more cores), "coresPerReplica" should dominate.
+          # If using small nodes, "nodesPerReplica" should dominate.
+          - --default-params={"linear":{"coresPerReplica":256,"nodesPerReplica":16,"preventSinglePointFailure":true}}
+          - --logtostderr=true
+          - --v=2
+      tolerations:
+      - key: "CriticalAddonsOnly"
+        operator: "Exists"
+      serviceAccountName: kube-dns-autoscaler
 EOF
 }

--- a/releasenotes/notes/dns-autoscale-90b63e3d71d7794e.yaml
+++ b/releasenotes/notes/dns-autoscale-90b63e3d71d7794e.yaml
@ -0,0 +1,8 @@
+---
+issues:
+  - |
+    Currently, the replicas of coreDNS pod is hardcoded as 1. It's not a
+    reasonable number for such a critical service. Without DNS, probably all
+    workloads running on the k8s cluster will be broken. Now Magnum is making
+    the coreDNS pod autoscaling based on the nodes and cores number.
+