Make DNS pod autoscale

DNS service is a very critical service in k8s world, though it's not
a part of k8s itself. So it would be nice to have it replicate more
than 1 and on differents nodes to have high availbility. Otherwise,
services running on k8s cluster will be broken if the node contains
DNS pod down. Another sample is, when user would like to do a cluster
upgrade, services will be borken when the node containers DNS pod
being replaced. You can find lots of discussion about this, please
refer [1],[2] and [3].

[1] https://github.com/kubernetes/kubeadm/issues/128
[2] https://github.com/kubernetes/kubernetes/issues/40063
[3] https://github.com/kubernetes/kops/issues/2693

Closes-Bug: #1757554

Change-Id: Ic64569d4bdcf367955398d5badef70e7afe33bbb
This commit is contained in:
Feilong Wang 2018-03-22 22:38:56 +13:00
parent 79f4cc0c9d
commit 54a4ac9f84
3 changed files with 123 additions and 2 deletions

View File

@ -1242,6 +1242,27 @@ _`ingress_controller_role`
kubectl label node <node-name> role=ingress
DNS
---
CoreDNS is a critical service in Kubernetes cluster for service discovery. To
get high availability for CoreDNS pod for Kubernetes cluster, now Magnum
supports the autoscaling of CoreDNS using `cluster-proportional-autoscaler
<https://github.com/kubernetes-incubator/cluster-proportional-autoscaler>`_.
With cluster-proportional-autoscaler, the replicas of CoreDNS pod will be
autoscaled based on the nodes and cores in the clsuter to prevent single
point failure.
The scaling parameters and data points are provided via a ConfigMap to the
autoscaler and it refreshes its parameters table every poll interval to be up
to date with the latest desired scaling parameters. Using ConfigMap means user
can do on-the-fly changes(including control mode) without rebuilding or
restarting the scaler containers/pods. Please refer `Autoscale the DNS Service
in a Cluster
<https://kubernetes.io/docs/tasks/administer-cluster/dns-horizontal-autoscaling/#tuning-autoscaling-parameters>`_
for more info.
Swarm
=====

View File

@ -2,7 +2,9 @@
. /etc/sysconfig/heat-params
_prefix=${CONTAINER_INFRA_PREFIX:-docker.io/coredns/}
_dns_prefix=${CONTAINER_INFRA_PREFIX:-docker.io/coredns/}
_autoscaler_prefix=${CONTAINER_INFRA_PREFIX:-docker.io/googlecontainer/}
CORE_DNS=/etc/kubernetes/manifests/kube-coredns.yaml
[ -f ${CORE_DNS} ] || {
echo "Writing File: $CORE_DNS"
@ -93,7 +95,7 @@ spec:
operator: "Exists"
containers:
- name: coredns
image: ${_prefix}coredns:1.0.1
image: ${_dns_prefix}coredns:1.0.1
imagePullPolicy: Always
args: [ "-conf", "/etc/coredns/Corefile" ]
volumeMounts:
@ -150,6 +152,96 @@ spec:
- name: metrics
port: 9153
protocol: TCP
---
kind: ServiceAccount
apiVersion: v1
metadata:
name: kube-dns-autoscaler
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: Reconcile
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: system:kube-dns-autoscaler
labels:
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["list"]
- apiGroups: [""]
resources: ["replicationcontrollers/scale"]
verbs: ["get", "update"]
- apiGroups: ["extensions"]
resources: ["deployments/scale", "replicasets/scale"]
verbs: ["get", "update"]
# Remove the configmaps rule once below issue is fixed:
# kubernetes-incubator/cluster-proportional-autoscaler#16
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "create"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: system:kube-dns-autoscaler
labels:
addonmanager.kubernetes.io/mode: Reconcile
subjects:
- kind: ServiceAccount
name: kube-dns-autoscaler
namespace: kube-system
roleRef:
kind: ClusterRole
name: system:kube-dns-autoscaler
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-dns-autoscaler
namespace: kube-system
labels:
k8s-app: kube-dns-autoscaler
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
selector:
matchLabels:
k8s-app: kube-dns-autoscaler
template:
metadata:
labels:
k8s-app: kube-dns-autoscaler
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
priorityClassName: system-cluster-critical
containers:
- name: autoscaler
image: ${_autoscaler_prefix}cluster-proportional-autoscaler-amd64:1.1.2
resources:
requests:
cpu: "20m"
memory: "10Mi"
command:
- /cluster-proportional-autoscaler
- --namespace=kube-system
- --configmap=kube-dns-autoscaler
# Should keep target in sync with above coredns deployment name
- --target=Deployment/coredns
# When cluster is using large nodes(with more cores), "coresPerReplica" should dominate.
# If using small nodes, "nodesPerReplica" should dominate.
- --default-params={"linear":{"coresPerReplica":256,"nodesPerReplica":16,"preventSinglePointFailure":true}}
- --logtostderr=true
- --v=2
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
serviceAccountName: kube-dns-autoscaler
EOF
}

View File

@ -0,0 +1,8 @@
---
issues:
- |
Currently, the replicas of coreDNS pod is hardcoded as 1. It's not a
reasonable number for such a critical service. Without DNS, probably all
workloads running on the k8s cluster will be broken. Now Magnum is making
the coreDNS pod autoscaling based on the nodes and cores number.