Without this, heat container agents using kubectl version
1.18.x (e.g. ussuri-dev) fail because they do not have the correct
KUBECONFIG in the environment.
Task: 39938
Story: 2007591
Change-Id: Ifc212478ae09c658adeb6ba4c8e8afc8943e3977
apiserver controller-manager and scheduler are not used in the minions.
story: 2007568
task: 39837
Change-Id: I93b380c484b7e3881b2aa0620fe41ab9d61c1eec
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
In the heat-agent we use kubectl to install
several deployments, it is better if we use
matching versions of kubectl and apiserver
to minimize errors. Additionally, the
heat-agent won't need kubectl anymore.
story: 2007591
task: 39536
Change-Id: If8f6d84efc70606ac0d888c084c82d8c7eff54f8
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>
To mount nfs volumes with the embedded volume
pkg [0], rpc-statd is required and should be
started by mount.nfs. When running kubelet
in a chroot this fails. With atomic containers
it used to work.
[0] https://github.com/kubernetes/kubernetes/tree/master/pkg/volume/nfs
story: 2005201
task: 39403
Change-Id: Ib64efe7ecbe9a24e86fa9d9a35a4d90c0e8bbf2e
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>
Kubelet fails to handle SELinux labelling of Cinder PV without
presenting the rootfs to Kubelet and as a result, an unprivileged
container lacks the ability to access the path.
With this patch, Kubelet handles the correct labelling automatically
when a Cinder PV is attached to a pod.
The default behaviour using system containers in Fedora Atomic is to
mount rootfs [1] but we did not implement the same behaviour in Fedora
CoreOS which was a mistake as this was a missing piece of code.
[1] https://github.com/openstack/magnum/blob/master/dockerfiles/kubernetes-kubelet/config.json.template#L335
Story: 2007413
Task: 39129
Change-Id: Id59c604928244bf49773b7519fa756d5b2814b69
Set the max-size for container/pod logs to 10m
and max of 5 rotated files. The values relay
the default of kubernetes when it is using
a remote container runtime [0] (container-log-max-files
and container-log-max-size) This defaults cover the
case of containerd.
[0] https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
story: 2007402
task: 39031
Change-Id: Ie3106b40b4d1c6866761c507122047e88e513651
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>
Adding the volume mount for /etc/machine-id so that the kubelet
boostraped by podman can access the correct instance ID. Without
this, autoscaler will fail to delete empty node. This issue is
reported on autoscaler repo[1].
[1] https://github.com/kubernetes/autoscaler/issues/2819
Task: 38743
Story: 2007286
Change-Id: I2852f4b255e782bb65b13571502194ee9f455ae3
To display the node OS-IMAGE in k8s properly
we need to mount /usr/lib/os-release,
/ets/os-release is just a symlink.
story: 2006459
task: 38505
Change-Id: I0c850126c7299cb7a4fe201efee311d76bc14ce6
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
Upstream k8s images changed the entrypoint to
/hyperkube instead of shell.
Set the entrypoint to /hyperkube which works
for v1.17.x and v1.16.x.
podman inspect k8s.gcr.io/hyperkube:v1.16.0 | grep Entrypoint -A 2
podman inspect k8s.gcr.io/hyperkube:v1.17.0 | grep Entrypoint -A 2
"Entrypoint": [
"/hyperkube"
]
story: 2007031
task: 37834
Change-Id: I021aeeef9f39dd426c1f335161a3d4b3f51670e8
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>
Now Magnum is using podman and systemd to manage the k8s components.
In cases where the nodes pull images from docker.io or another
mirror registry with high latency, some of the components may take long
time to start, which is causing timeout when bootstraping k8s
cluster for fedora atomic/coreos drivers. This patch fixes it by
adding TimeoutStartSec for the systemd services.
Task: 37251
Story: 2006459
Change-Id: I709bac620e4ceec1858672076eb0aef997704b62
Choose whether system containers etcd, kubernetes and the heat-agent will be
installed with podman or atomic. This label is relevant for k8s_fedora drivers.
k8s_fedora_atomic_v1 defaults to use_podman=false, meaning atomic will be used
pulling containers from docker.io/openstackmagnum. use_podman=true is accepted
as well, which will pull containers by k8s.gcr.io.
k8s_fedora_coreos_v1 defaults and accepts only use_podman=true.
Fix upgrade for k8s_fedora_coreos_v1 and magnum-cordon systemd unit.
Task: 37242
Story: 2005201
Change-Id: I0d5e4e059cd4f0458746df7c09d2fd47c389c6a0
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
With this change each node will be labeled with the following:
* --node-labels=magnum.openstack.org/role=${NODEGROUP_ROLE}
* --node-labels=magnum.openstack.org/nodegroup=${NODEGROUP_NAME}
Change-Id: Ic410a059b19a1252cdf6eed786964c5c7b03d01c
Add fedora coreos driver. To deploy clusters with fedora coreos operators
or users need to add os_distro=fedora-coreos to the image. The scripts
to deploy kubernetes on top are the same with fedora atomic. Note that
this driver has selinux enabled.
The startup of the heat-container-agent uses a workaround to copy the
SoftwareDeployment credentials to /var/lib/cloud/data/cfn-init-data.
The fedora coreos driver requires heat train to support ignition.
Task: 29968
Story: 2005201
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
Change-Id: Iffcaa68d385b1b829b577ebce2df465073dfb5a1
Using the atomic cli to install kubelet breaks mount
propagation of secrets, configmaps and so on. Using podman
in a systemd unit works.
Additionally, with this change all atomic commands are dropped,
containers are pulled from gcr.io (ofiicial kubernetes containers).
Finally, after this patch only by starting the heat-agent with
ignition, we can use fedora coreos as a drop-in replacement.
* Drop del of docker0
This command to remove docker0 is carried from
earlier versions of docker. This is not an issue
anymore.
story: 2006459
task: 36871
Change-Id: I2ed8e02f5295e48d371ac9e1aff2ad5d30d0c2bd
Signed-off-by: Spyros Trigazis <spyridon.trigazi@cern.ch>
Pass the node name to kube-proxy and not repy
on the cloud provider to set it. Kube-proxy needs
to start before the cloud-provider.
Without it kube-proxy fail to find the node
in the kubernete api.
story: 2006459
task: 36873
Change-Id: Ie04d8d99e68ee43c9d407dbd6f746f6249337ba2
This is the fix for the "line 528: KUBE_PROXY_ARGS: unbound variable"
error in master.
Change-Id: Iaf5bbc8e4946c6625e82b6f68e754328f08b6ce7
Story: 2006492
Task: 36448
The label kubeproxy_options was being ignored when setting up both
master and minions. Add it to the kube proxy args.
Change-Id: Ic830f19e1af062e90d066e6df4df2e4376e4f379
Story: 2006465
Task: 36394
We kept introspecting the name of the instance with the assumption
that the network always existed under .novalocal
This is not always the case, with certain variables changed inside
Neutron it is possible to control this, therefore, leading in failing
deploys.
With this change, we pass the instance name directly to the cluster
and therefore we always have the accurate name.
Task: 36160
Story: 2006371
Change-Id: I2ba32844b822ffc14da043e6ef7d071bb62a22ee
In fedora atomic 29, podman is present and configures
its own cni. We need to clear the cni configuration
otherwise we will get that cni0 is already used.
story: 2006171
task: 35682
Change-Id: Ic70938184bdb98eaaf4f384ce553818cf2624a2a
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
When using docker_storage_driver=overlay2 plus docker_volume_size > 0,
user will run into problem that some pods can't be created. The root
cause is kubelet needs the permission for /var/lib/docker to read/write.
This patch fixes it by add /var/lib/docker to kubelet container's mount.
Task: 30221
Story: 2005314
Change-Id: Ie19c95e6280e16644c686550950359cc9934c719
Rolling ugprade is an important feature for a managed k8s service,
at this stage, two user cases will be covered:
1. Upgrade base operating system
2. Upgrade k8s version
Known limitation: When doing operating system upgrade, there is no
chance to call kubectl drain to evict pods on that node.
Task: 30185
Story: 2002210
Change-Id: Ibbed59bc135969174a20e5243ff8464908801a23
Using Node Problem Detector, Draino and AutoScaler to support
auto healing for K8s cluster, user can use a new label
"auto_healing_enabled' to turn on/off it.
Meanwhile, a new label "auto_scaling_enabled" is also introduced
to enable the capability to let the k8s cluster auto scale based
its workload.
Task: 28923
Story: 2004782
Change-Id: I25af2a72a7a960205929374d2300bd83d4d20960
The scripts run by cloud-init for the master and minion nodes currently
write proxy environment variables into /bin/bashrc when they are defined.
These variables will only be introduced into the running environment
when a new bash shell is started. The /bin/sh used by the fragment
scripts will ignore /etc/bashrc, so the new shells invoked per fragment
will not have the http proxy variables present. This means that the
master/minion node deployment fails when behind an http proxy.
This patch adds explicit exports for HTTP_PROXY and HTTPS_PROXY when those
variables are defined, and not empty.
Task: 29863
Change-Id: Id05c90d5bf99d720ae6002b38d3291e364e1e0c4
Similar to calico, deploy flannel as a DS.
Flannel can use the kubernetes API to store
data, so it doesn't need to contact the etcd
server directly anymore.
This patch drops to relatively large files for
flannel's config, flannel-config-service.sh and
write-flannel-config.sh. All required config is
in the manifests.
Additional options to the controller manager:
--allocate-node-cidrs=true and --cluster-cidr.
Change-Id: I4f1129e155e2602299394b5866165260f4ea0df8
story: 2002751
task: 24870
* Use the external cloud-provider [0]
* Label master nodes
* Make the script the deploys the cloud-provider and clusterroles
for the apiserver a SoftwareDeployment
* Rename kube_openstack_config to cloud-config,
for cinder to workm the kubelet expects the cloud config name only
like this. Keep a copy of kube_openstack_config for backwards
compatibility.
Change-Id: Ife5558f1db4e581b64cc4a8ffead151f7b405702
Task: 22361
Story: 2002652
Co-Authored-By: Spyros Trigazis <spyridon.trigazis@cern.ch>
1. pods with host network can not reach coredns or any svc or resolve
their own hostname
2. If webhooks are deployed in the cluster, the apiserver needs to
contact them, which means kube-proxy is required in the master node with
the cluster-cidr set.
Change-Id: Icb8e7c3b8c75a3ab087c818c8580c0c8a9111d30
story: 2003460
task: 24719
The statement in configure-kubernetes-master and minion
that is checking weather to enable the cloud provider needs
to be split into two and use one '='.
Change-Id: I64b2d5be10058b2d03c406519b3d80e212844d15
story: 1775358
In these environments, the Kubelet needs to be told to use
a different flexvolume plugin directory that is accessible
and writeable (rw). By default, it's /usr/libexec/kubernetes/\
kubelet-plugins/volume/exec/. It raised read-only directory error
when creating.
The patch simply change flexvolume dir to accessible and
writeable one.
Change-Id: Iaa470890547a2ccf734e37498e0c5286e815ff97
Task: 22565
Story: 2002723
Add 'cloud_provider_enabled' label for the k8s_fedora_atomic
driver. Defaults to true. For specific kubernetes versions if
'cinder' is selected as a 'volume_driver', it is implied that
the cloud provider will be enabled since they are combined.
The motivation for this change is that in environments with
high load to the OpenStack APIs, users might want to disable
the cloud provider.
story: 1775358
task: 1775358
Change-Id: I2920f699654af1f4ba45644ab60a04a3f70918fe
This patch allows specification of Cgroup driver for Kubelet service.
The necessity of this patch was realised after upgrading Docker to the
new community edition (17.3+) which defaults to `cgroupfs` Cgroup
driver but on the other hand, Fedora Atomic (version 27) comes with
1.13. Cgroup drivers for Docker need to be identical for the two
services, Docker and Kubelet, need to be able to work together.
Story: 2002533
Task: 22079
Change-Id: Ia4b38a63ede59e18c8edb01e93acbb66f1e0b0e4
By current design, pods under kube-system will run on minion nodes. And
given now we're not running kubelet on master node, so calico-node is
not running on k8s master node. As a result, kubectl proxy is not
working to access dashboard. And it's confirmed with calico team that
the calico-node container must be running on master node if user want
to use kubectl proxy, see [1]. So, the solution is enabling kubelet
on master but disallow the other pods scheduled on master with
taint/tolerations.
Besides, this patch includes another fix about running calico on
Fedora Atomic. Because Fedora Atomic is using NetworkManager, it
manipulates the routing table for interfaces in the default network
namespace where Calico veth pairs are anchored for connections to
containers. This can interfere with the Calico agent’s ability to
route correctly. Please see more information about this at [2].
[1] https://docs.projectcalico.org/v3.0/getting-started/kubernetes/
installation/integration#about-the-calico-components
[2] https://docs.projectcalico.org/master/usage/troubleshooting/
#configure-networkmanager
Closes-Bug: #1751978
Change-Id: Iacd964806a28b3ca6ba3e037c60060f0957d44aa
Define a set of new labels to pass additional options to the kubernetes
daemons - kubelet_options, kubeapi_options, kubescheduler_options,
kubecontroller_options, kubeproxy_options.
In all cases the default value is "", meaning no extra options are
passed to the daemons.
Change-Id: Idabe33b1365c7530edc53d1a81dee3c857a4ea47
Closes-Bug: #1701223
In Fedora Atomic 27 etcd and flanneld are removed from the base image.
Install them as a system containers.
* update docker-storage configuration
* add etcd and flannel tags as labels
Change-Id: I2103c7c3d50f4b68ddc11abff72bc9e3f22839f3
Closes-Bug: #1735381
Due to a few several small connected patches for the
fedora atomic driver, this patch includes 4 smaller patches.
Patch 1:
k8s: Do not start kubelet and kube-proxy on master
Patch [1], misses the removal of kubelet and kube-proxy from
enable-services-master.sh and therefore they are started if they
exist in the image or the script will fail.
https://review.openstack.org/#/c/533593/
Closes-Bug: #1726482
Patch 2:
k8s: Set require-kubeconfig when needed
From kubernetes 1.8 [1] --require-kubeconfig is deprecated and
in kubernetes 1.9 it is removed.
Add --require-kubeconfig only for k8s <= 1.8.
[1] https://github.com/kubernetes/kubernetes/issues/36745
Closes-Bug: #1718926https://review.openstack.org/#/c/534309/
Patch 3:
k8s_fedora: Add RBAC configuration
* Make certificates and kubeconfigs compatible
with NodeAuthorizer [1].
* Add CoreDNS roles and rolebindings.
* Create the system:kube-apiserver-to-kubelet ClusterRole.
* Bind the system:kube-apiserver-to-kubelet ClusterRole to
the kubernetes user.
* remove creation of kube-system namespaces, it is created
by default
* update client cert generation in the conductor with
kubernetes' requirements
* Add --insecure-bind-address=127.0.0.1 to work on
multi-master too. The controller manager on each
node needs to contact the apiserver (on the same node)
on 127.0.0.1:8080
[1] https://kubernetes.io/docs/admin/authorization/node/
Closes-Bug: #1742420
Depends-On: If43c3d0a0d83c42ff1fceffe4bcc333b31dbdaab
https://review.openstack.org/#/c/527103/
Patch 4:
k8s_fedora: Update coredns config to pass e2e
To pass the e2e conformance tests, coredns needs to
be configured with POD-MODE verified. Otherwise, pods
won't be resolvable [1].
[1] https://github.com/coredns/coredns/tree/master/plugin/kuberneteshttps://review.openstack.org/#/c/528566/
Closes-Bug: #1738633
Change-Id: Ibd5245ca0f5a11e1d67a2514cebb2ffe8aa5e7de
Since 1.6 --apiservers is deprecated and it is removed in
1.8. Add the server parameter in kubeconfig and remove
--apiservers.
Change-Id: Ie766ec0797fdc86a93e7f70a321d39332a73b552
Closes-Bug: #1718926
Add a label to prefix all container image use by magnum:
* kubernetes components
* coredns
* node-exporter
* kubernetes-dashboard
Using this label all containers will be pulled from the specified
registry and group in the registry.
TODO:
* grafana
* prometheus
Closes-Bug: #1712810
Change-Id: Iefe02f5ebc97787ee80431e0f16f73ae8444bdc0
Separate the tag from which to pull from the kubernetes version.
With the current state the tag and the version happen to be the
the same. But, it is not decided yet in the fedoraproject how the
images are going to be tag. Finally, operators might want to try
their own container images with custom tags.
Depends-On: Icddb8ed1598f2ba1f782622f86fb6083953c3b3f
Implements: blueprint run-kube-as-container
Change-Id: I4c4bc055d7df5e65aede93464bff51e6d5971504
Following up of https://review.openstack.org/#/c/487943
Depends-On: I9a7d00cddb456b885b6de28cfb3d33d2e16cc348
Implements: blueprint run-kube-as-container
Change-Id: Icddb8ed1598f2ba1f782622f86fb6083953c3b3f
Use system containers based on fedora rawhide from
projectatomic [1]. Until the fedoraproject updated
the tags properly we mirror our containers in [2].
System containers are meant to be drop in replacements
of the fedora kubernetes binaries.
Update k8s to 1.7.4 to match the version in the containers.
[1] https://github.com/projectatomic/atomic-system-containers
[2] https://hub.docker.com/r/openstackmagnum/
Implements: blueprint run-kube-as-container
Change-Id: I22918c0b06ca34d96ee68ac43fabcd5c0b281950
Kubernetes uses cetificates, kubeconfig and the kubernetes openstack
cloud provider configuration from /srv/kubernetes and /etc/sysconfig.
The upstream kubernetes system containers used with atomic hosts
mounts /etc/kubernetes, we can unify the location of all kubernetes
configuration and also be able to use the upstream containers
unmodified.
Implements: blueprint run-kube-as-container
Change-Id: I9b2da390745836d9a66b7c8fc995a35cb74993e9
Enable internal cluster DNS by deploying CoreDNS in the kube-system
namespace. It covers dns queries for both the cluster and external,
acting as a proxy with a cache layer in front.
Version of CoreDNS hard-coded to 007, image taken from dockerhub.
Related-Bug: #1692449
Change-Id: I0a9703b531fe872416dcd79fa7d4d27c1ea61586
[Issue]
Container Log file which is located in /var/log/containers cannot be
founded when k8s cluster is created based on Fedora or CentOS
It is because docker set log-driver "journald" as it's default
[Solution]
Added Command into both configure-kubernetes-master and minion
It search string "--log-driver=journald" in /etc/sysconfig/docker
and then remove it.
After that, docker'll write logs into it's default.
Closes-Bug: #1690717
Change-Id: Ie8449c04c792e17e084187e5e1853c0f957717ce