Without this, heat container agents using kubectl version
1.18.x (e.g. ussuri-dev) fail because they do not have the correct
KUBECONFIG in the environment.
Task: 39938
Story: 2007591
Change-Id: Ifc212478ae09c658adeb6ba4c8e8afc8943e3977
Now the label `fixed_network_cidr` is not handled correctly, no matter
if the label is set, the default value '10.0.0.0/24' is used for
fixed network anyway. This patch fixes it and renamed it as
`fixed_subnet_cidr` to make less confusion. The new behaviour will be:
1. If the label `fixed_subnet_cidr` is set but no fixed subnet passed
in, then a new subnet will be created with the given CIDR.
2. If a fixed subnet is passed in by user, then label `fixed_subnet_cidr`
will be override with the CIDR from the given subnet.
Task: 39847
Story: 2007712
Change-Id: Id05e36696bf85297a556fcd959ed897fe47b7354
When resizing a NG we should strictly send the
desired node_count and the nodes_to_remove.
Otherwise the stack update operation may replace/rebuild
nodes or other resources.
This was the functionality with:
Id84e5d878b21c908021e631514c2c58b3fe8b8b0
But it was reverted with:
I725413e77f5a7bdb48131e8a10e5dc884b5e066a
Story: 2005266
task: 39860
Change-Id: Ib31b6801e0e2d954c31ac91e77ae9d3ef1afebd2
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>
Eventlet used by many openstack packages depends on greenlet which does
not have a pip release supported by Python 3.9 (default Python version
on Fedora 33). Therefore, pin Fedora to version 32 until new greenlet
release is cut which includes the required fix [0].
Also update default heat_container_agent_tag to victoria-dev.
[0] https://github.com/python-greenlet/greenlet/pull/161
Change-Id: Ice75ae880925cd15c096eb6d1cdabf7f802bccde
Story: 2007264
Task: 39941
Export proxy settings for helm install to make sure
helm can reach charts site.
Task: 39877
Story: 2007725
Change-Id: I4de26d40b7c5ba2759b4892349c59cf3cc870241
- Refactor helm installer to use a single meta chart install job
install job and config which use Helm v3 client.
- Use upstream helm client binary instead of using helm-client container
maintained by us. To verify checksum, helm_client_sha256 label is
introduced for helm_client_tag (or alternatively for URL specified
using new helm_client_url label).
- Default helm_client_tag=v3.2.1.
- Default tiller_tag=v2.16.7, tiller_enabled=false.
Story: 2007514
Task: 39295
Change-Id: I9b9633c81afb08b91576a9a4d3c5a0c445e0cee4
apiserver controller-manager and scheduler are not used in the minions.
story: 2007568
task: 39837
Change-Id: I93b380c484b7e3881b2aa0620fe41ab9d61c1eec
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
- Deprecate in-tree Cinder volume driver for removal in X cycle in
favour of out-of-tree Cinder CSI plugin for Kubernetes.
- Set cinder_csi_enabled to True by default from V cycle.
- Add unit test for in-tree Cinder deprecation.
- Add mssing unit tests for resent docker_storage_driver deprecation.
Change-Id: I6f033049b5ff18c19866637efc8cf964272097f5
Story: 2007048
Task: 37873
There are several some issues in current upgrade script.
1. The kubectl command location has changed
2. Before checking the digest of the hyperkube image, better wait
until the image fully downloaded.
3. Using full name to inspect image
4. Get the correct ostree commit id
Task: 39785
Story: 2007676
Change-Id: I5c16b123683ef1173c22d4e4628c36234871cb93
A new label named `master_lb_allowed_cidrs` is added to control
the IP range which can access the k8s api and etcd load balancers.
It's a good security enhancement.
Task: 39188
Story: 2007414
Change-Id: I157a3b01d169e550e79b94316803fde8ddf77b03
* remove user since it is controlled in the chart
and changed from 33 to 101
* use the latest chart v1.36.3 from stable
* use latest 0.32.0 controller image
story: 2006945
task: 39747
Change-Id: I6df49929cb8890f534afde185d56b7b6d70c691e
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>
Most of the times issues with cluster update/upgrade/resize can be
identified just by looking at the parameters sent to Heat. This patch
changes the existing log messages for cluster update and resize to info
from debug. Adds a log message for cluster upgrade.
story: #2007636
task: #39689
Change-Id: Ibac5e105885b6e7042e88dea31cfeafe42a401ab
In commit I1a75f1bf12747508a3497293650d3cc668202de6 the worker node
is missed to add the docker storage support. And the current systemd
unit is not really working. So this patch fixes it by removing the
hardcode for /dev/vdb and using xfs instead of ext4 (the same way
for Fedora Atomic) to make it simpler and solid.
Task: 39331
Story: 2005201
Change-Id: I4c465664eb19f1992df95750dd7b2d99688c6cae
In the heat-agent we use kubectl to install
several deployments, it is better if we use
matching versions of kubectl and apiserver
to minimize errors. Additionally, the
heat-agent won't need kubectl anymore.
story: 2007591
task: 39536
Change-Id: If8f6d84efc70606ac0d888c084c82d8c7eff54f8
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>
Heapster has been deprecated for a while and the new k8s dashboard
2.0.0 version supports metrics-server now. So it's time to upgrade
the default k8s dashboard to v2.0.0.
Task: 39101
Story: 2007256
Change-Id: I02f8cb77b472142f42ecc59a339555e60f5f38d0
Use kubectl from the heat agent to apply the
traefik deployment. Current behaviour was to
create a systemd unit to send the manifests
to the API.
This way we will have only one way for applying
manifests to the API.
This change is triggered to adddress the kubectl
change [0] that is not using 127.0.0.1:8080 as
the default kubernetes API.
[0] https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#kubectl
story: 2005286
task: 39522
Change-Id: I8982bd4ec2ab69f35938970d604c16ac5e62e1fa
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
Expose autoscaler prometheus metrics on pod
port/portName name metrics (8085).
task: 37574
story: 2006765
Change-Id: Ieedd0f60625eb5a5ce50a3b4e7344cae37c377bf
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
This is a corner case that when floating_ip_enabled=False,
master_lb_enabled=True,master_lb_floating_ip_enabled=False in
cluster template, but setting floating_ip_enabled=True when
creating the cluster. The current logic is not correct which
resulted in missing IP address in the api_address of cluster.
Task: 39519
Story: 2007586
Change-Id: I5e2ca270c4f4e2c48d067cd5b8f6609c037cb6e5
According to upstream kube-flannel.yml PR[1], node-selector introduced
is because flannel image doesn't support multiarch manifestes. Which
means that it can't specify flannel:version-arch images in the same
daemonset for every arch platform. To make every arch platform can
deploy flannel upstream add one daemonset with nodeSelector per arch.
But in magnum flannel image tag is configurable via label, thus every
arch platform can use one daemonset to deploy by specify corresponding
flannel image tag. So nodeSelector is unnecessary here.
[1]: https://github.com/coreos/flannel/pull/989
Change-Id: I97e78e8d77973e03eeff598b212287945ca00190
Task: 39453
Story: 2007026
Following changes were introduced in Train release:
- Allow setting network, subnet and FIP when creating cluster
(I11579ff6b83d133c71c2cbf49ee4b20996dfb918)
- ng-7: Adapt parameter and output mappings
(I45cf765977c7f5a92f28ae12c469b98435763163)
The first change allowed setting cluster.floating_ip_enabled but the
second change introduced ServerAddressOutputMapping conditional on
cluster_template.floating_ip_enabled which leads to an edge case where
if floating_ip_enabled is overriden to False when a cluster is created
when it is True in the cluster_template scope, we see this error in the
conductor logs: ValueError: Field `node_addresses[0]' cannot be None and
the cluster remains forever stuck in CREATE_IN_PROGRESS status despite
Heat reaching CREATE_COMPLETE. This commit addresses this issue by
correctly referring to the cluster.floating_ip_enabled.
Change-Id: Ic83f625178786d4750a66dd6dd6db35c05bc0272
Story: 2007550
Task: 39401
To mount nfs volumes with the embedded volume
pkg [0], rpc-statd is required and should be
started by mount.nfs. When running kubelet
in a chroot this fails. With atomic containers
it used to work.
[0] https://github.com/kubernetes/kubernetes/tree/master/pkg/volume/nfs
story: 2005201
task: 39403
Change-Id: Ib64efe7ecbe9a24e86fa9d9a35a4d90c0e8bbf2e
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>
In Icc4aa1f61f3b3937e5d9cc35dbe01c63c18ba3cd, we only opened tcp port 53
but services running on workers are unable to talk to CoreDNS service
running on master nodes when using Calico v3.13.1 without also opening
udp port 53. This patch addresses this issue.
Task: 39347
Story: 2007256
Change-Id: Ied4196e6f1ddcb131492b48fb57ff0ba9063bbf4
The original design of k8s cluster health status is allowing
the health status being updated by Magnum control plane. However,
it doesn't work when the cluster is private. This patch supports
updating the k8s cluster health status via the Magnum cluster
update API by a 3rd party service so that a controller (e.g.
magnum-auto-healer) running inside the k8s cluster can call
the Magnum update API to update the cluster health status.
Task: 38583
Story: 2007242
Change-Id: Ie7189d328c4038403576b0324e7b0e8a9b305a5e
For backwards compatibility support calico
v3.3.6 as well. The control flow is managed
in the heat templates.
Story: 2007256
task: 39280
Change-Id: Id61dbdaf09cde35fdd532e3fff216934c1ef4dff
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
The tags on quay.io/coreos/etcd follow the same format as
https://github.com/etcd-io/etcd/releases compared to k8s.gcr.io which
modifies the canonical version tag by dropping the "v" prefix.
Story: 2007475
Task: 39184
Change-Id: If44eb55a68c13f8e1706242c099578ed1f264d62
Improve the taint of master node kubelet to get the conformance
test passed and update the OCCM and Helm/Tiller tolerations accordingly.
Task: 39223
Story: 2007256
Change-Id: Ief452e05ddf13a1d1ee77641311c3ae7abbe90f2
The default version of coreDNS now is upgraded to 1.6.6 and
the coreDNS pod can be scheduled to master nodes.
Task: 39209
Story: 2007256
Change-Id: Icc4aa1f61f3b3937e5d9cc35dbe01c63c18ba3cd
The repo is Python 3 now, so update hacking to version 3.0 which
supports Python 3.
Fix problems found.
Update local hacking checks for new flake8.
Remove hacking and friends from lower-constraints, those are not needed
for co-installing.
Change-Id: I926efaef501f190e78da9cab40c1e94203277258
Kubelet fails to handle SELinux labelling of Cinder PV without
presenting the rootfs to Kubelet and as a result, an unprivileged
container lacks the ability to access the path.
With this patch, Kubelet handles the correct labelling automatically
when a Cinder PV is attached to a pod.
The default behaviour using system containers in Fedora Atomic is to
mount rootfs [1] but we did not implement the same behaviour in Fedora
CoreOS which was a mistake as this was a missing piece of code.
[1] https://github.com/openstack/magnum/blob/master/dockerfiles/kubernetes-kubelet/config.json.template#L335
Story: 2007413
Task: 39129
Change-Id: Id59c604928244bf49773b7519fa756d5b2814b69
With I13aa0c58bf168bc069edf1d5c0187f89011fffdb, we missed to update
the default value of pods_network_cidr. As a result, there is a
mismatch between the calico_ipv4pool and the cidr configured in
kubernetes (kube-proxy and kube-controller-mananer). The mismatch
will cause some connection issues between pods/nodes. This patch
fixes it.
Task: 39153
Story: 2007426
Change-Id: Ic560322f5009f28e7e72704508705c1572a9262d
At present, if floating_ip_enabled is true and master_lb_enabled is also
true but master_lb_fip_enabled is false, a cluster still resolves to
being accessible therefore resolves to an UNHEALTHY status when it
should be UNKNOWN. This patch fixes this edge case and also adds a unit
test to capture this issue.
Story: 2007242
Task: 39140
Change-Id: I74f7455e3caa920032080747a315470878ba5500
At present, when a fixed_network is not specified, it is given the name
"private" by default. When multiple clusters are created, we end up in a
situation where we end up with multiple networks all with the same name.
This PS intends to make it easier to see where the resources belong to
by using the cluster name.
Story: 2007460
Task: 39139
Change-Id: I7f8028b716f9a9eced17d85ca2e46e2b1e34875f
At present, the status reason resolves to:
default-master <reason> ,default-worker <reason>
It should be:
default-master <reason>, default-worker <reason>
This minor patch fixes this.
Task: 39092
Story: 2007438
Change-Id: I3382da8d950279713861e14d97997d5a5205b1e7
The current default Calico IPv4 CIDR 192.168.0.0/16 is too common and
it has bring us some IP conflicts troubles on production. This patch is
proposing to replace it with a rare CIDR range.
Task: 39052
Story: 2007426
Change-Id: I13aa0c58bf168bc069edf1d5c0187f89011fffdb
Set the max-size for container/pod logs to 10m
and max of 5 rotated files. The values relay
the default of kubernetes when it is using
a remote container runtime [0] (container-log-max-files
and container-log-max-size) This defaults cover the
case of containerd.
[0] https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
story: 2007402
task: 39031
Change-Id: Ie3106b40b4d1c6866761c507122047e88e513651
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>
The upstream docs [0] were missing a parameters
for the calico-node ClusterRole.
Without it we get:
2020-02-21 11:41:35.762 [ERROR][8]
...
User "system:serviceaccount:kube-system:calico-node"
cannot patch resource "nodes/status" in API group ""
at the cluster scope
[0] https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml
Needs to be backported to train.
story: 2005318
task: 39041
Change-Id: Ib7d3068ee53c08fea32a69c997b6de6477a17f0a
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>