Without this, heat container agents using kubectl version
1.18.x (e.g. ussuri-dev) fail because they do not have the correct
KUBECONFIG in the environment.
Task: 39938
Story: 2007591
Change-Id: Ifc212478ae09c658adeb6ba4c8e8afc8943e3977
- Refactor helm installer to use a single meta chart install job
install job and config which use Helm v3 client.
- Use upstream helm client binary instead of using helm-client container
maintained by us. To verify checksum, helm_client_sha256 label is
introduced for helm_client_tag (or alternatively for URL specified
using new helm_client_url label).
- Default helm_client_tag=v3.2.1.
- Default tiller_tag=v2.16.7, tiller_enabled=false.
Story: 2007514
Task: 39295
Change-Id: I9b9633c81afb08b91576a9a4d3c5a0c445e0cee4
In the heat-agent we use kubectl to install
several deployments, it is better if we use
matching versions of kubectl and apiserver
to minimize errors. Additionally, the
heat-agent won't need kubectl anymore.
story: 2007591
task: 39536
Change-Id: If8f6d84efc70606ac0d888c084c82d8c7eff54f8
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>
Heapster has been deprecated for a while and the new k8s dashboard
2.0.0 version supports metrics-server now. So it's time to upgrade
the default k8s dashboard to v2.0.0.
Task: 39101
Story: 2007256
Change-Id: I02f8cb77b472142f42ecc59a339555e60f5f38d0
Until now we had only the output of the Fedora
CoreOS Configuration Transpiler. Add a yaml
that can transpile it to an ignition file.
The current ignition file was generate with
version v0.4.0:
podman run --rm -v ./fcct-config.yaml:/config.fcc:z \
quay.io/coreos/fcct:v0.4.0 \
--pretty --strict --input /config.fcc > ./user_data.json
story: 2005201
task: 39027
Change-Id: I5cb78aa625c926e101424c04573002d05ac82a59
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
A new config option `post_install_manifest_url` is added to support
installing cloud provider/vendor specific manifest after booted
the k8s cluster. It's an URL pointing to the manifest file. For
example, cloud admin can set their specific storageclass into
this file, then it will be automatically setup after created
the cluster.
Task: 35798
Story: 2006209
Change-Id: Ib5a2c5cd7970085db941f189613e175f622aea3f
Add support for out of tree Cinder CSI. This is installed when the
cinder_csi_enabled=true label is added. This will allow us to eventually
deprecate in-tree Cinder.
story: 2007048
task: 37868
Change-Id: I8305b9f8c9c37518ec39198693adb6f18542bf2e
Signed-off-by: Bharat Kunwar <brtknr@bath.edu>
For a multi AZ env, if Nova doesn't support cross AZ volume mount,
then the cluster creation may fail because of block device mapping
error. The patch fixes this issue by passing in the AZ information
when creating volumes for etcd, docker and the node root disk.
Task: 38131
Story: 2007097
Change-Id: I39c99259abc84cbbee50ac1a827e9349ede6593c
IPIP Mode to use for the IPv4 POOL created at start up
allowed_values: ["Always", "CrossSubnet", "Never", "Off"]
default: "Off"
Change-Id: Ib834a1f86a6db408047cc8f86fc7744d16d83904
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
Given we're using public container registry as the default registry,
so it would be nice to have a verification for the image's digest.
Kubernetes already supports that so user can just use format like
@sha256:xxx for those addons' tags. This patch introduces the support
for hyperkube based on podman and fedora coreos driver.
Task: 37776
Story: 2007001
Change-Id: I970c1b91254d2a375192420a9169f3a629c56ce7
Due to the big changes recently to support k8s rolling upgrade, a
regression issue was introduced which is broken the proxy function
for image downloading. This patch fixes it for both fedor atomic
driver and fedora coreos driver.
Task: 37784
Story: 2007005
Change-Id: I11113d69629e1a97a58e5270f67c7404292b45c3
Magnum allows to use CONTAINER_INFRA_PREFIX to specify a local
repository from which we can pull container images. This repository
defaults to the upstream one that is specified in the metrics helm
chart.
* This patch allows for the usage of CONTAINER_INFRA_PREFIX to
correctly configure the pull of the metric-server container image
from the specified repo.
* Add label metrics_server_chart_tag to allow user to specify
stable/metrics-server chart tag to use
* Add label metrics_server_enabled to allow enable/disable of
component (defaults: true)
Story: 2004816
Task: 37390
Change-Id: Idc315937a82317b76349bbe8466d900d00194953
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
This will install the prometheus-adapter stable
helm chart. Requires monitoring_enabled=true.
The chart version can be configured using
prometheus_adapter_chart_tag and an option is
available to overwrite the default configuration
rules for a user defined ConfigMap referenced
by using prometheus_adapter_configmap label.
story: 2006765
task: 37278
Change-Id: I5b86f4455f88c8dbeac6e56942e1ca55f1d1726c
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
Additioanlly, bumping up the Chart version to 1.24.7 without which the
ingress controller fails to deploy on 1.16.x.
Additionally, bump up nginx_ingress_controller_tag version to 0.26.1.
This is to ensure that we are running an up to date nginx ingress
controller with fixes for known CVEs.
Story: 2006853
Task: 37444
Change-Id: Ibf045a06d19b02095e19d9a21d14a91a39a3751c
In the fedora coreos driver we should take the heat-agent
image from the parameters provided in the templates.
Change-Id: I48081b57192738b00fe14317d2488e658020a0ea
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
Choose whether system containers etcd, kubernetes and the heat-agent will be
installed with podman or atomic. This label is relevant for k8s_fedora drivers.
k8s_fedora_atomic_v1 defaults to use_podman=false, meaning atomic will be used
pulling containers from docker.io/openstackmagnum. use_podman=true is accepted
as well, which will pull containers by k8s.gcr.io.
k8s_fedora_coreos_v1 defaults and accepts only use_podman=true.
Fix upgrade for k8s_fedora_coreos_v1 and magnum-cordon systemd unit.
Task: 37242
Story: 2005201
Change-Id: I0d5e4e059cd4f0458746df7c09d2fd47c389c6a0
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
With this change each node will be labeled with the following:
* --node-labels=magnum.openstack.org/role=${NODEGROUP_ROLE}
* --node-labels=magnum.openstack.org/nodegroup=${NODEGROUP_NAME}
Change-Id: Ic410a059b19a1252cdf6eed786964c5c7b03d01c
Add fedora coreos driver. To deploy clusters with fedora coreos operators
or users need to add os_distro=fedora-coreos to the image. The scripts
to deploy kubernetes on top are the same with fedora atomic. Note that
this driver has selinux enabled.
The startup of the heat-container-agent uses a workaround to copy the
SoftwareDeployment credentials to /var/lib/cloud/data/cfn-init-data.
The fedora coreos driver requires heat train to support ignition.
Task: 29968
Story: 2005201
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
Change-Id: Iffcaa68d385b1b829b577ebce2df465073dfb5a1
Resource creation order in kubernetes templates for Fedora Atomic
was changed to avoid neutron bug https://bugs.launchpad.net/neutron/+bug/1845360
Floating IP should be assigned to network port after instance creation
Change-Id: Ib7e0503d475d7cd3164a116c3a0325c4ae417a0a
Story: 2006631
Task: 36844
Support boot from volume for Kubernetes all nodes (master and worker)
so that user can create a big size root volume, which could be more
flexible than using docker_volume_size. And user can specify the
volume type so that user can leverage high performance storage, e.g.
NVMe etc.
And a new label etcd_volme_type is added as well so that user can
set volume type for etcd volume.
If the boot_volume_type or etcd_volume_type are not passed by labels,
Magnum will try to read them from config option
default_boot_volume_type and default_etcd_volume_type. A random
volume type from Cinder will be used if those options are not set.
Task: 30374
Story: 2005386
Co-Authorized-By: Feilong Wang<flwang@catalyst.net.nz>
Change-Id: I39dd456bfa285bf06dd948d11c86867fc03d5afb
Sometimes, the fixed_network value gets rendered as UUID. However OCCM's
internal-network-name requires the network name, it does not support
UUID. This patch introduces a new parameter called fixed_network_name
which converts fixed_network UUID to name if it is UUID-like.
Story: 2005333
Task: 36313
Change-Id: I3453bc0dbea285687d39c9782685cb1f2a3ecd39
We kept introspecting the name of the instance with the assumption
that the network always existed under .novalocal
This is not always the case, with certain variables changed inside
Neutron it is possible to control this, therefore, leading in failing
deploys.
With this change, we pass the instance name directly to the cluster
and therefore we always have the accurate name.
Task: 36160
Story: 2006371
Change-Id: I2ba32844b822ffc14da043e6ef7d071bb62a22ee
When there is more than one NIC attached to an instance, openstack cloud
provider returns a random InternalIP back to the host resulting in instability
with API server which only talks to a default interface.
This patch incorporates the changes made in
https://github.com/kubernetes/cloud-provider-openstack/pull/444 which enables
OpenStack Cloud Controller Manager (OCCM) to respect the
`internal-network-name` in cloud-config file which ensures that InternalIP
remains stable.
Uses a separate cloud-config file for OCCM to ensure in-tree Cinder volumes
remain compatible.
Change-Id: Idfa52ed2d512e7dc383a556371e896205dd542f9
Story: 2005333
Task: 30271
* prometheus-operator chart version upgraded from 0.1.31. to 5.12.3
* Fix an issue where when using Feature Gate Priority the scheduler
would evict the prometheus monitoring node-exporter pods
* Fix an issue where intensive CPU utilization would make the
metrics fail intermitently or completly fail
* Prometheus resources are now calculated based on the MAX_NODE_COUNT
requested
* Change the sampling rate from the standard 30s to 1 minute (Rollback)
* Add the missing tiller CONTAINER_INFRA_PREFIX variable to the ConfigMap
* Add label prometheus_operator_chart_tag to enable the user to
specify the stable/prometheus-operator chart to use
* Fix breaking changes on CoreDNS metrics introduced by
8fb27da2fc
* Fix Graphana dashboard not showing data.
Change-Id: If42873cd6668c07e4e911e4eef5e4ae2232be66f
Task: 30777
Task: 30779
Story: 2005588
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
Rolling ugprade is an important feature for a managed k8s service,
at this stage, two user cases will be covered:
1. Upgrade base operating system
2. Upgrade k8s version
Known limitation: When doing operating system upgrade, there is no
chance to call kubectl drain to evict pods on that node.
Task: 30185
Story: 2002210
Change-Id: Ibbed59bc135969174a20e5243ff8464908801a23
The current magnum traefik deployment will always pull latest traefik
container image. With the new launch of traefik v2
(https://blog.containo.us/back-to-traefik-2-0-2f9aa17be305) this will
have impact on how the ingress is described in k8s.
This patch:
* Sets the traefik version to default tag v1.7.9, stable release
prior to v2.
* Adds a new label <traefik_ingress_controller_tag> to enable user
to specify other than default traefik release.
Task: 30143
Task: 30146
Story: 2005286
Change-Id: I031a594f7b6014d88df055664afcf51b1cd2cd94
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
Using Node Problem Detector, Draino and AutoScaler to support
auto healing for K8s cluster, user can use a new label
"auto_healing_enabled' to turn on/off it.
Meanwhile, a new label "auto_scaling_enabled" is also introduced
to enable the capability to let the k8s cluster auto scale based
its workload.
Task: 28923
Story: 2004782
Change-Id: I25af2a72a7a960205929374d2300bd83d4d20960
Add an nginx based Ingress controller for Kubernetes.
The use case is to provide better support use cases which require either
L4 access or SSL passthrough, which lack proper support in Traefik.
Selection is done via the same label 'ingress_controller' with value
'nginx'. Deployment relies on the upstream nginx-ingress helm chart.
Change-Id: I1db2074fce9d43c03f479a6aaeb4f238d7101555
Story: 2005327
Task: 30255
When there is more than one NIC attached to an instance, openstack cloud
provider returns a random InternalIP back to the host resulting in instability
with API server which only talks to a default interface.
This patch incorporates the changes made in
https://github.com/kubernetes/cloud-provider-openstack/pull/444 which enables
OpenStack Cloud Controller Manager to respect the `internal-network-name` in
cloud-config file which ensures that InternalIP remains stable.
Story: 2005333
Task: 30271
Change-Id: I9e3ad459dd05753b53cb4ce75ee3aed649fef196
The Kubernetes Helm repository includes in its stable distribution
a prometheus-operator Chart.
This stable/prometheus-operator chart can be used to install all the
dependencies and some default configurations to use prometheus.
The installed extra charts are:
* stable/prometheus-node-exporter (data scraping)
* stable/prometheus (prometheus and alertmanager server)
* stable/grafana (visualization dashboard)
* stable/prometheus-operator (supervision and simple configuration)
The prometheus-operator is installed by using the label
monitoring_enabled=True. Also, the label grafana_admin_passwd can be
used to set the admin password for access to the grafana dashboard
This patch allows for transferral of prometheus monitoring maintenance
work to be done by the kubernetes/helm team.
Task: 28544
Story: 2004623
depends_on: I99d3a78085ba10030200f12bbfe58a72964e2326
Change-Id: I80d590785bf30f9d634debeaf51c0d4cce0aeb93
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
Deploying Node Problem Detector to all nodes to detect problems which
can be leverage by auto healing. This is the first step of enabling
the auto healing feature.
Task: 29886
Story: 2004782
Change-Id: I1b6075025c5f369821b4136783e68b16535dc6ef
Similar to calico, deploy flannel as a DS.
Flannel can use the kubernetes API to store
data, so it doesn't need to contact the etcd
server directly anymore.
This patch drops to relatively large files for
flannel's config, flannel-config-service.sh and
write-flannel-config.sh. All required config is
in the manifests.
Additional options to the controller manager:
--allocate-node-cidrs=true and --cluster-cidr.
Change-Id: I4f1129e155e2602299394b5866165260f4ea0df8
story: 2002751
task: 24870
Add enable_tiller label to install tiller in k8s_fedora_atomic
clusters. Defaults to false.
Add tiller_tag label to select the version of tiller. If the
tag is not set the tag that matches the helm client version in
the heat-agent will be picked. The tiller image can be stored
in a private registry and the cluster can pull it using the
container_infra_prefix label.
Install tiller securely using helper container.
TODO:
*add instructions on how RBAC is designed
https://docs.helm.sh/using_helm/#example-deploy-tiller-in-a-namespace-restricted-to-deploying-resources-in-another-namespace
* add docs on how to install addon in the cluster using this tiller
* how users can get the creds to talk to tiller
NOTE:
The main goal of this tiller is internal usage!
Users can still deploy other tillers in other namespaces.
story: 2003902
task: 26780
Change-Id: I99d3a78085ba10030200f12bbfe58a72964e2326
Signed-off-by: dioguerra <dy090.guerra@gmail.com>
- Add "octavia" as one of the "ingress_controller" options.
- Add label "octavia_ingress_controller_tag".
- Use external network ID in the heat templates.
Story: 2004838
Change-Id: I7d889a054cd5feb2eeef523b20607a6c7630d777
Now cloud-provider-openstack of Kubernetes has a webhook to support
Keystone authorization and authentication. With this feature, user
can use a new label 'keystone-auth-enabled' to enable the keystone
authN and authZ.
DocImpact
Task: 21637
Story: 1755770
Change-Id: I3d21ad8f55c0d7308a302f62db9e9af147a604f8
* Use the external cloud-provider [0]
* Label master nodes
* Make the script the deploys the cloud-provider and clusterroles
for the apiserver a SoftwareDeployment
* Rename kube_openstack_config to cloud-config,
for cinder to workm the kubelet expects the cloud config name only
like this. Keep a copy of kube_openstack_config for backwards
compatibility.
Change-Id: Ife5558f1db4e581b64cc4a8ffead151f7b405702
Task: 22361
Story: 2002652
Co-Authored-By: Spyros Trigazis <spyridon.trigazis@cern.ch>
- Start workers as soon as the master VM is created, rather than
waiting all the services ready.
- Move all the SoftwareDeployment outside of kubemaster stack.
- Tweak the scripts in SoftwareDeployment so that they can be combined
into a single script.
Story: 2004573
Task: 28347
Change-Id: Ie48861253615c8f60b34a2c1e9ad6b91d3ae685e
Co-Authored-By: Lingxian Kong <anlin.kong@gmail.com>