The derived cloud_provider_enabled is placed inside extra_params so that
openstack-cloud-controller-manager gets applied correctly. This required
change was unfortulately missed in https://review.opendev.org/681922.
Additionally improve the docs related to cloud_provider_enabled label.
Story: 2006531
Task: 36740
Change-Id: I4a89d25b467edd2c4be608c37055706e4e62d78b
Support boot from volume for Kubernetes all nodes (master and worker)
so that user can create a big size root volume, which could be more
flexible than using docker_volume_size. And user can specify the
volume type so that user can leverage high performance storage, e.g.
NVMe etc.
And a new label etcd_volme_type is added as well so that user can
set volume type for etcd volume.
If the boot_volume_type or etcd_volume_type are not passed by labels,
Magnum will try to read them from config option
default_boot_volume_type and default_etcd_volume_type. A random
volume type from Cinder will be used if those options are not set.
Task: 30374
Story: 2005386
Co-Authorized-By: Feilong Wang<flwang@catalyst.net.nz>
Change-Id: I39dd456bfa285bf06dd948d11c86867fc03d5afb
For moving to 1.15.x and beyond we need to have PSP for privileged pods.
flannel, calico and node-problem-detector need it.
PSP
story: 2006515
task: 36513
Allow-priv
story: 2006252
task: 35867
Change-Id: I306a249afb275fdbd71354ed75043ffc4d466304
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
Flannel is recommending using vxlan[1] and udp is just for debugging
or the kernel doesn't support vxlan or host-gw. So this patch is
proposing using 'vxlan' as the default value of label 'flannel_backend'
and it has been verified with sonobuoy.
[1] https://github.com/coreos/flannel/blob/master/Documentation/backends.md
Task: 36425
Story: 2006482
Change-Id: Ibe7f3446be894c593c6147186cc159bd01834d29
Sometimes, the fixed_network value gets rendered as UUID. However OCCM's
internal-network-name requires the network name, it does not support
UUID. This patch introduces a new parameter called fixed_network_name
which converts fixed_network UUID to name if it is UUID-like.
Story: 2005333
Task: 36313
Change-Id: I3453bc0dbea285687d39c9782685cb1f2a3ecd39
Based on the policy of heat-container-agent tag, now it's updated
as train-dev and as long as we release Train, it will be updated
with train-stable.
Change-Id: Iec43df292dbd6a7e7ee33a0d4b8670b653a7ebbd
A default value for keystone_auth_default_policy is needed when using
label keystone_auth_enabled=false during creating k8s cluster, otherwise
it will fail because of missing the default value. This patch fixes it.
Story: 2005915
Task: 34175
Change-Id: I465725ecd55bf7c4dccaa75a8cd23c59a5be8db0
* prometheus-operator chart version upgraded from 0.1.31. to 5.12.3
* Fix an issue where when using Feature Gate Priority the scheduler
would evict the prometheus monitoring node-exporter pods
* Fix an issue where intensive CPU utilization would make the
metrics fail intermitently or completly fail
* Prometheus resources are now calculated based on the MAX_NODE_COUNT
requested
* Change the sampling rate from the standard 30s to 1 minute (Rollback)
* Add the missing tiller CONTAINER_INFRA_PREFIX variable to the ConfigMap
* Add label prometheus_operator_chart_tag to enable the user to
specify the stable/prometheus-operator chart to use
* Fix breaking changes on CoreDNS metrics introduced by
8fb27da2fc
* Fix Graphana dashboard not showing data.
Change-Id: If42873cd6668c07e4e911e4eef5e4ae2232be66f
Task: 30777
Task: 30779
Story: 2005588
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
The previous patch I7ac0cffcdc8712503e2ea584b12d28ed3a7748b7
seems to have missed a few other variables so we're still
deploying 1.11 clusters by default.
Change-Id: Ia443e3a129418048270487bf46af2cff488731f3
story: 2005380
task: 30362
Minion is not a good name for k8s worker node anymore, now it has been
replace with 'node' to align with the k8s terminologies. So the server
name of a worker will be something like `k8s-1-lnveovyzpreg-node-0`
instead of `k8s-1-lnveovyzpreg-minion-0`.
Task: 31008
Story: 2005689
Change-Id: Ie9a68b18658e94b6ebe76ebeae8becc23714380d
With the new config option `keystone_auth_default_policy`, cloud admin
can set a default keystone auth policy for k8s cluster when the
keystone auth is enabled. As a result, user can use their current
keystone user to access k8s cluster as long as they're assigned
correct roles, and they will get the pre-defined permissions
set by the cloud provider.
The default policy now is based on the v2 format recently introduced
in k8s-keystone-auth which is getting more useful now. For example,
in v1 it doesn't support a policy for user to access resources from
all namespaces but kube-system, but v2 can do that.
NOTE: Now we're using openstackmagnum dockerhub repo until CPO
team fixing their image release issue.
Task: 30069
Story: 1755770
Change-Id: I2425e957bd99edc92482b6f11ca0b1f91fe59ff6
Rolling ugprade is an important feature for a managed k8s service,
at this stage, two user cases will be covered:
1. Upgrade base operating system
2. Upgrade k8s version
Known limitation: When doing operating system upgrade, there is no
chance to call kubectl drain to evict pods on that node.
Task: 30185
Story: 2002210
Change-Id: Ibbed59bc135969174a20e5243ff8464908801a23
The current magnum traefik deployment will always pull latest traefik
container image. With the new launch of traefik v2
(https://blog.containo.us/back-to-traefik-2-0-2f9aa17be305) this will
have impact on how the ingress is described in k8s.
This patch:
* Sets the traefik version to default tag v1.7.9, stable release
prior to v2.
* Adds a new label <traefik_ingress_controller_tag> to enable user
to specify other than default traefik release.
Task: 30143
Task: 30146
Story: 2005286
Change-Id: I031a594f7b6014d88df055664afcf51b1cd2cd94
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
Using Node Problem Detector, Draino and AutoScaler to support
auto healing for K8s cluster, user can use a new label
"auto_healing_enabled' to turn on/off it.
Meanwhile, a new label "auto_scaling_enabled" is also introduced
to enable the capability to let the k8s cluster auto scale based
its workload.
Task: 28923
Story: 2004782
Change-Id: I25af2a72a7a960205929374d2300bd83d4d20960
When using calico as network driver, the traffic between k8s
worker nodes need to be allowed otherwise services maybe not
accessible sometimes because connection can't be established.
This issue only impacts calico.
Task: 30525
Story: 2005294
Change-Id: Ia71283a1abc75a7fb806f2601ac09a685dc5a4bc
Add an nginx based Ingress controller for Kubernetes.
The use case is to provide better support use cases which require either
L4 access or SSL passthrough, which lack proper support in Traefik.
Selection is done via the same label 'ingress_controller' with value
'nginx'. Deployment relies on the upstream nginx-ingress helm chart.
Change-Id: I1db2074fce9d43c03f479a6aaeb4f238d7101555
Story: 2005327
Task: 30255
Using comma delimited ipv4 address list to specify multi dns server
"8.8.8.8,114.114.114.114".
Task: 29465
Story: 2004994
Change-Id: I031247b0cc2ae417f18b2a5b9b3832e78ed9dafd
In Rocky release, the k8s workers security group was wide opened but
in Stein release it is more restrictive which prevent the access of
Kubnertes dashboard(and other serivces) via the command:
$ kubectl proxy
This patch can fix it by allowing traffic from master security group
to workers security group.
Co-Authored: Feilong Wang<flwang@catalyst.net.nz>
Task: 30171
Story: 2005294
Change-Id: I546cd7324b87b267e945477c78539ea80534538f
The Kubernetes Helm repository includes in its stable distribution
a prometheus-operator Chart.
This stable/prometheus-operator chart can be used to install all the
dependencies and some default configurations to use prometheus.
The installed extra charts are:
* stable/prometheus-node-exporter (data scraping)
* stable/prometheus (prometheus and alertmanager server)
* stable/grafana (visualization dashboard)
* stable/prometheus-operator (supervision and simple configuration)
The prometheus-operator is installed by using the label
monitoring_enabled=True. Also, the label grafana_admin_passwd can be
used to set the admin password for access to the grafana dashboard
This patch allows for transferral of prometheus monitoring maintenance
work to be done by the kubernetes/helm team.
Task: 28544
Story: 2004623
depends_on: I99d3a78085ba10030200f12bbfe58a72964e2326
Change-Id: I80d590785bf30f9d634debeaf51c0d4cce0aeb93
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
- Never allocate floating IP for etcd service.
- Introduce a new label `master_lb_floating_ip_enabled` which controls
if Magnum allocates floating IP for the master load balancer. This
label only takes effect when the `master_lb_enabled` is set. The
default value is the same with `floating_ip_enabled`.
- The `floating_ip_enabled` property now only controls if Magnum
should allocate the floating IPs for the master and worker nodes.
Change-Id: I0a232406deaf112b0cb9e445735d7b49206c676d
Story: #2005153
Task: #29868
Deploying Node Problem Detector to all nodes to detect problems which
can be leverage by auto healing. This is the first step of enabling
the auto healing feature.
Task: 29886
Story: 2004782
Change-Id: I1b6075025c5f369821b4136783e68b16535dc6ef
Similar to calico, deploy flannel as a DS.
Flannel can use the kubernetes API to store
data, so it doesn't need to contact the etcd
server directly anymore.
This patch drops to relatively large files for
flannel's config, flannel-config-service.sh and
write-flannel-config.sh. All required config is
in the manifests.
Additional options to the controller manager:
--allocate-node-cidrs=true and --cluster-cidr.
Change-Id: I4f1129e155e2602299394b5866165260f4ea0df8
story: 2002751
task: 24870
Defines more strict security group rules for kubernetes worker nodes. The
ports that are open by default: default port range(30000-32767) for
external service ports; kubelet healthcheck port; Calico BGP network ports;
flannel overlay network ports. The cluster admin should manually config the
security group on the nodes where Traefik is allowed.
Story: #2005082
Task: #29661
Change-Id: Idbc67cb95133d3a4029105e6d4dc92519c816288
Now Magnums onlys has one server group for all master and worker nodes
per cluster, which is not very flexible for small cloud scale. For a
3+ master clusters, it's easily meeting the capacity when using hard
anti-affinity policy. This patch is proposing one server group for each
master and worker nodes group to have better flexibility.
story: 2004195
Change-Id: If11ba863a2aa538efe1e3e850084bdd33afd27d2
* Add Folder specific for helm managed resources
* Add first use case of helm install script
* Install metrics-server with helm (parallel to heapster to allow back compatibility)
* Added extra ARGS to kube-apiserver to enable communication with metrics-server
Known Issues:
* Tiller pod sometimes is presented as not active due to (possibly) Heartbeat/Healthz
story: 2004816
task: 28980
depends_on: I99d3a78085ba10030200f12bbfe58a72964e2326
Change-Id: I1b2432bc09ccde02e43124ed010120b99d853d65
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
Add enable_tiller label to install tiller in k8s_fedora_atomic
clusters. Defaults to false.
Add tiller_tag label to select the version of tiller. If the
tag is not set the tag that matches the helm client version in
the heat-agent will be picked. The tiller image can be stored
in a private registry and the cluster can pull it using the
container_infra_prefix label.
Install tiller securely using helper container.
TODO:
*add instructions on how RBAC is designed
https://docs.helm.sh/using_helm/#example-deploy-tiller-in-a-namespace-restricted-to-deploying-resources-in-another-namespace
* add docs on how to install addon in the cluster using this tiller
* how users can get the creds to talk to tiller
NOTE:
The main goal of this tiller is internal usage!
Users can still deploy other tillers in other namespaces.
story: 2003902
task: 26780
Change-Id: I99d3a78085ba10030200f12bbfe58a72964e2326
Signed-off-by: dioguerra <dy090.guerra@gmail.com>
- Add "octavia" as one of the "ingress_controller" options.
- Add label "octavia_ingress_controller_tag".
- Use external network ID in the heat templates.
Story: 2004838
Change-Id: I7d889a054cd5feb2eeef523b20607a6c7630d777
And also update the default value of keystone_auth_enabled from False
to True in favor of prompting the integration of OpenStack and K8s.
Change-Id: I0fcea762d467e1afeecb175f65b9b13ad9ee1f71
There are 2 changes included in this patch:
1. Using cluster ip instead of fixed ip for grafana service to
make sure the address is reachable.
2. Move node exporter to prometheus-monitoring namespace and
make it as a DaemonSet to collect metrics from master node.
Task: 28468
Story: 2004590
Change-Id: I9090c6dc4b38e1a1466c4c3a6a827d95c089fb41
Now cloud-provider-openstack of Kubernetes has a webhook to support
Keystone authorization and authentication. With this feature, user
can use a new label 'keystone-auth-enabled' to enable the keystone
authN and authZ.
DocImpact
Task: 21637
Story: 1755770
Change-Id: I3d21ad8f55c0d7308a302f62db9e9af147a604f8
* Use the external cloud-provider [0]
* Label master nodes
* Make the script the deploys the cloud-provider and clusterroles
for the apiserver a SoftwareDeployment
* Rename kube_openstack_config to cloud-config,
for cinder to workm the kubelet expects the cloud config name only
like this. Keep a copy of kube_openstack_config for backwards
compatibility.
Change-Id: Ife5558f1db4e581b64cc4a8ffead151f7b405702
Task: 22361
Story: 2002652
Co-Authored-By: Spyros Trigazis <spyridon.trigazis@cern.ch>
- Start workers as soon as the master VM is created, rather than
waiting all the services ready.
- Move all the SoftwareDeployment outside of kubemaster stack.
- Tweak the scripts in SoftwareDeployment so that they can be combined
into a single script.
Story: 2004573
Task: 28347
Change-Id: Ie48861253615c8f60b34a2c1e9ad6b91d3ae685e
Co-Authored-By: Lingxian Kong <anlin.kong@gmail.com>
A user may not rely on nova-keypairs to access their cluster
such as a preconfigured SSSD.
story: 2004402
task: 28035
Change-Id: I77fbdc174d3dddfd312fb8dac20516314d4c182e
To upgrade cluster we need to be able to set image tags
so this change adds to labels for corresponding containers
Task: 23314
Story: 2003171
Change-Id: I4cd0270a69fb889c59bdb28966821adb11fd0292
This is a part of fixes for k8s v1.11.1 recently we're doing. When
testing the k8s v1.11.1, we just found some small but annoying issues:
1. cgroup-driver with systemd not working well with Fedora Atomic, so
we're going to use cgroupfs as the default cgroup-driver.
2. The $ char need to be escaped wc-notify-master.sh
Task: 23223
Story: 2003103
Change-Id: I995f5b82abadfdb7f78f7c098ac7a7f1e5c34fd3
Add 'cloud_provider_enabled' label for the k8s_fedora_atomic
driver. Defaults to true. For specific kubernetes versions if
'cinder' is selected as a 'volume_driver', it is implied that
the cloud provider will be enabled since they are combined.
The motivation for this change is that in environments with
high load to the OpenStack APIs, users might want to disable
the cloud provider.
story: 1775358
task: 1775358
Change-Id: I2920f699654af1f4ba45644ab60a04a3f70918fe
Kubernetes should initialize its Global configuration for the OpenStack
provider with the region specified in the Heat stack.
This will allow user to create Magnum Kubernetes clusters in
multiregional OpenStack installation with different public endpoint for
services.
Task: 22576
Story: 2002728
Change-Id: I66820369b889e16445cad7a48cd0f458aae1c41f
The service account private key needs to be hidden for safety reasons.
This patch fixes it.
Task: 22886
Story: 1766546
Change-Id: I16929c6de2c442865b2978ae6c18c22b8146f645
Multi master deployments for k8s driver use different service account
keys for each api/controller manager server which leads to 401 errors
for service accounts. This patch will create a signed cert and private
key for k8s service account keys explicitly, dedicatedly for the k8s
cluster to avoid the inconsistent keys issue.
Task: 21653
Story: 1766546
Change-Id: I61547405f866d3c5a84da63de66724b55af1066a