The derived cloud_provider_enabled is placed inside extra_params so that
openstack-cloud-controller-manager gets applied correctly. This required
change was unfortulately missed in https://review.opendev.org/681922.
Additionally improve the docs related to cloud_provider_enabled label.
Story: 2006531
Task: 36740
Change-Id: I4a89d25b467edd2c4be608c37055706e4e62d78b
Support boot from volume for Kubernetes all nodes (master and worker)
so that user can create a big size root volume, which could be more
flexible than using docker_volume_size. And user can specify the
volume type so that user can leverage high performance storage, e.g.
NVMe etc.
And a new label etcd_volme_type is added as well so that user can
set volume type for etcd volume.
If the boot_volume_type or etcd_volume_type are not passed by labels,
Magnum will try to read them from config option
default_boot_volume_type and default_etcd_volume_type. A random
volume type from Cinder will be used if those options are not set.
Task: 30374
Story: 2005386
Co-Authorized-By: Feilong Wang<flwang@catalyst.net.nz>
Change-Id: I39dd456bfa285bf06dd948d11c86867fc03d5afb
At the moment, cluster deployment fails when cluster_user_trust=False.
This is because the entire SoftwareDeployment exits rather than a single
script fragment. This patch fixes this by scoping the remainder of the
script conditional on whether TRUST_ID is defined.
Finally, default `cloud_provider_enabled` to false when
`cluster_user_trust` is false. Raise an error when
`cloud_provider_enabled` is overridden to true when `cluster_user_trust`
is false. This ensures that the minion kubelet is correctly configured.
Change-Id: Ibd9270c87bfa5d2f490e2e226e33ca56696d9e81
Story: 2006531
Task: 36587
The size of the etcd volume should be taken from the cluster and not
the cluster template.
story: 2005143
Change-Id: I4cdbb436558fba90adec717e228e2970be509b87
When using a public cluster template, user still need the capability
to reuse their existing network/subnet, and they also need to be
able to turn of/off the floatingip to overwrite the setting in the
public template. This patch supports that by adding those three
items as parameters when creating cluster.
Story: 2006208
Task: 35797
Change-Id: I11579ff6b83d133c71c2cbf49ee4b20996dfb918
* prometheus-operator chart version upgraded from 0.1.31. to 5.12.3
* Fix an issue where when using Feature Gate Priority the scheduler
would evict the prometheus monitoring node-exporter pods
* Fix an issue where intensive CPU utilization would make the
metrics fail intermitently or completly fail
* Prometheus resources are now calculated based on the MAX_NODE_COUNT
requested
* Change the sampling rate from the standard 30s to 1 minute (Rollback)
* Add the missing tiller CONTAINER_INFRA_PREFIX variable to the ConfigMap
* Add label prometheus_operator_chart_tag to enable the user to
specify the stable/prometheus-operator chart to use
* Fix breaking changes on CoreDNS metrics introduced by
8fb27da2fc
* Fix Graphana dashboard not showing data.
Change-Id: If42873cd6668c07e4e911e4eef5e4ae2232be66f
Task: 30777
Task: 30779
Story: 2005588
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
With the new config option `keystone_auth_default_policy`, cloud admin
can set a default keystone auth policy for k8s cluster when the
keystone auth is enabled. As a result, user can use their current
keystone user to access k8s cluster as long as they're assigned
correct roles, and they will get the pre-defined permissions
set by the cloud provider.
The default policy now is based on the v2 format recently introduced
in k8s-keystone-auth which is getting more useful now. For example,
in v1 it doesn't support a policy for user to access resources from
all namespaces but kube-system, but v2 can do that.
NOTE: Now we're using openstackmagnum dockerhub repo until CPO
team fixing their image release issue.
Task: 30069
Story: 1755770
Change-Id: I2425e957bd99edc92482b6f11ca0b1f91fe59ff6
Rolling ugprade is an important feature for a managed k8s service,
at this stage, two user cases will be covered:
1. Upgrade base operating system
2. Upgrade k8s version
Known limitation: When doing operating system upgrade, there is no
chance to call kubectl drain to evict pods on that node.
Task: 30185
Story: 2002210
Change-Id: Ibbed59bc135969174a20e5243ff8464908801a23
The current magnum traefik deployment will always pull latest traefik
container image. With the new launch of traefik v2
(https://blog.containo.us/back-to-traefik-2-0-2f9aa17be305) this will
have impact on how the ingress is described in k8s.
This patch:
* Sets the traefik version to default tag v1.7.9, stable release
prior to v2.
* Adds a new label <traefik_ingress_controller_tag> to enable user
to specify other than default traefik release.
Task: 30143
Task: 30146
Story: 2005286
Change-Id: I031a594f7b6014d88df055664afcf51b1cd2cd94
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
Using Node Problem Detector, Draino and AutoScaler to support
auto healing for K8s cluster, user can use a new label
"auto_healing_enabled' to turn on/off it.
Meanwhile, a new label "auto_scaling_enabled" is also introduced
to enable the capability to let the k8s cluster auto scale based
its workload.
Task: 28923
Story: 2004782
Change-Id: I25af2a72a7a960205929374d2300bd83d4d20960
Add an nginx based Ingress controller for Kubernetes.
The use case is to provide better support use cases which require either
L4 access or SSL passthrough, which lack proper support in Traefik.
Selection is done via the same label 'ingress_controller' with value
'nginx'. Deployment relies on the upstream nginx-ingress helm chart.
Change-Id: I1db2074fce9d43c03f479a6aaeb4f238d7101555
Story: 2005327
Task: 30255
The existing drivers are adapted to get node_count and master_count
information from the cluster's nodegroups. At the same time the
output mappings were updated to reflect the changes in the stack to
the nodegroups.
story: 2005266
Change-Id: I725413e77f5a7bdb48131e8a10e5dc884b5e066a
The Kubernetes Helm repository includes in its stable distribution
a prometheus-operator Chart.
This stable/prometheus-operator chart can be used to install all the
dependencies and some default configurations to use prometheus.
The installed extra charts are:
* stable/prometheus-node-exporter (data scraping)
* stable/prometheus (prometheus and alertmanager server)
* stable/grafana (visualization dashboard)
* stable/prometheus-operator (supervision and simple configuration)
The prometheus-operator is installed by using the label
monitoring_enabled=True. Also, the label grafana_admin_passwd can be
used to set the admin password for access to the grafana dashboard
This patch allows for transferral of prometheus monitoring maintenance
work to be done by the kubernetes/helm team.
Task: 28544
Story: 2004623
depends_on: I99d3a78085ba10030200f12bbfe58a72964e2326
Change-Id: I80d590785bf30f9d634debeaf51c0d4cce0aeb93
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
- Never allocate floating IP for etcd service.
- Introduce a new label `master_lb_floating_ip_enabled` which controls
if Magnum allocates floating IP for the master load balancer. This
label only takes effect when the `master_lb_enabled` is set. The
default value is the same with `floating_ip_enabled`.
- The `floating_ip_enabled` property now only controls if Magnum
should allocate the floating IPs for the master and worker nodes.
Change-Id: I0a232406deaf112b0cb9e445735d7b49206c676d
Story: #2005153
Task: #29868
Deploying Node Problem Detector to all nodes to detect problems which
can be leverage by auto healing. This is the first step of enabling
the auto healing feature.
Task: 29886
Story: 2004782
Change-Id: I1b6075025c5f369821b4136783e68b16535dc6ef
Similar to calico, deploy flannel as a DS.
Flannel can use the kubernetes API to store
data, so it doesn't need to contact the etcd
server directly anymore.
This patch drops to relatively large files for
flannel's config, flannel-config-service.sh and
write-flannel-config.sh. All required config is
in the manifests.
Additional options to the controller manager:
--allocate-node-cidrs=true and --cluster-cidr.
Change-Id: I4f1129e155e2602299394b5866165260f4ea0df8
story: 2002751
task: 24870
Add enable_tiller label to install tiller in k8s_fedora_atomic
clusters. Defaults to false.
Add tiller_tag label to select the version of tiller. If the
tag is not set the tag that matches the helm client version in
the heat-agent will be picked. The tiller image can be stored
in a private registry and the cluster can pull it using the
container_infra_prefix label.
Install tiller securely using helper container.
TODO:
*add instructions on how RBAC is designed
https://docs.helm.sh/using_helm/#example-deploy-tiller-in-a-namespace-restricted-to-deploying-resources-in-another-namespace
* add docs on how to install addon in the cluster using this tiller
* how users can get the creds to talk to tiller
NOTE:
The main goal of this tiller is internal usage!
Users can still deploy other tillers in other namespaces.
story: 2003902
task: 26780
Change-Id: I99d3a78085ba10030200f12bbfe58a72964e2326
Signed-off-by: dioguerra <dy090.guerra@gmail.com>
Now cloud-provider-openstack of Kubernetes has a webhook to support
Keystone authorization and authentication. With this feature, user
can use a new label 'keystone-auth-enabled' to enable the keystone
authN and authZ.
DocImpact
Task: 21637
Story: 1755770
Change-Id: I3d21ad8f55c0d7308a302f62db9e9af147a604f8
* Use the external cloud-provider [0]
* Label master nodes
* Make the script the deploys the cloud-provider and clusterroles
for the apiserver a SoftwareDeployment
* Rename kube_openstack_config to cloud-config,
for cinder to workm the kubelet expects the cloud config name only
like this. Keep a copy of kube_openstack_config for backwards
compatibility.
Change-Id: Ife5558f1db4e581b64cc4a8ffead151f7b405702
Task: 22361
Story: 2002652
Co-Authored-By: Spyros Trigazis <spyridon.trigazis@cern.ch>
To upgrade cluster we need to be able to set image tags
so this change adds to labels for corresponding containers
Task: 23314
Story: 2003171
Change-Id: I4cd0270a69fb889c59bdb28966821adb11fd0292
Due to a change in Go 1.10.3[1], which k8s v1.11.1 is based on, now
magnum is failing to create a working k8s cluster with version 1.11.1.
This patch is changing removing the extention usage for server auth
for ca cert and using simple public/private keys for k8s service account
keys.
[1] https://go.googlesource.com/go/+/09fa131c99da0ef9f78c9f4f6cd955237ccc01cd
Task: 23210
Story: 2003103
Change-Id: Ieba8f55d55db2afda6888d4bc6c2caa87370d13d
Add 'cloud_provider_enabled' label for the k8s_fedora_atomic
driver. Defaults to true. For specific kubernetes versions if
'cinder' is selected as a 'volume_driver', it is implied that
the cloud provider will be enabled since they are combined.
The motivation for this change is that in environments with
high load to the OpenStack APIs, users might want to disable
the cloud provider.
story: 1775358
task: 1775358
Change-Id: I2920f699654af1f4ba45644ab60a04a3f70918fe
Multi master deployments for k8s driver use different service account
keys for each api/controller manager server which leads to 401 errors
for service accounts. This patch will create a signed cert and private
key for k8s service account keys explicitly, dedicatedly for the k8s
cluster to avoid the inconsistent keys issue.
Task: 21653
Story: 1766546
Change-Id: I61547405f866d3c5a84da63de66724b55af1066a
This patch allows specification of Cgroup driver for Kubelet service.
The necessity of this patch was realised after upgrading Docker to the
new community edition (17.3+) which defaults to `cgroupfs` Cgroup
driver but on the other hand, Fedora Atomic (version 27) comes with
1.13. Cgroup drivers for Docker need to be identical for the two
services, Docker and Kubelet, need to be able to work together.
Story: 2002533
Task: 22079
Change-Id: Ia4b38a63ede59e18c8edb01e93acbb66f1e0b0e4
In Fedora Atomic 27 etcd and flanneld are removed from the base image.
Install them as a system containers.
* update docker-storage configuration
* add etcd and flannel tags as labels
Change-Id: I2103c7c3d50f4b68ddc11abff72bc9e3f22839f3
Closes-Bug: #1735381
Add a new label 'cert_manager_api' to kubernetes clusters controlling the
enable/disable of the kubernetes certificate manager api.
The same cluster cert/key pair is used by this api. The heat agent is used
to install the key in the master node(s), as this is required for kubernetes
to later sign new certificate requests.
The master template init order is changed so the heat agent is launched
previous to enabling the services - the controller manager requires the CA key
to be locally available before being launched.
Change-Id: Ibf85147316e3a194d8a3f92cbb4ae9ce8e16c98f
Partial-Bug: #1734318
Add a new label 'availability_zone' allowing users to specify the AZ
the nodes should be deployed in. Only one AZ can be passed for this
first implementation.
Change-Id: I9e55d7631191fffa6cc6b9bebbeb4faf2497815b
Partially-Implements: blueprint magnum-availability-zones
Currently, there is no guarantee to make sure all nodes of one cluster are
created on different compute hosts. So it would be nice if we can create
a server group and set it with anti-affinity policy to get a better HA
for cluster. This patch is proposing to create a server group for master
and minion nodes with soft-anti-affinity policy by default.
Closes-Bug: #1737802
Change-Id: Icc7a73ef55296a58bf00719ca4d1cdcc304fab86
Fix the setting of container_infra_prefix on the template definition, we were
taking the cluster template and ignoring cluster even if an explicit value
was given.
Change-Id: I658178c31080e4ea74d4e78fc1c76536d5aea73e
Closes-Bug: #1739424
Labels can be overriden on cluster creation, and should take precendence
against the cluster template when passed. When not passed, they are filled
with the ones from the template.
Fix the setting of kube_tag on the template definition, we were taking the
cluster template and ignoring cluster.
Change-Id: I8fff6f15a780c74b1f09e7238472620e8f93b532
Closes-Bug: #1739422
Add a label to prefix all container image use by magnum:
* kubernetes components
* coredns
* node-exporter
* kubernetes-dashboard
Using this label all containers will be pulled from the specified
registry and group in the registry.
TODO:
* grafana
* prometheus
Closes-Bug: #1712810
Change-Id: Iefe02f5ebc97787ee80431e0f16f73ae8444bdc0
1. It will fail to create cluster if there is chinese in tenant name
2. TENANT_NAME is unnecessary after changing to trustee
this patch is for k8s_fedora_atomic and k8s_fedora_ironic
Change-Id: Ie072f183110ae95861fb3694a913a3a4526549fb
Close-Bug: #1711308
Separate the tag from which to pull from the kubernetes version.
With the current state the tag and the version happen to be the
the same. But, it is not decided yet in the fedoraproject how the
images are going to be tag. Finally, operators might want to try
their own container images with custom tags.
Depends-On: Icddb8ed1598f2ba1f782622f86fb6083953c3b3f
Implements: blueprint run-kube-as-container
Change-Id: I4c4bc055d7df5e65aede93464bff51e6d5971504
Add labels as an option during cluster create. If not given,
the default is taken from the cluster template.
Add labels in the Cluster object and use that instead
of the one from ClusterTemplate.
Update both magnum and magnum cli documentation to reflect the above changes.
Partial-Bug: #1697651
Implements: blueprint flatten-attributes
Change-Id: I8990c78433dcbbca5bc4aa121678b02636346802
Allow setting the size of a volume for etcd storage.
Default is 0 which matches the current behavior - no persistency.
Related-Bug: #1697655
Change-Id: I8a30df63684133a902ae209ba6c124da2a567d3f
Add docker_volume_size as an option during cluster create. If not given,
the default is taken from the cluster template.
Add docker_volume_size in the Cluster object and use that instead
of the one from ClusterTemplate.
Update both magnum and magnum cli documentation to reflect the above changes.
Partial-Bug: #1697648
Implements: blueprint flatten-attributes
Change-Id: Ic6d77e6fdf5b068fa5319b238f4fd98b4d499be4
* add docker_volume_type for the cinder volumes which are
used for docker storage.
Related-Bug: #1678153
Change-Id: I55418a667cc8af043c61130aa61138d700fdc4ca
If a fixed_network and fixed_subnet is specified no private network
is created by the templates and the specified network is
used instead for VMs provisioning, like in the Ironic driver.
Currently missing is the code to handle the use case where you
specify a fixed_network but not a fixed_subnet, this will come
in a following patch.
Partially Implements: blueprint decouple-private-network
Change-Id: I2003eb709b22b905063d846eb71570fc5e033618
Refactor driver interface to encapsulate the orchestration
strategy. This first patch only refactors the main driver
operations. A follow-on will handle the state synchronization
and removing the poller from the conductor.
1. Make driver interface abstract
2. Move external cluster operations into driver interface
3. Make Heat-based driver abstract and update based on
driver interface changes
4. Move Heat driver code into its own module
5. Update existing Heat drivers based on interface changes
Change-Id: Icfa72e27dc496862d950ac608885567c911f47f2
Partial-Blueprint: bp-driver-consolodation
In the swarm_atomic and k8s_atomic drivers container images are
stored in a dedicated cinder volume per cluster node. It is
proven that this architecture can be a scalability bottleneck.
Make the use of cinder volumes for container images and opt-in
option. If docker-volume-size is not specified no cinder
volumes will be created. Before, if docker-volume-size wasn't
specified the default value was 25.
To use cinder volumes for container storage the user will
interact with magnum as before, (meaning the valid values are
integers starting from 1).
Closes-Bug: #1638006
Change-Id: I3394c62a43bbf950b7cf0b86a71b1d9b0481d68f
The 2 k8s atomic drivers we currently support are added to the
same driver. This breaks ironic support with the stevedore
work I'm currently doing.
With stevedore, we can choose only one driver based on the
server_type, os and coe. We won't be able to pick a driver and
then choose an implementation bases on server_type.
Partially-Implements: blueprint magnum-baremetal-full-support
Co-Authored-By: Spyros Trigazis <strigazi@gmail.com>
Change-Id: Ic1b8103551f48f85baa2ed9ff32d5b70b1fab84e
This is patch 3 of 3 to change the internal usage of the terms
Bay and BayModel. This patch updates Bay to Cluster in DB and
Object as well as all the usages. No functionality should be
changed by this patch, just naming and db updates.
Change-Id: Ife04b0f944ded03ca932d70e09e6766d09cf5d9f
Implements: blueprint rename-bay-to-cluster
Factor Out common kubernetes template definitions and address mapping
from Fedora Atomic, CoreOS and Suse drivers.
Partially-Implements: bp bay-drivers
Change-Id: Ib172c19acc1303041f7d8d9249df2d9ca1e4ff6f