Since each nodegroup will be one independent stack, we have to add
more fields to the table and object in order to track each stack
contained in the cluster. This adds the stack_id, version, status,
status_reason and version fields to the nodegroup object.
Change-Id: I6d36b2d3bc6476efbef6a9f702ffc73cfa0fab8c
The derived cloud_provider_enabled is placed inside extra_params so that
openstack-cloud-controller-manager gets applied correctly. This required
change was unfortulately missed in https://review.opendev.org/681922.
Additionally improve the docs related to cloud_provider_enabled label.
Story: 2006531
Task: 36740
Change-Id: I4a89d25b467edd2c4be608c37055706e4e62d78b
Support boot from volume for Kubernetes all nodes (master and worker)
so that user can create a big size root volume, which could be more
flexible than using docker_volume_size. And user can specify the
volume type so that user can leverage high performance storage, e.g.
NVMe etc.
And a new label etcd_volme_type is added as well so that user can
set volume type for etcd volume.
If the boot_volume_type or etcd_volume_type are not passed by labels,
Magnum will try to read them from config option
default_boot_volume_type and default_etcd_volume_type. A random
volume type from Cinder will be used if those options are not set.
Task: 30374
Story: 2005386
Co-Authorized-By: Feilong Wang<flwang@catalyst.net.nz>
Change-Id: I39dd456bfa285bf06dd948d11c86867fc03d5afb
There shouldn't be a default value for floating_ip_enabled when creating
cluster. By default, when it's not set, the cluster's floating_ip_enabled
attribute should be set with the value of cluster template. It's fixed
by removing the default value from Magnum API.
Task: 36500
Story: 2006208
Change-Id: I4077783c6a19a413d534f77f287da587353df0af
This is a missing case after we fixed[1]. When user passing in
an existing network when creating cluster, the network name is
missed in the code. This patch fixes it.
[1] https://review.opendev.org/678067
Task: 36430
Story: 2005333
Change-Id: I3a005089c4a755812c40589d8fa1e3ab7bbf062d
Sometimes, the fixed_network value gets rendered as UUID. However OCCM's
internal-network-name requires the network name, it does not support
UUID. This patch introduces a new parameter called fixed_network_name
which converts fixed_network UUID to name if it is UUID-like.
Story: 2005333
Task: 36313
Change-Id: I3453bc0dbea285687d39c9782685cb1f2a3ecd39
Fedora Atomic 27 has end of life for a while, it's time to replace it
with Fedora Atomic 29 now.
Task: 36356
Story: 2006441
Change-Id: Iab131745854b0b908be17bd17c7510cd54dde1f5
When using a public cluster template, user still need the capability
to reuse their existing network/subnet, and they also need to be
able to turn of/off the floatingip to overwrite the setting in the
public template. This patch supports that by adding those three
items as parameters when creating cluster.
Story: 2006208
Task: 35797
Change-Id: I11579ff6b83d133c71c2cbf49ee4b20996dfb918
Magnum is sending notifications like cluster create but has no
details regarding the cluster, like cluster UUID. Notifications
from other OpenStack projects contain full detailed information
(e.g. instance UUID in Nova instance create notification).
Detailed notifications are important for other OpenStack
projects like Searchlight or third party projects that cache
information regarding OpenStack objects or have custom actions
running on notification. Caching systems can efficiently update
one single object (e.g. cluster), while without notifications
they need to periodically retrieve object list, which is
inefficient.
Change-Id: I820fbe0659222ba31baf43ca09d2bbb0030ed61f
Story: #2006297
Task: 36009
This is a regression issue introduced by the rolling upgrade feature,
without setting the master_kube_tag and minion_kube_tag, they will
be set with the default value. This patch fixes it by keeping them
consistent with the kube_tag label.
Change-Id: I8b0ca3f87c9a52d48ecb75e4dd8de18a61a10d6f
* prometheus-operator chart version upgraded from 0.1.31. to 5.12.3
* Fix an issue where when using Feature Gate Priority the scheduler
would evict the prometheus monitoring node-exporter pods
* Fix an issue where intensive CPU utilization would make the
metrics fail intermitently or completly fail
* Prometheus resources are now calculated based on the MAX_NODE_COUNT
requested
* Change the sampling rate from the standard 30s to 1 minute (Rollback)
* Add the missing tiller CONTAINER_INFRA_PREFIX variable to the ConfigMap
* Add label prometheus_operator_chart_tag to enable the user to
specify the stable/prometheus-operator chart to use
* Fix breaking changes on CoreDNS metrics introduced by
8fb27da2fc
* Fix Graphana dashboard not showing data.
Change-Id: If42873cd6668c07e4e911e4eef5e4ae2232be66f
Task: 30777
Task: 30779
Story: 2005588
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
With the new config option `keystone_auth_default_policy`, cloud admin
can set a default keystone auth policy for k8s cluster when the
keystone auth is enabled. As a result, user can use their current
keystone user to access k8s cluster as long as they're assigned
correct roles, and they will get the pre-defined permissions
set by the cloud provider.
The default policy now is based on the v2 format recently introduced
in k8s-keystone-auth which is getting more useful now. For example,
in v1 it doesn't support a policy for user to access resources from
all namespaces but kube-system, but v2 can do that.
NOTE: Now we're using openstackmagnum dockerhub repo until CPO
team fixing their image release issue.
Task: 30069
Story: 1755770
Change-Id: I2425e957bd99edc92482b6f11ca0b1f91fe59ff6
Now the coe_version is out of sync with the k8s version deployed
for the cluster. This patch will make sure the kube_version is
consistent with the kube_tag when creating the cluster and upgrading
the cluster.
Task: 33608
Story: 2002210
Change-Id: I5812dac340099ecd8923c1e4a60ce0e6611f7ca4
Rolling ugprade is an important feature for a managed k8s service,
at this stage, two user cases will be covered:
1. Upgrade base operating system
2. Upgrade k8s version
Known limitation: When doing operating system upgrade, there is no
chance to call kubectl drain to evict pods on that node.
Task: 30185
Story: 2002210
Change-Id: Ibbed59bc135969174a20e5243ff8464908801a23
To enable the rolling upgrade ability of Kubernetes Cluster, this
patch is proposing a new API /upgrade to support upgrade the
base operating system of nodes and the version of Kubernetes, even
add-ons running on the k8s cluster:
POST <ClusterID>/actions/upgrade
And the post body will be:
{
"cluster_template": 'dd9cc5ed-3a2b-11e9-9233-fa163e46bcc2',
"max_batch_size": 1,
"nodegroup": "production_group"
}
Co-Authored-By: Feilong Wang <flwang@catalyst.net.nz>
Task: 30168
Story: 2002210
Change-Id: Ia168877778aa0d473383eb06b1c8a16dc06b0576
This reverts commit e8d0ee1b14.
This commit is reverted for two reasons:
* It is undesirable that the end user can inject proxy config into
the magnum-conductor service via the cluster template.
* The proxy settings for the magnum-conductor service may not be
the same as those which are required in the cluster template for
the end user VM.
Systemd, docker and podman all include native mechanisms for setting
environment variables for proecesses, and this should be used by the
cloud operator / deployment tooling to configure the required proxy
settings for the magnum-conductor service.
In particular this patch makes it impossible for the cloud operator
to specify their own http_proxy via the environment, the user supplied
cluster template setting will always be used.
Change-Id: I33da19ad6764bedcf15f2a08381063e2471f8991
The current magnum traefik deployment will always pull latest traefik
container image. With the new launch of traefik v2
(https://blog.containo.us/back-to-traefik-2-0-2f9aa17be305) this will
have impact on how the ingress is described in k8s.
This patch:
* Sets the traefik version to default tag v1.7.9, stable release
prior to v2.
* Adds a new label <traefik_ingress_controller_tag> to enable user
to specify other than default traefik release.
Task: 30143
Task: 30146
Story: 2005286
Change-Id: I031a594f7b6014d88df055664afcf51b1cd2cd94
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
Using Node Problem Detector, Draino and AutoScaler to support
auto healing for K8s cluster, user can use a new label
"auto_healing_enabled' to turn on/off it.
Meanwhile, a new label "auto_scaling_enabled" is also introduced
to enable the capability to let the k8s cluster auto scale based
its workload.
Task: 28923
Story: 2004782
Change-Id: I25af2a72a7a960205929374d2300bd83d4d20960
Add an nginx based Ingress controller for Kubernetes.
The use case is to provide better support use cases which require either
L4 access or SSL passthrough, which lack proper support in Traefik.
Selection is done via the same label 'ingress_controller' with value
'nginx'. Deployment relies on the upstream nginx-ingress helm chart.
Change-Id: I1db2074fce9d43c03f479a6aaeb4f238d7101555
Story: 2005327
Task: 30255
Using comma delimited ipv4 address list to specify multi dns server
"8.8.8.8,114.114.114.114".
Task: 29465
Story: 2004994
Change-Id: I031247b0cc2ae417f18b2a5b9b3832e78ed9dafd
This commit removes the fields node_addresses, master_addresses,
node_count and master_count from the cluster object since this info
will be stored in the nodegroups. At the same time, provides the way
to adapt existing clusters to the new schema.
story: 2005266
Change-Id: Iaf2cef3cc50b956c9b6d7bae13dbb716ae54eaf7
Now Magnum supports list, get, delete user's cluster/template by admin,
but not allowed for updating. We're seeing this could be a very useful
feature for us, since sometimes we need to help our customer to update
their templates or clusters on behalf.
Task: 30251
Story: 2005323
Change-Id: I3ab1d4583b5eb3d1c377e46fd73347c2477c3e08
The existing drivers are adapted to get node_count and master_count
information from the cluster's nodegroups. At the same time the
output mappings were updated to reflect the changes in the stack to
the nodegroups.
story: 2005266
Change-Id: I725413e77f5a7bdb48131e8a10e5dc884b5e066a
This changes the existing cluster APIs and the cluster conductor to
take into consideration nodegroups:
* create: now creates the default nodegroups for the cluster
* update: updates the default nodegroups of the cluster
* delete: deletes also the nodegroups that belong to the cluster
* cluster_resize: takes into account the nodegroup provided by the API
story: 2005266
Change-Id: I5478c83ca316f8f09625607d5ae9d9f3c02eb65a
This is a mechanically generated change to replace openstack.org
git:// URLs with https:// equivalents.
This is in aid of a planned future move of the git hosting
infrastructure to a self-hosted instance of gitea (https://gitea.io),
which does not support the git wire protocol at this stage.
This update should result in no functional change.
For more information see the thread at
http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003825.html
Change-Id: Ie288c147a3cbdd19abd257bf14972c316db6d67c
The Kubernetes Helm repository includes in its stable distribution
a prometheus-operator Chart.
This stable/prometheus-operator chart can be used to install all the
dependencies and some default configurations to use prometheus.
The installed extra charts are:
* stable/prometheus-node-exporter (data scraping)
* stable/prometheus (prometheus and alertmanager server)
* stable/grafana (visualization dashboard)
* stable/prometheus-operator (supervision and simple configuration)
The prometheus-operator is installed by using the label
monitoring_enabled=True. Also, the label grafana_admin_passwd can be
used to set the admin password for access to the grafana dashboard
This patch allows for transferral of prometheus monitoring maintenance
work to be done by the kubernetes/helm team.
Task: 28544
Story: 2004623
depends_on: I99d3a78085ba10030200f12bbfe58a72964e2326
Change-Id: I80d590785bf30f9d634debeaf51c0d4cce0aeb93
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
- Never allocate floating IP for etcd service.
- Introduce a new label `master_lb_floating_ip_enabled` which controls
if Magnum allocates floating IP for the master load balancer. This
label only takes effect when the `master_lb_enabled` is set. The
default value is the same with `floating_ip_enabled`.
- The `floating_ip_enabled` property now only controls if Magnum
should allocate the floating IPs for the master and worker nodes.
Change-Id: I0a232406deaf112b0cb9e445735d7b49206c676d
Story: #2005153
Task: #29868
Now an OpenStack driver for Kubernetes Cluster Autoscaler is being
proposed to support autoscaling when running k8s cluster on top of
OpenStack. However, currently there is no way in Magnum to let
the external consumer to control which node will be removed. The
alternative option is calling Heat API directly but obviously it
is not the best solution and it's confusing k8s community. So with
this patch, we're going to add a new API:
POST <ClusterID>/actions/resize
And the post body will be:
{
"node_count": 3,
"nodes_to_remove": ["dd9cc5ed-3a2b-11e9-9233-fa163e46bcc2"],
"nodegroup": "production_group"
}
The API will be working in a declarative way. For example, there
are 3 nodes in the cluser now, user can propose an API request
like above. Magnum will call Heat to remove the node
dd9cc5ed-3a2b-11e9-9233-fa163e46bcc2 firstly, then bring the node
count back to 3 again.
Task: 29563
Story: 2005052
Change-Id: I7e36ce82c3f442976cc498153950b19c56a1759f
Deploying Node Problem Detector to all nodes to detect problems which
can be leverage by auto healing. This is the first step of enabling
the auto healing feature.
Task: 29886
Story: 2004782
Change-Id: I1b6075025c5f369821b4136783e68b16535dc6ef
Similar to calico, deploy flannel as a DS.
Flannel can use the kubernetes API to store
data, so it doesn't need to contact the etcd
server directly anymore.
This patch drops to relatively large files for
flannel's config, flannel-config-service.sh and
write-flannel-config.sh. All required config is
in the manifests.
Additional options to the controller manager:
--allocate-node-cidrs=true and --cluster-cidr.
Change-Id: I4f1129e155e2602299394b5866165260f4ea0df8
story: 2002751
task: 24870
Fixes the problem with Mesos cluster creation where the
nodes_affinity_policy was not properly conveyed as it is required
in order to create the corresponding server group in Nova.
Change-Id: Ie8d73247ba95f20e24d6cae27963d18b35f8715a
story: 2005116
All unittests using FakeLoopingCall raise an IOError if an initial
delay is not specified, because the default initial_dealy is -1.
Changing the default initial delay to 0.
story: 2005112
task: 29748
Change-Id: I6cbae0996c2347e25d8be617e4b3fd93f4d9cc95
We are writing to files opened with text mode ('w+'), so binary data
will have to be decoded before writing
Task: 29577
Story: 2005057
Change-Id: I034d0230c3022e701111bdc71f0af43da1852c3c
Calling Kubernetes native API to update the cluster health status
so that it can used for cluster auto healing.
Task: 24593
Story: 2002742
Change-Id: Ia76eeeb2f1734dff38d9660c804d7d2d0f65b9fb
Add a new hidden flag to cluster templates. This allows an operator to
keep a cluster public (accessible to all users) while not showing them
in cluster template listing.
Story: 2004941
Task: 29342
Change-Id: Ia2717ca960041753f6e772bf2d41c7f5a196dae6
Add enable_tiller label to install tiller in k8s_fedora_atomic
clusters. Defaults to false.
Add tiller_tag label to select the version of tiller. If the
tag is not set the tag that matches the helm client version in
the heat-agent will be picked. The tiller image can be stored
in a private registry and the cluster can pull it using the
container_infra_prefix label.
Install tiller securely using helper container.
TODO:
*add instructions on how RBAC is designed
https://docs.helm.sh/using_helm/#example-deploy-tiller-in-a-namespace-restricted-to-deploying-resources-in-another-namespace
* add docs on how to install addon in the cluster using this tiller
* how users can get the creds to talk to tiller
NOTE:
The main goal of this tiller is internal usage!
Users can still deploy other tillers in other namespaces.
story: 2003902
task: 26780
Change-Id: I99d3a78085ba10030200f12bbfe58a72964e2326
Signed-off-by: dioguerra <dy090.guerra@gmail.com>
Allow passing label values on cluster creation for swarm mode. This is
available in all kubernetes drivers as well as swarm, but somehow missed
on swarm mode.
Story: 2004942
Task: 29343
Change-Id: Ie3ac66f45e27cc92993116c3df0b33873dc67e24
- Add "octavia" as one of the "ingress_controller" options.
- Add label "octavia_ingress_controller_tag".
- Use external network ID in the heat templates.
Story: 2004838
Change-Id: I7d889a054cd5feb2eeef523b20607a6c7630d777
To get a better cluster template versioning and relieve the pain
of maintaining public cluster template, the patch is proposing
that the name of cluster template can be changed.
A folllowing patch/spec will be proposed to add a new field
'deprecated' to allow ops to hide old/deprecated templates.
Task: 26889
Story: 2003960
Change-Id: Id1db81d35bc3dccff0fac481be7801de200d52de
This commit adds the functionality of magnum-status CLI for performing
upgrade checks as part of the Stein cycle upgrade-checkers goal.
It only includes a sample check which must be replaced by real checks in
future.
Change-Id: Ia8a74fd8bd5a804e71bb04eb0615fa114a517bc4
Story: 2003657
Task: 26138
When user creates LoadBalancer type service in k8s cluster, a floating
ip may be created and associated with the load balancer VIP. Magnum
now could delete the load balancers automatically in the cluster
pre-delete method, should also remove the floating ip as needed.
This patch depends on the github PR for cloud-provider-openstack:
https://github.com/kubernetes/cloud-provider-openstack/pull/433
Story: 2004836
Change-Id: Ia553aff4e66033346c6bfe120a72992bec79e136
Now cloud-provider-openstack of Kubernetes has a webhook to support
Keystone authorization and authentication. With this feature, user
can use a new label 'keystone-auth-enabled' to enable the keystone
authN and authZ.
DocImpact
Task: 21637
Story: 1755770
Change-Id: I3d21ad8f55c0d7308a302f62db9e9af147a604f8
HTTP(S) proxy can be specified when creating the template.
https://docs.openstack.org/magnum/latest/admin/magnum-proxy.html
However, it is not being utilized when talking to a public etcd discovery
service, which result in failed cluster creation. We need to be able to
use HTTP(S) proxy when services are running behind a firewall.
Change-Id: I13d86b0dc7c232a51149107f0412219388d8c2cd
story: 2004664