magnum

Author	SHA1	Message	Date
Zuul	f1cf3d0b38	Merge "Support auto_healing_controller"	2019-08-06 08:40:25 +00:00
Zuul	871c0bccdb	Merge "Set train-dev as the default tag for heat-container-agent"	2019-08-01 22:22:53 +00:00
Zuul	60e62940c3	Merge "Bump the openstackdocstheme extension to 1.20"	2019-08-01 19:46:05 +00:00
Zuul	fdb971459e	Merge "Add information about the cluster in magnum event notifications"	2019-08-01 10:26:30 +00:00
Zuul	fe339554ae	Merge "Allow setting network, subnet and FIP when creating cluster"	2019-08-01 10:25:37 +00:00
pengyuesheng	749a792eb4	Bump the openstackdocstheme extension to 1.20 Some options are now automatically configured by the version 1.20: - project - html_last_updated_fmt - latex_engine - latex_elements - version - release. Change-Id: I1e6e570d4db575d611212198d11ee4b84884ab23	2019-08-01 09:41:35 +08:00
Feilong Wang	32989b4f7b	Allow setting network, subnet and FIP when creating cluster When using a public cluster template, user still need the capability to reuse their existing network/subnet, and they also need to be able to turn of/off the floatingip to overwrite the setting in the public template. This patch supports that by adding those three items as parameters when creating cluster. Story: 2006208 Task: 35797 Change-Id: I11579ff6b83d133c71c2cbf49ee4b20996dfb918	2019-07-31 20:41:20 +12:00
Emanuel Andrecut	e5eade03dc	Add information about the cluster in magnum event notifications Magnum is sending notifications like cluster create but has no details regarding the cluster, like cluster UUID. Notifications from other OpenStack projects contain full detailed information (e.g. instance UUID in Nova instance create notification). Detailed notifications are important for other OpenStack projects like Searchlight or third party projects that cache information regarding OpenStack objects or have custom actions running on notification. Caching systems can efficiently update one single object (e.g. cluster), while without notifications they need to periodically retrieve object list, which is inefficient. Change-Id: I820fbe0659222ba31baf43ca09d2bbb0030ed61f Story: #2006297 Task: 36009	2019-07-29 11:23:42 +03:00
Feilong Wang	c6bf1da085	Set train-dev as the default tag for heat-container-agent Based on the policy of heat-container-agent tag, now it's updated as train-dev and as long as we release Train, it will be updated with train-stable. Change-Id: Iec43df292dbd6a7e7ee33a0d4b8670b653a7ebbd	2019-07-26 10:01:27 +12:00
Feilong Wang	92d516903a	Return ClusterID for resize and upgrade Magnum needs to return ClusterID for resize and upgrade to be consistent with other actions of cluster. Task: 35988 Story: 2002210 Change-Id: Ib15e0cbecd1cbfa57a3008a3f3917d37be7f8f0c	2019-07-26 09:49:30 +12:00
Lingxian Kong	52155f0e76	Support auto_healing_controller This patch allows the user to choose the auto-healing service by introducing a new label 'auto_healing_controller', currently, 'draino' and 'magnum-auto-healer'[1] are supported. 'draino' is the default value for backward compatibility. [1]: https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/using-magnum-auto-healer.md Change-Id: I7ff14837a8d7d360b72c8f40733e84c88c4269d4	2019-07-24 17:52:33 +12:00
Zuul	2ed6fa35d0	Merge "[k8s] Update prometheus monitoring helm based configuration"	2019-06-30 21:58:40 +00:00
Diogo Guerra	41b83cef43	[k8s] Update prometheus monitoring helm based configuration * prometheus-operator chart version upgraded from 0.1.31. to 5.12.3 * Fix an issue where when using Feature Gate Priority the scheduler would evict the prometheus monitoring node-exporter pods * Fix an issue where intensive CPU utilization would make the metrics fail intermitently or completly fail * Prometheus resources are now calculated based on the MAX_NODE_COUNT requested * Change the sampling rate from the standard 30s to 1 minute (Rollback) * Add the missing tiller CONTAINER_INFRA_PREFIX variable to the ConfigMap * Add label prometheus_operator_chart_tag to enable the user to specify the stable/prometheus-operator chart to use * Fix breaking changes on CoreDNS metrics introduced by `8fb27da2fc` * Fix Graphana dashboard not showing data. Change-Id: If42873cd6668c07e4e911e4eef5e4ae2232be66f Task: 30777 Task: 30779 Story: 2005588 Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>	2019-06-25 10:07:55 +00:00
Zuul	b92a81ddeb	Merge "Fix coe_version for k8s driver"	2019-06-14 12:30:15 +00:00
Zuul	51512c3a70	Merge "[k8s][fedora atomic] Using node instead of minion"	2019-06-11 20:23:48 +00:00
Feilong Wang	8f6612b2e9	[k8s][fedora atomic] Using node instead of minion Minion is not a good name for k8s worker node anymore, now it has been replace with 'node' to align with the k8s terminologies. So the server name of a worker will be something like `k8s-1-lnveovyzpreg-node-0` instead of `k8s-1-lnveovyzpreg-minion-0`. Task: 31008 Story: 2005689 Change-Id: Ie9a68b18658e94b6ebe76ebeae8becc23714380d	2019-06-11 18:20:14 +00:00
Feilong Wang	d8df9d0c36	[fedora-atomic][k8s] Support default Keystone auth policy file With the new config option `keystone_auth_default_policy`, cloud admin can set a default keystone auth policy for k8s cluster when the keystone auth is enabled. As a result, user can use their current keystone user to access k8s cluster as long as they're assigned correct roles, and they will get the pre-defined permissions set by the cloud provider. The default policy now is based on the v2 format recently introduced in k8s-keystone-auth which is getting more useful now. For example, in v1 it doesn't support a policy for user to access resources from all namespaces but kube-system, but v2 can do that. NOTE: Now we're using openstackmagnum dockerhub repo until CPO team fixing their image release issue. Task: 30069 Story: 1755770 Change-Id: I2425e957bd99edc92482b6f11ca0b1f91fe59ff6	2019-06-11 11:57:15 +12:00
Feilong Wang	dc100551e4	Fix coe_version for k8s driver Now the coe_version is out of sync with the k8s version deployed for the cluster. This patch will make sure the kube_version is consistent with the kube_tag when creating the cluster and upgrading the cluster. Task: 33608 Story: 2002210 Change-Id: I5812dac340099ecd8923c1e4a60ce0e6611f7ca4	2019-06-10 14:01:04 +12:00
Feilong Wang	05c27f2d73	[k8s][fedora atomic] Rolling upgrade support Rolling ugprade is an important feature for a managed k8s service, at this stage, two user cases will be covered: 1. Upgrade base operating system 2. Upgrade k8s version Known limitation: When doing operating system upgrade, there is no chance to call kubectl drain to evict pods on that node. Task: 30185 Story: 2002210 Change-Id: Ibbed59bc135969174a20e5243ff8464908801a23	2019-06-07 14:48:08 +12:00
Spyros Trigazis (strigazi)	9b1bd5da54	Add cluster upgrade to the API To enable the rolling upgrade ability of Kubernetes Cluster, this patch is proposing a new API /upgrade to support upgrade the base operating system of nodes and the version of Kubernetes, even add-ons running on the k8s cluster: POST <ClusterID>/actions/upgrade And the post body will be: { "cluster_template": 'dd9cc5ed-3a2b-11e9-9233-fa163e46bcc2', "max_batch_size": 1, "nodegroup": "production_group" } Co-Authored-By: Feilong Wang <flwang@catalyst.net.nz> Task: 30168 Story: 2002210 Change-Id: Ia168877778aa0d473383eb06b1c8a16dc06b0576	2019-06-07 12:01:10 +12:00
Lingxian Kong	49e5f17cb5	[k8s_fedora_atomic] Make calico devices unmanaged in NetworkManager config for master node In https://review.opendev.org/#/c/548139/, we did the same change for worker node, because kubelet is also installed on master nodes, we need the same configuration, otherwise, the pods on master nodes won't work properly(lost connection or timout frequently). Story: #2005805 Task: #33544 Change-Id: I14c4dcdd1d73e2d94325974b4e55c1e37a20d9ea	2019-05-31 14:56:02 +12:00
Spyros Trigazis	8fb27da2fc	Update coredns from upstream manifest and to 1.3.1 `5fe683c057/kubernetes/coredns.yaml.sed` story: 2003993 task: 30493 Change-Id: I0b0b4f98c20748c37c2d2f498ced222a53b52214 Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>	2019-04-18 12:38:58 +02:00
Diogo Guerra	b3ceb252ef	[k8s] Set traefik to stable version v1.7.10 The current magnum traefik deployment will always pull latest traefik container image. With the new launch of traefik v2 (https://blog.containo.us/back-to-traefik-2-0-2f9aa17be305) this will have impact on how the ingress is described in k8s. This patch: * Sets the traefik version to default tag v1.7.9, stable release prior to v2. * Adds a new label <traefik_ingress_controller_tag> to enable user to specify other than default traefik release. Task: 30143 Task: 30146 Story: 2005286 Change-Id: I031a594f7b6014d88df055664afcf51b1cd2cd94 Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>	2019-04-17 14:16:14 +02:00
Zuul	29f6eab346	Merge "[fedora_atomic] Support auto healing for k8s"	2019-04-17 08:36:41 +00:00
Feilong Wang	75fab6ff37	[fedora_atomic] Support auto healing for k8s Using Node Problem Detector, Draino and AutoScaler to support auto healing for K8s cluster, user can use a new label "auto_healing_enabled' to turn on/off it. Meanwhile, a new label "auto_scaling_enabled" is also introduced to enable the capability to let the k8s cluster auto scale based its workload. Task: 28923 Story: 2004782 Change-Id: I25af2a72a7a960205929374d2300bd83d4d20960	2019-04-17 14:47:39 +12:00
Zuul	9e498c4769	Merge "Support multi DNS server"	2019-04-15 18:13:59 +00:00
Ricardo Rocha	375fbccf58	[k8s] Add nginx based ingress controller Add an nginx based Ingress controller for Kubernetes. The use case is to provide better support use cases which require either L4 access or SSL passthrough, which lack proper support in Traefik. Selection is done via the same label 'ingress_controller' with value 'nginx'. Deployment relies on the upstream nginx-ingress helm chart. Change-Id: I1db2074fce9d43c03f479a6aaeb4f238d7101555 Story: 2005327 Task: 30255	2019-04-10 09:16:59 +02:00
huang.xiangdong	3cb6226ff0	Support multi DNS server Using comma delimited ipv4 address list to specify multi dns server "8.8.8.8,114.114.114.114". Task: 29465 Story: 2004994 Change-Id: I031247b0cc2ae417f18b2a5b9b3832e78ed9dafd	2019-04-08 23:08:45 +00:00
OpenStack Release Bot	5e0672a477	Update master for stable/stein Add file to the reno documentation build to show release notes for stable/stein. Use pbr instruction to increment the minor version number automatically so that master versions are higher than the versions on stable/stein. Change-Id: Ib327c9320ec306098769040df8188e8968913ef4 Sem-Ver: feature	2019-03-21 21:38:41 +00:00
Diogo Guerra	a46d2ffc91	[k8s] Install prometheus monitoring with helm The Kubernetes Helm repository includes in its stable distribution a prometheus-operator Chart. This stable/prometheus-operator chart can be used to install all the dependencies and some default configurations to use prometheus. The installed extra charts are: * stable/prometheus-node-exporter (data scraping) * stable/prometheus (prometheus and alertmanager server) * stable/grafana (visualization dashboard) * stable/prometheus-operator (supervision and simple configuration) The prometheus-operator is installed by using the label monitoring_enabled=True. Also, the label grafana_admin_passwd can be used to set the admin password for access to the grafana dashboard This patch allows for transferral of prometheus monitoring maintenance work to be done by the kubernetes/helm team. Task: 28544 Story: 2004623 depends_on: I99d3a78085ba10030200f12bbfe58a72964e2326 Change-Id: I80d590785bf30f9d634debeaf51c0d4cce0aeb93 Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>	2019-03-21 13:25:04 +01:00
Zuul	d1957c71dc	Merge "Improve floating IP allocation"	2019-03-20 18:12:43 +00:00
Lingxian Kong	c47fde0cbe	Improve floating IP allocation - Never allocate floating IP for etcd service. - Introduce a new label `master_lb_floating_ip_enabled` which controls if Magnum allocates floating IP for the master load balancer. This label only takes effect when the `master_lb_enabled` is set. The default value is the same with `floating_ip_enabled`. - The `floating_ip_enabled` property now only controls if Magnum should allocate the floating IPs for the master and worker nodes. Change-Id: I0a232406deaf112b0cb9e445735d7b49206c676d Story: #2005153 Task: #29868	2019-03-20 18:44:45 +13:00
Zuul	0cd35dbcca	Merge "Support <ClusterID>/actions/resize API"	2019-03-19 22:16:15 +00:00
Feilong Wang	15ecdb8033	Support <ClusterID>/actions/resize API Now an OpenStack driver for Kubernetes Cluster Autoscaler is being proposed to support autoscaling when running k8s cluster on top of OpenStack. However, currently there is no way in Magnum to let the external consumer to control which node will be removed. The alternative option is calling Heat API directly but obviously it is not the best solution and it's confusing k8s community. So with this patch, we're going to add a new API: POST <ClusterID>/actions/resize And the post body will be: { "node_count": 3, "nodes_to_remove": ["dd9cc5ed-3a2b-11e9-9233-fa163e46bcc2"], "nodegroup": "production_group" } The API will be working in a declarative way. For example, there are 3 nodes in the cluser now, user can propose an API request like above. Magnum will call Heat to remove the node dd9cc5ed-3a2b-11e9-9233-fa163e46bcc2 firstly, then bring the node count back to 3 again. Task: 29563 Story: 2005052 Change-Id: I7e36ce82c3f442976cc498153950b19c56a1759f	2019-03-19 20:13:17 +00:00
Zuul	f0175f6aac	Merge "[k8s] Make flannel self-hosted"	2019-03-07 21:37:40 +00:00
Spyros Trigazis	2ab874a5be	[k8s] Make flannel self-hosted Similar to calico, deploy flannel as a DS. Flannel can use the kubernetes API to store data, so it doesn't need to contact the etcd server directly anymore. This patch drops to relatively large files for flannel's config, flannel-config-service.sh and write-flannel-config.sh. All required config is in the manifests. Additional options to the controller manager: --allocate-node-cidrs=true and --cluster-cidr. Change-Id: I4f1129e155e2602299394b5866165260f4ea0df8 story: 2002751 task: 24870	2019-03-05 18:33:45 +01:00
Guang Yee	a47f5a3994	make sure to set node_affinity_policy for Mesos template definition Fixes the problem with Mesos cluster creation where the nodes_affinity_policy was not properly conveyed as it is required in order to create the corresponding server group in Nova. Change-Id: Ie8d73247ba95f20e24d6cae27963d18b35f8715a story: 2005116	2019-03-01 15:49:06 -08:00
Zuul	e256f87d1a	Merge "[k8s-fedora-atomic] Use ClusterIP for prometheus service"	2019-03-01 02:36:49 +00:00
Zuul	d76ab4da80	Merge "[k8s-fedora-atomic] Security group definition for worker nodes"	2019-02-27 23:59:12 +00:00
Lingxian Kong	31c82625d6	[k8s-fedora-atomic] Security group definition for worker nodes Defines more strict security group rules for kubernetes worker nodes. The ports that are open by default: default port range(30000-32767) for external service ports; kubelet healthcheck port; Calico BGP network ports; flannel overlay network ports. The cluster admin should manually config the security group on the nodes where Traefik is allowed. Story: #2005082 Task: #29661 Change-Id: Idbc67cb95133d3a4029105e6d4dc92519c816288	2019-02-27 22:15:46 +00:00
Zuul	07e48a1ed5	Merge "Add server group for cluster worker nodes"	2019-02-27 12:32:47 +00:00
Zuul	731499c460	Merge "Return instance ID of worker node"	2019-02-27 11:57:34 +00:00
Lingxian Kong	2bbfd52abc	[k8s-fedora-atomic] Use ClusterIP for prometheus service The NodePort type service, by design, bypasses almost all network security in Kubernetes, so is not recommended to be used in the cloud enviroment. This patch changes the prometheus service type from NodePort to ClusterIP. Story: #2005098 Task: #29712 Change-Id: Ic47a334bcf81afb87a78a5e66db1a988b473a47e	2019-02-28 00:13:28 +13:00
Feilong Wang	20d03919fb	Return instance ID of worker node Return the nova instance UUID of worker nodes in kubeminion templates. We will be able to remove resources from the ResourceGroups based on nova instance uuid. Backstory: In heat a ResourceGroup creates a stack of depth 2. ResourceGroups support removal policies to declare which resources must be removed. This can be done by passing the index of the resource or the stack_id of the nested stack. If a stack update call receives a list of indices (eg [0, 5, 3]) or nested stack uuid (eg [uuidA, uuidB]), it will remove the corresponding nested stacks. In magnum's heat templates, a nested stack logically represents a nova compute instance which is a cluster node. Using composition in heat, we can change the way a resources group references the nested stacks. This proposes to use the nova instance uuid as 'OS::stack_id'. With this change, an external consumer of the stack (the cluster autoscaler or an actual user) can remove resources from the ResourceGroup using the nova instance uuid or resource index. Without this change, a user or system (which typically knows the name, server uuid or ip) would have to find in which nested stack a kubernetes node belongs too. Resulting multiple call to heat. The end result of this patch can be verified like this: nested_stack_id=$(openstack stack resource show <STACK_ID_OR_NAME> kube_minions -c physical_resource_id -f value) openstack stack show "${nested_stack_id}" Task: 29664 Story: 2005054 Change-Id: I6d776f62d640c72b3228460392b92df94fe56fe6	2019-02-27 10:46:41 +01:00
Feilong Wang	4f84c849f6	Add server group for cluster worker nodes Now Magnums onlys has one server group for all master and worker nodes per cluster, which is not very flexible for small cloud scale. For a 3+ master clusters, it's easily meeting the capacity when using hard anti-affinity policy. This patch is proposing one server group for each master and worker nodes group to have better flexibility. story: 2004195 Change-Id: If11ba863a2aa538efe1e3e850084bdd33afd27d2	2019-02-27 09:09:20 +00:00
Spyros Trigazis	e6b3325120	Add reno for flannel reboot fix Change [0] fixed the issue of reseting iptables on node reboot when flannel was configured which made pods lose connectivity. [0] I7f6200a4966fda1cc701749bf1f37ddc492390c5 Change-Id: I07771f2c4711b0b86a53610517abdc3dad270574 Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>	2019-02-22 11:07:59 +01:00
Diogo Guerra	230ad3f2db	[k8s] helm install metrics service * Add Folder specific for helm managed resources * Add first use case of helm install script * Install metrics-server with helm (parallel to heapster to allow back compatibility) * Added extra ARGS to kube-apiserver to enable communication with metrics-server Known Issues: * Tiller pod sometimes is presented as not active due to (possibly) Heartbeat/Healthz story: 2004816 task: 28980 depends_on: I99d3a78085ba10030200f12bbfe58a72964e2326 Change-Id: I1b2432bc09ccde02e43124ed010120b99d853d65 Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>	2019-02-13 17:34:29 +01:00
Zuul	61173ec6fb	Merge "[k8s_fedora] Add heat-agent to worker nodes"	2019-02-13 11:48:03 +00:00
Spyros Trigazis	b2a6a7715a	[k8s_fedora] Add heat-agent to worker nodes Start/Install heat agent in worker nodes. task: 29140 story: 2002210 Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch> Change-Id: If39d0dff3432ba132b8b56eb21b5aae80ba52450	2019-02-13 09:36:33 +00:00
Spyros Trigazis	0b5f4260d9	k8s_fedora: Deploy tiller Add enable_tiller label to install tiller in k8s_fedora_atomic clusters. Defaults to false. Add tiller_tag label to select the version of tiller. If the tag is not set the tag that matches the helm client version in the heat-agent will be picked. The tiller image can be stored in a private registry and the cluster can pull it using the container_infra_prefix label. Install tiller securely using helper container. TODO: add instructions on how RBAC is designed https://docs.helm.sh/using_helm/#example-deploy-tiller-in-a-namespace-restricted-to-deploying-resources-in-another-namespace add docs on how to install addon in the cluster using this tiller * how users can get the creds to talk to tiller NOTE: The main goal of this tiller is internal usage! Users can still deploy other tillers in other namespaces. story: 2003902 task: 26780 Change-Id: I99d3a78085ba10030200f12bbfe58a72964e2326 Signed-off-by: dioguerra <dy090.guerra@gmail.com>	2019-02-11 11:18:08 +01:00

1 2 3 4

181 Commits