magnum

Author	SHA1	Message	Date
Zuul	e63611ad1a	Merge "ng-1: Add nodegroup representation"	2019-03-24 20:48:47 +00:00
Zuul	f1f96e5835	Merge "add python 3.7 unit test job"	2019-03-22 17:24:41 +00:00
Zuul	5c586aed3c	Merge "Fix openstack-cloud-controller-manager restarts"	2019-03-21 21:18:34 +00:00
Theodoros Tsioutsias	0607c7a9d6	ng-1: Add nodegroup representation This adds the object and db schema changes needed for supporting nodegroups. story: 2005266 Change-Id: Ibf10277a52aa94c4b217cf3b364844b04baab1e0	2019-03-21 16:19:56 +00:00
Diogo Guerra	a46d2ffc91	[k8s] Install prometheus monitoring with helm The Kubernetes Helm repository includes in its stable distribution a prometheus-operator Chart. This stable/prometheus-operator chart can be used to install all the dependencies and some default configurations to use prometheus. The installed extra charts are: * stable/prometheus-node-exporter (data scraping) * stable/prometheus (prometheus and alertmanager server) * stable/grafana (visualization dashboard) * stable/prometheus-operator (supervision and simple configuration) The prometheus-operator is installed by using the label monitoring_enabled=True. Also, the label grafana_admin_passwd can be used to set the admin password for access to the grafana dashboard This patch allows for transferral of prometheus monitoring maintenance work to be done by the kubernetes/helm team. Task: 28544 Story: 2004623 depends_on: I99d3a78085ba10030200f12bbfe58a72964e2326 Change-Id: I80d590785bf30f9d634debeaf51c0d4cce0aeb93 Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com> 8.0.0.0rc1	2019-03-21 13:25:04 +01:00
Zuul	d1957c71dc	Merge "Improve floating IP allocation"	2019-03-20 18:12:43 +00:00
Diogo Guerra	21acb8dc9a	Fix openstack-cloud-controller-manager restarts Openstack-cloud-controller-manager restarts several times during cluster creation. This happens because cloud-controller-manager starts running before needed secrets exist in kubernetes. Cloud-controller-manager lists secrets and if the secrets exists it uses it and moves on, but if the secret doesn't exist it starts a watch until it does. As this is not allowed the pod fails. This is triggered by Issue https://github.com/kubernetes/cloud-provider-openstack/issues/545 Story: 2005270 Change-Id: If8f34dc45b3b8a76e3d561ed41b4d0a783ceecb5 Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>	2019-03-20 14:55:23 +01:00
Zuul	342023e870	Merge "Migrate legacy jobs to Ubuntu Bionic"	2019-03-20 08:15:57 +00:00
Lingxian Kong	c47fde0cbe	Improve floating IP allocation - Never allocate floating IP for etcd service. - Introduce a new label `master_lb_floating_ip_enabled` which controls if Magnum allocates floating IP for the master load balancer. This label only takes effect when the `master_lb_enabled` is set. The default value is the same with `floating_ip_enabled`. - The `floating_ip_enabled` property now only controls if Magnum should allocate the floating IPs for the master and worker nodes. Change-Id: I0a232406deaf112b0cb9e445735d7b49206c676d Story: #2005153 Task: #29868	2019-03-20 18:44:45 +13:00
Zuul	0cd35dbcca	Merge "Support <ClusterID>/actions/resize API"	2019-03-19 22:16:15 +00:00
Feilong Wang	15ecdb8033	Support <ClusterID>/actions/resize API Now an OpenStack driver for Kubernetes Cluster Autoscaler is being proposed to support autoscaling when running k8s cluster on top of OpenStack. However, currently there is no way in Magnum to let the external consumer to control which node will be removed. The alternative option is calling Heat API directly but obviously it is not the best solution and it's confusing k8s community. So with this patch, we're going to add a new API: POST <ClusterID>/actions/resize And the post body will be: { "node_count": 3, "nodes_to_remove": ["dd9cc5ed-3a2b-11e9-9233-fa163e46bcc2"], "nodegroup": "production_group" } The API will be working in a declarative way. For example, there are 3 nodes in the cluser now, user can propose an API request like above. Magnum will call Heat to remove the node dd9cc5ed-3a2b-11e9-9233-fa163e46bcc2 firstly, then bring the node count back to 3 again. Task: 29563 Story: 2005052 Change-Id: I7e36ce82c3f442976cc498153950b19c56a1759f	2019-03-19 20:13:17 +00:00
Spyros Trigazis	13e8c11f78	k8s_fedora: Add ca_key before all deployments The script [1] that writes the ca.key depends in the apiserver to be running and the script to start the apiserver [0] needs the ca.key to exist. Write the ca_key before all other scripts that depend on the apiserver. story: 2005254 task: 30051 [0] https://github.com/openstack/magnum/blob/master/magnum/drivers/common/templates/kubernetes/fragments/enable-services-master.sh [1] https://github.com/openstack/magnum/blob/master/magnum/drivers/k8s_fedora_atomic_v1/templates/kubecluster.yaml#L843 Change-Id: If532ccc4673225eb1b7e7cab77a30950ee5ee695 Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>	2019-03-18 10:48:06 +01:00
Zuul	0da8288ada	Merge "ci: Disable functional tests"	2019-03-13 11:26:21 +00:00
ghanshyam	b5a6ee1dc1	Migrate legacy jobs to Ubuntu Bionic We have migrated the zuulv3 job to Bionic during Dec/Jan month. - http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000837.html - https://etherpad.openstack.org/p/devstack-bionic But that effort does not move all gate job to Bionic as there are large amount of jobs are still legacy jobs. All the legacy jobs still use Xenial as nodeset. As per the decided runtime for Stein, we need to test everything on openstack CI/CD on Bionic - https://governance.openstack.org/tc/reference/runtimes/stein.html Below patch move the legacy base jobs to bionic which will move the derived jobs automatically to bionic. These jobs are modified with branch variant so that they will use Bionic node from stein onwards and xenial for all other stable branches until stable/rocky. - https://review.openstack.org/#/c/639096 This commit remove the overridden nodeset from magnum legacy jobs so that it will start using the nodeset defined in parent job. More Details: - https://etherpad.openstack.org/p/legacy-job-bionic - http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003614.html Depends-On: https://review.openstack.org/#/c/641886/ Change-Id: Ia5f037432f4c5925f916e19cbe8a3253869674d9	2019-03-13 01:24:50 +00:00
Zuul	e6f4969539	Merge "[fedora-atomic-k8s] Adding Node Problem Detector"	2019-03-12 22:05:22 +00:00
Feilong Wang	c39f1150e5	[fedora-atomic-k8s] Adding Node Problem Detector Deploying Node Problem Detector to all nodes to detect problems which can be leverage by auto healing. This is the first step of enabling the auto healing feature. Task: 29886 Story: 2004782 Change-Id: I1b6075025c5f369821b4136783e68b16535dc6ef	2019-03-11 22:39:50 +00:00
Zuul	988cbb8b49	Merge "Add missing ws separator between words"	2019-03-11 22:17:41 +00:00
Spyros Trigazis	16c2a4cfe3	ci: Disable functional tests We currently run only vexxhost with nested virtualization. Due to a kernel change all functional jobs are failing. Change-Id: I9ab45da36dbc5618587b4795658b4f4bb264f2c8 Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>	2019-03-11 20:20:22 +01:00
Jonathan Rosser	2595fda3e3	Ensure http proxy environment is available during 'atomic install' for k8s The scripts run by cloud-init for the master and minion nodes currently write proxy environment variables into /bin/bashrc when they are defined. These variables will only be introduced into the running environment when a new bash shell is started. The /bin/sh used by the fragment scripts will ignore /etc/bashrc, so the new shells invoked per fragment will not have the http proxy variables present. This means that the master/minion node deployment fails when behind an http proxy. This patch adds explicit exports for HTTP_PROXY and HTTPS_PROXY when those variables are defined, and not empty. Task: 29863 Change-Id: Id05c90d5bf99d720ae6002b38d3291e364e1e0c4	2019-03-07 22:16:38 +00:00
Zuul	90dfeaa491	Merge "Fix swarm functional job"	2019-03-07 21:37:46 +00:00
Zuul	24775e0eb3	Merge "Update min tox version to 2.0"	2019-03-07 21:37:45 +00:00
Zuul	f0175f6aac	Merge "[k8s] Make flannel self-hosted"	2019-03-07 21:37:40 +00:00
Zuul	722fc56eb3	Merge "Return health_status for cluster listing"	2019-03-07 11:05:58 +00:00
Zuul	373286368d	Merge "make sure to set node_affinity_policy for Mesos template definition"	2019-03-06 21:10:57 +00:00
Zuul	c11c40a04d	Merge "Fix prometheus installation script"	2019-03-06 15:44:39 +00:00
Zuul	6505aa360d	Merge "Do not exit in the enable-helm-tiller script"	2019-03-06 09:46:49 +00:00
Spyros Trigazis	2ab874a5be	[k8s] Make flannel self-hosted Similar to calico, deploy flannel as a DS. Flannel can use the kubernetes API to store data, so it doesn't need to contact the etcd server directly anymore. This patch drops to relatively large files for flannel's config, flannel-config-service.sh and write-flannel-config.sh. All required config is in the manifests. Additional options to the controller manager: --allocate-node-cidrs=true and --cluster-cidr. Change-Id: I4f1129e155e2602299394b5866165260f4ea0df8 story: 2002751 task: 24870	2019-03-05 18:33:45 +01:00
Nguyen Hai Truong	18fc68dd26	Update min tox version to 2.0 The commands used by constraints need at least tox 2.0. Update to reflect reality, which should help with local running of constraints targets. Change-Id: Iece749b90ec90bec1f5324bc351878e6252720ed	2019-03-05 11:56:54 +11:00
Feilong Wang	83c8b13bf0	Release k8s v1.11.8, v1.12.6 and v1.13.4 Release new k8s version because of CVE-2019-1002100[1] [1] https://discuss.kubernetes.io/t/kubernetes-security-announcement-v1-11-8-1-12-6-1-13-4-released-to-address-medium-severity-cve-2019-1002100/5147 Task: 29789 Story: 2005124 Change-Id: I6435a10b05932ea71e825e944d53859eba374e91	2019-03-03 20:55:47 +00:00
Guang Yee	a47f5a3994	make sure to set node_affinity_policy for Mesos template definition Fixes the problem with Mesos cluster creation where the nodes_affinity_policy was not properly conveyed as it is required in order to create the corresponding server group in Nova. Change-Id: Ie8d73247ba95f20e24d6cae27963d18b35f8715a story: 2005116	2019-03-01 15:49:06 -08:00
Zuul	e256f87d1a	Merge "[k8s-fedora-atomic] Use ClusterIP for prometheus service"	2019-03-01 02:36:49 +00:00
Feilong Wang	e4b05bbd1a	Fix swarm functional job Now swarm functional job failed due to a a regression issue caused by If11ba863a2aa538efe1e3e850084bdd33afd27d2 This patch fixes. Task: 29766 Story: 2004195 Change-Id: I830ab66775e0dd57766cdab25d06500d85651dc1	2019-03-01 14:36:33 +13:00
Lingxian Kong	2cf4df0850	Fix prometheus installation script - Fix the indent in the file. - Use 'kubectl apply' instead of 'kubectl create' for more robust service restart. - Do not retry infinitely when Prometheus datasource already injected into Grafana Story: #2005117 Task: #29765 Change-Id: I5857fe62f922d27860946fd318296950834a8797	2019-03-01 14:16:36 +13:00
Feilong Wang	8c8cd7d199	Return health_status for cluster listing Task: 29761 Story: 2002742 Change-Id: If702584fabe1402257b45db281561a5f5b83b972	2019-03-01 12:08:01 +13:00
Lingxian Kong	3695536085	Do not exit in the enable-helm-tiller script The scripts included in the Heat kube_cluster_config resource should not exit if the particular step is skipped. Change-Id: I2d4cf54631c8ed3a9eb30b3e6c8e1af0007e23d5 Story: #2005109 Task: #29743	2019-03-01 12:03:52 +13:00
Zuul	57a3b73fa0	Merge "Fix async reserved word in python3.7"	2019-02-28 17:03:18 +00:00
Zuul	c181fce90d	Merge "FakeLoopingCall raises IOError"	2019-02-28 17:03:13 +00:00
Zuul	6d85d7be56	Merge "python3 fix: decode binary cert data if encountered"	2019-02-28 11:28:40 +00:00
Theodoros Tsioutsias	14b46ea22b	FakeLoopingCall raises IOError All unittests using FakeLoopingCall raise an IOError if an initial delay is not specified, because the default initial_dealy is -1. Changing the default initial delay to 0. story: 2005112 task: 29748 Change-Id: I6cbae0996c2347e25d8be617e4b3fd93f4d9cc95	2019-02-28 10:01:17 +00:00
Zuul	d76ab4da80	Merge "[k8s-fedora-atomic] Security group definition for worker nodes"	2019-02-27 23:59:12 +00:00
Lingxian Kong	31c82625d6	[k8s-fedora-atomic] Security group definition for worker nodes Defines more strict security group rules for kubernetes worker nodes. The ports that are open by default: default port range(30000-32767) for external service ports; kubelet healthcheck port; Calico BGP network ports; flannel overlay network ports. The cluster admin should manually config the security group on the nodes where Traefik is allowed. Story: #2005082 Task: #29661 Change-Id: Idbc67cb95133d3a4029105e6d4dc92519c816288	2019-02-27 22:15:46 +00:00
Zuul	07e48a1ed5	Merge "Add server group for cluster worker nodes"	2019-02-27 12:32:47 +00:00
Zuul	731499c460	Merge "Return instance ID of worker node"	2019-02-27 11:57:34 +00:00
Lingxian Kong	2bbfd52abc	[k8s-fedora-atomic] Use ClusterIP for prometheus service The NodePort type service, by design, bypasses almost all network security in Kubernetes, so is not recommended to be used in the cloud enviroment. This patch changes the prometheus service type from NodePort to ClusterIP. Story: #2005098 Task: #29712 Change-Id: Ic47a334bcf81afb87a78a5e66db1a988b473a47e	2019-02-28 00:13:28 +13:00
Zuul	138472dcf1	Merge "Add reno for flannel reboot fix"	2019-02-27 10:00:52 +00:00
Feilong Wang	20d03919fb	Return instance ID of worker node Return the nova instance UUID of worker nodes in kubeminion templates. We will be able to remove resources from the ResourceGroups based on nova instance uuid. Backstory: In heat a ResourceGroup creates a stack of depth 2. ResourceGroups support removal policies to declare which resources must be removed. This can be done by passing the index of the resource or the stack_id of the nested stack. If a stack update call receives a list of indices (eg [0, 5, 3]) or nested stack uuid (eg [uuidA, uuidB]), it will remove the corresponding nested stacks. In magnum's heat templates, a nested stack logically represents a nova compute instance which is a cluster node. Using composition in heat, we can change the way a resources group references the nested stacks. This proposes to use the nova instance uuid as 'OS::stack_id'. With this change, an external consumer of the stack (the cluster autoscaler or an actual user) can remove resources from the ResourceGroup using the nova instance uuid or resource index. Without this change, a user or system (which typically knows the name, server uuid or ip) would have to find in which nested stack a kubernetes node belongs too. Resulting multiple call to heat. The end result of this patch can be verified like this: nested_stack_id=$(openstack stack resource show <STACK_ID_OR_NAME> kube_minions -c physical_resource_id -f value) openstack stack show "${nested_stack_id}" Task: 29664 Story: 2005054 Change-Id: I6d776f62d640c72b3228460392b92df94fe56fe6	2019-02-27 10:46:41 +01:00
Feilong Wang	4f84c849f6	Add server group for cluster worker nodes Now Magnums onlys has one server group for all master and worker nodes per cluster, which is not very flexible for small cloud scale. For a 3+ master clusters, it's easily meeting the capacity when using hard anti-affinity policy. This patch is proposing one server group for each master and worker nodes group to have better flexibility. story: 2004195 Change-Id: If11ba863a2aa538efe1e3e850084bdd33afd27d2	2019-02-27 09:09:20 +00:00
Jake Yip	ea362b1391	python3 fix: decode binary cert data if encountered We are writing to files opened with text mode ('w+'), so binary data will have to be decoded before writing Task: 29577 Story: 2005057 Change-Id: I034d0230c3022e701111bdc71f0af43da1852c3c	2019-02-27 19:47:38 +11:00
Nguyen Hai Truong	055384343f	Add python 3.6 unit test job This is a mechanically generated patch to add a unit test job running under Python 3.6 as part of the python3-first goal. See the python3-first goal document for details: https://governance.openstack.org/tc/goals/stein/python3-first.html Change-Id: I5a92105f7cfbcabf521150d65f89b14cea62db0f	2019-02-23 18:01:18 +11:00
Spyros Trigazis	e6b3325120	Add reno for flannel reboot fix Change [0] fixed the issue of reseting iptables on node reboot when flannel was configured which made pods lose connectivity. [0] I7f6200a4966fda1cc701749bf1f37ddc492390c5 Change-Id: I07771f2c4711b0b86a53610517abdc3dad270574 Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>	2019-02-22 11:07:59 +01:00

1 2 3 4 5 ...

4628 Commits