shipyard

Author	SHA1	Message	Date
Anthony Lin	47cd7a25f4	Align Operators with UcpBaseOperator All UCP Operators will inherit from the UcpBaseOperator [0] This patch set will align the rest of the Operators, i.e. Armada, Deckhand and Promenade Operators with the UcpBaseOperator It also updates the name of the shipyard container to be 'shipyard-api' instead of 'shipyard' [0] https://review.gerrithub.io/#/c/407736/ Change-Id: I516590c492e9bb5554161119dade278d74197374	2018-04-19 16:32:51 +00:00
Anthony Lin	91b60ac595	Add get_k8s_logs Operator Add a method to retrieve logs from Kubernetes Pod Change-Id: I02e59c164881566d4c2b0d5decbe9eb0f3f30d34	2018-04-18 21:48:31 -04:00
Anthony Lin	b9b0e27de0	Add UCP Base Operator 1) Refactor Drydock Base Operator to make use of the UCP Base Operator instead 2) Dump logs from Drydock Pods when there are Exceptions Change-Id: I3fbe03d13b5fc89a503cfb2c3c25751076718554	2018-04-18 14:19:16 +00:00
Anthony Lin	773fcd71cc	[Fix] Update Shipyard Chart - Shipyard FQDN The 'proxy_read_timeout' needs to be a string instead of integer Change-Id: Iaddbb617bb50ddc0aa70649662816e6dfab3d713	2018-04-12 22:58:24 -04:00
Anthony Lin	9269caa227	Shipyard API for Airflow Logs Retrieval Introduce a new endpoint to retrieve Airflow logs - API path: GET /actions/{action_id}/steps/{step_id}/logs?try=2 Change-Id: I6a16cdab148a8a7a9f1bc5fb98a18bce1406cf9f	2018-04-12 09:25:42 -04:00
Bryan Strassner	2e780aef5a	[fix] add labels to shipyard jobs Adds the appropriate labels to the ks-user and ks-service jobs to ensure they can be referenced for deletion. Change-Id: I56d6f67d37e7293f596193a8bf7311e82cac3e7f	2018-04-11 17:23:28 -05:00
Scott Hussey	130eb26ab4	[400207] Fix shipyard FQDN - Update the shipyard chart to leverage the HTK routine for producing the Ingress manifests to be compatible with Ingress public endpoints. Change-Id: I864d0e787cd4cd1c3099894b27d22835b2177b7a	2018-04-09 13:52:43 -05:00
Anthony Lin	e178005143	Update kubernetes-entrypoint This patch set updates the kubernetes-entrypoint image inline with the chart used in OpenStack-Helm in [0]. This allows the chart to use pod dependencies. [0] https://review.openstack.org/#/c/554268/ Change-Id: I5a8bd741a2c7c58b5f110d827872a630953c9ae7	2018-04-02 17:53:54 +00:00
Anthony Lin	d40e9776d3	[398226] Add Resource limits for ks_service job Checks on Shipyard/Airflow chart show that we are missing the resource limits for ks_service job. This patch set will add the resource limits and will also update indentation for 'test-airflow-api' and 'test-shipyard-api'. Change-Id: I0a3f11bb9cbb45a9c8994dbc226c080914a86a1c	2018-03-28 13:23:11 -04:00
Anthony Lin	7219519135	Add Airflow Worker Upgrade Workflow This patch set is meant to create a workflow that will allow us to upgrade the airflow worker without causing disruption to the current running workflow. Note that we will set the update strategy for airflow worker to 'OnDelete'. The 'OnDelete' update strategy implements the legacy (1.6 and prior) behavior. When we select this update strategy, the statefulSet controller will not automatically update Pods when a modification is made to the StatefulSet’s '.spec.template field'. This strategy can be selected by setting the '.spec.template.updateStrategy.type' to 'OnDelete'. Refer to [0] for more information. [0] https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset Change-Id: I1f6c3564b7fba6abe422b86e36818eb2cd3454ea	2018-03-16 10:18:43 -04:00
Bryan Strassner	fa105e6da8	Change banners to restore attribution Restores the historical attribution in the top-of-file banners. Change-Id: I0bd673e18f0b6c6831c648d00474b1192d03b935	2018-03-15 16:57:20 -05:00
Anthony Lin	ba1e1439e4	Shipyard_API - Liveness and Readiness Probes This patch set does the following to enhance health/status checks on the shipyard-api pod: 1) Add Liveness Probe 2) Update Readiness Probe Change-Id: Ifab63a8724f29fb38124f43d475bb022807a4cce	2018-03-12 04:54:46 +00:00
Pete Birley	74a3743fae	Images: depreciate kolla heat-engine image for LOCI This PS deprecates the kolla heat-engine image for it's LOCI replacement. Change-Id: Ie6a445e48b87c30e334690d6e9b7298bbd360430	2018-03-08 22:05:48 -05:00
Bryan Strassner	9edcc7bc20	[383710] Add helm test to Shipyard Also covers [383892] Add helm test to Airflow Provides basic tests to run as helm test during deployment of Shipyard/Airflow. Change-Id: Icc4012f38b6162adf175702dd7f50de46dbfbe47	2018-03-07 22:08:51 -05:00
Anthony Lin	20bdce7137	Remove logging_config_class from values.yaml We are seeing the following error [0] in the Airflow Web GUI which prevents user from reading the workflow logs from the GUI. This is happening as the Airflow Web Pod is not able to directly access the volume of the Airflow Worker Pod. This patch set will remove the parameters that are causing this behavior and revert back to the default system configuration which was shown to be working properly in our local test environment. [0] Error Message Task log handler task does not support read logs. Change-Id: I71cc9ebd5f6571b486af4d77dbd89f234e8dd3b3	2018-02-28 15:29:26 +00:00
Anthony Lin	6c6acbfc80	Add Log Rotate Side Car Container We need a side car container to perform log rotation on the log files. Logs shall be retained for 30 days. This is the default setting and can be changed by updating values.yaml Also cleaned up README.md Change-Id: I39a7797e96abd349160d753f8917f7f78f7d8797	2018-02-27 16:19:19 +00:00
Anthony Lin	80210df387	Remove airflow config template This patch set removes the (pre)generated config ini file from airflow. The configuration will now be pulled directly from values.yaml which will be inline with OpenStack-Helm's approach. This will do away with the need to maintain the verbose .conf.tpl in the repository as mentioned by Tin in his comments for [0]. [0] https://review.gerrithub.io/#/c/400925/ Change-Id: I5a9766e52536ac9b143b397faa3563e69dfb6bf3	2018-02-27 10:18:25 -05:00
Anthony Lin	656d277975	Update Airflow Celery 'result_backend' The current settings in Airflow is different from the recommended one in [0] This patch set is meant to align with the recommended configurations Note also that due to issue reported in [1], we are keeping the variable 'celery_result_backend' for now and will remove it when we upgrade airflow to Airflow v1.9.1 [0] http://docs.celeryproject.org/en/latest/userguide/configuration.html [1] https://github.com/puckel/docker-airflow/issues/156 Change-Id: Ibead7c2ca76a984c09327579aedade036b959ab2	2018-02-25 22:15:09 -05:00
Anthony Lin	b162715f82	Update Airflow values.yaml The dag will be turned off if 'dags_are_paused_at_creation' is set to "True". This variable should be set be set to "False" so that we can execute the workflow. Change-Id: Ib9f7d20d2181861d31ad8a22c83ba3481de35eef	2018-02-24 02:54:26 +00:00
Anthony Lin	7ffc8637fc	Update Airflow Config Template There has been significant changes in the Airflow code base with recent software updates. This has resulted in huge changes in airflow.cfg This patch set is meant to align the config file with that of Airflow 1.9.0 [0] [0] https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg Change-Id: I796fb1803c0f80a7486155864fe0a2a87e7a5737	2018-02-22 09:17:15 +00:00
Anthony Lin	d3419123c3	Make Airflow Worker Stateful Set There is a need to make the airflow worker a stateful set so that the name of the pod will be consistent. This will allow us to properly extract and correlate logs in the database. We are also adding pvc for the airflow worker pods so that the logs persist. Change-Id: I79917aa02b38672cac13d6148c4ed44007a78d32	2018-02-21 14:59:59 +00:00
Anthony Lin	258449d688	Remove RabbitMQ Admin User The admin user is not used. We will remove it. Change-Id: I2e62ee55599a0fb4f21e619a292de32e08af1550	2018-02-13 16:26:35 -05:00
Bryan Strassner	1c893ab3ef	Shipyard DB init grant use admin user Updates the db init job for Shipyard to use the DB admin user, connect to the airflow db, and grant the privileges. This changes from trying to connect as the 'airflow' user and the admin user password Change-Id: Ib3dbac2b81129b0a849781175fcce4593df639df	2018-02-07 18:11:12 -06:00
Anthony Lin	cf1e822599	Make Ingress proxy-read-timeout Configurable There is a need to make the proxy-read-timeout configurable so that we can alter the value to handle request that takes more than a minute (default timeout) to process Also increase http-timeout for uwsgi to 600 seconds Change-Id: I25dabc648822252a7918d6272c78fb8ebc236b6c	2018-02-07 18:37:12 +00:00
Anthony Lin	c9d6660d91	Bug Fix - Update Shipyard/Airflow Ingress Port The port should be 80 instead as that is the port that is opened on the Ingress Controller. Change-Id: Ic63ff3601522f47cae15150c07e1a7e8beb7a84a	2018-02-07 10:05:35 -05:00
Anthony Lin	25236ac89b	Make Request Timeout Configurable As the size of the YAMLs increases, the amount of time needed to process the request increased as well. Hence there is a need to make 'timeout' configurable for the deckhand client. Change-Id: Iab91091cd8b9a900ad0daeac22e435d4e5c9c97d	2018-02-07 01:32:10 +00:00
Anthony Lin	eb23a5a0d2	Update Shipyard/Airflow Chart - Database Configurability - Support configured Postgres admin password - Use secrets for database job environment setup This patch set also updates a bunch of banners Change-Id: I238cfd123b5aad31c9cb93864cff7641f719f3df	2018-01-30 10:26:50 -05:00
Krysta	5cc0b5b986	Enable Multi-Workers/Threads for Shipyard Updates to entry.sh to allow for multi-workers/threads Updates Shipyard chart to allow parametrs to be configurable Change-Id: I6ad9d198ac4df4c7c85dfcf5c04afd3c7966f0f0	2018-01-26 13:00:15 -06:00
Anthony Lin	14cdfca6d5	Bug Fix - Shipyard DB Sync We are getting the following errors [0] after merging [1] as we will need to use the shipyard image to execute db-sync. This p.s. updates the default value for shipyard-db-sync [0] shipyard-db-sync pod went into CrashLoopBackOff root@labinstance:~# kubectl logs -f shipyard-db-sync-g7mdn -n ucp + upgrade_db /tmp/shipyard-db-sync.sh: line 7: upgrade_db: command not found root@labinstance:~# [1] https://review.gerrithub.io/#/c/395502/ Change-Id: I4a8445ae9431121754b84f42e98192af36335487	2018-01-25 17:26:14 +00:00
Krysta	7fbc3dad25	Add database upgrade entrypoint Removes the database upgrade from start shipyard and instead adds it as an entrypoint, so the database upgrade is only done once. Change-Id: I8c087af58aa46051d0d1c47ba5f35e5e86c1acdc	2018-01-25 09:37:00 -05:00
Anthony Lin	08f228ed91	Merge "Redeploy Server - Dags & Operators"	2018-01-24 22:16:10 -05:00
Anthony Lin	3d88cf9e33	Redeploy Server - Dags & Operators This patch set updates the required dags and operators for the redeploy server workflow. It also introduces the Promenade Operator. Note that many of the required functionalities in DryDock and Promenade are being worked on and are not ready at the moment. As such, this patch set is mainly providing the skeleton framework for the redeploy server workflow. The dags and relevant Operators will be updated at a later date when the features and functionalities are ready for usage. Change-Id: I4baae76ea9d8cde9c2b0bab3feac896d01400868	2018-01-24 17:34:51 +00:00
Anthony Lin	4991d8f6ff	Update RBAC rules for Airflow Workers We are getting the following errors [0] while getting Airflow worker to execute a health check on the underlying K8s cluster. This patch set is meant to grant watch/get/list pods rights to the airflow worker so that it can perform health checks on the K8s cluster. [0] Error messages: [2018-01-23 02:51:32,003] {base_task_runner.py:98} INFO - Subtask: HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure", "message":"pods is forbidden: User \"system:serviceaccount:ucp:airflow-worker\" cannot list pods at the cluster scope","reason":"Forbidden","details":{"kind":"pods"},"code":403} Change-Id: Iede29f605b5d508d0e58c0c2ae74d7d040d5b8ea	2018-01-24 03:13:49 +00:00
Anthony Lin	b379477236	RBAC: Update serviceaccount and k8s rbac for Airflow This patch set brings the airflow/shipyard chart to be inline with OSH* RBAC approach used in [0] and [1] [0] https://review.openstack.org/#/c/526464/52 [1] https://review.openstack.org/#/c/529378/ Change-Id: Id2ff9f59028474601933196e1722b46c95f3a8ac	2018-01-22 16:47:47 +00:00
Anthony Lin	5190189a60	Update DryDock Operator The following errors [0] were encountered during our end-to-end testing. This is a result of extended execution of the workflow that led to expiration of the keystone token. It is also possible for the 'prepare_site' task to take more than 120 seconds to complete. Hence we are increasing the time out for the 'prepare_site_task_timeout' variable to 300 seconds. This P.S. addresses the above 2 observations [0] Logs from DryDock Authorization failed for token Identity response: {"error": {"message": "Failed to validate token", "code": 404, "title": "Not Found"}} Authorization failed for token Change-Id: I4760e390822e6e8c9540216035e263d054fde400	2018-01-06 05:49:12 +00:00
Anthony Lin	5db6d42050	RBAC: Update serviceaccount and k8s rbac for shipyard This patch set brings the shipyard chart to be inline with OSH* RBAC approach used in [0] and [1]. [0] https://review.openstack.org/#/c/526464/52 [1] https://review.openstack.org/#/c/529378/ Change-Id: I608d00a69729e347b4121745e80f1e9760e5f6d4	2017-12-28 17:56:02 +00:00
Anthony Lin	768981df44	Refactor UCP Health Check Operator There has been significant changes to the Shipyard code base since the last major update to the UCP Health Check Operator. This patch set is meant to align its implementation with the rest of the Operators. It removes the usage of 'urlopen' which can be a security risk and make use of the python 'requests' module instead. We are also adding 'timeout' parameters to the other Operators that are using 'requests.get' as failure to do so can cause the Operator(s) to hang indefinitely. The default time out has been set to 30 seconds. It is noted that nearly all production code should use this parameter in nearly all requests. Change-Id: I1205aab38ff120cd239c236dc9bdffd1660c9afb	2017-12-18 17:24:28 +00:00
Anthony Lin	ed8107baad	Add Backoff time before checking cluster join The current logic checks for nodes that started the join process (based on the snapshot of the environment that was taken by the operator at that point in time). It will not check the state of nodes that it is not aware of, i.e. those that it did not capture initially will not be checked. Hence there is a need to introduce backoff time as it takes a while before all the nodes start to join the Cluster. This is a short term stop gap approach until the Promenade API is ready for consumption Change-Id: I2bdf9c970ecb509fe833fd353e6648a97118d79b	2017-12-08 08:38:53 +00:00
Anthony Lin	55ae811742	Set default number of replicas to 2 There is a need to set the number of replicas to 2 for redundancies/resiliency Change-Id: I876c74d0a71d5d03c9158228eff9f819e227b837	2017-12-06 20:13:06 +00:00
portdirect	b28e08f0f1	Images: Remove Kolla-Toolbox image as not required This ps removes the last references to Kolla-Toolbox which is not required for keystone management jobs. Change-Id: I7ca1b93a2485b8eafdd6a48fc4c26c049f20d9cd	2017-11-16 12:12:05 -05:00
Scott Hussey	c5d55677c5	Update to use latest entrypoint container image Update the dep_check image to use the latest Stackanetes entrypoint image. Change-Id: I9f0720be3390109d3972a778816332e85323ab56	2017-11-15 10:00:56 -06:00
Anthony Lin	28c24eb221	Fix typo in Shipyard Chart There was a need to make changes to the variable names when we merged the Airflow and Shipyard charts. This resulted in typo in variable name in the service-shipyard-ingress.yaml which stops the service from being deployed. This P.S. is meant to correct that behavior We also need the public endpoint for shipyard to be on port 80 so that the CLI will work properly with the Ingress Controller Change-Id: I0483e8ab9e3eb7839149413311abb8c1475f59fa	2017-11-04 15:32:31 +00:00
Anthony Lin	251bfff83e	Update Shipyard Helm Chart This patch set removes the shipyard config, policy and paste.ini template from the existing Shipyard Helm Chart. This is done to align with the current approach in OpenStack Helm. 1) Remove shipyard config template 2) Remove shipyard policy.yaml template 3) Remove shipyard api-paste.ini template 4) Update related template files There has also been a recent change to the Helm Toolkit which will break the current implementation of the Shipyard Chart The changes in Helm Toolkit were made to the 'images' definition in values.yaml to facilitate adding the option to prefix image name etc This P.S. will also update the Shipyard Chart to align with the recent changes in Helm Toolkit Change-Id: Ie79fd9da2c9a577027dd0dddbcca6b7f7b3b4f6f	2017-10-24 15:23:15 +00:00
Anthony Lin	dfa7cedb19	Update DryDock Operator & Shipyard Chart This Patch Set is meant to expose the 'query_interval' and 'task_timeout' parameters for Drydock tasks in Shipyard. This will allow us to specify the values for a particular site. The corresponding changes for the Helm Chart are included in this Patch Set as well. It is also noted that the task has been updated to 'prepare_nodes' and 'deploy_nodes' instead. Task State can either be 'completed' or 'terminated'. These new changes have been captured in this Patch Set as well. Change-Id: I1b446f7bcf493bc8e5bbfdba842158797f0e3594	2017-10-24 01:53:58 +00:00
Anthony Lin	b002bd58fd	Move Shipyard Chart This PS migrates the Shipyard Chart into this repo Change-Id: I2cf037ab662886a94c8439f43d248da9295a83b3	2017-10-20 02:34:03 +00:00

45 Commits