45 Commits

Author SHA1 Message Date
Anthony Lin
47cd7a25f4 Align Operators with UcpBaseOperator
All UCP Operators will inherit from the UcpBaseOperator [0]

This patch set will align the rest of the Operators, i.e. Armada,
Deckhand and Promenade Operators with the UcpBaseOperator

It also updates the name of the shipyard container to be
'shipyard-api' instead of 'shipyard'

[0] https://review.gerrithub.io/#/c/407736/

Change-Id: I516590c492e9bb5554161119dade278d74197374
2018-04-19 16:32:51 +00:00
Anthony Lin
91b60ac595 Add get_k8s_logs Operator
Add a method to retrieve logs from Kubernetes Pod

Change-Id: I02e59c164881566d4c2b0d5decbe9eb0f3f30d34
2018-04-18 21:48:31 -04:00
Anthony Lin
b9b0e27de0 Add UCP Base Operator
1) Refactor Drydock Base Operator to make use of the
   UCP Base Operator instead

2) Dump logs from Drydock Pods when there are Exceptions

Change-Id: I3fbe03d13b5fc89a503cfb2c3c25751076718554
2018-04-18 14:19:16 +00:00
Anthony Lin
773fcd71cc [Fix] Update Shipyard Chart - Shipyard FQDN
The 'proxy_read_timeout' needs to be a string instead of integer

Change-Id: Iaddbb617bb50ddc0aa70649662816e6dfab3d713
2018-04-12 22:58:24 -04:00
Anthony Lin
9269caa227 Shipyard API for Airflow Logs Retrieval
Introduce a new endpoint to retrieve Airflow logs

- API path:
   GET /actions/{action_id}/steps/{step_id}/logs?try=2

Change-Id: I6a16cdab148a8a7a9f1bc5fb98a18bce1406cf9f
2018-04-12 09:25:42 -04:00
Bryan Strassner
2e780aef5a [fix] add labels to shipyard jobs
Adds the appropriate labels to the ks-user and ks-service jobs
to ensure they can be referenced for deletion.

Change-Id: I56d6f67d37e7293f596193a8bf7311e82cac3e7f
2018-04-11 17:23:28 -05:00
Scott Hussey
130eb26ab4 [400207] Fix shipyard FQDN
- Update the shipyard chart to leverage the HTK routine
  for producing the Ingress manifests to be compatible
  with Ingress public endpoints.

Change-Id: I864d0e787cd4cd1c3099894b27d22835b2177b7a
2018-04-09 13:52:43 -05:00
Anthony Lin
e178005143 Update kubernetes-entrypoint
This patch set updates the kubernetes-entrypoint image inline with
the chart used in OpenStack-Helm in [0]. This allows the chart to
use pod dependencies.

[0] https://review.openstack.org/#/c/554268/

Change-Id: I5a8bd741a2c7c58b5f110d827872a630953c9ae7
2018-04-02 17:53:54 +00:00
Anthony Lin
d40e9776d3 [398226] Add Resource limits for ks_service job
Checks on Shipyard/Airflow chart show that we are missing the
resource limits for ks_service job.

This patch set will add the resource limits and will also update
indentation for 'test-airflow-api' and 'test-shipyard-api'.

Change-Id: I0a3f11bb9cbb45a9c8994dbc226c080914a86a1c
2018-03-28 13:23:11 -04:00
Anthony Lin
7219519135 Add Airflow Worker Upgrade Workflow
This patch set is meant to create a workflow that will allow us
to upgrade the airflow worker without causing disruption to the
current running workflow.

Note that we will set the update strategy for airflow worker
to 'OnDelete'. The 'OnDelete' update strategy implements the legacy
(1.6 and prior) behavior. When we select this update strategy, the
statefulSet controller will not automatically update Pods when a
modification is made to the StatefulSet’s '.spec.template field'.
This strategy can be selected by setting the '.spec.template.updateStrategy.type'
to 'OnDelete'. Refer to [0] for more information.

[0] https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset

Change-Id: I1f6c3564b7fba6abe422b86e36818eb2cd3454ea
2018-03-16 10:18:43 -04:00
Bryan Strassner
fa105e6da8 Change banners to restore attribution
Restores the historical attribution in the top-of-file banners.

Change-Id: I0bd673e18f0b6c6831c648d00474b1192d03b935
2018-03-15 16:57:20 -05:00
Anthony Lin
ba1e1439e4 Shipyard_API - Liveness and Readiness Probes
This patch set does the following to enhance health/status checks
on the shipyard-api pod:

1) Add Liveness Probe
2) Update Readiness Probe

Change-Id: Ifab63a8724f29fb38124f43d475bb022807a4cce
2018-03-12 04:54:46 +00:00
Pete Birley
74a3743fae Images: depreciate kolla heat-engine image for LOCI
This PS deprecates the kolla heat-engine image for it's LOCI
replacement.

Change-Id: Ie6a445e48b87c30e334690d6e9b7298bbd360430
2018-03-08 22:05:48 -05:00
Bryan Strassner
9edcc7bc20 [383710] Add helm test to Shipyard
Also covers [383892] Add helm test to Airflow
Provides basic tests to run as helm test during deployment
of Shipyard/Airflow.

Change-Id: Icc4012f38b6162adf175702dd7f50de46dbfbe47
2018-03-07 22:08:51 -05:00
Anthony Lin
20bdce7137 Remove logging_config_class from values.yaml
We are seeing the following error [0] in the Airflow
Web GUI which prevents user from reading the workflow
logs from the GUI.

This is happening as the Airflow Web Pod is not able
to directly access the volume of the Airflow Worker
Pod.

This patch set will remove the parameters that are
causing this behavior and revert back to the default
system configuration which was shown to be working
properly in our local test environment.

[0] Error Message

Task log handler task does not support read logs.

Change-Id: I71cc9ebd5f6571b486af4d77dbd89f234e8dd3b3
2018-02-28 15:29:26 +00:00
Anthony Lin
6c6acbfc80 Add Log Rotate Side Car Container
We need a side car container to perform log rotation
on the log files. Logs shall be retained for 30 days.
This is the default setting and can be changed by updating
values.yaml

Also cleaned up README.md

Change-Id: I39a7797e96abd349160d753f8917f7f78f7d8797
2018-02-27 16:19:19 +00:00
Anthony Lin
80210df387 Remove airflow config template
This patch set removes the (pre)generated config ini file from
airflow. The configuration will now be pulled directly from
values.yaml which will be inline with OpenStack-Helm's approach.

This will do away with the need to maintain the verbose .conf.tpl
in the repository as mentioned by Tin in his comments for [0].

[0] https://review.gerrithub.io/#/c/400925/

Change-Id: I5a9766e52536ac9b143b397faa3563e69dfb6bf3
2018-02-27 10:18:25 -05:00
Anthony Lin
656d277975 Update Airflow Celery 'result_backend'
The current settings in Airflow is different from the recommended
one in [0]

This patch set is meant to align with the recommended configurations

Note also that due to issue reported in [1], we are keeping the
variable 'celery_result_backend' for now and will remove it when
we upgrade airflow to Airflow v1.9.1

[0] http://docs.celeryproject.org/en/latest/userguide/configuration.html
[1] https://github.com/puckel/docker-airflow/issues/156

Change-Id: Ibead7c2ca76a984c09327579aedade036b959ab2
2018-02-25 22:15:09 -05:00
Anthony Lin
b162715f82 Update Airflow values.yaml
The dag will be turned off if 'dags_are_paused_at_creation' is
set to "True". This variable should be set be set to "False" so
that we can execute the workflow.

Change-Id: Ib9f7d20d2181861d31ad8a22c83ba3481de35eef
2018-02-24 02:54:26 +00:00
Anthony Lin
7ffc8637fc Update Airflow Config Template
There has been significant changes in the Airflow
code base with recent software updates. This has
resulted in huge changes in airflow.cfg

This patch set is meant to align the config file
with that of Airflow 1.9.0 [0]

[0] https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg

Change-Id: I796fb1803c0f80a7486155864fe0a2a87e7a5737
2018-02-22 09:17:15 +00:00
Anthony Lin
d3419123c3 Make Airflow Worker Stateful Set
There is a need to make the airflow worker a stateful set
so that the name of the pod will be consistent. This will
allow us to properly extract and correlate logs in the
database.

We are also adding pvc for the airflow worker pods so that
the logs persist.

Change-Id: I79917aa02b38672cac13d6148c4ed44007a78d32
2018-02-21 14:59:59 +00:00
Anthony Lin
258449d688 Remove RabbitMQ Admin User
The admin user is not used. We will remove it.

Change-Id: I2e62ee55599a0fb4f21e619a292de32e08af1550
2018-02-13 16:26:35 -05:00
Bryan Strassner
1c893ab3ef Shipyard DB init grant use admin user
Updates the db init job for Shipyard to use the DB admin user,
connect to the airflow db, and grant the privileges. This changes
from trying to connect as the 'airflow' user and the admin user password

Change-Id: Ib3dbac2b81129b0a849781175fcce4593df639df
2018-02-07 18:11:12 -06:00
Anthony Lin
cf1e822599 Make Ingress proxy-read-timeout Configurable
There is a need to make the proxy-read-timeout configurable
so that we can alter the value to handle request that takes
more than a minute (default timeout) to process

Also increase http-timeout for uwsgi to 600 seconds

Change-Id: I25dabc648822252a7918d6272c78fb8ebc236b6c
2018-02-07 18:37:12 +00:00
Anthony Lin
c9d6660d91 Bug Fix - Update Shipyard/Airflow Ingress Port
The port should be 80 instead as that is the port that is
opened on the Ingress Controller.

Change-Id: Ic63ff3601522f47cae15150c07e1a7e8beb7a84a
2018-02-07 10:05:35 -05:00
Anthony Lin
25236ac89b Make Request Timeout Configurable
As the size of the YAMLs increases, the amount of time needed
to process the request increased as well. Hence there is a need
to make 'timeout' configurable for the deckhand client.

Change-Id: Iab91091cd8b9a900ad0daeac22e435d4e5c9c97d
2018-02-07 01:32:10 +00:00
Anthony Lin
eb23a5a0d2 Update Shipyard/Airflow Chart - Database Configurability
- Support configured Postgres admin password
- Use secrets for database job environment setup

This patch set also updates a bunch of banners

Change-Id: I238cfd123b5aad31c9cb93864cff7641f719f3df
2018-01-30 10:26:50 -05:00
Krysta
5cc0b5b986 Enable Multi-Workers/Threads for Shipyard
Updates to entry.sh to allow for multi-workers/threads
Updates Shipyard chart to allow parametrs to be configurable

Change-Id: I6ad9d198ac4df4c7c85dfcf5c04afd3c7966f0f0
2018-01-26 13:00:15 -06:00
Anthony Lin
14cdfca6d5 Bug Fix - Shipyard DB Sync
We are getting the following errors [0] after merging [1] as
we will need to use the shipyard image to execute db-sync.

This p.s. updates the default value for shipyard-db-sync

[0] shipyard-db-sync pod went into CrashLoopBackOff

root@labinstance:~# kubectl logs -f shipyard-db-sync-g7mdn -n ucp
+ upgrade_db
/tmp/shipyard-db-sync.sh: line 7: upgrade_db: command not found
root@labinstance:~#

[1] https://review.gerrithub.io/#/c/395502/

Change-Id: I4a8445ae9431121754b84f42e98192af36335487
2018-01-25 17:26:14 +00:00
Krysta
7fbc3dad25 Add database upgrade entrypoint
Removes the database upgrade from start shipyard and
instead adds it as an entrypoint, so the database upgrade
is only done once.

Change-Id: I8c087af58aa46051d0d1c47ba5f35e5e86c1acdc
2018-01-25 09:37:00 -05:00
Anthony Lin
08f228ed91 Merge "Redeploy Server - Dags & Operators" 2018-01-24 22:16:10 -05:00
Anthony Lin
3d88cf9e33 Redeploy Server - Dags & Operators
This patch set updates the required dags and operators
for the redeploy server workflow. It also introduces the
Promenade Operator.

Note that many of the required functionalities in DryDock
and Promenade are being worked on and are not ready at the
moment. As such, this patch set is mainly providing the
skeleton framework for the redeploy server workflow. The
dags and relevant Operators will be updated at a later date
when the features and functionalities are ready for usage.

Change-Id: I4baae76ea9d8cde9c2b0bab3feac896d01400868
2018-01-24 17:34:51 +00:00
Anthony Lin
4991d8f6ff Update RBAC rules for Airflow Workers
We are getting the following errors [0] while getting
Airflow worker to execute a health check on the underlying
K8s cluster.

This patch set is meant to grant watch/get/list pods rights
to the airflow worker so that it can perform health checks
on the K8s cluster.

[0] Error messages:

[2018-01-23 02:51:32,003] {base_task_runner.py:98} INFO - Subtask: HTTP response body:

{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure",
"message":"pods is forbidden: User \"system:serviceaccount:ucp:airflow-worker\"
cannot list pods at the cluster scope","reason":"Forbidden","details":{"kind":"pods"},"code":403}

Change-Id: Iede29f605b5d508d0e58c0c2ae74d7d040d5b8ea
2018-01-24 03:13:49 +00:00
Anthony Lin
b379477236 RBAC: Update serviceaccount and k8s rbac for Airflow
This patch set brings the airflow/shipyard chart to be
inline with OSH* RBAC approach used in [0] and [1]

[0] https://review.openstack.org/#/c/526464/52
[1] https://review.openstack.org/#/c/529378/

Change-Id: Id2ff9f59028474601933196e1722b46c95f3a8ac
2018-01-22 16:47:47 +00:00
Anthony Lin
5190189a60 Update DryDock Operator
The following errors [0] were encountered during our end-to-end
testing. This is a result of extended execution of the workflow
that led to expiration of the keystone token.

It is also possible for the 'prepare_site' task to take more than
120 seconds to complete. Hence we are increasing the time out for
the 'prepare_site_task_timeout' variable to 300 seconds.

This P.S. addresses the above 2 observations

[0] Logs from DryDock

Authorization failed for token
Identity response: {"error": {"message": "Failed to validate token", "code": 404, "title": "Not Found"}}
Authorization failed for token

Change-Id: I4760e390822e6e8c9540216035e263d054fde400
2018-01-06 05:49:12 +00:00
Anthony Lin
5db6d42050 RBAC: Update serviceaccount and k8s rbac for shipyard
This patch set brings the shipyard chart to be inline with OSH* RBAC
approach used in [0] and [1].

[0] https://review.openstack.org/#/c/526464/52
[1] https://review.openstack.org/#/c/529378/

Change-Id: I608d00a69729e347b4121745e80f1e9760e5f6d4
2017-12-28 17:56:02 +00:00
Anthony Lin
768981df44 Refactor UCP Health Check Operator
There has been significant changes to the Shipyard code base
since the last major update to the UCP Health Check Operator.
This patch set is meant to align its implementation with the
rest of the Operators.

It removes the usage of 'urlopen' which can be a security
risk and make use of the python 'requests' module instead.

We are also adding 'timeout' parameters to the other Operators
that are using 'requests.get' as failure to do so can cause
the Operator(s) to hang indefinitely. The default time out
has been set to 30 seconds. It is noted that nearly all production
code should use this parameter in nearly all requests.

Change-Id: I1205aab38ff120cd239c236dc9bdffd1660c9afb
2017-12-18 17:24:28 +00:00
Anthony Lin
ed8107baad Add Backoff time before checking cluster join
The current logic checks for nodes that started the join process
(based on the snapshot of the environment that was taken by the
operator at that point in time). It will not check the state of
nodes that it is not aware of, i.e. those that it did not capture
initially will not be checked. Hence there is a need to introduce
backoff time as it takes a while before all the nodes start to join
the Cluster.

This is a short term stop gap approach until the Promenade API is ready
for consumption

Change-Id: I2bdf9c970ecb509fe833fd353e6648a97118d79b
2017-12-08 08:38:53 +00:00
Anthony Lin
55ae811742 Set default number of replicas to 2
There is a need to set the number of replicas to 2 for
redundancies/resiliency

Change-Id: I876c74d0a71d5d03c9158228eff9f819e227b837
2017-12-06 20:13:06 +00:00
portdirect
b28e08f0f1 Images: Remove Kolla-Toolbox image as not required
This ps removes the last references to Kolla-Toolbox which is not
required for keystone management jobs.

Change-Id: I7ca1b93a2485b8eafdd6a48fc4c26c049f20d9cd
2017-11-16 12:12:05 -05:00
Scott Hussey
c5d55677c5 Update to use latest entrypoint container image
Update the dep_check image to use the latest Stackanetes
entrypoint image.

Change-Id: I9f0720be3390109d3972a778816332e85323ab56
2017-11-15 10:00:56 -06:00
Anthony Lin
28c24eb221 Fix typo in Shipyard Chart
There was a need to make changes to the variable names when we
merged the Airflow and Shipyard charts. This resulted in typo
in variable name in the service-shipyard-ingress.yaml which stops
the service from being deployed.

This P.S. is meant to correct that behavior

We also need the public endpoint for shipyard to be on port 80
so that the CLI will work properly with the Ingress Controller

Change-Id: I0483e8ab9e3eb7839149413311abb8c1475f59fa
2017-11-04 15:32:31 +00:00
Anthony Lin
251bfff83e Update Shipyard Helm Chart
This patch set removes the shipyard config, policy and paste.ini
template from the existing Shipyard Helm Chart.  This is done to
align with the current approach in OpenStack Helm.

1) Remove shipyard config template
2) Remove shipyard policy.yaml template
3) Remove shipyard api-paste.ini template
4) Update related template files

There has also been a recent change to the Helm Toolkit which will
break the current implementation of the Shipyard Chart

The changes in Helm Toolkit were made to the 'images' definition
in values.yaml to facilitate adding the option to prefix image
name etc

This P.S. will also update the Shipyard Chart to align with the
recent changes in Helm Toolkit

Change-Id: Ie79fd9da2c9a577027dd0dddbcca6b7f7b3b4f6f
2017-10-24 15:23:15 +00:00
Anthony Lin
dfa7cedb19 Update DryDock Operator & Shipyard Chart
This Patch Set is meant to expose the 'query_interval' and
'task_timeout' parameters for Drydock tasks in Shipyard.
This will allow us to specify the values for a particular
site.

The corresponding changes for the Helm Chart are included in
this Patch Set as well.

It is also noted that the task has been updated to 'prepare_nodes'
and 'deploy_nodes' instead.

Task State can either be 'completed' or 'terminated'. These new
changes have been captured in this Patch Set as well.

Change-Id: I1b446f7bcf493bc8e5bbfdba842158797f0e3594
2017-10-24 01:53:58 +00:00
Anthony Lin
b002bd58fd Move Shipyard Chart
This PS migrates the Shipyard Chart into this repo

Change-Id: I2cf037ab662886a94c8439f43d248da9295a83b3
2017-10-20 02:34:03 +00:00