Also covers [383892] Add helm test to Airflow
Provides basic tests to run as helm test during deployment
of Shipyard/Airflow.
Change-Id: Icc4012f38b6162adf175702dd7f50de46dbfbe47
We are seeing the following error [0] in the Airflow
Web GUI which prevents user from reading the workflow
logs from the GUI.
This is happening as the Airflow Web Pod is not able
to directly access the volume of the Airflow Worker
Pod.
This patch set will remove the parameters that are
causing this behavior and revert back to the default
system configuration which was shown to be working
properly in our local test environment.
[0] Error Message
Task log handler task does not support read logs.
Change-Id: I71cc9ebd5f6571b486af4d77dbd89f234e8dd3b3
We need a side car container to perform log rotation
on the log files. Logs shall be retained for 30 days.
This is the default setting and can be changed by updating
values.yaml
Also cleaned up README.md
Change-Id: I39a7797e96abd349160d753f8917f7f78f7d8797
This patch set removes the (pre)generated config ini file from
airflow. The configuration will now be pulled directly from
values.yaml which will be inline with OpenStack-Helm's approach.
This will do away with the need to maintain the verbose .conf.tpl
in the repository as mentioned by Tin in his comments for [0].
[0] https://review.gerrithub.io/#/c/400925/
Change-Id: I5a9766e52536ac9b143b397faa3563e69dfb6bf3
The current settings in Airflow is different from the recommended
one in [0]
This patch set is meant to align with the recommended configurations
Note also that due to issue reported in [1], we are keeping the
variable 'celery_result_backend' for now and will remove it when
we upgrade airflow to Airflow v1.9.1
[0] http://docs.celeryproject.org/en/latest/userguide/configuration.html
[1] https://github.com/puckel/docker-airflow/issues/156
Change-Id: Ibead7c2ca76a984c09327579aedade036b959ab2
The dag will be turned off if 'dags_are_paused_at_creation' is
set to "True". This variable should be set be set to "False" so
that we can execute the workflow.
Change-Id: Ib9f7d20d2181861d31ad8a22c83ba3481de35eef
There is a need to make the airflow worker a stateful set
so that the name of the pod will be consistent. This will
allow us to properly extract and correlate logs in the
database.
We are also adding pvc for the airflow worker pods so that
the logs persist.
Change-Id: I79917aa02b38672cac13d6148c4ed44007a78d32
Updates the db init job for Shipyard to use the DB admin user,
connect to the airflow db, and grant the privileges. This changes
from trying to connect as the 'airflow' user and the admin user password
Change-Id: Ib3dbac2b81129b0a849781175fcce4593df639df
There is a need to make the proxy-read-timeout configurable
so that we can alter the value to handle request that takes
more than a minute (default timeout) to process
Also increase http-timeout for uwsgi to 600 seconds
Change-Id: I25dabc648822252a7918d6272c78fb8ebc236b6c
As the size of the YAMLs increases, the amount of time needed
to process the request increased as well. Hence there is a need
to make 'timeout' configurable for the deckhand client.
Change-Id: Iab91091cd8b9a900ad0daeac22e435d4e5c9c97d
- Support configured Postgres admin password
- Use secrets for database job environment setup
This patch set also updates a bunch of banners
Change-Id: I238cfd123b5aad31c9cb93864cff7641f719f3df
Updates to entry.sh to allow for multi-workers/threads
Updates Shipyard chart to allow parametrs to be configurable
Change-Id: I6ad9d198ac4df4c7c85dfcf5c04afd3c7966f0f0
We are getting the following errors [0] after merging [1] as
we will need to use the shipyard image to execute db-sync.
This p.s. updates the default value for shipyard-db-sync
[0] shipyard-db-sync pod went into CrashLoopBackOff
root@labinstance:~# kubectl logs -f shipyard-db-sync-g7mdn -n ucp
+ upgrade_db
/tmp/shipyard-db-sync.sh: line 7: upgrade_db: command not found
root@labinstance:~#
[1] https://review.gerrithub.io/#/c/395502/
Change-Id: I4a8445ae9431121754b84f42e98192af36335487
Removes the database upgrade from start shipyard and
instead adds it as an entrypoint, so the database upgrade
is only done once.
Change-Id: I8c087af58aa46051d0d1c47ba5f35e5e86c1acdc
This patch set updates the required dags and operators
for the redeploy server workflow. It also introduces the
Promenade Operator.
Note that many of the required functionalities in DryDock
and Promenade are being worked on and are not ready at the
moment. As such, this patch set is mainly providing the
skeleton framework for the redeploy server workflow. The
dags and relevant Operators will be updated at a later date
when the features and functionalities are ready for usage.
Change-Id: I4baae76ea9d8cde9c2b0bab3feac896d01400868
We are getting the following errors [0] while getting
Airflow worker to execute a health check on the underlying
K8s cluster.
This patch set is meant to grant watch/get/list pods rights
to the airflow worker so that it can perform health checks
on the K8s cluster.
[0] Error messages:
[2018-01-23 02:51:32,003] {base_task_runner.py:98} INFO - Subtask: HTTP response body:
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure",
"message":"pods is forbidden: User \"system:serviceaccount:ucp:airflow-worker\"
cannot list pods at the cluster scope","reason":"Forbidden","details":{"kind":"pods"},"code":403}
Change-Id: Iede29f605b5d508d0e58c0c2ae74d7d040d5b8ea
The following errors [0] were encountered during our end-to-end
testing. This is a result of extended execution of the workflow
that led to expiration of the keystone token.
It is also possible for the 'prepare_site' task to take more than
120 seconds to complete. Hence we are increasing the time out for
the 'prepare_site_task_timeout' variable to 300 seconds.
This P.S. addresses the above 2 observations
[0] Logs from DryDock
Authorization failed for token
Identity response: {"error": {"message": "Failed to validate token", "code": 404, "title": "Not Found"}}
Authorization failed for token
Change-Id: I4760e390822e6e8c9540216035e263d054fde400
There has been significant changes to the Shipyard code base
since the last major update to the UCP Health Check Operator.
This patch set is meant to align its implementation with the
rest of the Operators.
It removes the usage of 'urlopen' which can be a security
risk and make use of the python 'requests' module instead.
We are also adding 'timeout' parameters to the other Operators
that are using 'requests.get' as failure to do so can cause
the Operator(s) to hang indefinitely. The default time out
has been set to 30 seconds. It is noted that nearly all production
code should use this parameter in nearly all requests.
Change-Id: I1205aab38ff120cd239c236dc9bdffd1660c9afb
The current logic checks for nodes that started the join process
(based on the snapshot of the environment that was taken by the
operator at that point in time). It will not check the state of
nodes that it is not aware of, i.e. those that it did not capture
initially will not be checked. Hence there is a need to introduce
backoff time as it takes a while before all the nodes start to join
the Cluster.
This is a short term stop gap approach until the Promenade API is ready
for consumption
Change-Id: I2bdf9c970ecb509fe833fd353e6648a97118d79b
This ps removes the last references to Kolla-Toolbox which is not
required for keystone management jobs.
Change-Id: I7ca1b93a2485b8eafdd6a48fc4c26c049f20d9cd
There was a need to make changes to the variable names when we
merged the Airflow and Shipyard charts. This resulted in typo
in variable name in the service-shipyard-ingress.yaml which stops
the service from being deployed.
This P.S. is meant to correct that behavior
We also need the public endpoint for shipyard to be on port 80
so that the CLI will work properly with the Ingress Controller
Change-Id: I0483e8ab9e3eb7839149413311abb8c1475f59fa
This patch set removes the shipyard config, policy and paste.ini
template from the existing Shipyard Helm Chart. This is done to
align with the current approach in OpenStack Helm.
1) Remove shipyard config template
2) Remove shipyard policy.yaml template
3) Remove shipyard api-paste.ini template
4) Update related template files
There has also been a recent change to the Helm Toolkit which will
break the current implementation of the Shipyard Chart
The changes in Helm Toolkit were made to the 'images' definition
in values.yaml to facilitate adding the option to prefix image
name etc
This P.S. will also update the Shipyard Chart to align with the
recent changes in Helm Toolkit
Change-Id: Ie79fd9da2c9a577027dd0dddbcca6b7f7b3b4f6f
This Patch Set is meant to expose the 'query_interval' and
'task_timeout' parameters for Drydock tasks in Shipyard.
This will allow us to specify the values for a particular
site.
The corresponding changes for the Helm Chart are included in
this Patch Set as well.
It is also noted that the task has been updated to 'prepare_nodes'
and 'deploy_nodes' instead.
Task State can either be 'completed' or 'terminated'. These new
changes have been captured in this Patch Set as well.
Change-Id: I1b446f7bcf493bc8e5bbfdba842158797f0e3594