All UCP Operators will inherit from the UcpBaseOperator [0]
This patch set will align the rest of the Operators, i.e. Armada,
Deckhand and Promenade Operators with the UcpBaseOperator
It also updates the name of the shipyard container to be
'shipyard-api' instead of 'shipyard'
[0] https://review.gerrithub.io/#/c/407736/
Change-Id: I516590c492e9bb5554161119dade278d74197374
1) Refactor Drydock Base Operator to make use of the
UCP Base Operator instead
2) Dump logs from Drydock Pods when there are Exceptions
Change-Id: I3fbe03d13b5fc89a503cfb2c3c25751076718554
Introduce a new endpoint to retrieve Airflow logs
- API path:
GET /actions/{action_id}/steps/{step_id}/logs?try=2
Change-Id: I6a16cdab148a8a7a9f1bc5fb98a18bce1406cf9f
Adds the appropriate labels to the ks-user and ks-service jobs
to ensure they can be referenced for deletion.
Change-Id: I56d6f67d37e7293f596193a8bf7311e82cac3e7f
- Update the shipyard chart to leverage the HTK routine
for producing the Ingress manifests to be compatible
with Ingress public endpoints.
Change-Id: I864d0e787cd4cd1c3099894b27d22835b2177b7a
This patch set updates the kubernetes-entrypoint image inline with
the chart used in OpenStack-Helm in [0]. This allows the chart to
use pod dependencies.
[0] https://review.openstack.org/#/c/554268/
Change-Id: I5a8bd741a2c7c58b5f110d827872a630953c9ae7
Checks on Shipyard/Airflow chart show that we are missing the
resource limits for ks_service job.
This patch set will add the resource limits and will also update
indentation for 'test-airflow-api' and 'test-shipyard-api'.
Change-Id: I0a3f11bb9cbb45a9c8994dbc226c080914a86a1c
This patch set is meant to create a workflow that will allow us
to upgrade the airflow worker without causing disruption to the
current running workflow.
Note that we will set the update strategy for airflow worker
to 'OnDelete'. The 'OnDelete' update strategy implements the legacy
(1.6 and prior) behavior. When we select this update strategy, the
statefulSet controller will not automatically update Pods when a
modification is made to the StatefulSet’s '.spec.template field'.
This strategy can be selected by setting the '.spec.template.updateStrategy.type'
to 'OnDelete'. Refer to [0] for more information.
[0] https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset
Change-Id: I1f6c3564b7fba6abe422b86e36818eb2cd3454ea
This patch set does the following to enhance health/status checks
on the shipyard-api pod:
1) Add Liveness Probe
2) Update Readiness Probe
Change-Id: Ifab63a8724f29fb38124f43d475bb022807a4cce
Also covers [383892] Add helm test to Airflow
Provides basic tests to run as helm test during deployment
of Shipyard/Airflow.
Change-Id: Icc4012f38b6162adf175702dd7f50de46dbfbe47
We are seeing the following error [0] in the Airflow
Web GUI which prevents user from reading the workflow
logs from the GUI.
This is happening as the Airflow Web Pod is not able
to directly access the volume of the Airflow Worker
Pod.
This patch set will remove the parameters that are
causing this behavior and revert back to the default
system configuration which was shown to be working
properly in our local test environment.
[0] Error Message
Task log handler task does not support read logs.
Change-Id: I71cc9ebd5f6571b486af4d77dbd89f234e8dd3b3
We need a side car container to perform log rotation
on the log files. Logs shall be retained for 30 days.
This is the default setting and can be changed by updating
values.yaml
Also cleaned up README.md
Change-Id: I39a7797e96abd349160d753f8917f7f78f7d8797
This patch set removes the (pre)generated config ini file from
airflow. The configuration will now be pulled directly from
values.yaml which will be inline with OpenStack-Helm's approach.
This will do away with the need to maintain the verbose .conf.tpl
in the repository as mentioned by Tin in his comments for [0].
[0] https://review.gerrithub.io/#/c/400925/
Change-Id: I5a9766e52536ac9b143b397faa3563e69dfb6bf3
The current settings in Airflow is different from the recommended
one in [0]
This patch set is meant to align with the recommended configurations
Note also that due to issue reported in [1], we are keeping the
variable 'celery_result_backend' for now and will remove it when
we upgrade airflow to Airflow v1.9.1
[0] http://docs.celeryproject.org/en/latest/userguide/configuration.html
[1] https://github.com/puckel/docker-airflow/issues/156
Change-Id: Ibead7c2ca76a984c09327579aedade036b959ab2
The dag will be turned off if 'dags_are_paused_at_creation' is
set to "True". This variable should be set be set to "False" so
that we can execute the workflow.
Change-Id: Ib9f7d20d2181861d31ad8a22c83ba3481de35eef
There is a need to make the airflow worker a stateful set
so that the name of the pod will be consistent. This will
allow us to properly extract and correlate logs in the
database.
We are also adding pvc for the airflow worker pods so that
the logs persist.
Change-Id: I79917aa02b38672cac13d6148c4ed44007a78d32
Updates the db init job for Shipyard to use the DB admin user,
connect to the airflow db, and grant the privileges. This changes
from trying to connect as the 'airflow' user and the admin user password
Change-Id: Ib3dbac2b81129b0a849781175fcce4593df639df
There is a need to make the proxy-read-timeout configurable
so that we can alter the value to handle request that takes
more than a minute (default timeout) to process
Also increase http-timeout for uwsgi to 600 seconds
Change-Id: I25dabc648822252a7918d6272c78fb8ebc236b6c
As the size of the YAMLs increases, the amount of time needed
to process the request increased as well. Hence there is a need
to make 'timeout' configurable for the deckhand client.
Change-Id: Iab91091cd8b9a900ad0daeac22e435d4e5c9c97d
- Support configured Postgres admin password
- Use secrets for database job environment setup
This patch set also updates a bunch of banners
Change-Id: I238cfd123b5aad31c9cb93864cff7641f719f3df
Updates to entry.sh to allow for multi-workers/threads
Updates Shipyard chart to allow parametrs to be configurable
Change-Id: I6ad9d198ac4df4c7c85dfcf5c04afd3c7966f0f0
We are getting the following errors [0] after merging [1] as
we will need to use the shipyard image to execute db-sync.
This p.s. updates the default value for shipyard-db-sync
[0] shipyard-db-sync pod went into CrashLoopBackOff
root@labinstance:~# kubectl logs -f shipyard-db-sync-g7mdn -n ucp
+ upgrade_db
/tmp/shipyard-db-sync.sh: line 7: upgrade_db: command not found
root@labinstance:~#
[1] https://review.gerrithub.io/#/c/395502/
Change-Id: I4a8445ae9431121754b84f42e98192af36335487
Removes the database upgrade from start shipyard and
instead adds it as an entrypoint, so the database upgrade
is only done once.
Change-Id: I8c087af58aa46051d0d1c47ba5f35e5e86c1acdc
This patch set updates the required dags and operators
for the redeploy server workflow. It also introduces the
Promenade Operator.
Note that many of the required functionalities in DryDock
and Promenade are being worked on and are not ready at the
moment. As such, this patch set is mainly providing the
skeleton framework for the redeploy server workflow. The
dags and relevant Operators will be updated at a later date
when the features and functionalities are ready for usage.
Change-Id: I4baae76ea9d8cde9c2b0bab3feac896d01400868
We are getting the following errors [0] while getting
Airflow worker to execute a health check on the underlying
K8s cluster.
This patch set is meant to grant watch/get/list pods rights
to the airflow worker so that it can perform health checks
on the K8s cluster.
[0] Error messages:
[2018-01-23 02:51:32,003] {base_task_runner.py:98} INFO - Subtask: HTTP response body:
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure",
"message":"pods is forbidden: User \"system:serviceaccount:ucp:airflow-worker\"
cannot list pods at the cluster scope","reason":"Forbidden","details":{"kind":"pods"},"code":403}
Change-Id: Iede29f605b5d508d0e58c0c2ae74d7d040d5b8ea
The following errors [0] were encountered during our end-to-end
testing. This is a result of extended execution of the workflow
that led to expiration of the keystone token.
It is also possible for the 'prepare_site' task to take more than
120 seconds to complete. Hence we are increasing the time out for
the 'prepare_site_task_timeout' variable to 300 seconds.
This P.S. addresses the above 2 observations
[0] Logs from DryDock
Authorization failed for token
Identity response: {"error": {"message": "Failed to validate token", "code": 404, "title": "Not Found"}}
Authorization failed for token
Change-Id: I4760e390822e6e8c9540216035e263d054fde400
There has been significant changes to the Shipyard code base
since the last major update to the UCP Health Check Operator.
This patch set is meant to align its implementation with the
rest of the Operators.
It removes the usage of 'urlopen' which can be a security
risk and make use of the python 'requests' module instead.
We are also adding 'timeout' parameters to the other Operators
that are using 'requests.get' as failure to do so can cause
the Operator(s) to hang indefinitely. The default time out
has been set to 30 seconds. It is noted that nearly all production
code should use this parameter in nearly all requests.
Change-Id: I1205aab38ff120cd239c236dc9bdffd1660c9afb
The current logic checks for nodes that started the join process
(based on the snapshot of the environment that was taken by the
operator at that point in time). It will not check the state of
nodes that it is not aware of, i.e. those that it did not capture
initially will not be checked. Hence there is a need to introduce
backoff time as it takes a while before all the nodes start to join
the Cluster.
This is a short term stop gap approach until the Promenade API is ready
for consumption
Change-Id: I2bdf9c970ecb509fe833fd353e6648a97118d79b
This ps removes the last references to Kolla-Toolbox which is not
required for keystone management jobs.
Change-Id: I7ca1b93a2485b8eafdd6a48fc4c26c049f20d9cd
There was a need to make changes to the variable names when we
merged the Airflow and Shipyard charts. This resulted in typo
in variable name in the service-shipyard-ingress.yaml which stops
the service from being deployed.
This P.S. is meant to correct that behavior
We also need the public endpoint for shipyard to be on port 80
so that the CLI will work properly with the Ingress Controller
Change-Id: I0483e8ab9e3eb7839149413311abb8c1475f59fa
This patch set removes the shipyard config, policy and paste.ini
template from the existing Shipyard Helm Chart. This is done to
align with the current approach in OpenStack Helm.
1) Remove shipyard config template
2) Remove shipyard policy.yaml template
3) Remove shipyard api-paste.ini template
4) Update related template files
There has also been a recent change to the Helm Toolkit which will
break the current implementation of the Shipyard Chart
The changes in Helm Toolkit were made to the 'images' definition
in values.yaml to facilitate adding the option to prefix image
name etc
This P.S. will also update the Shipyard Chart to align with the
recent changes in Helm Toolkit
Change-Id: Ie79fd9da2c9a577027dd0dddbcca6b7f7b3b4f6f
This Patch Set is meant to expose the 'query_interval' and
'task_timeout' parameters for Drydock tasks in Shipyard.
This will allow us to specify the values for a particular
site.
The corresponding changes for the Helm Chart are included in
this Patch Set as well.
It is also noted that the task has been updated to 'prepare_nodes'
and 'deploy_nodes' instead.
Task State can either be 'completed' or 'terminated'. These new
changes have been captured in this Patch Set as well.
Change-Id: I1b446f7bcf493bc8e5bbfdba842158797f0e3594