f8b1d914aa
This is more zuul debugging documentation. Change-Id: I5298f62658cd68f2bd19ec02fb2c1970d855bf84
328 lines
13 KiB
ReStructuredText
328 lines
13 KiB
ReStructuredText
:title: Zuul
|
|
|
|
.. _zuul:
|
|
|
|
Zuul
|
|
####
|
|
|
|
Zuul is a pipeline-oriented project gating system. It facilitates
|
|
running tests and automated tasks in response to Code Review events.
|
|
|
|
At a Glance
|
|
===========
|
|
|
|
:Hosts:
|
|
* https://zuul.opendev.org
|
|
* zuul*.opendev.org
|
|
* ze*.opendev.org
|
|
* zm*.opendev.org
|
|
:Configuration:
|
|
* :config:`zuul/main.yaml`
|
|
* :config:`zuul.d`
|
|
:Projects:
|
|
* https://opendev.org/zuul/zuul
|
|
:Bugs:
|
|
* https://storyboard.openstack.org/#!/project/zuul/zuul
|
|
:Resources:
|
|
* `Zuul Reference Manual`_
|
|
:Chat:
|
|
* ``#zuul:opendev.org`` on Matrix
|
|
|
|
Overview
|
|
========
|
|
|
|
The OpenDev project uses a number of pipelines in Zuul:
|
|
|
|
**check**
|
|
Newly uploaded patchsets enter this pipeline to receive an initial
|
|
+/-1 Verified vote.
|
|
|
|
**gate**
|
|
Changes that have been approved by core reviewers are enqueued in
|
|
order in this pipeline, and if they pass tests, will be merged.
|
|
|
|
**post**
|
|
This pipeline runs jobs that operate after each change is merged.
|
|
|
|
**pre-release**
|
|
This pipeline runs jobs on projects in response to pre-release tags.
|
|
|
|
**release**
|
|
When a commit is tagged as a release, this pipeline runs jobs that
|
|
publish archives and documentation.
|
|
|
|
**silent**
|
|
This pipeline is used for silently testing new jobs.
|
|
|
|
**experimental**
|
|
This pipeline is used for on-demand testing of new jobs.
|
|
|
|
**periodic**
|
|
This pipeline has jobs triggered on a timer for e.g. testing for
|
|
environmental changes daily.
|
|
|
|
**promote**
|
|
This pipeline runs jobs that operate after each change is merged
|
|
in order to promote artifacts generated in the gate
|
|
pipeline.
|
|
|
|
Zuul watches events in Gerrit (using the Gerrit "stream-events"
|
|
command) and matches those events to the pipelines above. If a match
|
|
is found, it adds the change to the pipeline and starts running
|
|
related jobs.
|
|
|
|
The **gate** pipeline uses speculative execution to improve
|
|
throughput. Changes are tested in parallel under the assumption that
|
|
changes ahead in the queue will merge. If they do not, Zuul will
|
|
abort and restart tests without the affected changes. This means that
|
|
many changes may be tested in parallel while continuing to assure that
|
|
each commit is correctly tested.
|
|
|
|
Zuul's current status may be viewed at
|
|
`<https://zuul.opendev.org/>`_.
|
|
|
|
Zuul's configuration is stored in :config:`zuul/main.yaml`. Anyone
|
|
may propose a change to the configuration by editing that file and
|
|
submitting the change to Gerrit for review.
|
|
|
|
For the full syntax of Zuul's configuration file format, see the `Zuul
|
|
reference manual`_.
|
|
|
|
Sysadmin
|
|
========
|
|
|
|
Zuul has three main subsystems:
|
|
|
|
* Zuul Scheduler
|
|
* Zuul Executors
|
|
* Zuul Web
|
|
|
|
that in OpenDev's deployment depend on four 'external' systems:
|
|
|
|
* Nodepool
|
|
* Zookeeper
|
|
* gear
|
|
* MySQL
|
|
|
|
Scheduler
|
|
---------
|
|
|
|
The Zuul Scheduler and gear are all co-located on a single host,
|
|
referred to by the ``zuul.opendev.org`` CNAME in DNS.
|
|
|
|
Zuul is stateless, so the server does not need backing up. However
|
|
Zuul talks through git and ssh so you will need to manually check ssh
|
|
host keys as the zuul user.
|
|
|
|
e.g.::
|
|
|
|
sudo su - zuul
|
|
ssh -p 29418 review.opendev.org
|
|
|
|
The Zuul Scheduler talks to Nodepool using Zookeeper and distributes work to
|
|
the executors using gear.
|
|
|
|
OpenDev's Zuul installation is also configured to write job results into
|
|
a MySQL database via the SQL Reporter plugin. The database for that is a
|
|
Rackspace Cloud DB and is configured in the ``mysql`` entry of the
|
|
``zuul_connection_secrets`` entry for the ``zuul-scheduler`` group.
|
|
|
|
Executors
|
|
---------
|
|
|
|
The Zuul Executors are a horizontally scalable set of servers named
|
|
``ze*.opendev.org``. They perform git merging operations for the scheduler
|
|
and execute Ansible playbooks to actually run jobs.
|
|
|
|
Our jobs are configured to upload as much information as possible along with
|
|
their logs, but if there is an error which can not be diagnosed in that
|
|
manner, logs are available in the ``executor-debug`` log file on
|
|
the executor host. You may use the Zuul build UUID to track
|
|
assignment of a given job from the Zuul scheduler to the Zuul executor
|
|
used by that job.
|
|
|
|
It is safe, although not free, to restart executors. If an executor goes away
|
|
the scheduler will reschedule the jobs it was originally running.
|
|
|
|
Web
|
|
---
|
|
|
|
Zuul Web is a horizontally scalable service. It is currently running colocated
|
|
with the scheduler on ``zuul.opendev.org``. Zuul Web provides live console
|
|
streaming and is the home of various web dashboards such as the status
|
|
page.
|
|
|
|
Zuul Web is stateless so is safe to restart, however restarting it will result
|
|
in a loss of connection for anyone watching a live-stream of a console log
|
|
when the restart happens.
|
|
|
|
Restarting Zuul Services
|
|
------------------------
|
|
|
|
Currently the safest way to restart the Zuul scheduler is to restart all
|
|
services at the same time. The reason for this is that if the scheduler is
|
|
restarted, but executors are not, then the executors and scheduler can get out
|
|
of sync with each other. Note that restarting Zuul Web or a single executor
|
|
should continue to be safe as noted above, but this process should generally
|
|
be preferred.
|
|
|
|
Zuul Scheduler restarts are disruptive, so non-emergency restarts should
|
|
always be scheduled for quieter times of the day, week and cycle. We should
|
|
attempt to be courteous and avoid restarts when project teams are cutting
|
|
releases or have other important changes that are about to land.
|
|
|
|
Since Zuul is stateless, some work needs to be done to save and then
|
|
re-enqueue patches when restarts are done. To accomplish this, start by
|
|
running the ``zuul-changes`` script to save the check and gate queues::
|
|
|
|
root@zuul02# ~root/zuul-changes.py https://zuul.opendev.org >queues-$(date +%Y%m%d).sh
|
|
|
|
The resulting script will be executed when Zuul is up and running again to restore
|
|
the previous queue contents.
|
|
|
|
One other thing to consider before restarting all zuul services is you may
|
|
want to update all of the zuul docker images. This can be useful if restarting
|
|
Zuul to correct a bug that was fixed in the Zuul codebase. To do this run
|
|
the ``zuul_pull.yaml`` playbook from bridge::
|
|
|
|
root@bridge# ansible-playbook -f20 /home/zuul/src/opendev.org/opendev/system-config/playbooks/zuul_pull.yaml
|
|
|
|
Once ready to restart all Zuul services you will want to run the
|
|
``zuul_restart.yaml`` playbook from bridge to do this::
|
|
|
|
root@bridge# ansible-playbook -f20 /home/zuul/src/opendev.org/opendev/system-config/playbooks/zuul_restart.yaml
|
|
|
|
Once this playbook is done running, the services will have been restarted, but
|
|
the Zuul system still needs to load its configs before it is ready to do work.
|
|
The `root <https://zuul.opendev.org/>`_ of the Zuul dashboard will show you
|
|
loaded tenants. Once all tenants show up on this page, it is safe to proceed
|
|
with re-enqueing changes to pipelines with the script we generated earlier.
|
|
Note that the OpenStack tenant takes the most time. If you wait for it to
|
|
show up in the dashboard you should be ready to go. You can double check
|
|
this by loading the OpenStack Zuul `status
|
|
<https://zuul.opendev.org/t/openstack/status>`_ and ensuring it doesn't report
|
|
an error.
|
|
|
|
To re-enqueue, execute the previously generated script::
|
|
|
|
root@zuul# bash queues-$(date +%Y%m%d).sh
|
|
|
|
When this has completed you are done with the Zuul restart. Please log
|
|
the restart and any Zuul version update with statusbot in IRC.
|
|
|
|
Secrets
|
|
-------
|
|
|
|
In some cases it may be warranted to compare the decrypted plaintext of
|
|
a secret from job configuration against a reference value while
|
|
troubleshooting, since random padding means encrypting the same
|
|
plaintext a second time will result in wholly different ciphertext. In
|
|
order to avoid unintentional disclosure this should only be done when
|
|
absolutely necessary, but it's possible to decrypt a secret locally on
|
|
the scheduler server. The first step is extracting the key data from
|
|
our daily key backups::
|
|
|
|
root@zuul# jq --raw-output '.keys."/keystorage/gerrit/opendev/opendev%2Fsystem-config".keys[0].private_key' /var/lib/zuul/zuul-keys-backup.json
|
|
|
|
The name between the double quotes is the path to the project's keys in
|
|
ZooKeeper. To construct this you will need to know the Zuul connection name
|
|
and full project name. The connection name in the example above is 'gerrit',
|
|
replace it with the appropriate connection name for the project you are looking
|
|
at. Next is the unique project name. In the example above we start with
|
|
``opendev/system-config`` and split it on ``/``. Everything before the first ``/``
|
|
is the next component of our name in this case, ``opendev``. Then we take the
|
|
entire name ``opendev/system-config`` and URL encode it to get
|
|
``opendev%2Fsystem-config`` which becomes our last component.
|
|
|
|
Save the output of this jq command to a file ``secret.pem``. Then extract the
|
|
secret ciphertext from the job configuration to remove surrounding
|
|
YAML (there is no need to recombine split lines) and run the following
|
|
command to decrypt::
|
|
|
|
cat ciphertext.txt | sed 's/^ *//' | base64 -d | sudo openssl rsautl -decrypt -oaep -inkey \
|
|
secret.pem
|
|
|
|
Debugging Problems
|
|
------------------
|
|
|
|
Occasionally you'll have a job enter an error state or an entire change that
|
|
appears to be stuck in a Zuul pipeline. Debugging these problems can be a bit
|
|
daunting to start as Zuul's logs are quite verbose. The good news is that once
|
|
you learn a few tricks those verbose logs become quite the powerful tool.
|
|
|
|
Often the best place to start is grepping the Zuul scheduler debug log for
|
|
the pipeline entry identifier (eg change number, tag, or ref sha1)::
|
|
|
|
you@zuul02$ grep 123456 /var/log/zuul/debug.log
|
|
you@zuul02$ grep c6229660cda0af42ecd5afbe7fefdb51136a0436 /var/log/zuul/debug.log
|
|
|
|
In many of these log lines you'll see Zuul event IDs like
|
|
``[e: 1718628fe39643e1bd6a88a9a1477b4f]``. This ID identifies the event that
|
|
triggered Zuul to take action for these changes and is logged through all
|
|
the Zuul services. It can be very powerful to do a grep on this event ID and
|
|
trace through the actions that the scheduler took for this event::
|
|
|
|
you@zuul02$ grep 1718628fe39643e1bd6a88a9a1477b4f /var/log/zuul/debug.log
|
|
|
|
This might lead you to look at executor logs where you can use the same
|
|
ID to grep for actions related to this even on the executor::
|
|
|
|
you@ze01$ grep 1718628fe39643e1bd6a88a9a1477b4f /var/log/zuul/executor-debug.log
|
|
|
|
As you trace through the logs related to a change or event ID you can look for
|
|
``ERROR`` or ``Traceback`` messages to try and identify the underlying source of
|
|
the problem. Note that ``Traceback`` messages are not prefixed with the event
|
|
ID which means you'll have to grep with additional context, for example using
|
|
``grep -B20 -A20``.
|
|
|
|
Another useful debugging tool is Zuul's SIGUSR2 handler. This signal handler
|
|
produces a thread dump in the debug log and toggles the yappi python profiler.
|
|
Each Zuul service supports the signal handler and it can be triggered via::
|
|
|
|
you@zuul02$ sudo kill -USR2 $ZUUL_PID
|
|
|
|
To determine ``$ZUUL_PID`` you can run ``ps`` against the ``zuul-*`` service
|
|
that you are interested in getting information from. For example::
|
|
|
|
you@zuul02$ ps -ef | grep zuul-scheduler
|
|
zuuld 1893030 1893010 0 08:33 ? 00:00:00 /usr/bin/dumb-init -- /usr/local/bin/zuul-scheduler -f
|
|
zuuld 1893052 1893030 69 08:33 ? 07:57:42 /usr/local/bin/python /usr/local/bin/zuul-scheduler -f
|
|
zuuld 1893198 1893052 0 08:33 ? 00:03:22 /usr/local/bin/python /usr/local/bin/zuul-scheduler -f
|
|
|
|
All of the zuul services are run under ``dumb-init``. The process to send
|
|
SIGUSR2 to is the child of the ``dumb-init`` process. In the example above
|
|
``$ZUUL_PID`` would be ``1893052``.
|
|
|
|
The first time you run it you will turn on the yappi profiler. This profiler
|
|
does incur a runtime cost which can significantly slow down Zuul's processing
|
|
of pipelines. Be sure to resend the signal once you have let Zuul run long
|
|
enough to collect a representative set of profiler data. In most cases a minute
|
|
or two should be sufficient. Slow memory leaks may require hours, but running
|
|
Zuul under yappi for hours isn't practical.
|
|
|
|
.. _zuul_github_projects:
|
|
|
|
GitHub Projects
|
|
===============
|
|
|
|
OpenStack does not use GitHub for development purposes, but there are some
|
|
non-OpenStack projects in the broader ecosystem that we care about who do.
|
|
When we are interested in setting up jobs in Zuul to test the interaction
|
|
between OpenStack projects and those ecosystem projects, we can add the
|
|
OpenDev Zuul GitHub app to those projects, then configure them in Zuul.
|
|
|
|
In order to add the GitHub app to a project, an admin on that project should
|
|
navigate to the `OpenDev Zuul`_ app in the GitHub UI. From there they can
|
|
click "Install", then choose the project or organization they want to install
|
|
the App on.
|
|
|
|
The repository then needs to be added to the ``zuul/main.yaml`` file before Zuul
|
|
can be configured to actually run jobs on it.
|
|
|
|
Information about the configuration of the OpenDev Zuul App itself can be
|
|
found on the :ref:`github` page at :ref:`openstack_zuul_app`.
|
|
|
|
.. _OpenDev Zuul: https://github.com/apps/opendev-zuul
|
|
.. _Zuul Reference Manual: https://zuul-ci.org/docs/zuul
|
|
.. _Zuul Status Page: https://zuul.opendev.org
|