zuul/doc/source/merger.rst
Antoine Musso d9ac4c18b1 Zuul references cleaner
Zuul mergers create a vast number of git references under /refs/zuul
which are never garbage collected.

With hundred of thousands of references, that makes git fetch operations
very slow since git uploads all references to Gerrit to synchronize the
Zuul maintained repository.  On one of Wikimedia busy repository
(mediawiki/core) we had 55000 such references and it can take up to 18
seconds for a fetch to complete.  I have seen occurences of a merge
taking 2 minutes to complete.

As such, this tiny script clears out references for which the commit date
of the pointed commit object is older than 360 days (the default).

It is not perfect since a recent reference can well point to an old
object.  That would be the case on repositories that are barely active.
In such case the ref will be gone despite it being recently created.

A better way would be to vary Zuul references by using month/day which
will let one easily garbage collect them.  But I am being lazy and that
would not let us clear out references using the current scheme.

Example usage:

 zuul-clear-refs.py --verbose --dry-run --until 90 /srv/zuul/git/project

Would show a list of references pointing to commit dates older than 90
days and output a message whenever the script would delete them.

Hint about the utility in our merger documentation.

Reference:
 https://phabricator.wikimedia.org/T70481

Change-Id: Id4e55f5d571ebd5e8271e516f53f8e05c1f78c1a
2015-07-20 18:57:04 +02:00

3.1 KiB

title

Merger

Merger

The Zuul Merger is a separate component which communicates with the main Zuul server via Gearman. Its purpose is to speculatively merge the changes for Zuul in preparation for testing. The resulting git commits also must be served to the test workers, and the server(s) running the Zuul Merger are expected to do this as well. Because both of these tasks are resource intensive, any number of Zuul Mergers can be run in parallel on distinct hosts.

Configuration

The Zuul Merger can read the same zuul.conf file as the main Zuul server and requires the gearman, gerrit, merger, and zuul sections (indicated fields only). Be sure the zuul_url is set appropriately on each host that runs a zuul-merger.

Zuul References

As the DependentPipelineManager may combine several changes together for testing when performing speculative execution, determining exactly how the workspace should be set up when running a Job can be complex. To alleviate this problem, Zuul performs merges itself, merging or cherry-picking changes as required and identifies the result with a Git reference of the form refs/zuul/<branch>/Z<random sha1>. Preparing the workspace is then a simple matter of fetching that ref and checking it out. The parameters that provide this information are described in launchers.

These references need to be made available via a Git repository that is available to workers (such as Jenkins). This is accomplished by serving Zuul's Git repositories directly.

Serving Zuul Git Repos

Zuul maintains its own copies of any needed Git repositories in the directory specified by git_dir in the merger section of zuul.conf (by default, /var/lib/zuul/git). To directly serve Zuul's Git repositories in order to provide Zuul refs for workers, you can configure Apache to do so using the following directives:

SetEnv GIT_PROJECT_ROOT /var/lib/zuul/git
SetEnv GIT_HTTP_EXPORT_ALL

AliasMatch ^/p/(.*/objects/[0-9a-f]{2}/[0-9a-f]{38})$ /var/lib/zuul/git/$1
AliasMatch ^/p/(.*/objects/pack/pack-[0-9a-f]{40}.(pack|idx))$ /var/lib/zuul/git/$1
ScriptAlias /p/ /usr/lib/git-core/git-http-backend/

Note that Zuul's Git repositories are not bare, which means they have a working tree, and are not suitable for public consumption (for instance, a clone will produce a repository in an unpredictable state depending on what the state of Zuul's repository is when the clone happens). They are, however, suitable for automated systems that respond to Zuul triggers.

Clearing old references

The references created under refs/zuul are not garbage collected. Since git fetch send them all to Gerrit to sync the repositories, the time spent on merge will slightly grow overtime and start being noticeable.

To clean them you can use the tools/zuul-clear-refs.py script on each repositories. It will delete Zuul references that point to commits for which the commit date is older than a given amount of days (default 360):

./tools/zuul-clear-refs.py /path/to/zuul/git/repo