The Gatekeeper, or a project gating system

Go to file

James E. Blair df37ad2ce7 Executor: Don't start too many jobs at once The metrics that we use to govern load on the executors are all trailing indicators. The executors are capable of accepting a large number of jobs in a batch and then, only after they begin to run, will the load indicators increase. To avoid the thundering herd problem, reduce the rate at which we accept jobs past a certain point. That point is twice the number of jobs as the target load average. In practice that seems to be a fairly conservative but reasonable number of jobs for the executor to run, so, to facilitate a quick start, allow the executor to start up to that number all at once. Once the number of jobs running is beyond that number, subsequent jobs will only be accepted one at a time, after each one completes its startup phase (cloning repos, establishing ansible connections), which is to say, at the point where the job begins running its first pre-playbook. We will also wait until the next regular interval of the governor to accept the next job. That's currently 30 seconds, but to make the system a bit more responsive, it's lowered to 10 seconds in this change. To summarize: after a bunch[1] of jobs are running, after each new job, we wait until that job has started running playbooks, plus up to an additional 10 seconds, before accepting a new job. This is implemented by adding a 'starting jobs' metric to the governor so that we register or de-register the execute function based on whether too many jobs are in the startup phase. We add a forced call to the governor routine after each job starts so that we can unregister if necessary before picking up the next job, and wrap that routine in a lock since it is now called from multiple threads and its logic may not be entirely thread-safe. Also, add tests for all three inputs to manageLoad. [1] 2*target load average Change-Id: I066bc539e70eb475ca2b871fb90644264d8d5bf4		2018-02-02 11:36:49 -08:00
doc	Executor: Don't start too many jobs at once	2018-02-02 11:36:49 -08:00
etc	Remove large status header and tagline	2017-12-08 18:20:32 -06:00
playbooks	Changes for Ansible 2.4	2017-11-30 09:56:26 -05:00
tests	Executor: Don't start too many jobs at once	2018-02-02 11:36:49 -08:00
tools	Merge "Enable direct use of github driver in debug tool" into feature/zuulv3	2018-01-17 00:36:45 +00:00
zuul	Executor: Don't start too many jobs at once	2018-02-02 11:36:49 -08:00
.gitignore	Ignore .mypy_cache	2017-07-28 10:26:00 +02:00
.gitreview	Updated .gitreview location	2012-12-16 20:34:13 +00:00
.mailmap	Fix pep8 E127 violations	2012-09-26 14:23:10 +00:00
.testr.conf	Tests: store debug logs on error	2017-02-06 10:10:48 -08:00
.zuul.yaml	Update docs to use sphinx-build	2018-01-24 07:30:46 -06:00
LICENSE	Initial commit.	2012-05-29 14:49:32 -07:00
MANIFEST.in	Migrate to pbr.	2013-06-25 19:04:30 +00:00
NEWS.rst	Case sensitive label matching	2017-09-15 14:38:44 -07:00
README.rst	Remove feature/zuulv3 references from README	2018-01-18 13:05:44 -08:00
TESTING.rst	Rename zuul-launcher to zuul-executor	2017-03-15 12:21:24 -04:00
bindep.txt	Fix docs building	2017-12-18 22:21:59 +01:00
requirements.txt	Add memory awareness to system load governor	2018-01-31 06:54:03 +00:00
setup.cfg	Add finger gateway	2017-12-13 10:07:37 -05:00
setup.py	Partial sync with OpenStack requirements.	2013-09-25 15:30:37 -07:00
test-requirements.txt	Remove pep8 and pyflakes from test-requirements	2018-01-23 10:11:57 -06:00
tox.ini	Update docs to use sphinx-build	2018-01-24 07:30:46 -06:00

README.rst

Zuul

Zuul is a project gating system developed for the OpenStack Project.

We are currently engaged in a significant development effort in preparation for the third major version of Zuul. We call this effort Zuul v3 and it is described in more detail below.

The latest documentation for Zuul v3 is published at: https://docs.openstack.org/infra/zuul/

If you are looking for the Edge routing service named Zuul that is related to Netflix, it can be found here: https://github.com/Netflix/zuul

If you are looking for the Javascript testing tool named Zuul, it can be found here: https://github.com/defunctzombie/zuul

Contributing

We are currently engaged in a significant development effort in preparation for the third major version of Zuul. We call this effort Zuul v3.

To browse the latest code, see: https://git.openstack.org/cgit/openstack-infra/zuul/tree/ To clone the latest code, use git clone git://git.openstack.org/openstack-infra/zuul

Bugs are handled at: https://storyboard.openstack.org/#!/project/679

Code reviews are, as you might expect, handled by gerrit at https://review.openstack.org

Use git review to submit patches (after creating a Gerrit account that links to your launchpad account). Example:

# Do your commits
$ git review
# Enter your username if prompted

Zuul v3

The Zuul v3 effort involves significant changes to Zuul, and its companion program, Nodepool. The intent is for Zuul to become more generally useful outside of the OpenStack community. This is the best way to get started with this effort:

Read the Zuul v3 spec: http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html

We use specification documents like this to describe large efforts where we want to make sure that all the participants are in agreement about what will happen and generally how before starting development. These specs should contain enough information for people to evaluate the proposal generally, and sometimes include specific details that need to be agreed upon in advance. They are living documents which can change as work gets underway. However, every change or detail does not need to be reflected in the spec --most work is simply done with patches (and revised if necessary in code review).
Read the Nodepool build-workers spec: http://specs.openstack.org/openstack-infra/infra-specs/specs/nodepool-zookeeper-workers.html
Review any proposed updates to these specs: https://review.openstack.org/#/q/status:open+project:openstack-infra/infra-specs+topic:zuulv3

Some of the information in the specs may be effectively superceded by changes here, which are still undergoing review.
Read developer documentation on the internal data model and testing: http://docs.openstack.org/infra/zuul/developer.html

The general philosophy for Zuul tests is to perform functional testing of either the individual component or the entire end-to-end system with external systems (such as Gerrit) replaced with fakes. Before adding additional unit tests with a narrower focus, consider whether they add value to this system or are merely duplicative of functional tests.
Review open changes: https://review.openstack.org/#/q/status:open

We find that the most valuable code reviews are ones that spot problems with the proposed change, or raise questions about how that might affect other systems or subsequent work. It is also a great way to stay involved as a team in work performed by others (for instance, by observing and asking questions about development while it is in progress). We try not to sweat the small things and don't worry too much about style suggestions or other nitpicky things (unless they are relevant -- for instance, a -1 vote on a change that introduces a yaml change out of character with existing conventions is useful because it makes the system more user-friendly; a -1 vote on a change which uses a sub-optimal line breaking strategy is probably not the best use of anyone's time).
Join #zuul on Freenode. Let others (especially jeblair who is trying to coordinate and prioritize work) know what you would like to work on.
Check storyboard for status of current work items: https://storyboard.openstack.org/#!/board/41

Work items tagged with low-hanging-fruit are tasks that have been identified as not requiring an expansive knowledge of the system. They may still require either some knowledge or investigation into a specific area, but should be suitable for a developer who is becoming acquainted with the system. Those items can be found at: https://storyboard.openstack.org/#!/story/list?tags=low-hanging-fruit&tags=zuulv3

Once you are up to speed on those items, it will be helpful to know the following:

Zuul v3 includes some substantial changes to Zuul, and in order to implement them quickly and simultaneously, we temporarily disabled most of the test suite. That test suite still has relevance, but tests are likely to need updating individually, with reasons ranging from something simple such as a test-framework method changing its name, to more substantial issues, such as a feature being removed as part of the v3 work. Each test will need to be evaluated individually. Feel free to, at any time, claim a test name in this story and work on re-enabling it: https://storyboard.openstack.org/#!/story/2000773
Because of the importance of external systems, as well as the number of internal Zuul components, actually running Zuul in a development mode quickly becomes unweildy (imagine uploading changes to Gerrit repeatedly while altering Zuul source code). Instead, the best way to develop with Zuul is in fact to write a functional test. Construct a test to fully simulate the series of events you want to see, then run it in the foreground. For example:
```
.tox/py27/bin/python -m testtools.run tests.unit.test_scheduler.TestScheduler.test_jobs_executed
```
See TESTING.rst for more information.
There are many occasions, when working on sweeping changes to Zuul v3, we left notes for future work items in the code marked with "TODOv3". These represent potentially serious missing functionality or other issues which must be resolved before an initial v3 release (unlike a more conventional TODO note, these really can not be left indefinitely). These present an opportunity to identify work items not otherwise tracked. The names associated with TODO or TODOv3 items do not mean that only that person can address them -- they simply reflect who to ask to explain the item in more detail if it is too cryptic. In your own work, feel free to leave TODOv3 notes if a change would otherwise become too large or unweildy.

Python Version Support

Zuul v3 requires Python 3. It does not support Python 2.

As Ansible is used for the execution of jobs, it's important to note that while Ansible does support Python 3, not all of Ansible's modules do. Zuul currently sets ansible_python_interpreter to python2 so that remote content will be executed with Python2.

Roadmap

Begin using Zuul v3 to run jobs for Zuul itself
Implement a shim to translate Zuul v2 demand into Nodepool Zookeeper launcher requests
Begin using Zookeeper based Nodepool launchers with Zuul v2.5 in OpenStack Infra
Move OpenStack Infra to use Zuul v3
Implement Github support
Begin using Zuul v3 to run tests on Ansible repos
Implement support in Nodepool for non-OpenStack clouds
Add native container support to Zuul / Nodepool