Commit Graph

105 Commits (f88b81781397bf38c6be1d544378b6d10d711328)

Author SHA1 Message Date
James E. Blair e511d2f6c4 Reorganize connections into drivers
This change, while substantial, is mostly organizational.
Currently, connections, sources, triggers, and reporters are
discrete concepts, and yet are related by virtue of the fact that
the ConnectionRegistry is used to instantiate each of them.  The
method used to instantiate them is called "_getDriver", in
recognition that behind each "trigger", etc., which appears in
the config file, there is a class in the zuul.trigger hierarchy
implementing the driver for that trigger.  Connections also
specify a "driver" in the config file.

In this change, we redefine a "driver" as a single class that
organizes related connections, sources, triggers and reporters.

The connection, source, trigger, and reporter interfaces still
exist.  A driver class is responsible for indicating which of
those interfaces it supports and instantiating them when asked to
do so.

Zuul instantiates a single instance of each driver class it knows
about (currently hardcoded, but in the future, we will be able to
easily ask entrypoints for these).  That instance will be
retained for the life of the Zuul server process.

When Zuul is (re-)configured, it asks the driver instances to
create new connection, source, trigger, reporter instances as
necessary.  For instance, a user may specify a connection that
uses the "gerrit" driver, and the ConnectionRegistry would call
getConnection() on the Gerrit driver instance.

This is done for two reasons: first, it allows us to organize all
of the code related to interfacing with an external system
together.  All of the existing connection, source, trigger, and
reporter classes are moved as follows:

  zuul.connection.FOO -> zuul.driver.FOO.FOOconnection
  zuul.source.FOO -> zuul.driver.FOO.FOOsource
  zuul.trigger.FOO -> zuul.driver.FOO.FOOtrigger
  zuul.reporter.FOO -> zuul.driver.FOO.FOOreporter

For instance, all of the code related to interfacing with Gerrit
is now is zuul.driver.gerrit.

Second, the addition of a single, long-lived object associated
with each of these systems allows us to better support some types
of interfaces.  For instance, the Zuul trigger maintains a list
of events it is required to emit -- this list relates to a tenant
as a whole rather than individual pipelines or triggers.  The
timer trigger maintains a single scheduler instance for all
tenants, but must be able to add or remove cron jobs based on an
individual tenant being reconfigured.  The global driver instance
for each of these can be used to accomplish this.

As a result of using the driver interface to create new
connection, source, trigger and reporter instances, the
connection setup in ConnectionRegistry is much simpler, and can
easily be extended with entrypoints in the future.

The existing tests of connections, sources, triggers, and
reporters which only tested that they could be instantiated and
have names have been removed, as there are functional tests which
cover them.

Change-Id: Ib2f7297d81f7a003de48f799dc1b09e82d4894bc
2017-01-20 05:43:21 -08:00
James E. Blair 6ab79e0637 Handle nodepool allocation failure
When a request is either fulfilled or failed, pass it through to
the scheduler which will accept the request (which means deleting
it in the case of a failure) and pass it on to the pipeline manager
which will set the result of the requesting job to NODE_FAILURE
and cause any sub-jobs to be SKIPPED.

Adjust the request algorithm to only request nodes for jobs that
are ready to run.  The current behavior requests all jobs for a
build set asap, but that has two downsides: it may request and
return nodes more aggressively than necessary (if you have chosen
to create a job tree, you *probably* don't want to tie up nodes
until they are actually needed).  However, that's a grey area,
and we may want to adjust or make that behavior configurable later.
More pressing here is that it makes the logic of when to return
nodes *very* complicated (since SKIPPED jobs are represented by
fake builds, there is no good opportunity to return their nodes).

This seems like a good solution for now, and if we want to make
the node request behavior more aggressive in the future, we can
work out a better model for knowing when to return nodes.

Change-Id: Ideab6eb5794a01d5c2b70cb87d02d61bb3d41cce
2017-01-06 16:47:02 -08:00
James E. Blair e18d460e47 Verify nodes and requests are not leaked
Check that at the end of every test, there are no outstanding
nodepool requests and no locked nodes.

Move final state assertions into the tearDown method so that
they run right after the end of the test but before any
cleanup handlers are called (which can interfere with the
assertion checking by, say, deleting the zookeeper tree we
are trying to check).  Move the cleanup in test_webapp to
tearDown so that it ends the paused job that the tests in
that class use before the assertion check.

Fix some bugs uncovered by this testing:

* Two typos.
* When we re-launch a job, we need a new nodeset, so make sure
  to remove the nodeset from the buildset after the build
  completes if we are going to retry the build.
* Always report build results to the scheduler even for non-current
  buildsets so that it can return used nodes for aborted builds.
* Have the scheduler return the nodeset for a completed build rather
  than the pipeline manager to avoid the edge case where a build
  result is returned after a configuration that removes the pipeline
  (and therefore, there is no longer a manager to return the nodeset).
* When canceling jobs, return nodesets for any jobs which do not yet
  have builds (such as jobs which have nodes but have not yet
  launched).
* Return nodes for skipped jobs.

Normalize the debug messages in nodepool.py.

Change-Id: I32f6807ac95034fc2636993824f4a45ffe7c59d8
2017-01-05 17:28:35 -08:00
James E. Blair ce001e11a8 Remove excess printing from stats test
This isn't necessary and makes the output of this test noisy.

Change-Id: I214d40fada9567ac6b9cee5ff9cdc748a472cbbb
2017-01-05 16:14:18 -08:00
James E. Blair a38c28efa3 Lock nodes when nodepool request is fulfilled
This is continuing work on implementing the Zuul<->Nodepool protocol
from the Zuulv3 spec.

Change-Id: Ic8477e607fd09b85a37f47cbee7da905c017c534
2017-01-04 16:08:43 -08:00
James E. Blair 15be0e1e11 Re-submit node requests on ZooKeeper disconnect
Change-Id: I689bf812c713fa6f5f37958b7001b0d5fb0a254b
2017-01-04 09:11:35 -08:00
James E. Blair 6ac368c57b Add a test printHistory function
I frequently add something like this ad-hoc when debugging a test.
Make it a convenient function that's easy to add when needed, and
also run it at the completion of every test so a developer can
easily survey the logs to see what happened.

Change-Id: I3d3810f51245d92855f086b875edfd52bdd86983
2016-12-22 18:07:20 -08:00
James E. Blair 10fc1eb487 Improve test output
* When a test timeout occurs, output the state debug information at
  error level so that it shows up in all logs.
* Add some more info to that output.
* Further restrict the (often not useful) chatty gear logs by default.

Change-Id: Ib275441172c5b1598593d0931cef0168d02e521d
2016-12-21 16:22:16 -08:00
James E. Blair dce6ceac8e Add FakeNodepool test fixture
Add a fake nodepool that immediately successfully fulfills all
requests, but actually uses the Nodepool ZooKeeper API.

Update the Zuul Nodepool facade to use the Nodepool ZooKeeper API.

Change-Id: If7859f0c6531439c3be38cc6ca6b699b3b5eade2
2016-12-21 14:16:51 -08:00
James E. Blair 498059ba28 Add Zookeeper to tests
Add a requirement on kazoo and add a Zookeeper chroot to the test
infrastructure.

This is based on similar code in Nodepool.

Change-Id: Ic05386aac284c5542721fa3dcb1cd1c8e52d4a1f
2016-12-20 14:14:15 -08:00
Clint Byrum ee0786dd8b Remove now-unused ZuulTestCase.resetGearmanServer
This was used in only one test and it isn't in use anymore and is thus
untested, dead code.

Change-Id: Iff8c235583424a45926f273a88838d908381e237
2016-12-14 11:39:54 -08:00
Clint Byrum 69e4712574 Re-enable TestScheduler.test_rerun_on_error
This test is making sure that when something terrible goes wrong in the
launcher and it just returns 'None', the scheduler retries.

During refactoring of the launcher, some of the logic that handled this
special case of run_error was removed. Also the launcher wasn't properly
handling a return of None and would have failed with a TypeError instead
of sending a failure to the zuul client.

Change-Id: I6b063ba913bf72087d2cc027f08e02304310c2be
Story: 2000773
Task: 3403
2016-12-02 16:43:50 -08:00
Jenkins b4947125f8 Merge "Enable test_post*" into feature/zuulv3 2016-11-30 18:16:19 +00:00
Adam Gandelman c5e4f1d262 Enable test_post*
This requires some updating of how merger data construction gets
handled between Refs and Changes.

Change-Id: Icd81a95565ab137b98d6a8ac52e262487d412534
Story: 2000773
Task: 3389
2016-11-29 15:11:24 -08:00
Joshua Hesketh 3f7def3424 Merge branch 'master' into workingv3
This includes forward-porting changes to launcher/server.py with the
exception of the pre/post playbooks changes which will be done in a
follow up commit as they have deviated.

Change-Id: I13aa229c1460b748745babe178c0a745e52f841c
2016-11-22 11:15:24 +11:00
Clint Byrum 3343e3e7b4 Refactor test_zuul_refs and FakeBuild.hasChanges
In re-enabling test_zuul_refs and refactoring it to use
FakeBuild.hasChanges, a weakness was discovered in hasChanges where it
would not check the repositories of all the changes one is looking for.

After fixing that, the test passes and others should be able to be
refactored in the same way.

Change-Id: Iaf647412d2518c079c8b42ed670919f4e8ca0b63
Story: 2000773
Task: 3296
2016-11-16 00:20:17 -08:00
Paul Belanger 66e9596884 Re-enable test_repo_deleted test
Updated updateConfigLayout() to support zuulv3 syntax.

Story: #2000773

Change-Id: Ifd19604d42d3df90a9154e62c8dfbaee9931eeba
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-11-14 15:31:12 -05:00
Paul Belanger 71d9817406 Add attempts logic for jobs
Today, if a job is aborted, zuul will launch said job until success /
failure.  If the job continues to abort, it will loop forever.  As a
result, we now added the ability to limit this.  By default we'll try
to relaunch an aborted job a total of 3 times, before RETRY_LIMIT is
returned as the result.

Change-Id: Ie26fdc29c07430ebfb3df8be8ac1786d63d7e0fe
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-11-11 18:25:28 -05:00
Paul Belanger 6ab6af7ad2 Re-enable test_queue_precedence test
Update FakeGearmanServer with new gear syntax for job names.

Change-Id: If3c4f8e66fa11d3de81b6acbda2be68c94a2fdad
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-11-09 12:22:37 -05:00
Jenkins 2ddeb8967e Merge "Default test log level to DEBUG except for testr" 2016-10-18 15:39:57 +00:00
James E. Blair 79e94b6bdd Default test log level to DEBUG except for testr
When running under testr, default the test log level to INFO,
but otherwise, default it to DEBUG.  This way when a developer
runs a test in the foreground it logs at DEBUG without any
further configuration needed.

Change-Id: Ie7388ebf25669807a8c430b6908f9d18115b5dc6
2016-10-18 08:29:11 -07:00
Jenkins c5c1da19a4 Merge "Lower the log level in tests" 2016-10-18 15:26:25 +00:00
James E. Blair b536ecc6ea Re-enable test_failed_change_at_head
And "improve" it to use the new build/history assertions.

Add an "ordered" option to assertHistory so that we can assert
everything about the history except that the builds arrived in
the specified order.  In this case, aborted builds don't always
finish in order.

Also add the ordered option to test_failed_changes since it
is subject to the same issue.

Change-Id: I7b3bec798b462568d4c44db8943daaeb27728735
2016-09-02 14:51:38 -07:00
James E. Blair 34776ee412 Pass assigned nodes to the launcher
This fleshes out the nodepool stub a little more.  It passes some
node information to the launcher after node requests have been
fulfilled.  It also corrects some logic errors in the node request
framework.  It moves data structures related to node requests into
the model.  Finally, it adds nodes to the configuration of some
tests to exercise the system, and adds a test to verify the correct
node is supplied on a job that has a branch variant.

Change-Id: I395ce23ae865df3a55436ee92d04e0eae07c963a
2016-08-30 11:21:54 -07:00
James E. Blair 3158e288f0 Improve debug output from tests
When a build or history sequence does not match what is expected,
log the whole sequence for better visibility while debugging.

Change-Id: I23b16d20d14697cb03943c00c66414c00f32b57e
2016-08-19 09:37:15 -07:00
James E. Blair 2b2a8ab67b Re-enable test_failed_changes
Also, improve the test so that it is more deterministic and
add some handy assert methods to make checking the build sequence
easier.

Change-Id: I2993187162b2d0446595315ef144e77d2a4b8360
2016-08-12 09:01:39 -07:00
James E. Blair 173029714a Stub out aborting jobs in ansible launcher
The scheduler tests need to be able to abort jobs.  Even though
our Ansible launcher is not capable of aborting jobs yet, implement
the method so that we can override it in the RecordingLaunchServer
and exercise the scheduler.

Instead of using name+number to specify which build to abort, use
the UUID of the build.

Remove the build number entirely, since it isn't required.  In some
places it was used to determine whether the job had started.  In those
cases, use the build URL instead.

Also correct the launch server used in tests so that it does not send
two initial WORK_DATA packets.

Change-Id: I75525a2b48eb2761d599c039ba3084f09609dfbe
2016-08-12 09:01:38 -07:00
James E. Blair f3156c9154 Tests: add an extra debug line when stalled
When Zuul is stuck, add another debug line indicating the potential
cause.

Change-Id: Ie16d95fb248ae5573e257bdf05514b7b58520f77
2016-08-12 09:01:37 -07:00
James E. Blair a5dba23e77 Fix addFailTest
The addFailTest method needed to be updated to catch up with some
test infrastructure changes.  Rather than consulting the merger's
git repo for the ZUUL_REF to find out if a job which is supposed to
fail is present for a given build, use the new FakeBuild.hasChanges
method to inspect the launcher's prepared git repo.

Also rename the method as 'Job' is more descriptive than 'Test' and
document it.

Change-Id: I3224b8a01d49cfa06b799a8028c1bf0d455d25b1
2016-08-12 08:42:51 -07:00
James E. Blair 8b5408c6e9 Update category mapping in tests
Use the newer form of spelled-out category names in tests since
that's what we're writing in config files now.

Change-Id: Ib679e54b53131280956cbda20b84f5602a4953c8
2016-08-10 13:06:40 -07:00
James E. Blair 7fc8daa372 Stop sharing Gerrit event queues in tests
When connections are set up in tests, multiple Gerrit connections which
are configured to connect to the same fake Gerrit server share a change
database so that changes sent to Gerrit via one connection are reflected
back to Zuul on another.  They also share an event queue so that events
injected on one are seen by another.

Unfortunately, that part doesn't work, and in fact, events are only seen
by one of the gerrit connections.  This happens to work since it doesn't
matter which gerrit connection receives an event, which is why we haven't
noticed the problem in tests.

Where we do see the problem in Zuulv3 is in shutting down the connections.
When a GerritConnection is stopped, a sentinal object (None) is added to
the event queue.  When the GerritConnection gets an event from the queue,
it first checks whether it has stopped before processing that event.
Because in tests (but not in production) multiple GerritConnections share
an event queue, the connection that adds the None object to the queue
may not be the one that receives it, which causes the test to raise an
exception and not stop correctly.

We did not notice this in v2 because the order in which the Queue.Queue
class decides to awaken a thread is deterministic enough that the thread
which submitted the sentinel was always the one that received it.  In
v3, the thread order is sufficiently different that the thread for the
*other* connection is reliably the one which receives it.

To correct this, stop using a shared queue between the differing
GerritConnection objects, and instead add a helper method to the testcase
class which will add an event to every connection for a given server.

Change-Id: Idd3238f5ab8f5e09e295c0fa028e140c089a2a3f
2016-08-10 09:11:44 -07:00
James E. Blair e7b99a0baa Add some documentation on functional testing tools
Change-Id: I4c694ba8da0ece7d8e94921edc8ff7b46242f705
2016-08-08 11:16:11 -07:00
James E. Blair 48d9a22ded Tests: remove FakeWorker
This class is no longer used.

Change-Id: Id7c4c8af48d13abd99ca1975ff1dfed50096f33f
2016-08-05 13:04:11 -07:00
James E. Blair ab7132bb62 Tests: ensure fake builds recorded in order
Alter the recording ansible launcher used in tests to record the
build in the launch method which is synchronous, rather than in
the runAnsible method which is run from inside of a thread started
by the launch method.  This way, all builds that show up in the
running_builds attribute appear in the order they arrived from
gearman (which is the order in which they were launched).

Change-Id: I11d8c686f738b51797f9ac9cee0ac201800de383
2016-08-05 13:00:34 -07:00
James E. Blair 962220f87f Tests: Improve support in fake builds
This change makes two improvments to FakeBuild:

1) Restore support for examining commits present in a build

Since the merge for a commit is now performed on the launch server,
the previous method of examining the git repo state on the merger
no longer works.  Move the method that performs this function to
the FakeBuild and give the FakeBuild the context of the launch
server's JobDir.  Then it can inspect the repo states as created
in the JobDir on the launch server.  These are the repos that will
be pushed onto the test node.

The ZUUL_COMMIT env variable is no longer revelant, so remove that
from the check.

2) Restore support for releasing a held build.

Change-Id: I654a269d37c0bc323ed73afa68a73ddd558be7e2
2016-08-05 13:00:34 -07:00
James E. Blair e1767bc263 Use RecordingLaunchServer to run all tests
Instead of having an entirely fake launch server and an entirely
real one, use the real launch server for everything, but add an
option to not actually execute ansible.  This will exercise most
of the code in the launcher, remove unecessary fakes, and still
maintain the speed benefit of not running ansible for every test
of scheduler behavior.

Some tests are still run with the launcher actually running ansible,
and that facility will continue to be available as we create tests
that validate actual ansible behavior.

Change-Id: Ie0fbba2b786a5aeb1c603597af30fcd728a8cec8
2016-08-02 16:52:35 -07:00
James E. Blair 43746ffe51 Remove build descriptions
These (the HTML formatted information we would display on Jenkins
about related builds) are no longer relevant in v3.

Change-Id: Id1c3ff353308f2732d223f63dec5fb743029ec2c
2016-08-02 16:49:01 -07:00
James E. Blair 3f876d5e71 Create AnsibleZuulTestCase
This test case base class maintains the current test case base
class behavior in v3, which is to run the actual ansible launcher.
However, that's not needed for all of the scheduler tests, some
of which want to have complete control of jobs in a way which may
be difficult in ansible.  Create this new test case base class so
that tests where we know we want to exercise ansible are easy, but
if we don't need it, we get the Zuul v2 behavior of a fake worker.

Change-Id: I836fec935979d90eb0eb3ca765c87bc8300920aa
2016-07-25 15:26:49 -07:00
James E. Blair 8b1dc3fb22 Add dynamic reconfiguration
If a change alters .zuul.yaml in a repo that is permitted to use in-repo
configuration, create a shadow configuration layout specifically for that
and any following changes with the new configuration in place.

Such configuration changes extend only to altering jobs and job trees.
More substantial changes such as altering pipelines will be ignored.  This
only applies to "project" repos (ie, the repositories under test which may
incidentally have .zuul.yaml files) rather than "config" repos (repositories
specifically designed to hold Zuul configuration in zuul.yaml files).  This
is to avoid the situation where a user might propose a change to a config
repository (and Zuul would therefore run) that would perform actions that
the gatekeepers of that repository would not normally permit.

This change also corrects an issue with job inheritance in that the Job
instances attached to the project pipeline job trees (ie, those that
represent the job as invoked in the specific pipeline configuration for
a project) were inheriting attributes at configuration time rather than
when job trees are frozen when a change is enqueued.  This could mean that
they would inherit attributes from the wrong variant of a job.

Change-Id: If3cd47094e6c6914abf0ffaeca45997c132b8e32
2016-07-18 09:58:19 -07:00
James E. Blair 8d692398f1 Add nodepool request framework
This does not actually talk to nodepool, but this adds the nodepool
request flow to the pipeline managers, and establishes a nodepool
class for zuul to interact with nodepool directly.

Change-Id: I41c4d8f86e140786d590698f1a0048c0011382dd
2016-07-18 09:55:29 -07:00
Morgan Fainberg 78c301afed Do not contest locks in PythonGit
The config writer should be
used in a context manager form to ensure locks are shared and
the data is properly written out. In python 2 it looks as if
the locks were not properly handled and multiple config
writers would just lock the same file (and write) blindly.
In python 3, the context manager is required to not raise
an exception when writing the config due to the lock already
being held by another config_writer.

Change-Id: I42a9804638c6065127ce31a1865b017bf969855f
2016-07-14 13:47:01 -07:00
Joshua Hesketh 0aa7e8bdbf Merge branch 'master' into v3_merge
Includes minor py3 fixes (for pep8 on py3).

 Conflicts:
	tests/base.py
	tests/test_model.py
	tests/test_scheduler.py
	tox.ini
	zuul/model.py
	zuul/reporter/__init__.py
	zuul/scheduler.py
	zuul/source/gerrit.py

Change-Id: I99daf9acd746767967b42396881a2dff82134a07
2016-07-14 22:36:59 +10:00
Jenkins c19d3d9eff Merge "Expose webapp listen_address and port" 2016-07-12 23:31:02 +00:00
Jenkins a5ddf547ac Merge "Support post jobs by supporting rev checkout" 2016-07-12 15:51:45 +00:00
Sachi King 9f16d522a9 Support post jobs by supporting rev checkout
Currently zuul-cloner does not support post jobs, as it does not know
what to checkout.  This adds the ability on a per project basis to
specify a revision to be checked out.  When specified zuul-cloner
will successfully check out the same repo as gerrit-git-prep.sh does
in post jobs.

Sample usage:
clonemap:
  - name: openstack/neutron
    dest: ./neu
  - name: openstack/requirements
    dest: ./reqs

export ZUUL_PROJECT="openstack/neutron"
export ZUUL_NEWREV="a2Fhc2Rma2FzZHNkZjhkYXM4OWZhc25pb2FzODkK"
export ZUUL_BRANCH="stable/liberty"

zuul-cloner -m map.yaml git://git.openstack.org $ZUUL_PROJECT \
openstack/requirements

This results with openstack/neutron checked out at rev a2Fhc2 and
openstack/requirements at 'heads/stable/liberty'

Change-Id: Ie9b03508a44f04adfbe2696cde136439ebffb9a6
2016-07-12 12:51:51 +10:00
Jan Hruban b4f9c61e84 Lower the log level in tests
The subunit output size is already nearing 50MB, which is the maximum
allowed by jenkins jobs. Lower the log level to INFO, which should be
enough for normal test runs and lower the output size significantly.

Co-Authored-By: Joshua Hesketh <josh@nitrotech.org>
Change-Id: Ia6adc28a7bda482595df4b5f3b144f150e3a441e
2016-06-23 13:29:02 +10:00
Jenkins a413178204 Merge "Fix timeout debug print in tests" 2016-06-14 06:24:33 +00:00
Morgan Fainberg d34e0b4dc7 Reduce Log Size
To reduce the testrepository.subunit output, eliminate debugging logs
from gear.Server and gear.Client.

This is handled via an ENV defined in the tox.ini called `OS_LOG_DEFAULTS`.
Any module can be specified in the typicall python logging format (e.g.
"gear.Server=INFO"). Each entry should be comma separated. For each valid
entry, a fake logger is created with the log level set to that level.

An invalid format will be skipped (expected: `<module name str>=<level_str>`).

An invalid logging level will default to logging.DEBUG.

Specifying OS_LOG_DEFAULT as an ENV var prior to running tox will override
the default values defined in tox.ini.

Change-Id: I893418435c538bfcedb803d12b57832c8111f06f
2016-06-10 10:15:50 -07:00
James E. Blair 622c968737 Fix timeout debug print in tests
Commit 4c6a7744 introduced an error in formatting the queue status.
This corrects that and also changes the print statements to debug
logs so they are easier to follow.

Change-Id: I412ad6c2e460c5ee15cc0e5a3956a513b7cd7138
2016-06-09 16:36:25 -07:00
Monty Taylor 74fa3865ac Python 3 Fixes: Replace missing builtins
There are a few different missing builtins in python3 that we're using.
This shows up when running tox pep8 under python3 which is needed for
the streamer work.

Change-Id: I1b2ef0b7bdcd1a85895576682455745fe06e880b
2016-06-07 17:59:16 +00:00