If A is the head in A <- B <- C, and B failed, then C would be
correctly reparented to A. Then if A failed, B and C would be
restarted, but C would not be reparented back to B. This is
because the check around moving a change short-circuited if
there was no change ahead (which is the case if C is behind A
and A reports).
The solution to this is to still perform the move check even if
there is no change currently ahead (so that if there is a NNFI
change ahead, the current change will be moved behind it). This
effectively means we should remove the "not item ahead" part of
the conditional around the move.
This part of the conditional serves two additional purposes --
to make sure that we don't dereference an attribute on item_ahead
if it is None, and also to ensure that the NNFI algorithm is not
applied to independent queues.
So the fix moves that part of the conditional out so that we can
safely reference the needed attributes if there is a change ahead,
and also makes explicit that we ignore the situation if we are
working on an independent change queue.
This also adds a test that failed (at the indicated position) with
the previous code.
Change-Id: I4cf5e868af7cddb7e95ef378abb966613ac9701c
When moving an item, we correctly reparented the items behind
the item to the item that was previously ahead. But we did
not remove the references to the items behind from the item
that was being moved. This could result in that item
maintaining references to items that were previously behind it.
Generally, these would be the same and it would manifest as
double entries in items_behind.
Change-Id: Ibc1447867df4c6fc7b4fe954770a06c7c24fadc4
Update the scheduler algorithm to NNFI -- Nearest Non-Failing Item.
A stateless description of the algorithm is that jobs for every
item should always be run based on the repository state(s) set by
the nearest non-failing item ahead of it in its change queue.
This means that should an item fail (for any reason -- failure to
merge, a merge conflict, or job failure) changes after it will
have their builds canceled and restarted with the assumption that
the failed change will not merge but the nearest non-failing
change ahead will merge.
This should mean that dependent queues will always be running
jobs and no longer need to wait for a failing change to merge or
not merge before restarting jobs.
This removes the dequeue-on-conflict behavior because there is
now no cost to keeping an item that can not merge in the queue.
The documentation and associated test for this are removed.
This also removes the concept of severed heads because a failing
change at the head will not prevent other changes from proceeding
with their tests. If the jobs for the change at the head run
longer than following changes, it could still impact them while
it completes, but the reduction in code complexity is worth this
minor de-optimization.
The debugging representation of QueueItem is changed to make it
more useful.
Change-Id: I0d2d416fb0dd88647490ec06ed69deae71d39374
By default, tox passes --pre option to pip install commands so that
prerelease packages, which perhaps are not suitable for test,
would be installed, causing unforseeable errors. Like:
http://logs.openstack.org/33/47233/1/check/gate-zuul-docs/7b794af/console.html
whose installed Sphinx package is a beta version 2 causing the error in the log.
fungi's proposal https://review.openstack.org/#/c/47239/1 also applies here.
Change-Id: I9108b534bb469211434a4abf22b25c983aa444ba
Utilises the new reporter plugin architecture to add support for
emailing success/failure messages based on layout.yaml.
This will assist in testing new gates as currently after a job has
finished if no report is sent back to gerrit then only the workers
logs can be consulted to see if it was successful. This will allow
developers to see exactly what zuul will return if they turn on
gerrit reporting.
Change-Id: I47ac038bbdffb0a0c75f8e63ff6978fd4b4d0a52
Allows multiple reports per a patchset to be sent to pluggable
destinations. These are configurable per pipeline and, if not
specified, defaults to the legacy behaviour of reporting back only
to gerrit.
Having multiple reporting methods means only certain success/failure
/start parameters will apply to certain reporters. Reporters are
listed as keys under each of those actions.
This means that each key under success/failure/start is a reporter and the
dictionaries under those are sent to the reporter to deal with.
Change-Id: I80d7539772e1485d5880132f22e55751b25ec198
The conditional that did a 'git remote update' for ref-updated
events (which otherwise don't affect the Merger) was wrong.
Change-Id: Icb2596df023279442613e10e13104a3621d867d9
Revert "Fix checkout when preparing a ref"
This reverts commit 6eeb24743a.
Revert "Don't reset the local repo"
This reverts commit 96ee718c4b.
Revert "Fetch specific refs on ref-updated events"
This reverts commit bfd5853957.
Change-Id: I50ae4535e3189350d3cc3a7527f89d5cb8eec01d
The new checkout method was relying on out of date information
stored in the remote which was not being updated by the fetch
command. Instead, just checkout FETCH_HEAD using git directly
so that the remote does not need to be kept up to date.
Also, reset and clean _before_ checking out, since that's supposed
to clean up from messy merges, etc.
Change-Id: Ie47b675512edc36e8aeb9b537ca945ad8d07b780
If an exception was received during a report, _reportItem would
erroneously indicate that it had been reported without error.
If a merge was expected, isMerged would be called which may then
raise a further exception which would stop queue processing.
Instead, set the default return value for _reportItem to True
because trigger.report returns a true value on error. This will
cause the change to be marked as reported (with a value of ERROR),
the merge check skipped, and the change will be quickly removed
from the pipeline.
Change-Id: I08b7cee486111200ac9857644d478727c635908d
If a change is removed outside of the main process method (eg
it is superceded), stats were not reported. Report them in that
case.
Change-Id: I9e753599dc3ecdf0d4bffc04f4515c04d882c8be
When the gearman server was added, the exit handler was not updated
correctly. It should tell the scheduler to exit, wait for the
scheduler to empty its pipelines, and then kill the gearman server
and exit.
Change-Id: Ie0532c2ea058ed56217e41641f8eec45080a9470
Instead of "resetting" the local repo (git remote update,
git checkout master, git reset --hard, git clean -xfdq) before
merging each change, just fetch the remote ref for the branch
and check that out (as a detached head). Or, if we are merging
a change that depends on another change in the queue, just check
that change out.
Change-Id: I0a9b839a0c75c04eca7393d7bb58cf89448b6494
The current behavior is that for every event, run
'git remote origin update', which is quite a bit of overhead and
doesn't match what the comments say should be happening. The goal
is to ensure that when new tags arrive, we have them locally in
our repo. It's also not a bad idea for us to keep up with remote
branch movements as well.
This updates the event pre-processor to fetch the ref for each
ref-updated event as they are processed. This is much faster than
the git remote update that was happening before. It also adds
a git remote update to the repo initialization step so that when
Zuul starts, it will pick up any remote changes since it last ran.
Change-Id: I671bb43eddf41c7403de53bb4a223762101adc3c
We can more closely approximate Gerrit's behavior by using the
'resolve' git merge strategy. Make that the default, and leave
the previous behavior ('git merge') as an option. Also, finish
and correct the partially implemented plumbing for other merge
strategies (including cherry-pick).
(Note the previous unfinished implementation attempted to mimic
Gerrit's option names; the new implementation does not, but rather
documents the alignment. It's not a perfect translation anyway,
and this gives us more room to support other strategies not
currently supported by Gerrit).
Change-Id: Ie1ce4fde5980adf99bba69a5aa1d4e81026db676
We have a graph on our status page showing all of the jobs Zuul
launched, but it's built from more than 1000 graphite keys which
is a little inefficient. Add a key for convenience that rolls
up all of the job completions in a pipeline, so that such a graph
can be built with only about 10 keys.
Change-Id: Ie6dbcca68c8a118653effe90952c7921a9de9ad1
If a job is complete with no build result, it has failed to
run to completion. In this case, discard the previous build
and launch a replacement (in the next run of the queue processor).
Change-Id: Ib8fc245a5becb1e7deb13f1ea0721fdb6ceb9f6f
This change causes the gearman launcher to completely ignore builds
after the stop job request has been sent. This should prevent
them from updating build status with confusing results. It will
also help us avoid restarting canceled builds in a subsequent
change.
Change-Id: Id31bcbfb6f24a7ec9f5f0a776d7d2c30f36685b4
Several assignments in updateChange would actually just keep
appending data causing immensely large data structures (which
are later traversed putting Zuul into a significant busy loop).
Make sure that data are replaced instead of augmented.
Change-Id: I8c6528adbbe24d30f8d5bb8b55bb731fefd9941a
* doc/source/zuul.rst: Document SIGUSR2 behavior.
* zuul/cmd/server.py: When SIGUSR2 is received log stack traces for all
active running threads. This is useful for debugging deadlock
situations. Note that this makes use of sys._current_frames which may
not play nice with all implementations of Python.
*tests/test_stack_dump.py: Test the stack dump signal handler with a new
test file, class, and test method.
* requirements.txt: Add argparse to requirements list so that py26 tests
can pass when zuul.cmd.server is imported.
Change-Id: I8ad8155b16f324e832c191f0a619ff89ef804a87
If job_name_in_report is false, the 'name' variable may be
undefined. This makes sure it is always defined.
Change-Id: Ie544ccdf1661e08e2aa4c8055999f16e20d7584b
Add the ability for Zuul to accept inputs from multiple trigger
sources simultaneously.
Pipelines are associated with exactly one trigger, which must now
be named in the configuration file.
Co-Authored-By: Monty Taylor <mordred@inaugust.com>
Change-Id: Ief2b31a7b8d85d30817f2747c1e2635f71ea24b9
The multi-branch test occasionally picked the wrong change's commit
to inspect, resulting in a flakey test. Try to pick the right one
instead.
Change-Id: I1cefa1b96cac9e5a8fdfe0f4de4704202dd6d072
For every job completed, record the result of that job separately
to statsd. For successful and failed jobs, record the runtimes
of the jobs separately by result (others are not interesting).
Also, substitute '_' for '.' in job names in statsd keys.
This is backwards-incompatible with current statsd keys.
Change-Id: I7b6152bcc7ea5ce6e37bf90ed41aee89baa29309
Pass a reference to the current job about to be run to the custom
parameter function.
Remove the un-needed ZUUL_SHORT_* parameters.
Change-Id: I39538b3815ce89fae0b59c21c5cff588509cfe4e
Add an option to the syntax validator to test that job
referenced in the layout are defined in a file. Creating the
file with the list of jobs is an exercise for user.
Change-Id: Iceb74440cb004e9ebe6fc08a4eedf7715de2d485