Browse Source

Cancel a build even if not found

Canceling a build is naturally subject to race conditions as the
build is started (since canceling a build is different depending
on whether or not it has started).  This has been handled by
checking whether the build has started, and if so canceling the
build, if not, removing it from the queue, and if that fails,
checking to see if the build just started and if so, canceling
that.

But even that is still racy because it's possible for the build
to have started but for zuul to not have received the gearman
packet indicating that it had.  To handle that, simply don't check
whether the build has started for the third attempt.

(The reason we even check at all before the first attempt is that
canceling a running build in jenkins is somewhat expensive (it
involves iterating over all the builds) so it's better to avoid
that if we think it won't work.)

Also, add an extra check in the unit test suite when deciding
whether the system has settled.  This should deal with the case
that a trigger_event -> job transition is happening during the
haveAllBuildsReported check (which only checks jobs).

Change-Id: I60018a5215e7d8230bdf6ef67ec7bc9c719fc286
changes/91/79391/2
James E. Blair 8 years ago
parent
commit
9c2985a7e7
  1. 1
      tests/test_scheduler.py
  2. 8
      zuul/launcher/gearman.py

1
tests/test_scheduler.py

@ -1112,6 +1112,7 @@ class TestScheduler(testtools.TestCase):
self.sched.result_event_queue.empty() and
self.fake_gerrit.event_queue.empty() and
not self.merge_client.build_sets and
self.haveAllBuildsReported() and
self.areAllBuildsWaiting()):
self.sched.run_handler_lock.release()
self.worker.lock.release()

8
zuul/launcher/gearman.py

@ -352,11 +352,11 @@ class Gearman(object):
self.log.debug("Still unable to find build %s to cancel" % build)
if build.number:
self.log.debug("Build %s has just started" % build)
self.cancelRunningBuild(build)
self.log.debug("Canceled just running build %s" % build)
else:
self.log.error("Build %s has not started but "
"was not found in queue" % build)
self.log.error("Build %s has not started but was not"
"found in queue; canceling anyway" % build)
self.cancelRunningBuild(build)
self.log.debug("Canceled possibly running build %s" % build)
def onBuildCompleted(self, job, result=None):
if job.unique in self.meta_jobs:

Loading…
Cancel
Save