Update gerrit changes more atomically
The following problem was observed: Change A depends-on change B, which is in turn the tip of a patch series of several changes. Drivers warm the change cache on events by querying information about changes related to those events. But they don't process depends-on headers, which means most drivers only warm one change, and while the gerrit driver will follow other types of dependency links which are unique to it, it stops at depends-on boundaries. So in the example above, the only change in the cach which was warm was change A. The triggering event was enqueued, forwarded, and processed by two responding pipelines simultaneously on two executors. Each of them noticed the depends-on link and started querying gerrit for change B and its dependencies. One of the schedulers was about 1 second ahead of the other in this process. In the gerrit driver, there is a two phase process for updating changes. First the change itself is updated in the normal way common to all drivers, and then gerrit-specific dependency links are updated. That means the change is added to the change cache with no dependencies, then mutated to add dependencies later. The first scheduler added change B to the cache with no dependencies. The second scheduler saw the update and refreshed its copy of B. The second scheduler begin updating B, saw that the ltime of its copy of B was sufficiently new it didn't need to update the cache and stopped updating. The second scheduler enqueued changes A and B, but no others in its pipeline. The first scheduler finished querying the stack of changes ending at B, added them to the change cache, and mutated the entry for B in the cache. The first scheduler enqueued A, B, and the rest of the stack in its pipeline. The second scheduler updated its copy of B to include the new dependencies. The second scheduler ran a pipeline processor, noticed that B lacked dependencies, and dequeued A and B, and reported an error to the user. The good news is that Zuul noticed the mistake and dequeued the changes. To correct this, we will now collect all of the information about a change and its gerrit-specific dependencies before writing any of that information to the change cache. This means that in our example above, the second scheduler would not have aborted its queries. Eventually, both schedulers would end up with the same information before enqueing anything. This process is still not quite optimal, in that we will have multiple schedulers racing to update the same changes at the same time, but they are designed to deal with collisions like that, so it should at least be correct. A future area of work might be to investigate whether we can optimize this case further. Change-Id: I647c2b54a55789e521fca71c8c3814907df65da6
This commit is contained in:
@@ -115,7 +115,8 @@ class DependentPipelineManager(SharedQueuePipelineManager):
|
||||
# cycle changes will otherwise be partially enqueued without
|
||||
# any error handling
|
||||
self.log.debug(
|
||||
" Skipping change %s due to dependency cycle"
|
||||
" Skipping change %s due to dependency cycle",
|
||||
other_change
|
||||
)
|
||||
continue
|
||||
|
||||
|
||||
Reference in New Issue
Block a user