
The following sequence would trigger this error: * A project merges a change which renames a zuul.d/* config file * The scheduler is restarted When the scheduler starts, it needs to read in every in-repo config file. Previously it would issue cat jobs for every project-branch in the system which would then get the list of files and their contents from the git repos. Now it relies on the cache in ZooKeeper to provide that list and their contents. That cache is updated whenever a config change merges. When that happens, Zuul knows that the cache is invalid for that project-branch, so it issues a cat job and stores the results in the cache. This is how the cache is populated (generally speaking; it is also populated on startup if any new projects or branches have been added and are not present in the cache). Because we use the results of the cat job after the change merges, the error is not observed immediately. Only later when we rely on the values in ZK does the error manifest, and that is because the contents in ZK are a superset of all the files Zuul has seen. The reason that we did not simply delete the entire contents of the project-branch cache when we invalidate it is because a cat job is run for a specific tenant, with a specific tenant-project-config (TPC). This TPC may list extra files to include for only this project in this tenant. Therefore, two cat jobs run on the same project-branch but for different tenants may return a different set of files. If we naively removed all the files, we would end up with the smallest subset in the cache, which would be incorrect. Obviously we do need to delete files if they really don't exist in the repo. We can do this safely if we delete files from the cache iff they do not appear in the set returned by the cat job, but do match the set of files we expect for this particular TPC. In that case we know that if the file really existed, the cat job would have returned it. That is what this change implements. A test is added which shuts down and restarts the scheduler in the middle of the test. This is the first such test, so a little adjustment in the test framework is needed to accommodate this. Finally, a release note is included since operators may need to perform a manual step after upgrading in order to reconcile the cache with reality. A small change is made to the file filter used when loading dynamic configs in order to make the directory matching more correct and consistent between the two cases. Change-Id: I9a1ee94cf0b55ac04a8f0cc12ac7507cab18d44b
16 lines
602 B
YAML
16 lines
602 B
YAML
---
|
|
upgrade:
|
|
- |
|
|
An error was found in a change related to Zuul's internal
|
|
configuration cache which could cause Zuul to use cached in-repo
|
|
configuration files which no longer exist. If a ``zuul.yaml`` (or
|
|
``zuul.d/*`` or any related variant) file was deleted or renamed,
|
|
Zuul would honor that change immediately, but would attempt to
|
|
load both the old and new contents from its cache upon the next
|
|
restart.
|
|
|
|
This error was introduced in version 4.8.0.
|
|
|
|
If upgrading from 4.8.0, run ``zuul-scheduler full-reconfigure``
|
|
in order to correctly update the cache.
|