Fix error with config cache
The following sequence would trigger this error: * A project merges a change which renames a zuul.d/* config file * The scheduler is restarted When the scheduler starts, it needs to read in every in-repo config file. Previously it would issue cat jobs for every project-branch in the system which would then get the list of files and their contents from the git repos. Now it relies on the cache in ZooKeeper to provide that list and their contents. That cache is updated whenever a config change merges. When that happens, Zuul knows that the cache is invalid for that project-branch, so it issues a cat job and stores the results in the cache. This is how the cache is populated (generally speaking; it is also populated on startup if any new projects or branches have been added and are not present in the cache). Because we use the results of the cat job after the change merges, the error is not observed immediately. Only later when we rely on the values in ZK does the error manifest, and that is because the contents in ZK are a superset of all the files Zuul has seen. The reason that we did not simply delete the entire contents of the project-branch cache when we invalidate it is because a cat job is run for a specific tenant, with a specific tenant-project-config (TPC). This TPC may list extra files to include for only this project in this tenant. Therefore, two cat jobs run on the same project-branch but for different tenants may return a different set of files. If we naively removed all the files, we would end up with the smallest subset in the cache, which would be incorrect. Obviously we do need to delete files if they really don't exist in the repo. We can do this safely if we delete files from the cache iff they do not appear in the set returned by the cat job, but do match the set of files we expect for this particular TPC. In that case we know that if the file really existed, the cat job would have returned it. That is what this change implements. A test is added which shuts down and restarts the scheduler in the middle of the test. This is the first such test, so a little adjustment in the test framework is needed to accommodate this. Finally, a release note is included since operators may need to perform a manual step after upgrading in order to reconcile the cache with reality. A small change is made to the file filter used when loading dynamic configs in order to make the directory matching more correct and consistent between the two cases. Change-Id: I9a1ee94cf0b55ac04a8f0cc12ac7507cab18d44b
This commit is contained in:
parent
6eb84eb4bd
commit
598db8a78b
@ -0,0 +1,15 @@
|
||||
---
|
||||
upgrade:
|
||||
- |
|
||||
An error was found in a change related to Zuul's internal
|
||||
configuration cache which could cause Zuul to use cached in-repo
|
||||
configuration files which no longer exist. If a ``zuul.yaml`` (or
|
||||
``zuul.d/*`` or any related variant) file was deleted or renamed,
|
||||
Zuul would honor that change immediately, but would attempt to
|
||||
load both the old and new contents from its cache upon the next
|
||||
restart.
|
||||
|
||||
This error was introduced in version 4.8.0.
|
||||
|
||||
If upgrading from 4.8.0, run ``zuul-scheduler full-reconfigure``
|
||||
in order to correctly update the cache.
|
@ -4324,8 +4324,8 @@ class SchedulerTestManager:
|
||||
def __setitem__(self, key: int, value: SchedulerTestApp):
|
||||
raise Exception("Not implemented, use create method!")
|
||||
|
||||
def __delitem__(self, key, value):
|
||||
raise Exception("Not implemented!")
|
||||
def __delitem__(self, key):
|
||||
del self.instances[key]
|
||||
|
||||
def __iter__(self):
|
||||
return iter(self.instances)
|
||||
@ -4595,11 +4595,7 @@ class ZuulTestCase(BaseTestCase):
|
||||
self.builds = self.executor_server.running_builds
|
||||
|
||||
self.scheds = SchedulerTestManager(self.validate_tenants)
|
||||
self.scheds.create(
|
||||
self.log, self.config, self.changes,
|
||||
self.additional_event_queues, self.upstream_root, self.rpcclient,
|
||||
self.poller_events, self.git_url_with_auth,
|
||||
self.fake_sql, self.addCleanup, self.validate_tenants)
|
||||
self.createScheduler()
|
||||
|
||||
self.merge_server = None
|
||||
|
||||
@ -4611,6 +4607,13 @@ class ZuulTestCase(BaseTestCase):
|
||||
self.scheds.execute(
|
||||
lambda app: app.start(self.validate_tenants))
|
||||
|
||||
def createScheduler(self):
|
||||
self.scheds.create(
|
||||
self.log, self.config, self.changes,
|
||||
self.additional_event_queues, self.upstream_root, self.rpcclient,
|
||||
self.poller_events, self.git_url_with_auth,
|
||||
self.fake_sql, self.addCleanup, self.validate_tenants)
|
||||
|
||||
def __event_queues(self, matcher) -> List[Queue]:
|
||||
# TODO (felix): Can be removed when the nodes provisioned events are
|
||||
# switched to ZooKeeper.
|
||||
|
2
tests/fixtures/config/in-repo-dir/git/common-config/playbooks/job.yaml
vendored
Normal file
2
tests/fixtures/config/in-repo-dir/git/common-config/playbooks/job.yaml
vendored
Normal file
@ -0,0 +1,2 @@
|
||||
- hosts: all
|
||||
tasks: []
|
22
tests/fixtures/config/in-repo-dir/git/common-config/zuul.d/config.yaml
vendored
Normal file
22
tests/fixtures/config/in-repo-dir/git/common-config/zuul.d/config.yaml
vendored
Normal file
@ -0,0 +1,22 @@
|
||||
- pipeline:
|
||||
name: check
|
||||
manager: independent
|
||||
trigger:
|
||||
gerrit:
|
||||
- event: patchset-created
|
||||
success:
|
||||
gerrit:
|
||||
Verified: 1
|
||||
failure:
|
||||
gerrit:
|
||||
Verified: -1
|
||||
|
||||
- job:
|
||||
name: base
|
||||
parent: null
|
||||
run: playbooks/job.yaml
|
||||
|
||||
- project:
|
||||
name: common-config
|
||||
check:
|
||||
jobs: []
|
1
tests/fixtures/config/in-repo-dir/git/org_project/README
vendored
Normal file
1
tests/fixtures/config/in-repo-dir/git/org_project/README
vendored
Normal file
@ -0,0 +1 @@
|
||||
test
|
8
tests/fixtures/config/in-repo-dir/git/org_project/zuul.d/project.yaml
vendored
Normal file
8
tests/fixtures/config/in-repo-dir/git/org_project/zuul.d/project.yaml
vendored
Normal file
@ -0,0 +1,8 @@
|
||||
- job:
|
||||
name: project-test1
|
||||
|
||||
- project:
|
||||
name: org/project
|
||||
check:
|
||||
jobs:
|
||||
- project-test1
|
8
tests/fixtures/config/in-repo-dir/main.yaml
vendored
Normal file
8
tests/fixtures/config/in-repo-dir/main.yaml
vendored
Normal file
@ -0,0 +1,8 @@
|
||||
- tenant:
|
||||
name: tenant-one
|
||||
source:
|
||||
gerrit:
|
||||
config-projects:
|
||||
- common-config
|
||||
untrusted-projects:
|
||||
- org/project
|
@ -2441,6 +2441,60 @@ class TestInRepoConfig(ZuulTestCase):
|
||||
A.messages[0], "A should have debug info")
|
||||
|
||||
|
||||
class TestInRepoConfigDir(ZuulTestCase):
|
||||
# Like TestInRepoConfig, but the fixture test files are in zuul.d
|
||||
tenant_config_file = 'config/in-repo-dir/main.yaml'
|
||||
|
||||
def test_file_move(self):
|
||||
# Tests that a zuul config file can be renamed
|
||||
in_repo_conf = textwrap.dedent(
|
||||
"""
|
||||
- job:
|
||||
name: project-test2
|
||||
|
||||
- project:
|
||||
name: org/project
|
||||
check:
|
||||
jobs:
|
||||
- project-test2
|
||||
""")
|
||||
file_dict = {'zuul.d/project.yaml': None,
|
||||
'zuul.d/newfile.yaml': in_repo_conf}
|
||||
A = self.fake_gerrit.addFakeChange('org/project', 'master', 'A',
|
||||
files=file_dict)
|
||||
A.setMerged()
|
||||
self.fake_gerrit.addEvent(A.getChangeMergedEvent())
|
||||
self.waitUntilSettled()
|
||||
|
||||
B = self.fake_gerrit.addFakeChange('org/project', 'master', 'B')
|
||||
self.fake_gerrit.addEvent(B.getPatchsetCreatedEvent(1))
|
||||
self.waitUntilSettled()
|
||||
self.assertHistory([
|
||||
dict(name='project-test2', result='SUCCESS', changes='2,1'),
|
||||
], ordered=True)
|
||||
|
||||
self.scheds[0].sched.stop()
|
||||
self.scheds[0].sched.join()
|
||||
del self.scheds[0]
|
||||
self.log.debug("Restarting scheduler")
|
||||
self.createScheduler()
|
||||
self.scheds[0].start(self.validate_tenants)
|
||||
|
||||
self.waitUntilSettled()
|
||||
|
||||
# The fake gerrit was lost with the scheduler shutdown;
|
||||
# restore the state we care about:
|
||||
self.fake_gerrit.change_number = 2
|
||||
|
||||
C = self.fake_gerrit.addFakeChange('org/project', 'master', 'C')
|
||||
self.fake_gerrit.addEvent(C.getPatchsetCreatedEvent(1))
|
||||
self.waitUntilSettled()
|
||||
self.assertHistory([
|
||||
dict(name='project-test2', result='SUCCESS', changes='2,1'),
|
||||
dict(name='project-test2', result='SUCCESS', changes='3,1'),
|
||||
], ordered=True)
|
||||
|
||||
|
||||
class TestGlobalRepoState(AnsibleZuulTestCase):
|
||||
config_file = 'zuul-connections-gerrit-and-github.conf'
|
||||
tenant_config_file = 'config/global-repo-state/main.yaml'
|
||||
|
@ -1787,8 +1787,10 @@ class TenantParser(object):
|
||||
pb_ltime = min_ltimes[project.canonical_name][branch]
|
||||
if files_cache.isValidFor(tpc, pb_ltime):
|
||||
self.log.debug(
|
||||
"Using files from cache for project %s @%s",
|
||||
project.canonical_name, branch)
|
||||
"Using files from cache for project "
|
||||
"%s @%s: %s",
|
||||
project.canonical_name, branch,
|
||||
list(files_cache.keys()))
|
||||
branch_cache = abide.getUnparsedBranchCache(
|
||||
project.canonical_name, branch)
|
||||
if branch_cache.isValidFor(tpc, files_cache.ltime):
|
||||
@ -1842,6 +1844,22 @@ class TenantParser(object):
|
||||
# Cache file in Zookeeper
|
||||
if content is not None:
|
||||
files_cache[fn] = content
|
||||
# If there are any files in the cache which match our
|
||||
# set of requested files but aren't in the set
|
||||
# returned by the cat job, then they must have been
|
||||
# deleted. Remove them from the cache so that the
|
||||
# file list in the cache is correct.
|
||||
ok_files = ('zuul.yaml', '.zuul.yaml') + tpc.extra_config_files
|
||||
ok_dirs = ('zuul.d', '.zuul.d') + tpc.extra_config_dirs
|
||||
for fn in files_cache.keys():
|
||||
if ((fn in ok_files) or
|
||||
any([fn.startswith(d + '/') for d in ok_dirs])):
|
||||
if fn not in job.files:
|
||||
self.log.debug(
|
||||
"Removing file from cache for project "
|
||||
"%s @%s: %s",
|
||||
project.canonical_name, branch, fn)
|
||||
del files_cache[fn]
|
||||
files_cache.setValidFor(job.extra_config_files,
|
||||
job.extra_config_dirs,
|
||||
job.ltime)
|
||||
@ -2405,10 +2423,10 @@ class ConfigLoader(object):
|
||||
if fn.startswith(".zuul.d/"):
|
||||
fns2.append(fn)
|
||||
for ef in tpc.extra_config_files:
|
||||
if fn.startswith(ef):
|
||||
if fn == ef:
|
||||
fns3.append(fn)
|
||||
for ed in tpc.extra_config_dirs:
|
||||
if fn.startswith(ed):
|
||||
if fn.startswith(ed + '/'):
|
||||
fns4.append(fn)
|
||||
fns = (["zuul.yaml"] + sorted(fns1) + [".zuul.yaml"] +
|
||||
sorted(fns2) + fns3 + sorted(fns4))
|
||||
|
Loading…
x
Reference in New Issue
Block a user