Remove implicit smart reconfig during startup

In the past (especially before the scale-out scheduler) the expectation
was that restarting a scheduler would also pick up changes to the local
tenant config.

In a multi-scheduler deployment this can become a problem when the
tenant config is updated on another scheduler while e.g. a rolling
update in Kubernetes is in progress.

Schedulers should use the tenant config that's stored in Zookeeper and
not rely on the local config to be up-to-date, which a smart-reconfig
implicitly assumes to be the case.

Change-Id: I01f22888a08075a9efc86c3ed978b20e2df2c6bf
This commit is contained in:
Simon Westphahl
2024-12-04 16:05:23 +01:00
parent 41845f9632
commit 8dab278845
3 changed files with 10 additions and 16 deletions

View File

@@ -0,0 +1,8 @@
---
upgrade:
- |
Schedulers will no longer trigger a smart-reconfig after startup and with
that also not pick up modifications to the local tenant config and system
attributes in the zuul config file with a restart of a scheduler. This
means that changes to the tenant config or the zuul config always require
an explicit (smart) reconfiguration.

View File

@@ -211,13 +211,8 @@ class TestScaleOutScheduler(ZuulTestCase):
== second_app.sched.unparsed_abide.ltime):
break
# TODO (swestphahl): change this to assertEqual() when we remove
# the smart reconfiguration during config priming.
# Currently the smart reconfiguration during priming of the second
# scheduler will update the system config in Zookeeper and the first
# scheduler updates it's config in return.
self.assertNotEqual(second_app.sched.globals.max_hold_expiration,
initial_max_hold_exp)
self.assertEqual(second_app.sched.globals.max_hold_expiration,
initial_max_hold_exp)
def test_reconfigure(self):
# Create a second scheduler instance

View File

@@ -1093,15 +1093,6 @@ class Scheduler(threading.Thread):
self.local_layout_state[tenant_name] = layout_state
self.connections.reconfigureDrivers(tenant)
# TODO(corvus): Consider removing this implicit reconfigure
# event with v5. Currently the expectation is that if you
# stop a scheduler, change the tenant config, and start it,
# the new tenant config should take effect. If we change that
# expectation with multiple schedulers, we can remove this.
event = ReconfigureEvent(smart=True)
event.zuul_event_ltime = self.zk_client.getCurrentLtime()
self._doReconfigureEvent(event)
# TODO(corvus): This isn't quite accurate; we don't really
# know when the last reconfiguration took place. But we
# need to set some value here in order for the cleanup