Remove implicit smart reconfig during startup
In the past (especially before the scale-out scheduler) the expectation was that restarting a scheduler would also pick up changes to the local tenant config. In a multi-scheduler deployment this can become a problem when the tenant config is updated on another scheduler while e.g. a rolling update in Kubernetes is in progress. Schedulers should use the tenant config that's stored in Zookeeper and not rely on the local config to be up-to-date, which a smart-reconfig implicitly assumes to be the case. Change-Id: I01f22888a08075a9efc86c3ed978b20e2df2c6bf
This commit is contained in:
@@ -0,0 +1,8 @@
|
||||
---
|
||||
upgrade:
|
||||
- |
|
||||
Schedulers will no longer trigger a smart-reconfig after startup and with
|
||||
that also not pick up modifications to the local tenant config and system
|
||||
attributes in the zuul config file with a restart of a scheduler. This
|
||||
means that changes to the tenant config or the zuul config always require
|
||||
an explicit (smart) reconfiguration.
|
||||
@@ -211,13 +211,8 @@ class TestScaleOutScheduler(ZuulTestCase):
|
||||
== second_app.sched.unparsed_abide.ltime):
|
||||
break
|
||||
|
||||
# TODO (swestphahl): change this to assertEqual() when we remove
|
||||
# the smart reconfiguration during config priming.
|
||||
# Currently the smart reconfiguration during priming of the second
|
||||
# scheduler will update the system config in Zookeeper and the first
|
||||
# scheduler updates it's config in return.
|
||||
self.assertNotEqual(second_app.sched.globals.max_hold_expiration,
|
||||
initial_max_hold_exp)
|
||||
self.assertEqual(second_app.sched.globals.max_hold_expiration,
|
||||
initial_max_hold_exp)
|
||||
|
||||
def test_reconfigure(self):
|
||||
# Create a second scheduler instance
|
||||
|
||||
@@ -1093,15 +1093,6 @@ class Scheduler(threading.Thread):
|
||||
self.local_layout_state[tenant_name] = layout_state
|
||||
self.connections.reconfigureDrivers(tenant)
|
||||
|
||||
# TODO(corvus): Consider removing this implicit reconfigure
|
||||
# event with v5. Currently the expectation is that if you
|
||||
# stop a scheduler, change the tenant config, and start it,
|
||||
# the new tenant config should take effect. If we change that
|
||||
# expectation with multiple schedulers, we can remove this.
|
||||
event = ReconfigureEvent(smart=True)
|
||||
event.zuul_event_ltime = self.zk_client.getCurrentLtime()
|
||||
self._doReconfigureEvent(event)
|
||||
|
||||
# TODO(corvus): This isn't quite accurate; we don't really
|
||||
# know when the last reconfiguration took place. But we
|
||||
# need to set some value here in order for the cleanup
|
||||
|
||||
Reference in New Issue
Block a user