zuul/governor at bb41b1221519c58dfc497999fef390ba0d82652a - zuul

zuul/zuul

History

James E. Blair df37ad2ce7 Executor: Don't start too many jobs at once The metrics that we use to govern load on the executors are all trailing indicators. The executors are capable of accepting a large number of jobs in a batch and then, only after they begin to run, will the load indicators increase. To avoid the thundering herd problem, reduce the rate at which we accept jobs past a certain point. That point is twice the number of jobs as the target load average. In practice that seems to be a fairly conservative but reasonable number of jobs for the executor to run, so, to facilitate a quick start, allow the executor to start up to that number all at once. Once the number of jobs running is beyond that number, subsequent jobs will only be accepted one at a time, after each one completes its startup phase (cloning repos, establishing ansible connections), which is to say, at the point where the job begins running its first pre-playbook. We will also wait until the next regular interval of the governor to accept the next job. That's currently 30 seconds, but to make the system a bit more responsive, it's lowered to 10 seconds in this change. To summarize: after a bunch[1] of jobs are running, after each new job, we wait until that job has started running playbooks, plus up to an additional 10 seconds, before accepting a new job. This is implemented by adding a 'starting jobs' metric to the governor so that we register or de-register the execute function based on whether too many jobs are in the startup phase. We add a forced call to the governor routine after each job starts so that we can unregister if necessary before picking up the next job, and wrap that routine in a lock since it is now called from multiple threads and its logic may not be entirely thread-safe. Also, add tests for all three inputs to manageLoad. [1] 2*target load average Change-Id: I066bc539e70eb475ca2b871fb90644264d8d5bf4	2018-02-02 11:36:49 -08:00
..
git/common-config	Executor: Don't start too many jobs at once	2018-02-02 11:36:49 -08:00
main.yaml	Executor: Don't start too many jobs at once	2018-02-02 11:36:49 -08:00

James E. Blair df37ad2ce7 Executor: Don't start too many jobs at once

The metrics that we use to govern load on the executors are all
trailing indicators.  The executors are capable of accepting a
large number of jobs in a batch and then, only after they begin
to run, will the load indicators increase.  To avoid the thundering
herd problem, reduce the rate at which we accept jobs past a certain
point.

That point is twice the number of jobs as the target load average.
In practice that seems to be a fairly conservative but reasonable
number of jobs for the executor to run, so, to facilitate a quick
start, allow the executor to start up to that number all at once.

Once the number of jobs running is beyond that number, subsequent
jobs will only be accepted one at a time, after each one completes
its startup phase (cloning repos, establishing ansible connections),
which is to say, at the point where the job begins running its first
pre-playbook.

We will also wait until the next regular interval of the governor
to accept the next job.  That's currently 30 seconds, but to make
the system a bit more responsive, it's lowered to 10 seconds in this
change.

To summarize: after a bunch[1] of jobs are running, after each new
job, we wait until that job has started running playbooks, plus up
to an additional 10 seconds, before accepting a new job.

This is implemented by adding a 'starting jobs' metric to the governor
so that we register or de-register the execute function based on
whether too many jobs are in the startup phase.  We add a forced
call to the governor routine after each job starts so that we can
unregister if necessary before picking up the next job, and wrap that
routine in a lock since it is now called from multiple threads and
its logic may not be entirely thread-safe.

Also, add tests for all three inputs to manageLoad.

[1] 2*target load average

Change-Id: I066bc539e70eb475ca2b871fb90644264d8d5bf4

2018-02-02 11:36:49 -08:00

git/common-config

Executor: Don't start too many jobs at once

2018-02-02 11:36:49 -08:00

main.yaml

Executor: Don't start too many jobs at once

2018-02-02 11:36:49 -08:00