Browse Source
We can obtain the same information from the SQL database now, so do that and remove the filesystem-based time database. This will help support multiple schedulers (as they will all have access to the same data). Nothing in the scheduler uses the state directory anymore, so clean up the docs around that. The executor still has a state dir where it may install ansible-related files. The SQL query was rather slow in practice because it created a temporary table since it was filtering mostly by buildset fields then sorting by build.id. We can sort by buildset.id and get nearly the same results (equally valid from our perspective) much faster. In some configurations under postgres, we may see a performance variation in the run-time of the query. In order to keep the time estimation out of the critical path of job launches, we perform the SQL query asynchronously. We may be able to remove this added bit of complexity once the scale-out-scheduler work is finished (and/or we we further define/restrict our database requirements). Change-Id: Id3c64be7a05c9edc849e698200411ad436a1334dchanges/41/808841/12
11 changed files with 163 additions and 202 deletions
@ -0,0 +1,10 @@
|
||||
--- |
||||
upgrade: |
||||
- | |
||||
The scheduler time database has been removed. This was stored in |
||||
the scheduler state directory, typically ``/var/lib/zuul/times``. |
||||
The entire state directory on the scheduler is no longer used and |
||||
may now be removed. |
||||
|
||||
Zuul now derives its estimated build duration times from the SQL |
||||
database. |
@ -0,0 +1,96 @@
|
||||
# Copyright 2021 Acme Gating, LLC |
||||
# |
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may |
||||
# not use this file except in compliance with the License. You may obtain |
||||
# a copy of the License at |
||||
# |
||||
# http://www.apache.org/licenses/LICENSE-2.0 |
||||
# |
||||
# Unless required by applicable law or agreed to in writing, software |
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT |
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the |
||||
# License for the specific language governing permissions and limitations |
||||
# under the License. |
||||
|
||||
import logging |
||||
import threading |
||||
import queue |
||||
|
||||
import cachetools |
||||
|
||||
|
||||
class Times: |
||||
"""Perform asynchronous database queries to estimate build times. |
||||
|
||||
To avoid allowing the SQL database to become a bottelneck when |
||||
launching builds, this class performs asynchronous queries against |
||||
the DB and returns estimated build times. |
||||
|
||||
This is intended as a temporary hedge against performance |
||||
regressions during Zuul v4 development and can likely be removed |
||||
once multiple schedulers are supported and possible tightening of |
||||
database requirements. |
||||
""" |
||||
|
||||
log = logging.getLogger("zuul.times") |
||||
|
||||
def __init__(self, sql, statsd): |
||||
self.sql = sql |
||||
self.statsd = statsd |
||||
self.queue = queue.Queue() |
||||
self.cache = cachetools.TTLCache(8192, 3600) |
||||
self.thread = threading.Thread(target=self.run) |
||||
self.running = False |
||||
|
||||
def start(self): |
||||
self.running = True |
||||
self.thread.start() |
||||
|
||||
def stop(self): |
||||
self.running = False |
||||
self.queue.put(None) |
||||
|
||||
def join(self): |
||||
return self.thread.join() |
||||
|
||||
def run(self): |
||||
while self.running: |
||||
key = self.queue.get() |
||||
if key is None: |
||||
continue |
||||
try: |
||||
# Double check that we haven't added this key since it |
||||
# was requested |
||||
if key in self.cache: |
||||
continue |
||||
with self.statsd.timer('zuul.scheduler.time_query'): |
||||
self._getTime(key) |
||||
except Exception: |
||||
self.log.exception("Error querying DB for build %s", key) |
||||
|
||||
def _getTime(self, key): |
||||
tenant, project, branch, job = key |
||||
previous_builds = self.sql.getBuilds( |
||||
tenant=tenant, |
||||
project=project, |
||||
branch=branch, |
||||
job_name=job, |
||||
final=True, |
||||
result='SUCCESS', |
||||
limit=10, |
||||
sort_by_buildset=True) |
||||
times = [x.duration for x in previous_builds if x.duration] |
||||
if times: |
||||
estimate = float(sum(times)) / len(times) |
||||
self.cache.setdefault(key, estimate) |
||||
# Don't cache a zero value, so that new jobs get an estimated |
||||
# time ASAP. |
||||
|
||||
def getEstimatedTime(self, tenant, project, branch, job): |
||||
key = (tenant, project, branch, job) |
||||
ret = self.cache.get(key) |
||||
if ret is not None: |
||||
return ret |
||||
|
||||
self.queue.put(key) |
||||
return None |
Loading…
Reference in new issue