have realtime engine only search recent indexes
Elastic Recheck is really 2 things, real time searching, and bulk offline categorization. While the bulk categorization needs to look over the entire dataset, the real time portion is really deadline oriented. So only cares about the last hour's worth of data. As such we really don't need to search *all* the indexes in ES, but only the most recent one (and possibly the one before that if we are near rotation). Implement this via a recent= parameter for our search feature. If set to true then we specify the most recently logstash index. If it turns out that we're within an hour of rotation, also search the one before that. Adjust all the queries the bot uses to be recent=True. This will hopefully reduce the load generated by the bot on the ES cluster. Change-Id: I0dfc295dd9b381acb67f192174edd6fdde06f24c
This commit is contained in:
@@ -196,7 +196,10 @@ class RecheckWatch(threading.Thread):
|
||||
|
||||
for job in event.failed_jobs:
|
||||
job.bugs = set(classifier.classify(
|
||||
event.change, event.rev, job.build_short_uuid))
|
||||
event.change,
|
||||
event.rev,
|
||||
job.build_short_uuid,
|
||||
recent=True))
|
||||
if not event.get_all_bugs():
|
||||
self._read(event)
|
||||
else:
|
||||
|
||||
Reference in New Issue
Block a user