Support 'max_result_window' config for Elasticsearch indexes

Gerrit supports Elasticsearch as the index for search API. This is very
convenient for users to choose the index type they want. But when there
exists a large number of documents in the index of Elasticsearch, some
relative functions such as the front pages and APIs will become
unusable.

This is because of the default Elasticsearch 'max_result_window' setting
being 10000. When the query result (from + size) is greater than it, the
query will fail.

When gerrit executes a query on the index (such as the user is searching
on the Gerrit front pages, which makes a request to the Gerrit
backend API), then Gerrit will execute an HTTP request to the
Elasticsearch API to query the relative data and return the data back
to the front user to view the data they want.

But if the result set that exists in the index is greater than the
default value 10000, Gerrit will encounter this issue because
Elasticsearch API will return an error response to tip the invoker
that the result window is exceeded with the query.

There are three different solutions to the problem:

    1. Support 'max_result_window' config for Elasticsearch indexes. It
is the simplest solution and the common solution for multiple releases
of Gerrit (all Elasticsearch versions supported by Gerrit support this
setting).

    2. Use Elasticsearch Scroll Query API instead. This way decreases
the costs of query performance, but brings other problems. Elasticsearch
does not support traditional pagination by using scroll API and has
some limitation with the '_scoll_id'; it is not recommended to use it
for real-time user requests.

    3. Use Elasticsearch Search After API instead. It's the best way in
Elasticsearch to solve the costly deep pagination. But, it only supports
Elasticsearch versions greater than 6.2, while Gerrit still supports
5.6.

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-scroll
[2] https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-search-after

Bug: Issue 12444
Change-Id: I39da7d1d50df7bbe9dc88411632bb029c77f9f36
This commit is contained in:
Dyrone Teng
2020-03-18 11:20:03 +08:00
parent c5243ea8f5
commit 76fed530a0
3 changed files with 16 additions and 0 deletions

View File

@@ -40,10 +40,13 @@ class ElasticConfiguration {
static final String KEY_SERVER = "server";
static final String KEY_NUMBER_OF_SHARDS = "numberOfShards";
static final String KEY_NUMBER_OF_REPLICAS = "numberOfReplicas";
static final String KEY_MAX_RESULT_WINDOW = "maxResultWindow";
static final String DEFAULT_PORT = "9200";
static final String DEFAULT_USERNAME = "elastic";
static final int DEFAULT_NUMBER_OF_SHARDS = 0;
static final int DEFAULT_NUMBER_OF_REPLICAS = 1;
static final int DEFAULT_MAX_RESULT_WINDOW = 10000;
private final Config cfg;
private final List<HttpHost> hosts;
@@ -52,6 +55,7 @@ class ElasticConfiguration {
final String password;
final int numberOfShards;
final int numberOfReplicas;
final int maxResultWindow;
final String prefix;
@Inject
@@ -68,6 +72,8 @@ class ElasticConfiguration {
cfg.getInt(SECTION_ELASTICSEARCH, null, KEY_NUMBER_OF_SHARDS, DEFAULT_NUMBER_OF_SHARDS);
this.numberOfReplicas =
cfg.getInt(SECTION_ELASTICSEARCH, null, KEY_NUMBER_OF_REPLICAS, DEFAULT_NUMBER_OF_REPLICAS);
this.maxResultWindow =
cfg.getInt(SECTION_ELASTICSEARCH, null, KEY_MAX_RESULT_WINDOW, DEFAULT_MAX_RESULT_WINDOW);
this.hosts = new ArrayList<>();
for (String server : cfg.getStringList(SECTION_ELASTICSEARCH, null, KEY_SERVER)) {
try {