The account staleness checker was disabled because of failures in the
Elasticsearch tests that looked like [1]. The investigation of this
issue showed that this was not caused by the account staleness checker
but was an issue that could already occur before. For some reason the
issue is hit more often when the staleness checker is enabled. Change
I7b20a0cee6 fixed this partly, but the problem can still occur. In
general the unstable Elasticsearch tests should not prevent us from
enabling unrelated features like the staleness checker for accounts.
The CI already excludes the Elasticsearch tests since they are flaky.
Fixing the flakyness of the Elasticsearch tests is unrelated to the
account staleness checker and should be done in separate changes.
However to not make the Elasticsearch tests any more flaky disable the
staleness checker for all Elasticsearch tests.
However a real issue with enabling the account staleness checker was
that it made some account tests flaky. Some account tests verify the
number of times that an account is indexed. With the account staleness
checker these tests got flaky because the staleness checker caused extra
reindex events. This may happen if an account is reindexed several times
in a row since the staleness check is done asynchronously, e.g. if an
account is reindexed twice the following happens:
1. TestThread: update account to state A, index and trigger reindex if
stale
2. BatchExecutorThread: auto reindex if stale, triggered by 1.
3. TestThread: update account to state B, index and trigger reindex if
stale
4. BatchExecutorThread: auto reindex if stale, triggered by 3.
Since 2 and 3 are executed in parallel it can happen that 2 reads state
B from NoteDb and state A from index and hence detects the account as
stale which triggers another reindex. This is fine, it's an extra
reindex event but the logic ensures that in the end the account is never
stale in the index.
However since this is confusing the account tests the staleness checker
is now explicitly disabled for all account tests.
There was also some flakiness observed when we expected only a single
index event [2] but I wasn't able to reproduce this. Disabling the
staleness checker for all account tests makes sure that we don't get
this flakiness either.
This reverts commit b316186826.
[1]
[accounts_0007] IndexAlreadyExistsException[already exists]
at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService.validateIndexName(MetaDataCreateIndexService.java:136)
at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService.validate(MetaDataCreateIndexService.java:431)
at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService.access$100(MetaDataCreateIndexService.java:95)
at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$1.execute(MetaDataCreateIndexService.java:190)
at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45)
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:480)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:784)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2] https://gerrit-ci.gerritforge.com/job/Gerrit-verifier-bazel/35988/consoleText
Change-Id: Ib0c69dd6805b2624679afa10d2ef2fd89dc5f8be
Signed-off-by: Edwin Kempin <ekempin@google.com>