gerrit/gerrit-server
Han-Wen Nienhuys 405a8f53d3 in WorkQueue, explicitly cancel Runnables that are Futures.
In LuceneChangeIndex, we schedule the future by calling (essentially)

  MoreExecutors.listeningDecorator(threadPool).submit()

this returns a TrustedListenableFutureTask, a Future implemented by
Guava, and we wait on this one.

The implementation passes this off to ScheduledThreadPoolExecutor for
running. This interprets it as a Runnable in
AbstractExecutorService#submit(), and a Runnable has no call surface
for cancellation. This means that the guava future is never canceled
if the corresponding ScheduledFutureTask is canceled.

Server#stop shuts down all thread pools. Since the pools are created
with

    setExecuteExistingDelayedTasksAfterShutdownPolicy(false)

all pending work is canceled.

The problem would trigger in the following circumstances:

 * For tests that schedule two or more ref updates at the end of the
   test. Since the interactive pool has size 1, that could delay a
   piece of work to be delayed.

 * Executors are shutdown in creation order, which is random. It would
   only trigger if the interactive pool was shutdown before the batch
   pool.

The problem could be reliably reproduced by building with Bazel,
setting shard_count=30 on
//gerrit-acceptance-tests/src/test/java/com/google/gerrit/acceptance/rest/project:rest_project,
and running shard 10 (which exhibited the problem) 50-way parallel on
a 12 HT-core system.

Things to note:

* If we have to use ListenableFutures, then it would be nice if we
  could use a Executor that actually works together well with Guava.

* Server#stop discards pending work. In particular, work scheduled by
  ReindexAfterUpdate can be discarded, potentially leaving the index
  inconsistent.

* ReindexAfterUpdate runs in the batch executor, but then schedules
  its search work on the interactive executor, which is gratuitously
  parallel.

* WorkQueue.Executor is a lot of cognitive overhead for providing a
  list of processes. Can't administrators just run jstack?

* A randomized creation order for threadpools causes randomized
  shutdown order, making problems harder to reproduce.

Bug: Issue 4466
Change-Id: I55c3b85c66433de7ee9e037fc243abe705080bbc
2016-10-13 18:31:28 +00:00
..
src in WorkQueue, explicitly cancel Runnables that are Futures. 2016-10-13 18:31:28 +00:00
BUCK Add support for secondary index with Elasticsearch 2016-09-27 23:27:37 +09:00
BUILD bazel: update for elasticsearch and lucene. 2016-09-28 21:17:08 +02:00