gerrit/gerrit-lucene
Edwin Kempin 8feeb02541 Properly await termination of all index executor threads on shutdown
When Gerrit is shutdown we must wait for all currently running index
threads to finish because otherwise these index events are lost and the
index becomes stale.

There are 2 executors for indexing, one for interactive indexing and one
for batch indexing.

A first attempt to fix this was done by change I7bda130585, but it broke
the site after online reindexing (see issue 4618) and hence it was
reverted by change I6ec90eb733.

Change Id332ec0215 fixed the issue partly. In addition to the
Index#close() method which closes the index it added a Index#stop()
method that stops the (interactive) index executor and awaits the
termination of all running index threads. The Index#stop() method is
only called when Gerrit is shutdown, while Index#close() is also invoked
when online reindexing is done. There are several issues with this fix:

1. It only cared about the interactive index executor, but not about the
   batch index executor.
2. The IndexCollection#stop() method which is called on Gerrit shutdown
   missed to call Index#stop() on the write index if it was the same
   Index instance as the read index.
3. Although documented developers were confused about Index having both
   a close() and a stop() method and there was a danger of calling
   stop() accidentally and thus break all further indexing.
4. Closing the interactive index executor when stopping the change index
   assumed that the inactive index executor is only used by the change
   index. This assumption is currently true, but only because the other
   indexes (account, group, project) missed to use this executor.
5. The interactive index executor was only stopped when Lucene was used,
   but not when ElasticSearch was used.

Due to 1. and 2. all query tests (e.g. LuceneQueryChangesTest) were
leaking threads. Each test case accumulated more running (interactive
and batch) index threads. These threads consumed memory and in the end
this possibly resulted in an OutOfMemoryError (only for running
LuceneQueryChangesTest on Mac, see issue 7611).

To fix this IndexModule adds a new LifecycleListener that is responsible
for shutting down both index executors:

- Both index executors are now shutdown (interactive and batch).
- The stop() method of LifecycleListeners is only invoked when Gerrit is
  shutdown and it's unlikely to be triggered during normal Gerrit
  runtime (e.g. when online reindexing is done).
- There is exactly one place that shuts down the index executors (as
  opposed to n indexes trying to take care of it). This makes sure that
  the executors can be shared across indexes.
- The stop() method from Index is removed so that there is less
  confusion about stop() vs. close().

One important thing is that the executors are shutdown before closing
the indexes. This ensures that all writes have been performed in-memory
before the final flush to disk is done. This means that the
LifecycleListener to stop the executors must be registered after the
LifecycleListeners that close the indexes (on Gerrit start the
LifecycleListeners are invoked in the order in which they are
registered, on shutdown of Gerrit the order is reversed).

IndexModule also allows to provide external executors for the
interactive and batch indexing (instead of creating new executors). In
this case IndexModule doesn't register a LifecycleListener to shutdown
the executors since they are externally managed. Providing external
executors is only used by Google for their multi-tenant setup. In this
setup the executors are global singletons in the process that are shared
by many sites and shutting down one site must not shut down the global
executors.

This fixes the OutOfMemoryError that was observed on Mac when running
the LuceneQueryChangesTest (see issue 7611).

Bug: Issue 7611
Change-Id: I093fb19b56aba90fbb7579f0e59da435c4c1b702
Signed-off-by: Edwin Kempin <ekempin@google.com>
2017-11-09 20:10:57 +00:00
..
2016-12-07 11:33:07 +00:00