Files
gerrit/java
Edwin Kempin 577e8fc66c RepoSequence: Release counter lock while blocking for retry
If the update of the sequence in NoteDb fails with LOCK_FAILURE we use a
retryer to reattempt the update. The retryer uses wait strategies with
exponential and random wait time, giving up after 30s. While the retryer
waits until the next retry the thread is blocked. If we keep the counter
lock while we are blocking for retry, any other thread that needs a
sequence number gets blocked until all retries happened and the counter
lock was released:

1. [thread A] RepoSequence.next() is called to get a sequence number.
2. [thread A] The counter lock is acquired.
3. [thread A] RepoSequence.acquire(int) is called to update the sequence
              in NoteDb.
4. [thread A] Retryer is used to update NoteDb.
4. [thread A - Retryer] The NoteDb update fails with LOCK_FAILURE.
5. [thread A - Retryer] The retryer blocks until the next retry
6. [thread B] RepoSequence.next() is called to get a sequence number.
7. [thread B] The counter lock is still held by thread A, hence thread B
              is blocked until A is done.
8. [thread A - Retryer] The NoteDb update is retried and succeeds now.
9. [thread A] The counter lock is released.
10. [thread B] Thread B is unblocked and can get a sequence number now.

Blocking other threads while waiting for retry is bad. To avoid this,
acquire and release the counter lock from the code block that is
executed by the retryer:

1. [thread A] RepoSequence.next() is called to get a sequence number.
2. [thread A] Retryer is used to get the next sequence number
3. [thread A - Retryer] The counter lock is acquired.
4. [thread A - Retryer] RepoSequence.acquire(int) is called to update
                        the sequence in NoteDb.
5. [thread A - Retryer] The NoteDb update fails with LOCK_FAILURE.
6. [thread A - Retryer] The counter lock is released.
7. [thread A - Retryer] The retryer blocks until the next retry
8. [thread B] RepoSequence.next() is called to get a sequence number.
9. [thread B] Retryer is used to get the next sequence number, the
              retryer can acquire the counter lock because thread A has
              released it before blocking, after getting the sequence
              number the counter lock is released again.
10. [thread A - Retryer] Retryer retries:
                         a) acquire counter lock,
                         b) update NoteDb,
                         c) release counter lock

Retrying is now done on a higher level (before locking) and needs to be
done in all public methods that need to acquire the counter lock (next()
and next(int)). Since both methods have different return types (int and
ImmutableList<Integer>) and the return value is computed within the
retryer block, we would need to have seperate retryer instance for this
(because the retryer needs to be instantiated with type that it should
return as result). To avoid this we let the next() method delegate to
next(int). This is not super nice since we now always wrap single
sequence numbers in an ImmutableList, but it's better than having to
have two retryer instances (Retryer<Integer> and
Retryer<ImmutableList<Integer>>). It's also an improvement to have only
a single point where locking/unlocking is done.

TryAcquire is inlined since this is no longer the entity that is
retried.

To test that the counter lock was properly released before blocking for
retry we do:

1. create a RepoSequence instance with
   a) a background update that causes a LOCK_FAILURE for the first
      attempt to update the sequence in NoteDb
   b) a retryer that has a custom block strategy that blocks until we
      flip an isBlocking flag
2. start a background thread that retrieves a sequence number, for this
   thread the first attempt to update NoteDb fails due to the background
   update and then the retry hangs until we flip the isBlocking flag
3. wait until the LOCK_FAILURE in the background thread has happened and
   the background thread is blocked for retry
4. verify that retrieving a sequence number from the test thread works
   while the background thread is blocking
5. verify that the background thread succeeds to retrieve a sequence
   number after when isBlocking flag is flipped

If RepoSequence wouldn't release the counter lock before blocking for
retry the test would hang at step 4. and then time out.

Signed-off-by: Edwin Kempin <ekempin@google.com>
Change-Id: Ib17af223b63d655066ab538b724355316fa24ca1
2019-06-12 17:13:18 +02:00
..
2019-05-08 07:33:21 -07:00