Add a helper for retrying BatchUpdates under NoteDb

With a SQL or SQL-like backend for ReviewDb, two transactions on the
same Change entity initiated around the same time will generally both
succeed, due to the low-level implementation waiting for a lock or
retrying. NoteDb, being Git-backed, has no notion of locking, and the
only atomic operation is a compare-and-swap. This means that concurrent
writes carry a higher risk of exceptions in the Gerrit level when
compared with ReviewDb, and it will be worth it to implement a retrying
mechanism in Gerrit.

The question becomes: what is the appropriate unit of work to retry? The
implementation in this change encourages retrying at the highest level
of an entire end-user operation, like a REST API handler.

The main reason not to limit retrying to a lower level, like a single
BatchUpdate or its Ops, is that the op implementations may depend on
repository state that was read prior to entering the retry loop. This
potentially includes pretty much any caller of
BatchUpdate#setRepository, but the most notable is MergeOp: the initial
branch tips, which are ultimately used as old IDs in the final ref
updates, are read outside of the BatchUpdate. If we retried the
BatchUpdate on LOCK_FAILURE but not the outer code, retrying would be
guaranteed to fail.

The next question is: under what conditions should we retry? The safest
approach, implemented here, is to look specifically for LOCK_FAILUREs
only in the disabled-ReviewDb case, and only when the underlying ref
backend performs atomic multi-ref transactions. If transactions are not
atomic, then it is infeasible to find out which portions of the code
would need to be retried; if they are atomic, then we can assume that a
failed transaction means the operation had no side effects, so retrying
is safe.

There is certainly an argument to be made that it may be worth retrying
even after non-atomic partially-successful operations, under the
assumption that if an error propagates back to the user, probably the
next thing they were going to anyway is just retry manually. But
decisions about when to loosen up our initially tight safety assumptions
can be deferred.

Change-Id: Ic7a9df9ba1bfdb01784cd1fce2b2ce82511e1068
This commit is contained in:
Dave Borowitz
2017-04-20 10:48:29 -04:00
parent 521f78f700
commit 66ea2ed46e
3 changed files with 280 additions and 5 deletions

View File

@@ -0,0 +1,79 @@
// Copyright (C) 2017 The Android Open Source Project
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package com.google.gerrit.server.update;
import com.github.rholder.retry.RetryException;
import com.github.rholder.retry.RetryerBuilder;
import com.github.rholder.retry.StopStrategies;
import com.github.rholder.retry.WaitStrategies;
import com.google.common.base.Throwables;
import com.google.gerrit.extensions.restapi.RestApiException;
import com.google.gerrit.server.git.LockFailureException;
import com.google.gerrit.server.notedb.NotesMigration;
import com.google.inject.Inject;
import com.google.inject.Singleton;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
@Singleton
public class RetryHelper {
public interface Action<T> {
T call(BatchUpdate.Factory updateFactory) throws Exception;
}
private final BatchUpdate.Factory updateFactory;
@Inject
RetryHelper(
NotesMigration migration,
ReviewDbBatchUpdate.AssistedFactory reviewDbBatchUpdateFactory,
FusedNoteDbBatchUpdate.AssistedFactory fusedNoteDbBatchUpdateFactory,
UnfusedNoteDbBatchUpdate.AssistedFactory unfusedNoteDbBatchUpdateFactory) {
this.updateFactory =
new BatchUpdate.Factory(
migration,
reviewDbBatchUpdateFactory,
fusedNoteDbBatchUpdateFactory,
unfusedNoteDbBatchUpdateFactory);
}
public <T> T execute(Action<T> action) throws RestApiException, UpdateException {
try {
// TODO(dborowitz): Make configurable.
return RetryerBuilder.<T>newBuilder()
.withStopStrategy(StopStrategies.stopAfterDelay(20, TimeUnit.SECONDS))
.withWaitStrategy(
WaitStrategies.join(
WaitStrategies.exponentialWait(5, TimeUnit.SECONDS),
WaitStrategies.randomWait(50, TimeUnit.MILLISECONDS)))
.retryIfException(RetryHelper::isLockFailure)
.build()
.call(() -> action.call(updateFactory));
} catch (ExecutionException | RetryException e) {
if (e.getCause() != null) {
Throwables.throwIfInstanceOf(e.getCause(), UpdateException.class);
Throwables.throwIfInstanceOf(e.getCause(), RestApiException.class);
}
throw new UpdateException(e);
}
}
private static boolean isLockFailure(Throwable t) {
if (t instanceof UpdateException) {
t = t.getCause();
}
return t instanceof LockFailureException;
}
}