Fix GitOverHttpServletIT flakiness

The races here are non-obvious. First, in general, HTTP audit event
dispatching looks synchronous in GitOverHttpServlet, but is actually
async since it happens after writing data over an HTTP connection.
There's no way to know whether the thread on the other end has read the
data and continued executing before #doFilter returns.

More specifically in the test environment, GitOverHttpServletIT performs
some Git-over-HTTP operations during setup, namely cloning the repo into
an InMemoryRepository. We don't know whether the audit events generated
by this operation are processed before or after the @Test method starts
executing. As a result, we can't use the cleaner alternative of having a
test set up its own AuditListener binding, because the DynamicSet
addition may or may not be visible when GitOverHttpServlet calls the
audit listener.

Fundamentally, we can't avoid waiting an arbitrarily long amount of time
for the audit events to appear, because the JVM provides no time-bounded
guarantee about when the HTTP thread will continue executing after the
test thread has finished reading the HTTP response. So ultimately, this
change boils down to increasing the timeout we wait for audit events to
appear. (Also note that in the pessimal JVM thread scheduling case, we
are also subject to timeouts in HTTP communication, so even if we
somehow got rid of timeouts for audit event processing it's not like we
would be free from timeouts.)

That said, this change also improves the code to be more reliable and
easy to reason about in several ways. First, use BlockingQueue over
wait/notify, which is a better abstraction for pumping messages between
threads.

Second, we now keep track in GitOverHttpServlet of how many requests
were started, and this AtomicLong increment operation happens before any
HTTP response data is written. This means that we know exactly how many
audit events to expect from the BlockingQueue, and callers don't have to
either specify the expected count in advance. This reduces the risk of
a caller miscounting and waiting for an event that will never arrive. It
also means that test methods can easily drain an arbitrary number of
events from the setup code with a single call at the top of the method.
The remainder of events are generated from entirely within the method.

Now that we more reliably dequeue all events, it's clear that we
actually do expect two HTTP requests in both the upload-pack and
receive-pack cases. The previous code expecting 1 event in the
upload-pack case was wrong and only passed due to flakiness.

It's possible that the timeouts may still need to be adjusted to work in
other environments (slow laptops, CI). In the worst case, we can use
BlockingQueue#take(), which has no timeout at all.

As a side effect, move FakeGroupAuditService to the acceptance package,
since it needs a new dependency that we don't really need to add to the
testing package. Like other classes in the acceptance package, it's used
during acceptance test setup, not any smaller tests.

Considered the following alternatives to introducing a new counter in
GitOverHttpServlet:
 * Changing the GroupAuditService interface to explicitly record the
   start of requests. Getting the interface right is nontrivial and
   would require changes to multiple implementations, with no benefit
   outside of tests.
 * Hooking into existing HTTP request metrics, which doesn't currently
   work due to the bug identified by Ic7b38a01.

Verified that this reduces flakiness to <0.5% on my workstation with:

$ bazel test --runs_per_test=200 --test_filter='GitOverHttpServletIT#.*AuditEventLog' //javatests/com/google/gerrit/acceptance/git:git

Change-Id: Idd57cc7f5e0b6504403b10bd3676e78d9f1cd792
This commit is contained in:
Dave Borowitz
2019-01-24 10:07:09 -08:00
parent 018a3ecdb7
commit 54fe8e83e3
6 changed files with 168 additions and 97 deletions

View File

@@ -110,6 +110,7 @@ java_library2(
"//java/com/google/gerrit/pgm/init",
"//java/com/google/gerrit/reviewdb:server",
"//java/com/google/gerrit/server",
"//java/com/google/gerrit/server/audit",
"//java/com/google/gerrit/server/git/receive",
"//java/com/google/gerrit/server/restapi",
"//java/com/google/gerrit/server/schema",

View File

@@ -0,0 +1,95 @@
// Copyright (C) 2018 The Android Open Source Project
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package com.google.gerrit.acceptance;
import static java.util.Comparator.comparing;
import static java.util.concurrent.TimeUnit.SECONDS;
import com.google.common.collect.ImmutableList;
import com.google.common.primitives.Ints;
import com.google.gerrit.extensions.registration.DynamicSet;
import com.google.gerrit.httpd.GitOverHttpServlet;
import com.google.gerrit.server.AuditEvent;
import com.google.gerrit.server.audit.AuditListener;
import com.google.gerrit.server.audit.AuditService;
import com.google.gerrit.server.audit.HttpAuditEvent;
import com.google.gerrit.server.audit.group.GroupAuditListener;
import com.google.gerrit.server.group.GroupAuditService;
import com.google.gerrit.server.plugincontext.PluginSetContext;
import com.google.inject.AbstractModule;
import com.google.inject.Inject;
import com.google.inject.Singleton;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicLong;
@Singleton
public class FakeGroupAuditService extends AuditService {
public static class Module extends AbstractModule {
@Override
public void configure() {
DynamicSet.setOf(binder(), GroupAuditListener.class);
DynamicSet.setOf(binder(), AuditListener.class);
// Use this fake service at the Guice level rather than depending on tests binding their own
// audit listeners. If we used per-test listeners, then there would be a race between
// dispatching the audit events from HTTP requests performed during test setup in
// AbstractDaemonTest, and the later test setup binding the audit listener. Using a separate
// audit service implementation ensures all events get recorded.
bind(GroupAuditService.class).to(FakeGroupAuditService.class);
}
}
private final GitOverHttpServlet.Metrics httpMetrics;
private final BlockingQueue<HttpAuditEvent> httpEvents;
private final AtomicLong drainedSoFar;
@Inject
FakeGroupAuditService(
PluginSetContext<AuditListener> auditListeners,
PluginSetContext<GroupAuditListener> groupAuditListeners,
GitOverHttpServlet.Metrics httpMetrics) {
super(auditListeners, groupAuditListeners);
this.httpMetrics = httpMetrics;
this.httpEvents = new LinkedBlockingQueue<>();
this.drainedSoFar = new AtomicLong();
}
@Override
public void dispatch(AuditEvent action) {
super.dispatch(action);
if (action instanceof HttpAuditEvent) {
httpEvents.add((HttpAuditEvent) action);
}
}
public ImmutableList<HttpAuditEvent> drainHttpAuditEvents() throws Exception {
// Assumes that all HttpAuditEvents are produced by GitOverHttpServlet.
int expectedSize = Ints.checkedCast(httpMetrics.getRequestsStarted() - drainedSoFar.get());
List<HttpAuditEvent> result = new ArrayList<>();
for (int i = 0; i < expectedSize; i++) {
HttpAuditEvent e = httpEvents.poll(30, SECONDS);
if (e == null) {
throw new AssertionError(
String.format("Timeout after receiving %d/%d audit events", i, expectedSize));
}
drainedSoFar.incrementAndGet();
result.add(e);
}
return ImmutableList.sortedCopyOf(comparing(e -> e.when), result);
}
}

View File

@@ -47,7 +47,6 @@ import com.google.gerrit.server.ssh.NoSshModule;
import com.google.gerrit.server.util.SocketUtil;
import com.google.gerrit.server.util.SystemLog;
import com.google.gerrit.testing.FakeEmailSender;
import com.google.gerrit.testing.FakeGroupAuditService;
import com.google.gerrit.testing.InMemoryRepositoryManager;
import com.google.gerrit.testing.SshMode;
import com.google.inject.AbstractModule;

View File

@@ -14,6 +14,7 @@
package com.google.gerrit.httpd;
import com.google.common.annotations.VisibleForTesting;
import com.google.common.cache.Cache;
import com.google.common.collect.ImmutableListMultimap;
import com.google.common.collect.ListMultimap;
@@ -54,6 +55,7 @@ import java.util.Collections;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.atomic.AtomicLong;
import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
@@ -129,6 +131,25 @@ public class GitOverHttpServlet extends GitServlet {
.expireAfterWrite(Duration.ofMinutes(10));
}
});
// Don't bind Metrics, which is bound in a parent injector in tests.
}
}
@VisibleForTesting
@Singleton
public static class Metrics {
// Recording requests separately in this class is only necessary because of a bug in the
// implementation of the generic RequestMetricsFilter; see
// https://gerrit-review.googlesource.com/c/gerrit/+/211692
private final AtomicLong requestsStarted = new AtomicLong();
void requestStarted() {
requestsStarted.incrementAndGet();
}
public long getRequestsStarted() {
return requestsStarted.get();
}
}
@@ -317,22 +338,26 @@ public class GitOverHttpServlet extends GitServlet {
private final PermissionBackend permissionBackend;
private final Provider<CurrentUser> userProvider;
private final GroupAuditService groupAuditService;
private final Metrics metrics;
@Inject
UploadFilter(
UploadValidators.Factory uploadValidatorsFactory,
PermissionBackend permissionBackend,
Provider<CurrentUser> userProvider,
GroupAuditService groupAuditService) {
GroupAuditService groupAuditService,
Metrics metrics) {
this.uploadValidatorsFactory = uploadValidatorsFactory;
this.permissionBackend = permissionBackend;
this.userProvider = userProvider;
this.groupAuditService = groupAuditService;
this.metrics = metrics;
}
@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain next)
throws IOException, ServletException {
metrics.requestStarted();
// The Resolver above already checked READ access for us.
Repository repo = ServletUtils.getRepository(request);
ProjectState state = (ProjectState) request.getAttribute(ATT_STATE);
@@ -431,22 +456,26 @@ public class GitOverHttpServlet extends GitServlet {
private final PermissionBackend permissionBackend;
private final Provider<CurrentUser> userProvider;
private final GroupAuditService groupAuditService;
private final Metrics metrics;
@Inject
ReceiveFilter(
@Named(ID_CACHE) Cache<AdvertisedObjectsCacheKey, Set<ObjectId>> cache,
PermissionBackend permissionBackend,
Provider<CurrentUser> userProvider,
GroupAuditService groupAuditService) {
GroupAuditService groupAuditService,
Metrics metrics) {
this.cache = cache;
this.permissionBackend = permissionBackend;
this.userProvider = userProvider;
this.groupAuditService = groupAuditService;
this.metrics = metrics;
}
@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
throws IOException, ServletException {
metrics.requestStarted();
boolean isGet = "GET".equalsIgnoreCase(((HttpServletRequest) request).getMethod());
AsyncReceiveCommits arc = (AsyncReceiveCommits) request.getAttribute(ATT_ARC);

View File

@@ -1,62 +0,0 @@
// Copyright (C) 2018 The Android Open Source Project
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package com.google.gerrit.testing;
import com.google.gerrit.extensions.registration.DynamicSet;
import com.google.gerrit.server.AuditEvent;
import com.google.gerrit.server.audit.AuditListener;
import com.google.gerrit.server.audit.AuditService;
import com.google.gerrit.server.audit.group.GroupAuditListener;
import com.google.gerrit.server.group.GroupAuditService;
import com.google.gerrit.server.plugincontext.PluginSetContext;
import com.google.inject.AbstractModule;
import com.google.inject.Inject;
import com.google.inject.Singleton;
import java.util.ArrayList;
import java.util.List;
@Singleton
public class FakeGroupAuditService extends AuditService {
public final List<AuditEvent> auditEvents = new ArrayList<>();
public static class Module extends AbstractModule {
@Override
public void configure() {
DynamicSet.setOf(binder(), GroupAuditListener.class);
DynamicSet.setOf(binder(), AuditListener.class);
bind(GroupAuditService.class).to(FakeGroupAuditService.class);
}
}
@Inject
FakeGroupAuditService(
PluginSetContext<AuditListener> auditListeners,
PluginSetContext<GroupAuditListener> groupAuditListeners) {
super(auditListeners, groupAuditListeners);
}
public void clearEvents() {
auditEvents.clear();
}
@Override
public void dispatch(AuditEvent action) {
super.dispatch(action);
synchronized (auditEvents) {
auditEvents.add(action);
auditEvents.notifyAll();
}
}
}

View File

@@ -16,13 +16,15 @@ package com.google.gerrit.acceptance.git;
import static com.google.common.truth.Truth.assertThat;
import com.google.common.collect.ImmutableList;
import com.google.gerrit.acceptance.FakeGroupAuditService;
import com.google.gerrit.acceptance.Sandboxed;
import com.google.gerrit.server.AuditEvent;
import com.google.gerrit.server.AnonymousUser;
import com.google.gerrit.server.audit.HttpAuditEvent;
import com.google.gerrit.testing.FakeGroupAuditService;
import com.google.inject.Inject;
import java.util.Collections;
import javax.servlet.http.HttpServletResponse;
import org.eclipse.jgit.junit.TestRepository;
import org.eclipse.jgit.lib.Repository;
import org.eclipse.jgit.transport.CredentialsProvider;
import org.eclipse.jgit.transport.RefSpec;
import org.eclipse.jgit.transport.UsernamePasswordCredentialsProvider;
@@ -30,8 +32,6 @@ import org.junit.Before;
import org.junit.Test;
public class GitOverHttpServletIT extends AbstractPushForReview {
private static final long AUDIT_EVENT_TIMEOUT = 500L;
@Inject private FakeGroupAuditService auditService;
@Before
@@ -39,51 +39,60 @@ public class GitOverHttpServletIT extends AbstractPushForReview {
CredentialsProvider.setDefault(
new UsernamePasswordCredentialsProvider(admin.username, admin.httpPassword));
selectProtocol(AbstractPushForReview.Protocol.HTTP);
// Don't clear audit events here, since we can't guarantee all test setup has run yet.
}
@Test
@Sandboxed
public void receivePackAuditEventLog() throws Exception {
auditService.clearEvents();
auditService.drainHttpAuditEvents();
testRepo
.git()
.push()
.setRemote("origin")
.setRefSpecs(new RefSpec("HEAD:refs/for/master"))
.call();
waitForAudit(auditService);
// Git smart protocol makes two requests:
// https://github.com/git/git/blob/master/Documentation/technical/http-protocol.txt
assertThat(auditService.auditEvents.size()).isEqualTo(2);
ImmutableList<HttpAuditEvent> auditEvents = auditService.drainHttpAuditEvents();
assertThat(auditEvents).hasSize(2);
AuditEvent e = auditService.auditEvents.get(1);
assertThat(e.who.getAccountId()).isEqualTo(admin.id);
assertThat(e.what).endsWith("/git-receive-pack");
assertThat(e.params).isEmpty();
assertThat(((HttpAuditEvent) e).httpStatus).isEqualTo(HttpServletResponse.SC_OK);
HttpAuditEvent lsRemote = auditEvents.get(0);
assertThat(lsRemote.who.getAccountId()).isEqualTo(admin.id);
assertThat(lsRemote.what).endsWith("/info/refs?service=git-receive-pack");
assertThat(lsRemote.params).containsExactly("service", "git-receive-pack");
assertThat(lsRemote.httpStatus).isEqualTo(HttpServletResponse.SC_OK);
HttpAuditEvent receivePack = auditEvents.get(1);
assertThat(receivePack.who.getAccountId()).isEqualTo(admin.id);
assertThat(receivePack.what).endsWith("/git-receive-pack");
assertThat(receivePack.params).isEmpty();
assertThat(receivePack.httpStatus).isEqualTo(HttpServletResponse.SC_OK);
}
@Test
@Sandboxed
public void uploadPackAuditEventLog() throws Exception {
auditService.clearEvents();
testRepo.git().fetch().call();
waitForAudit(auditService);
assertThat(auditService.auditEvents.size()).isEqualTo(1);
AuditEvent e = auditService.auditEvents.get(0);
assertThat(e.who.toString()).isEqualTo("ANONYMOUS");
assertThat(e.params.get("service"))
.containsExactlyElementsIn(Collections.singletonList("git-upload-pack"));
assertThat(e.what).endsWith("service=git-upload-pack");
assertThat(((HttpAuditEvent) e).httpStatus).isEqualTo(HttpServletResponse.SC_OK);
}
private void waitForAudit(FakeGroupAuditService auditService) throws InterruptedException {
synchronized (auditService.auditEvents) {
auditService.auditEvents.wait(AUDIT_EVENT_TIMEOUT);
auditService.drainHttpAuditEvents();
// testRepo is already a clone. Make a server-side change so we have something to fetch.
try (Repository repo = repoManager.openRepository(project)) {
new TestRepository<>(repo).branch("master").commit().create();
}
testRepo.git().fetch().call();
ImmutableList<HttpAuditEvent> auditEvents = auditService.drainHttpAuditEvents();
assertThat(auditEvents).hasSize(2);
HttpAuditEvent lsRemote = auditEvents.get(0);
// Repo URL doesn't include /a, so fetching doesn't cause authentication.
assertThat(lsRemote.who).isInstanceOf(AnonymousUser.class);
assertThat(lsRemote.what).endsWith("/info/refs?service=git-upload-pack");
assertThat(lsRemote.params).containsExactly("service", "git-upload-pack");
assertThat(lsRemote.httpStatus).isEqualTo(HttpServletResponse.SC_OK);
HttpAuditEvent uploadPack = auditEvents.get(1);
assertThat(lsRemote.who).isInstanceOf(AnonymousUser.class);
assertThat(uploadPack.what).endsWith("/git-upload-pack");
assertThat(uploadPack.params).isEmpty();
assertThat(uploadPack.httpStatus).isEqualTo(HttpServletResponse.SC_OK);
}
}