Files
gerrit/proto/cache.proto
Patrick Hiesel e945da3880 Serialize AccountCache
Data on googlesource.com suggests that we spend a significant amount of
time loading accounts from NoteDb. This is true for all Gerrit
installations, but especially for distributed setups or setups that
restart often.

This commit serializes the AccountCache using established mechanisms.
To do that, we decompose AccountState - the entity that we currently
cache - into smaller chunks that can be cached individually:

1) External IDs + user name (cached in ExternalIdCache)
2) CachedAccountDetails (newly cached)
3) Gerrit's default settings (we start caching this in a follow-up
   change)

CachedAccountDetails - a new class representing all information stored
under the user's ref (refs/users/<sharded-id>) is now cached in the
'accounts' cache instead of AccountState. AccountState is contructed
when requested from the sources 1-3 and not cached itself as it's
just a plain wrapper around other state, that we already cache.

This has the following advantages:
1) CachedAccountDetails contains only details from
   refs/users/<sharded-id>.
   By that, we can use the SHA1 of that ref as cache key and start
   serializing the cache to eliminate cold start penalty as well as
   router assignment change penalty (for distributed setups).
   It also means that we don't have to do invalidation ourselves
   anymore.
2) When the server's default preferences change, we don't have to
   invalidate all accounts anymore. This is a shortcoming of the
   current approach.
3) The projected speed improvements that come from persisting the
   cache makes it so that we can remove the logic to load accounts
   in parallel.

The new aproach also means that:
1) We now need to get the SHA1 from refs/users/<sharded-id> for
   every account that we look up. Data suggests that this is not an
   issue for latency as ref lookups are cheap. We retain the method
   in AccountCacheImpl that allows the caller to load a
   Set<AccountState> so that in the cases where we want many many
   accounts (change queries, ...) we have to open All-Users only once.
   In case we discover that - against our assumptions - this is a
   bottleneck we can add a small in-memory cache for AccountState.

Related prework:
The new aproach shows that the way we handle user preferences is
suboptimal, because:
1) We pipe through API data types to the storage
2) We overlay defaults directly in the storage
3) Use reflection to get/set fields.

I considered and prototyped a rewrite of this and initially thought I
could get it done before serializing the account cache. However it turned
out to be significantly more work and the impact of that work (besides
being a much desired cleanup) is rather low. So I decided to get the
cache serialized independently.

Change-Id: I61ae57802f37c62ee9e3552e4a0f19fe3d8d762b
2020-04-07 10:27:44 +02:00

326 lines
8.6 KiB
Protocol Buffer

// Copyright (C) 2018 The Android Open Source Project
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
syntax = "proto3";
package gerrit.cache;
option java_package = "com.google.gerrit.server.cache.proto";
// Serialized form of com.google.gerrit.server.change.CHangeKindCacheImpl.Key.
// Next ID: 4
message ChangeKindKeyProto {
bytes prior = 1;
bytes next = 2;
string strategy_name = 3;
}
// Serialized form of
// com.google.gerrit.server.change.MergeabilityCacheImpl.EntryKey.
// Next ID: 5
message MergeabilityKeyProto {
bytes commit = 1;
bytes into = 2;
string submit_type = 3;
string merge_strategy = 4;
}
// Serialized form of com.google.gerrit.extensions.auth.oauth.OAuthToken.
// Next ID: 6
message OAuthTokenProto {
string token = 1;
string secret = 2;
string raw = 3;
// Epoch millis.
int64 expires_at_millis = 4;
string provider_id = 5;
}
// Serialized form of com.google.gerrit.server.notedb.ChangeNotesCache.Key.
// Next ID: 4
message ChangeNotesKeyProto {
string project = 1;
int32 change_id = 2;
bytes id = 3;
}
// Serialized from of com.google.gerrit.server.notedb.ChangeNotesState.
//
// Note on embedded protos: this is just for storing in a cache, so some formats
// were chosen ease of coding the initial implementation. In particular, where
// there already exists another serialization mechanism in Gerrit for
// serializing a particular field, we use that rather than defining a new proto
// type. This includes types that can be serialized to proto using
// ProtoConverters as well as NoteDb and indexed types that are serialized using
// JSON. We can always revisit this decision later; it just requires bumping the
// cache version.
//
// Note on nullability: there are a lot of nullable fields in ChangeNotesState
// and its dependencies. It's likely we could make some of them non-nullable,
// but each one of those would be a potentially significant amount of cleanup,
// and there's no guarantee we'd be able to eliminate all of them. (For a less
// complex class, it's likely the cleanup would be more feasible.)
//
// Instead, we just take the tedious yet simple approach of having a "has_foo"
// field for each nullable field "foo", indicating whether or not foo is null.
//
// Next ID: 24
message ChangeNotesStateProto {
// Effectively required, even though the corresponding ChangeNotesState field
// is optional, since the field is only absent when NoteDb is disabled, in
// which case attempting to use the ChangeNotesCache is programmer error.
bytes meta_id = 1;
int32 change_id = 2;
// Next ID: 26
message ChangeColumnsProto {
string change_key = 1;
// Epoch millis.
int64 created_on_millis = 2;
// Epoch millis.
int64 last_updated_on_millis = 3;
int32 owner = 4;
string branch = 5;
int32 current_patch_set_id = 6;
bool has_current_patch_set_id = 7;
string subject = 8;
string topic = 9;
bool has_topic = 10;
string original_subject = 11;
bool has_original_subject = 12;
string submission_id = 13;
bool has_submission_id = 14;
reserved 15; // assignee
reserved 16; // has_assignee
string status = 17;
bool has_status = 18;
bool is_private = 19;
bool work_in_progress = 20;
bool review_started = 21;
int32 revert_of = 22;
bool has_revert_of = 23;
string cherry_pick_of = 24;
bool has_cherry_pick_of = 25;
}
// Effectively required, even though the corresponding ChangeNotesState field
// is optional, since the field is only absent when NoteDb is disabled, in
// which case attempting to use the ChangeNotesCache is programmer error.
ChangeColumnsProto columns = 3;
reserved 4; // past_assignee
repeated string hashtag = 5;
// Raw PatchSet proto as produced by PatchSetProtoConverter.
repeated bytes patch_set = 6;
// Raw PatchSetApproval proto as produced by PatchSetApprovalProtoConverter.
repeated bytes approval = 7;
// Next ID: 4
message ReviewerSetEntryProto {
string state = 1;
int32 account_id = 2;
// Epoch millis.
int64 timestamp_millis = 3;
}
repeated ReviewerSetEntryProto reviewer = 8;
// Next ID: 4
message ReviewerByEmailSetEntryProto {
string state = 1;
string address = 2;
// Epoch millis.
int64 timestamp_millis = 3;
}
repeated ReviewerByEmailSetEntryProto reviewer_by_email = 9;
repeated ReviewerSetEntryProto pending_reviewer = 10;
repeated ReviewerByEmailSetEntryProto pending_reviewer_by_email = 11;
repeated int32 past_reviewer = 12;
// Next ID: 5
message ReviewerStatusUpdateProto {
// Epoch millis.
int64 timestamp_millis = 1;
int32 updated_by = 2;
int32 reviewer = 3;
string state = 4;
}
repeated ReviewerStatusUpdateProto reviewer_update = 13;
// JSON produced from
// com.google.gerrit.server.index.change.ChangeField.StoredSubmitRecord.
repeated string submit_record = 14;
// Raw ChangeMessage proto as produced by ChangeMessageProtoConverter.
repeated bytes change_message = 15;
// JSON produced from com.google.gerrit.entities.Comment.
repeated string published_comment = 16;
reserved 17; // read_only_until
reserved 18; // has_read_only_until
// Number of updates to the change's meta ref.
int32 update_count = 19;
string server_id = 20;
bool has_server_id = 21;
message AssigneeStatusUpdateProto {
// Epoch millis.
int64 timestamp_millis = 1;
int32 updated_by = 2;
int32 current_assignee = 3;
bool has_current_assignee = 4;
}
repeated AssigneeStatusUpdateProto assignee_update = 22;
// An update to the attention set of the change. See class AttentionSetUpdate
// for context.
message AttentionSetUpdateProto {
// Epoch millis.
int64 timestamp_millis = 1;
int32 account = 2;
// Maps to enum AttentionSetUpdate.Operation
string operation = 3;
string reason = 4;
}
repeated AttentionSetUpdateProto attention_set_update = 23;
}
// Serialized form of com.google.gerrit.server.query.change.ConflictKey
message ConflictKeyProto {
bytes commit = 1;
bytes other_commit = 2;
string submit_type = 3;
bool content_merge = 4;
}
// Serialized form of com.google.gerrit.server.query.git.TagSetHolder.
// Next ID: 3
message TagSetHolderProto {
string project_name = 1;
// Next ID: 4
message TagSetProto {
string project_name = 1;
// Next ID: 3
message CachedRefProto {
bytes id = 1;
int32 flag = 2;
}
map<string, CachedRefProto> ref = 2;
// Next ID: 3
message TagProto {
bytes id = 1;
bytes flags = 2;
}
repeated TagProto tag = 3;
}
TagSetProto tags = 2;
}
// Serialized form of
// com.google.gerrit.server.account.externalids.AllExternalIds.
// Next ID: 2
message AllExternalIdsProto {
// Next ID: 6
message ExternalIdProto {
string key = 1;
int32 accountId = 2;
string email = 3;
string password = 4;
bytes blobId = 5;
}
repeated ExternalIdProto external_id = 1;
}
// Serialized form of a list of com.google.gerrit.entities.AccountGroup.UUID
// Next ID: 2
message AllExternalGroupsProto {
message ExternalGroupProto {
string groupUuid = 1;
}
repeated ExternalGroupProto external_group = 1;
}
// Key for com.google.gerrit.server.git.PureRevertCache.
// Next ID: 4
message PureRevertKeyProto {
string project = 1;
bytes claimed_original = 2;
bytes claimed_revert = 3;
}
// Key for com.google.gerrit.server.account.ProjectWatches.
// Next ID: 4
message ProjectWatchProto {
string project = 1;
string filter = 2;
repeated string notify_type = 3;
}
// Serialized form of
// com.google.gerrit.entities.Account.
// Next ID: 9
message AccountProto {
int32 id = 1;
int64 registered_on = 2;
string full_name = 3;
string display_name = 4;
string preferred_email = 5;
bool inactive = 6;
string status = 7;
string meta_id = 8;
}
// Serialized form of com.google.gerrit.server.account.CachedAccountDetails.Key.
// Next ID: 3
message AccountKeyProto {
int32 account_id = 1;
bytes id = 2;
}
// Serialized form of com.google.gerrit.server.account.CachedAccountDetails.
// Next ID: 4
message AccountDetailsProto {
AccountProto account = 1;
repeated ProjectWatchProto project_watch_proto = 2;
string user_preferences = 3;
}