Files
gerrit/proto
Patrick Hiesel e945da3880 Serialize AccountCache
Data on googlesource.com suggests that we spend a significant amount of
time loading accounts from NoteDb. This is true for all Gerrit
installations, but especially for distributed setups or setups that
restart often.

This commit serializes the AccountCache using established mechanisms.
To do that, we decompose AccountState - the entity that we currently
cache - into smaller chunks that can be cached individually:

1) External IDs + user name (cached in ExternalIdCache)
2) CachedAccountDetails (newly cached)
3) Gerrit's default settings (we start caching this in a follow-up
   change)

CachedAccountDetails - a new class representing all information stored
under the user's ref (refs/users/<sharded-id>) is now cached in the
'accounts' cache instead of AccountState. AccountState is contructed
when requested from the sources 1-3 and not cached itself as it's
just a plain wrapper around other state, that we already cache.

This has the following advantages:
1) CachedAccountDetails contains only details from
   refs/users/<sharded-id>.
   By that, we can use the SHA1 of that ref as cache key and start
   serializing the cache to eliminate cold start penalty as well as
   router assignment change penalty (for distributed setups).
   It also means that we don't have to do invalidation ourselves
   anymore.
2) When the server's default preferences change, we don't have to
   invalidate all accounts anymore. This is a shortcoming of the
   current approach.
3) The projected speed improvements that come from persisting the
   cache makes it so that we can remove the logic to load accounts
   in parallel.

The new aproach also means that:
1) We now need to get the SHA1 from refs/users/<sharded-id> for
   every account that we look up. Data suggests that this is not an
   issue for latency as ref lookups are cheap. We retain the method
   in AccountCacheImpl that allows the caller to load a
   Set<AccountState> so that in the cases where we want many many
   accounts (change queries, ...) we have to open All-Users only once.
   In case we discover that - against our assumptions - this is a
   bottleneck we can add a small in-memory cache for AccountState.

Related prework:
The new aproach shows that the way we handle user preferences is
suboptimal, because:
1) We pipe through API data types to the storage
2) We overlay defaults directly in the storage
3) Use reflection to get/set fields.

I considered and prototyped a rewrite of this and initially thought I
could get it done before serializing the account cache. However it turned
out to be significantly more work and the impact of that work (besides
being a much desired cleanup) is rather low. So I decided to get the
cache serialized independently.

Change-Id: I61ae57802f37c62ee9e3552e4a0f19fe3d8d762b
2020-04-07 10:27:44 +02:00
..
2019-09-11 08:17:14 +09:00
2020-04-07 10:27:44 +02:00