Merge "Update designdoc to current state"

2021-03-11 16:13:44 +00:00
parent f253fa78be d7873e6071
commit ae8b54b741
1 changed files with 302 additions and 401 deletions
--- a/Documentation/dev-design.txt
+++ b/Documentation/dev-design.txt
@@ -18,83 +18,45 @@ centralized usage of Git.
 == Background
 Google developed Mondrian, a Perforce based code review tool to
 facilitate peer-review of changes prior to submission to the central
 code repository.  Mondrian is not open source, as it is tied to the
 use of Perforce and to many Google-only services, such as Bigtable.
 Google employees have often described how useful Mondrian and its
 peer-review process is to their day-to-day work.
 Guido van Rossum open sourced portions of Mondrian within Rietveld,
 a similar code review tool running on Google App Engine, but for
 use with Subversion rather than Perforce.  Rietveld is in common
 use by many open source projects, facilitating their peer reviews
 much as Mondrian does for Google employees.  Unlike Mondrian and
 the Google Perforce triggers, Rietveld is strictly advisory and
 does not enforce peer-review prior to submission.
 Git is a distributed version control system, wherein each repository
 is assumed to be owned/maintained by a single user.  There are no
 inherent security controls built into Git, so the ability to read
 from or write to a repository is controlled entirely by the host's
-filesystem access controls.  When multiple maintainers collaborate
+filesystem or network access controls.
 on a single shared repository a high degree of trust is required,
 as any collaborator with write access can alter the repository.
-Gitosis provides tools to secure centralized Git repositories,
+The objective of Gerrit is to facilitate Git development by larger
-permitting multiple maintainers to manage the same project at once,
+teams: it provides a means to enforce organizational policies around
-by restricting the access to only over a secure network protocol,
+code submissions, eg. "all code must be reviewed by another
-much like Perforce secures a repository by only permitting access
+developer", "all code shall pass tests". It achieves this by
 over its network port.
-The Android Open Source Project (AOSP) was founded by Google by the
+* providing fine-grained (per-branch, per-repository, inheriting)
-open source releasing of the Android operating system.  AOSP has
+  access controls, which allow a Gerrit admin to delegate permissions
-selected Git as its primary version control tool.  As many of the
+  to different team(-lead)s.
 engineers have a background of working with Mondrian at Google,
 there is a strong desire to have the same (or better) feature set
 available for Git and AOSP.
 Gerrit Code Review started as a simple set of patches to Rietveld,
 and was originally built to service AOSP. This quickly turned
 into a fork as we added access control features that Guido van
 Rossum did not want to see complicating the Rietveld code base. As
 the functionality and code were starting to become drastically
 different, a different name was needed. Gerrit calls back to the
 original namesake of Rietveld, Gerrit Rietveld, a Dutch architect.
 Gerrit 2.x is a complete rewrite of the Gerrit fork, completely
 changing the implementation from Python on Google App Engine, to Java
 on a J2EE servlet container and an SQL database.
 Since Gerrit 3.x link:note-db.html[NoteDb] replaced the SQL database
 and all metadata is now stored in Git.
 * link:http://video.google.com/videoplay?docid=-8502904076440714866[Mondrian Code Review On The Web,role=external,window=_blank]
 * link:https://github.com/rietveld-codereview/rietveld[Rietveld - Code Review for Subversion,role=external,window=_blank]
 * link:http://eagain.net/gitweb/?p=gitosis.git;a=blob;f=README.rst;hb=HEAD[Gitosis README,role=external,window=_blank]
 * link:http://source.android.com/[Android Open Source Project,role=external,window=_blank]
 * facilitate code review: Gerrit offers a web view of pending code
  changes, that allows for easy reading and commenting by humans. The
  web view can offer data coming out of automated QA processes (eg.
  CI). The permission system also includes fine grained control of who
  can approve pending changes for submission to further facilitate
  delegation of code ownership.
 == Overview
 Developers create one or more changes on their local desktop system,
 then upload them for review to Gerrit using the standard `git push`
-command line program, or any GUI which can invoke `git push` on
+command line program, or any GUI which can invoke `git push` on behalf
-behalf of the user.  Authentication and data transfer are handled
+of the user. Authentication and data transfer are handled through SSH
-through SSH.  Users are authenticated by username and public/private
+and HTTPS. Uploads are protected by the authentication,
-key pair, and all data transfer is protected by the SSH connection
+confidentiality and integrity offered by the transport (SSH, HTTPS).
 and Git's own data integrity checks.
-Each Git commit created on the client desktop system is converted
+Each Git commit created on the client desktop system is converted into
-into a unique change record which can be reviewed independently.
+a unique change record which can be reviewed independently.
 Change records are stored in NoteDb.
 A summary of each newly uploaded change is automatically emailed
 to reviewers, so they receive a direct hyperlink to review the
 change on the web.  Reviewer email addresses can be specified on the
-`git push` command line, but typically reviewers are automatically
+`git push` command line, but typically reviewers are added in the web
-selected by Gerrit by identifying users who have change approval
+interface.
 permissions in the project.
 Reviewers use the web interface to read the side-by-side or unified
 diff of a change, and insert draft inline/file comments where
@@ -103,20 +65,16 @@ they publish those comments.  Published comments are automatically
 emailed to the change author by Gerrit, and are CC'd to all other
 reviewers who have already commented on the change.
-When publishing comments reviewers are also given the opportunity
+Reviewers can score the change ("vote"), indicating whether they feel the
-to score the change, indicating whether they feel the change is
+change is ready for inclusion in the project, needs more work, or
-ready for inclusion in the project, needs more work, or should be
+should be rejected outright. These scores provide direct feedback to
-rejected outright.  These scores provide direct feedback to Gerrit's
+Gerrit's change submit function.
 change submit function.
-After a change has been scored positively by reviewers, Gerrit
+After a change has been scored positively by reviewers, Gerrit enables
-enables a submit button on the web interface.  Authorized users
+a submit button on the web interface. Authorized users can push the
-can push the submit button to have the change enter the project
+submit button to have the change enter the project repository. The
-repository.  The equivalent in Subversion or Perforce would be
+user pressing the submit button does not need to be the author of the
-that Gerrit is invoking `svn commit` or `p4 submit` on behalf of
+change.
 the web user pressing the button.  Due to the way Git audit trails
 are maintained, the user pressing the submit button does not need
 to be the author of the change.
 == Infrastructure
@@ -125,18 +83,30 @@ End-user web browsers make HTTP requests directly to Gerrit's
 HTTP server.  As nearly all of the user interface is implemented
 through PolyGerrit, the majority of these requests are transmitting
 compressed JSON payloads, with all HTML being generated within the
-browser.  Most responses are under 1 KB.
+browser.
-Gerrit's HTTP server side component is implemented as a standard
+Gerrit's HTTP server side component is implemented as a standard Java
-Java servlet, and thus runs within any J2EE servlet container.
+servlet, and thus runs within any link:install-j2ee.html[J2EE servlet
-Popular choices for deployments would be Tomcat or Jetty, as these
+container]. The standard install will run inside Jetty, which is
-are high-quality open-source servlet containers that are readily
+included in the binary.
 available for download.
-End-user uploads are performed over SSH, so Gerrit's servlets also
+End-user uploads are performed over SSH or HTTP, so Gerrit's servlets
-start up a background thread to receive SSH connections through
+also start up a background thread to receive SSH connections through
-an independent SSH port.  SSH clients communicate directly with
+an independent SSH port. SSH clients communicate directly with this
-this port, bypassing the HTTP server used by browsers.
+port, bypassing the HTTP server used by browsers.
 User authentication is handled by identity realms. Gerrit supports the
 following types of authentication:
 * OpenId (see link:http://openid.net/developers/specs/[OpenID Specifications,role=external,window=_blank])
 * OAuth2
 * LDAP
 * Google accounts (on googlesource.com)
 * SAML
 * Kerberos
 * 3rd party SSO
 === NoteDb
 Server side data storage for Gerrit is broken down into two different
 categories:
@@ -156,28 +126,119 @@ namespace.  Remote filesystems are likely to perform worse than
 local ones, due to Git disk IO behavior not being optimized for
 remote access.
-The Gerrit metadata contains a summary of the available changes,
+The Gerrit metadata contains a summary of the available changes, all
-all comments (published and drafts), and individual user account
+comments (published and drafts), and individual user account
-information.  The metadata is mostly housed in the database (*1),
+information.
 which can be located either on the same server as Gerrit, or on
 a different (but nearby) server.  Most installations would opt to
 install both Gerrit and the metadata database on the same server,
 to reduce administration overheads.
-User authentication is handled by OpenID, and therefore Gerrit
+Gerrit metadata is also stored in Git, with the commits marking the
-requires that the OpenID provider selected by a user must be
+historical state of metadata. Data is stored in the trees associated
-online and operating in order to authenticate that user.
+with the commits, typically using Git config file or JSON as the base
 format. For metadata, there are 3 types of data: changes, accounts and
 groups.
-* link:http://www.kernel.org/pub/software/scm/git/docs/gitrepository-layout.html[Git Repository Format,role=external,window=_blank]
+Accounts are stored in a special Git repository `All-Users`.
 * link:http://openid.net/developers/specs/[OpenID Specifications,role=external,window=_blank]
-*1  Although an effort is underway to eliminate the use of the
+Accounts can be grouped in groups. Gerrit has a built-in group system,
-database altogether, and to store all the metadata directly in
+but can also interface to external group system (eg. Google groups,
-the git repositories themselves.  So far, as of Gerrit 2.2.1, of
+LDAP). The built-in groups are stored in `All-Users`.
 all Gerrit's metadata, only the project configuration metadata
 has been migrated out of the database and into the git
 repositories for each project.
 Draft comments are stored in `All-Users` too.
 Permissions are stored in Git, in a branch `refs/meta/config` for the
 repository. Repository configuration (including permissions) supports
 single inheritance, with the `All-Projects` repository containing
 site-wide defaults.
 Code review metadata is stored in Git, alongside the code under
 review. Metadata includes change status, votes, comments. This review
 metadata is stored in NoteDb along with the submitted code and code
 under review. Hence, the review history can be exported with `git
 clone --mirror` by anyone with sufficient permissions.
 == Permissions
 Permissions are specified on branch names, and given to groups. For
 example,
 ```
 [access "refs/heads/stable/*"]
        push = group Release-Engineers
 ```
 this provides a rule, granting Release-Engineers push permission for
 stable branches.
 There are fundamentally two types of permissions:
 * Write permissions (who can vote, push, submit etc.)
 * Read permissions (who can see data)
 Read permissions need special treatment across Gerrit, because Gerrit
 should only surface data (including repository existence) if a user
 has read permission. This means that
 * The git wire protocol support must omit references from
  advertisement if the user lacks read permissions
 * Uploads through the git wire protocol must refuse commits that are
  based on SHA1s for data that the user can't see.
 * Tags are only visible if their commits are visible to user through a
  non-tag reference.
 Metadata (eg. OAuth credentials) is also stored in Git. Existing
 endpoints must refuse creating branches or changes that expose these
 metadata or allow changes to them.
 === Indexing
 Almost all data is stored as Git, but Git only supports fast lookup by
 SHA1 or by ref (branch) name. Therefore Gerrit also has an indexing
 system (powered by Lucene by default) for other types of queries.
 There are 4 indices:
 * Project index - find repositories by name, parent project, etc.
 * Account index - find accounts by name, email, etc.
 * Group index - find groups by name, owner, description etc.
 * Change index - find changes by file, status, modification date etc.
 The base entities are characterized by SHA1s. Storing the
 characterizing SHA1s allows detection of stale index entries.
 == Plug-in architecture
 Gerrit has a plug-in architecture. Plugins can be installed by
 dropping them into $site_directory/plugins, or at runtime through
 plugin SSH commands, or the plugin REST API.
 === Backend plugins
 At runtime, code can be loaded from a `.jar` file. This code can hook
 into predefined extension points. A common use of plugins is to have
 Gerrit interoperate with site-specific tools, such as CI-systems or
 issue trackers.
 // list some notable extension points, and notable plugins
 // link to plugin development
 Some backend plugins expose the JVM for scripting use (eg. Groovy,
 Scala), so plugins can be written without having to setup a Java
 development environment.
 // Luca to expand: how do script plugins load their scripts?
 === Frontend plugins
 The UI can be extended using Frontend plugins. This is useful for
 changing the look & feel of Gerrit, but it can also be used to surface
 data from systems that aren't integrated with the Gerrit backend, eg.
 CI systems or code coverage providers.
 // FE team to write a bit more:
 // * how to load ?
 // * XSRF, CORS ?
 == Internationalization and Localization
@@ -189,14 +250,11 @@ The majority of Gerrit's users will be writing change descriptions
 and comments in English, and therefore an English user interface
 is usable by the target user base.
 Right-to-left (RTL) support is only barely considered within the
 Gerrit code base.  Some portions of the code have tried to take
 RTL into consideration, while others probably need to be modified
 before translating the UI to an RTL language.
 == Accessibility Considerations
 // UI team to rewrite this.
 Whenever possible Gerrit displays raw text rather than image icons,
 so screen readers should still be able to provide useful information
 to blind persons accessing Gerrit sites.
@@ -215,7 +273,9 @@ provide hints to screen readers.
 == Browser Compatibility
-Supporting non-JavaScript enabled browsers is a non-goal for Gerrit.
+Gerrit requires a JavaScript enabled browser.
 // UI team to add section on minimum browser requirements.
 As Gerrit is a pure JavaScript application on the client side, with
 no server side rendering fallbacks, the browser must support modern
@@ -223,54 +283,19 @@ JavaScript semantics in order to access the Gerrit web application.
 Dumb clients such as `lynx`, `wget`, `curl`, or even many search engine
 spiders are not able to access Gerrit content.
-There are number of web browsers available with full JavaScript
+All of the content stored within Gerrit is also available through
-support, and nearly every operating system (including any PDA-like
+other means, such as gitweb or the `git://` protocol. Any existing
-mobile phone) comes with one standard.  Users who are committed
+search engine crawlers can index the server-side HTML served by a code
-to developing changes for a Gerrit managed project can be expected
+browser, and thus can index the majority of the changes which might
-to be able to run a JavaScript enabled browser, as they also would
+appear in Gerrit. Therefore the lack of support for most search engine
-need to be running Git in order to contribute.
+crawlers is a non-issue for most Gerrit deployments.
 There are a number of open source browsers available, including
 Firefox and Chromium.  Users have some degree of choice in their
 browser selection, including being able to build and audit their
 browser from source.
 The majority of the content stored within Gerrit is also available
 through other means, such as gitweb or the `git://` protocol.
 Any existing search engine spider can crawl the server-side HTML
 produced by gitweb, and thus can index the majority of the changes
 which might appear in Gerrit.  Some engines may even choose to
 crawl the native version control database, such as ohloh.net does.
 Therefore the lack of support for most search engine spiders is a
 non-issue for most Gerrit deployments.
 == Product Integration
-Gerrit integrates with an existing gitweb installation by optionally
+Gerrit optionally surfaces links to HTML pages in a code browser. The
-creating hyperlinks to reference changes on the gitweb server.
+links are configurable, and Gerrit comes with a built-in code browser,
-
+called Gitiles.
 Gerrit integrates with an existing git-daemon installation by
 optionally displaying `git://` URLs for users to download a
 change through the native Git protocol.
 Gerrit integrates with any OpenID provider for user authentication,
 making it easier for users to join a Gerrit site and manage their
 authentication credentials to it.  To make use of Google Accounts
 as an OpenID provider easier, Gerrit has a shorthand "Sign in with
 a Google Account" link on its sign-in screen.  Gerrit also supports
 a shorthand sign in link for Yahoo!.  Other providers may also be
 supported more directly in the future.
 Site administrators may limit the range of OpenID providers to
 a subset of "reliable providers".  Users may continue to use
 any OpenID provider to publish comments, but granted privileges
 are only available to a user if the only entry point to their
 account is through the defined set of "reliable OpenID providers".
 This permits site administrators to require HTTPS for OpenID,
 and to use only large main-stream providers that are trustworthy,
 or to require users to only use a custom OpenID provider installed
 alongside Gerrit Code Review.
 Gerrit integrates with some types of corporate single-sign-on (SSO)
 solutions, typically by having the SSO authentication be performed
@@ -290,16 +315,17 @@ they choose.
 Gerrit does not integrate with any Google service, or any other
 services other than those listed above.
 Plugins (see above) can be used to drive product integrations from the
 Gerrit side. Products that support Gerrit explicitly can use the REST
 API or the SSH API to contact Gerrit.
 == Privacy Considerations
 Gerrit stores the following information per user account:
 * Full Name
 * Preferred Email Address
 * Mailing Address '(Optional, Encrypted)'
 * Country '(Optional, Encrypted)'
 * Phone Number '(Optional, Encrypted)'
 * Fax Number '(Optional, Encrypted)'
 The full name and preferred email address fields are shown to any
 site visitor viewing a page containing a change uploaded by the
@@ -325,271 +351,145 @@ project's mailing list archives.
 The user's name and email address is stored unencrypted in the
 link:config-accounts.html#all-users[All-Users] repository.
 The snail-mail mailing address, country, and phone and fax numbers
 are gathered to help project leads contact the user should there
 be a legal question regarding any change they have uploaded.
 These sensitive fields are immediately encrypted upon receipt with
 a GnuPG public key, and stored "off site" in another data store,
 isolated from the main Gerrit change data.  Gerrit does not have
 access to the matching private key, and as such cannot decrypt the
 information.  Therefore these fields are write-once in Gerrit, as not
 even the account owner can recover the values they previously stored.
 It is expected that the address information would only need to be
 decrypted and revealed with a valid court subpoena, but this is
 really left to the discretion of the Gerrit site administrator as
 to when it is reasonable to reveal this information to a 3rd party.
 == Spam and Abuse Considerations
-Gerrit makes no attempt to detect spam changes or comments.  The
+There is no spam protection for the Git protocol upload path.
-somewhat high barrier to entry makes it unlikely that a spammer
+Uploading a change successfully requires a pre-existing account, and a
-will target Gerrit.
+lot of up-front effort.
-To upload a change, the client must speak the native Git protocol
+Gerrit makes no attempt to detect spam changes or comments in the web
-embedded in SSH, with some custom Gerrit semantics added on top.
+UI. To post and publish a comment a client must sign in and then use
-The client must have their public key already stored in the Gerrit
+the XSRF protected JSON-RPC interface to publish the draft on an
-database, which can only be done through the XSRF protected
+existing change record.
 JSON-RPC interface.  The level of effort required to construct
 the necessary tools to upload a well-formatted change that isn't
 rejected outright by the Git and Gerrit checksum validations is
 too high to for a spammer to get any meaningful return.
-To post and publish a comment a client must sign in with an OpenID
+Absence of SPAM handling is based upon the idea that Gerrit caters to
-provider and then use the XSRF protected JSON-RPC interface to
+a niche audience, and will therefore be unattractive to spammers. In
-publish the draft on an existing change record.  Again, the level of
+addition, it is not a factor for corporate, on-premise deployments.
 effort required to implement the Gerrit specific XSRF protections
 and the JSON-RPC payload format necessary to post a draft and then
 publish that draft is simply too high for a spammer to bother with.
 Both of these assumptions are also based upon the idea that Gerrit
 will be a lot less popular than blog software, and thus will be
 running on a lot fewer websites.  Spammers therefore have very little
 returned benefit for getting over the protocol hurdles.
 These assumptions may need to be revisited in the future if any
 public Gerrit site actually notices spam.
 == Latency
 Gerrit targets for sub-250 ms per page request, mostly by using
 very compact JSON payloads between client and server.  However, as
 most of the serving stack (network, hardware, metadata
 database) is out of control of the Gerrit developers, no real
 guarantees can be made about latency.
 == Scalability
-Gerrit is designed for a very large scale open source project, or
+Gerrit supports the Git wire protocol, and an API (one API for HTTP,
-large commercial development project.  Roughly this amounts to
+and one for SSH).
 parameters such as the following:
-.Design Parameters
+The git wire protocol does a client/server negotiation to avoid
-[options="header"]
+sending too much data. This negotation occupies a CPU, so the number
-|======================================================
+of concurrent push/fetch operations should be capped by the number of
-|Parameter        | Default Maximum | Estimated Maximum
+CPUs.
 |Projects         |         1,000   | 10,000
 |Contributors     |         1,000   | 50,000
 |Changes/Day      |           100   |  2,000
 |Revisions/Change |            20   |     20
 |Files/Change     |            50   | 16,000
 |Comments/File    |           100   |    100
 |Reviewers/Change |             8   |      8
 |======================================================
-Out of the box, Gerrit will handle the "Default Maximum". Site
+Clients on slow network connections may be network bound rather than
-administrators may reconfigure their servers by editing gerrit.config
+server side CPU bound, in which case a core may be effectively shared
-to run closer to the estimated maximum if sufficient memory is made
+with another user. Possible core sharing due to network bottlenecks
 available to the JVM and the relevant cache.*.memoryLimit variables
 are increased from their defaults.
 === Discussion
 Very few, if any open source projects have more than a handful of
 Git repositories associated with them.  Since Gerrit treats each
 Git repository as a project, an upper limit of 10,000 projects
 is reasonable.  If a site has more than 1,000 projects, administrators
 should increase
 link:config-gerrit.html#cache.name.memoryLimit[`cache.projects.memoryLimit`]
 to match.
 Almost no open source project has 1,000 contributors over all time,
 let alone on a daily basis.  This default figure of 1,000 was WAG'd by
 looking at PR statements published by cell phone companies picking
 up the Android operating system.  If all of the stated employees in
 those PR statements were working on *only* the open source Android
 repositories, we might reach the 1,000 estimate listed here.  Knowing
 these companies as being very closed-source minded in the past, it
 is very unlikely all of their Android engineers will be working on
 the open source repository, and thus 1,000 is a very high estimate.
 The upper maximum of 50,000 contributors is based on existing
 installations that are already handling quite a bit more than the
 default maximum of 1,000 contributors. Given how the user data is
 stored and indexed, supporting 50,000 contributor accounts (or more)
 is easily possible for a server. If a server has more than 1,000
 *active* contributors,
 link:config-gerrit.html#cache.name.memoryLimit[`cache.accounts.memoryLimit`]
 should be increased by the site administrator, if sufficient RAM
 is available to the host JVM.
 The estimate of 100 changes per day was WAG'd off some estimates
 originally obtained from Android's development history.  Writing a
 good change that will be accepted through a peer-review process
 takes time.  The average engineer may need 4-6 hours per change just
 to write the code and unit tests.  Proper design consideration and
 additional but equally important tasks such as meetings, interviews,
 training, and eating lunch will often pad the engineer's day out
 such that suitable changes are only posted once a day, or once
 every other day.  For reference, the entire Linux kernel has an
 average of only 79 changes/day. If more than 100 changes are active
 per day, site administrators should consider increasing the
 link:config-gerrit.html#cache.name.memoryLimit[`cache.diff.memoryLimit`]
 and `cache.diff_intraline.memoryLimit`.
 On average any given change will need to be modified once to address
 peer review comments before the final revision can be accepted by the
 project.  Executing these revisions also eats into the contributor's
 time, and is another factor limiting the number of changes/day
 accepted by the Gerrit instance.  However, even though this implies
 only 2 revisions/change, many existing Gerrit installations have seen
 20 or more revisions/change, when new contributors are learning the
 project's style and conventions.
 On average, each change will have 2 reviewers, a human and an
 automated test bed system.  Usually this would be the project lead, or
 someone who is familiar with the code being modified.  The time
 required to comment further reduces the time available for writing
 one's own changes.  However, existing Gerrit installations have seen 8
 or more reviewers frequently show up on changes that impact many
 functional areas, and therefore it is reasonable to expect 8 or more
 reviewers to be able to work together on a single change.
 Existing installations have successfully processed change reviews with
 more than 16,000 files per change. However, since 16,000 modified/new
 files is a massive amount of code to review, it is more typical to see
 less than 10 files modified in any single change. Changes larger than
 10 files are typically merges, for example integrating the latest
 version of an upstream library, where the reviewer has little to do
 beyond verifying the project compiles and passes a test suite.
 === CPU Usage - Web UI
 Gerrit's web UI would require on average `4+F+F*C` HTTP requests to
 review a change and post comments.  Here `F` is the number of files
 modified by the change, and `C` is the number of inline/file comments
 left by the reviewer per file.  The constant 4 accounts for the request
 to load the reviewer's dashboard, to load the change detail page,
 to publish the review comments, and to reload the change detail
 page after comments are published.
 This WAG'd estimate boils down to 216,000 HTTP requests per day
 (QPD). Assuming these are evenly distributed over an 8 hour work day
 in a single time zone, we are looking at approximately 7.5 queries
 per second (QPS).
 ----
  QPD = Changes_Day * Revisions_Change * Reviewers_Change * (4 +  F +  F * C)
      = 2,000       * 2                * 1                * (4 + 10 + 10 * 4)
      = 216,000
  QPS = QPD / 8_Hours / 60_Minutes / 60_Seconds
      = 7.5
 ----
 Gerrit serves most requests in under 60 ms when using the loopback
 interface and a single processor.  On a single CPU system there is
 sufficient capacity for 16 QPS.  A dual processor system should be
 more than sufficient for a site with the estimated load described above.
 Given a more realistic estimate of 79 changes per day (from the
 Linux kernel) suggests only 8,532 queries per day, and a much lower
 0.29 QPS when spread out over an 8 hour work day.
 === CPU Usage - Git over SSH/HTTP
 A 24 core server is able to handle ~25 concurrent `git fetch`
 operations per second. The issue here is each concurrent operation
 demands one full core, as the computation is almost entirely server
 side CPU bound. 25 concurrent operations is known to be sufficient to
 support hundreds of active developers and 50 automated build servers
 polling for updates and building every change.  (This data was derived
 from an actual installation's performance.)
 Because of the distributed nature of Git, end-users don't need to
 contact the central Gerrit Code Review server very often. For `git
 fetch` traffic, link:pgm-daemon.html[replica mode] is known to be an
 effective way to offload traffic from the main server, permitting it
 to scale to a large user base without needing an excessive number of
 cores in a single system.
 Clients on very slow network connections (for example home office
 users on VPN over home DSL) may be network bound rather than server
 side CPU bound, in which case a core may be effectively shared with
 another user. Possible core sharing due to network bottlenecks
 generally holds true for network connections running below 10 MiB/sec.
-If the server's own network interface is 1 Gib/sec (Gigabit Ethernet),
+Deployments for large, distributed companies can replicate Git data to
-the system can really only serve about 10 concurrent clients at the
+read-only replicas to offload fetch traffic. The read-only replicas
-10 MiB/sec speed, no matter how many cores it has.
+should also serve this data using Gerrit to ensure that permissions
 are obeyed.
-=== Disk Usage
+The API serves requests of varying costs. Requests that originate in
 the UI can block productivity, so care has been taken to optimize
 these for latency, using the following techniques:
-The average size of a revision in the Linux kernel once compressed by
+* Async calls: the UI becomes responsive before some UI elements
-Git is 2,327 bytes, or roughly 2 KiB.  Over the course of a year a
+  finished loading
-Gerrit server running with the estimated maximum parameters above might
+
-see an introduction of 1.4 GiB over the total set of 10,000 projects
+* Caching: metadata is stored in Git, which is relatively expensive to
-hosted in that server.  This figure assumes the majority of the content
+  access. This is sped up by multiple caches. Metadata entities are
-is human written source code, and not large binary blobs such as disk
+  stored in Git, and can therefore be seen as immutable values keyed
-images or media files.
+  by SHA1, which is very amenable to caching. All SHA1 keyed caches
  can be persisted on local disk.
  The size (memory, disk) of these caches should be adapted to the
  instance size (number of users, size and quantity of repositories)
  for optimal performance.
 Git does not impose fundamental limits (eg. number of files per
 change) on data. To ensure stability, Gerrit configures a number of
 default limits for these.
 // add a link to the default settings.
 === Scaling team size
 A team of size N has N^2 possible interactions. As a result, features
 that expose interactions with activities of other team members has a
 quadratic cost in aggregate. The following features scale poorly with
 large team sizes:
 * the change screen shows conflicting changes by default. This data is
  cached, but updates to pending changes cause cache misses. For a
  single change, the amount of work is proportional to the number of
  pending changes, so in aggregate, the cost of this feature is
  quadratic in the team size.
 * the change screen shows if a change is mergeable to the target
  branch. If the target branch moves quickly (large developer team),
  this causes cache misses. In aggregate, the cost of this feature is
  also quadratic.
 Both features should be turned off for repositories that involve 1000s
 of developers.
 === Browser performance
 // say something about browser performance tuning.
 === Real life numbers
 Gerrit is designed for very large projects, both open source and
 proprietary commercial projects. For a single Gerrit process, the
 following limits are known to work:
 .Observed maximums
 [options="header"]
 |======================================================
 |Parameter        |         Maximum | Deployment
 |Projects         |         50,000  | gerrithub.io
 |Contributors     |        150,000  | eclipse.org
 |Bytes/repo       |        100G     | Qualcomm internal
 |Changes/repo     |        300k     | Qualcomm internal
 |Revisions/Change |        300      | Qualcomm internal
 |Reviewers/Change |        87       | Qualcomm internal
 |======================================================
 // find some numbers for these stats:
 // |Files/repo       |        ? |
 // |Files/Change     |        ? |
 // |Comments/Change  |        ? |
 // |max QPS/CPU      |        ? |
 Google runs a horizontally scaled deployment. We have seen the
 following per-JVM maximums:
 .Observed maximums (googlesource.com)
 [options="header"]
 |======================================================
 |Parameter        |         Maximum | Deployment
 |Files/repo       |        500,000  | chromium-review
 |Bytes/repo       |         12G     | chromium-review
 |Changes/repo     |          500k   | chromium-review
 |Revisions/Change |          1900   | chromium-review
 |Files/Change     |           10,000| android-review
 |Comments/Change  |           1,200 | chromium-review
 |======================================================
 Production Gerrit installations have been tested, and are known to
 handle Git repositories in the multigigabyte range, storing binary
 files, ranging in size from a few kilobytes (for example compressed
 icons) to 800+ megabytes (firmware images, large uncompressed original
 artwork files).  Best practices encourage breaking very large binary
 files into their Git repositories based on access, to prevent desktop
 clients from needing to clone unnecessary materials (for example a C
 developer does not need every 800+ megabyte firmware image created by
 the product's quality assurance team).
 == Redundancy & Reliability
-Gerrit largely assumes that the local filesystem where Git repository
+Gerrit is structured as a single JVM process, reading and writing to a
-data is stored is always available.  Important data written to disk
+single file system. If there are hardware failures in the machine
-is also forced to the platter with an `fsync()` once it has been
+running the JVM, or the storage holding the repositories, there is no
-fully written.  If the local filesystem fails to respond to reads
+recourse; on failure, errors will be returned to the client.
 or becomes corrupt, Gerrit has no provisions to fallback or retry
 and errors will be returned to clients.
-Gerrit largely assumes that the metadata database is online and
+Deployments needing more stringent uptime guarantees can use
-answering both read and write queries.  Query failures immediately
+replication/multi-master setup, which ensures availability and
-result in the operation aborting and errors being returned to the
+geographical distribution, at the cost of slower write actions.
 client, with no retry or fallback provisions.
-Due to the relatively small scale described above, it is very likely
+// TODO: link.
 that the Git filesystem and metadata database are all housed on the
 same server that is running Gerrit.  If any failure arises in one of
 these components, it is likely to manifest in the others too.  It is
 also likely that the administrator cannot be bothered to deploy a
 cluster of load-balanced server hardware, as the scale and expected
 load does not justify the hardware or management costs.
 Most deployments caring about reliability will setup a warm-spare
 standby system and use a manual fail-over process to switch from the
 failed system to the warm-spare.
 As Git is a distributed version control system, and open source
 projects tend to have contributors from all over the world, most
 contributors will be able to tolerate a Gerrit down time of several
 hours while the administrator is notified, signs on, and brings the
 warm-spare up.  Pending changes are likely to need at least 24 hours
 of time on the Gerrit site anyway in order to ensure any interested
 parties around the world have had a chance to comment.  This expected
 lag largely allows for some downtime in a disaster scenario.
 === Backups
@@ -603,7 +503,8 @@ Amazon S3 blob storage service.
 == Logging Plan
-Gerrit does not maintain logs on its own.
+Gerrit stores Apache style HTTPD logs, as well as ERROR/INFO messages
 from the Java logger, under `$site_dir/logs/`.
 Published comments contain a publication date, so users can judge
 when the comment was posted and decide if it was "recent" or not.