Add a document describing Gerrit's high level design
Signed-off-by: Shawn O. Pearce <sop@google.com>
This commit is contained in:
630
Documentation/dev-design.txt
Normal file
630
Documentation/dev-design.txt
Normal file
@@ -0,0 +1,630 @@
|
||||
Gerrit2 - System Design
|
||||
=======================
|
||||
|
||||
Objective
|
||||
---------
|
||||
|
||||
Gerrit is a web based code review system, facilitating online code
|
||||
reviews for projects using the Git version control system.
|
||||
|
||||
Gerrit makes reviews easier by showing changes in a side-by-side
|
||||
display, and allowing inline comments to be added by any reviewer.
|
||||
|
||||
Gerrit simplifies Git based project maintainership by permitting
|
||||
any authorized user to submit changes to the master Git repository,
|
||||
rather than requiring all approved changes to be merged in by
|
||||
hand by the project maintainer. This functionality enables a more
|
||||
centralized usage of Git.
|
||||
|
||||
|
||||
Background
|
||||
----------
|
||||
|
||||
Google developed Mondrian, a Perforce based code review tool to
|
||||
facilitate peer-review of changes prior to submission to the central
|
||||
code repository. Mondrian is not open source, as it is tied to the
|
||||
use of Perforce and to many Google-only services, such as Bigtable.
|
||||
Google employees have often described how useful Mondrian and its
|
||||
peer-review process is to their day-to-day work.
|
||||
|
||||
Guido van Rossum open sourced portions of Mondrian within Rietveld,
|
||||
a similar code review tool running on Google App Engine, but for
|
||||
use with Subversion rather than Perforce. Rietveld is in common
|
||||
use by many open source projects, facilitating their peer reviews
|
||||
much as Mondrian does for Google employees. Unlike Mondrian and
|
||||
the Google Perforce triggers, Rietveld is strictly advisory and
|
||||
does not enforce peer-review prior to submission.
|
||||
|
||||
Git is a distributed version control system, wherein each repository
|
||||
is assumed to be owned/maintained by a single user. There are no
|
||||
inherit security controls built into Git, so the ability to read
|
||||
from or write to a repository is controlled entirely by the host's
|
||||
filesystem access controls. When multiple maintainers collaborate
|
||||
on a single shared repository a high degree of trust is required,
|
||||
as any collaborator with write access can alter the repository.
|
||||
|
||||
Gitosis provides tools to secure centralized Git repositories,
|
||||
permitting multiple maintainers to manage the same project at once,
|
||||
by restricting the access to only over a secure network protocol,
|
||||
much like Perforce secures a repository by only permitting access
|
||||
over its network port.
|
||||
|
||||
The Android Open Source Project (AOSP) was founded by Google by the
|
||||
open source releasing of the Android operating system. AOSP has
|
||||
selected Git as its primary version control tool. As many of the
|
||||
engineers have a background of working with Mondrian at Google,
|
||||
there is a strong desire to have the same (or better) feature set
|
||||
available for Git and AOSP.
|
||||
|
||||
* link:http://video.google.com/videoplay?docid=-8502904076440714866[Mondrian Code Review On The Web]
|
||||
* link:http://code.google.com/p/rietveld/[Rietveld - Code Review for Subversion]
|
||||
* link:http://eagain.net/gitweb/?p=gitosis.git;a=blob;f=README.rst;hb=HEAD[Gitosis README]
|
||||
* link:http://source.android.com/[Android Open Source Project]
|
||||
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
Developers create one or more changes on their local desktop system,
|
||||
then upload them for review to Gerrit using the standard `git push`
|
||||
command line program, or any GUI which can invoke `git push` on
|
||||
behalf of the user. Authentication and data transfer are handled
|
||||
through SSH. Users are authenticated by username and public/private
|
||||
key pair, and all data transfer is protected by the SSH connection
|
||||
and Git's own data integrity checks.
|
||||
|
||||
Each Git commit created on the client desktop system is converted
|
||||
into a unique change record which can be reviewed independently.
|
||||
Change records are stored in PostgreSQL, where they can be queried to
|
||||
present customized user dashboards, enumerating any pending changes.
|
||||
|
||||
A summary of each newly uploaded change is automatically emailed
|
||||
to reviewers, so they receive a direct hyperlink to review the
|
||||
change on the web. Reviewer email addresses can be specified on the
|
||||
`git push` command line, but typically reviewers are automatically
|
||||
selected by Gerrit by identifying users who have change approval
|
||||
permissions in the project.
|
||||
|
||||
Reviewers use the web interface to read the side-by-side or unified
|
||||
diff of a change, and insert draft inline comments where appropriate.
|
||||
A draft comment is visible only to the reviewer, until they publish
|
||||
those comments. Published comments are automatically emailed to
|
||||
the change author by Gerrit, and are CC'd to all other reviewers
|
||||
who have already commented on the change.
|
||||
|
||||
When publishing comments reviewers are also given the opportunity
|
||||
to score the change, indicating whether they feel the change is
|
||||
ready for inclusion in the project, needs more work, or should be
|
||||
rejected outright. These scores provide direct feedback to Gerrit's
|
||||
change submit function.
|
||||
|
||||
After a change has been scored positively by reviewers, Gerrit
|
||||
enables a submit button on the web interface. Authorized users
|
||||
can push the submit button to have the change enter the project
|
||||
repository. The equivilant in Subversion or Perforce would be
|
||||
that Gerrit is invoking `svn commit` or `p4 submit` on behalf of
|
||||
the web user pressing the button. Due to the way Git audit trails
|
||||
are maintained, the user pressing the submit button does not need
|
||||
to be the author of the change.
|
||||
|
||||
|
||||
Infrastructure
|
||||
--------------
|
||||
|
||||
End-user web browsers make HTTP requests directly to Gerrit's
|
||||
HTTP server. As nearly all of the user interface is implemented
|
||||
through Google Web Toolkit (GWT), the majority of these requests
|
||||
are transmitting compressed JSON payloads, with all HTML being
|
||||
generated within the browser. Most responses are under 1 KB.
|
||||
|
||||
Gerrit's HTTP server side component is implemented as a standard
|
||||
Java servlet, and thus runs within any J2EE servlet container.
|
||||
Popular choices for deployments would be Tomcat or Jetty, as these
|
||||
are high-quality open-source servlet containers that are readily
|
||||
available for download.
|
||||
|
||||
End-user uploads are performed over SSH, so Gerrit's servlets also
|
||||
start up a background thread to receive SSH connections through
|
||||
an independent SSH port. SSH clients communicate directly with
|
||||
this port, bypassing the HTTP server used by browsers.
|
||||
|
||||
Server side data storage for Gerrit is broken down into two different
|
||||
categories:
|
||||
|
||||
* Git repository data
|
||||
* Gerrit metadata
|
||||
|
||||
The Git repository data is the Git object database used to store
|
||||
already submitted revisions, as well as all uploaded (proposed)
|
||||
changes. Gerrit uses the standard Git repository format, and
|
||||
therefore requires direct filesystem access to the repositories.
|
||||
All repository data is stored in the filesystem and accessed through
|
||||
the JGit library. Repository data can be stored on remote servers
|
||||
accessible through NFS or SMB, but the remote directory must
|
||||
be mounted on the Gerrit server as part of the local filesystem
|
||||
namespace. Remote filesystems are likely to perform worse than
|
||||
local ones, due to Git disk IO behavior not being optimized for
|
||||
remote access.
|
||||
|
||||
The Gerrit metadata contains a summary of the available changes,
|
||||
all comments (published and drafts), and individual user account
|
||||
information. The metadata is housed in a PostgreSQL database,
|
||||
which can be located either on the same server as Gerrit, or on
|
||||
a different (but nearby) server. Most installations would opt to
|
||||
install both Gerrit and PostgreSQL on the same server, to reduce
|
||||
administration overheads.
|
||||
|
||||
User authentication is handled by OpenID, and therefore Gerrit
|
||||
requires that the OpenID provider selected by a user must be
|
||||
online and operating in order to authenticate that user.
|
||||
|
||||
* link:http://code.google.com/webtoolkit/[Google Web Toolkit (GWT)]
|
||||
* link:http://www.kernel.org/pub/software/scm/git/docs/gitrepository-layout.html[Git Repository Format]
|
||||
* link:http://www.postgresql.org/about/[About PostgreSQL]
|
||||
* link:http://openid.net/developers/specs/[OpenID Specifications]
|
||||
|
||||
|
||||
Project Information
|
||||
-------------------
|
||||
|
||||
Gerrit is developed as a self-hosting open source project:
|
||||
|
||||
* link:http://code.google.com/p/gerrit/[Project Homepage]
|
||||
* link:http://code.google.com/p/gerrit/downloads/list[Release Versions]
|
||||
* link:http://code.google.com/p/gerrit/wiki/Source?tm=4[Source]
|
||||
* link:http://code.google.com/p/gerrit/wiki/Issues?tm=3[Issue Tracking]
|
||||
* link:http://review.source.android.com/[Change Review]
|
||||
|
||||
|
||||
Internationalization and Localization
|
||||
-------------------------------------
|
||||
|
||||
As a source code review system for open source projects, where the
|
||||
commonly preferred language for communication is typically English,
|
||||
Gerrit does not make internationalization or localization a priority.
|
||||
|
||||
The majority of Gerrit's users will be writing change descriptions
|
||||
and comments in English, and therefore an English user interface
|
||||
is usable by the target user base.
|
||||
|
||||
Gerrit uses GWT's i18n support to externalize all constant strings
|
||||
and messages shown to the user, so that in the future someone who
|
||||
really needed a translated version of the UI could contribute new
|
||||
string files for their locale(s).
|
||||
|
||||
Right-to-left (RTL) support is only barely considered within the
|
||||
Gerrit code base. Some portions of the code have tried to take
|
||||
RTL into consideration, while others probably need to be modified
|
||||
before translating the UI to an RTL language.
|
||||
|
||||
* link:i18n-readme.html[Gerrit's i18n Support]
|
||||
|
||||
|
||||
Accessibility Considerations
|
||||
----------------------------
|
||||
|
||||
Whenever possible Gerrit displays raw text rather than image icons,
|
||||
so screen readers should still be able to provide useful information
|
||||
to blind persons accessing Gerrit sites.
|
||||
|
||||
Standard HTML hyperlinks are used rather than HTML div or span tags
|
||||
with click listeners. This provides two benefits to the end-user.
|
||||
The first benefit is that screen readers are optimized to locating
|
||||
standard hyperlink anchors and presenting them to the end-user as
|
||||
a navigation action. The second benefit is that users can use
|
||||
the 'open in new tab/window' feature of their browser whenever
|
||||
they choose.
|
||||
|
||||
When possible, Gerrit uses the ARIA properties on DOM widgets to
|
||||
provide hints to screen readers.
|
||||
|
||||
|
||||
Browser Compatibility
|
||||
---------------------
|
||||
|
||||
Supporting non-JavaScript enabled browsers is a non-goal for Gerrit.
|
||||
|
||||
As Gerrit is a pure-GWT application with no server side rendering
|
||||
fallbacks, the browser must support modern JavaScript semantics in
|
||||
order to access the Gerrit web application. Dumb clients such as
|
||||
`lynx`, `wget`, `curl`, or even many search engine spiders are not
|
||||
able to access Gerrit content.
|
||||
|
||||
As Google Web Toolkit (GWT) is used to generate the browser
|
||||
specific versions of the client-side JavaScript code, Gerrit works
|
||||
on any JavaScript enabled browser which GWT can produce code for.
|
||||
This covers the majority of the popular browsers.
|
||||
|
||||
The Gerrit project wants to offer offline support via the HTML 5
|
||||
standard and/or Google Gears plugin, both of which would require
|
||||
the UI to be rendered in JavaScript on the client side.
|
||||
|
||||
The Gerrit project does not have the development resources necessary
|
||||
to support two parallel UI implementations (GWT based JavaScript
|
||||
and server-side rendering). Consequently only one is implemented.
|
||||
|
||||
There are number of web browsers available with full JavaScript
|
||||
support, and nearly every operating system (including any PDA-like
|
||||
mobile phone) comes with one standard. Users who are committed
|
||||
to developing changes for a Gerrit managed project can be expected
|
||||
to be able to run a JavaScript enabled browser, as they also would
|
||||
need to be running Git in order to contribute.
|
||||
|
||||
There are a number of open source browsers available, including
|
||||
Firefox and Chromium. Users have some degree of choice in their
|
||||
browser selection, including being able to build and audit their
|
||||
browser from source.
|
||||
|
||||
The majority of the content stored within Gerrit is also available
|
||||
through other means, such as gitweb or the `git://` protocol.
|
||||
Any existing search engine spider can crawl the server-side HTML
|
||||
produced by gitweb, and thus can index the majority of the changes
|
||||
which might appear in Gerrit. Some engines may even choose to
|
||||
crawl the native version control database, such as ohloh.net does.
|
||||
Therefore the lack of support for most search engine spiders is a
|
||||
non-issue for most Gerrit deployments.
|
||||
|
||||
|
||||
Product Integration
|
||||
-------------------
|
||||
|
||||
Gerrit integrates with an existing gitweb installation by optionally
|
||||
creating hyperlinks to reference changes on the gitweb server.
|
||||
|
||||
Gerrit integrates with an existing git-daemon installation by
|
||||
optionally displaying `git://` URLs for users to download a
|
||||
change through the native Git protocol.
|
||||
|
||||
Gerrit integrates with any OpenID provider for user authentication,
|
||||
making it easier for users to join a Gerrit site and manage their
|
||||
authentication credentials to it. To make use of Google Accounts
|
||||
as an OpenID provider easier, Gerrit has a shorthand "Sign in with
|
||||
a Google Account" link on its sign-in screen. Gerrit also supports
|
||||
a shorthand sign in link for Yahoo!. Other providers may also be
|
||||
supported more directly in the future.
|
||||
|
||||
Gerrit integrates with some types of corporate single-sign-on (SSO)
|
||||
solutions, typically by having the SSO authentication be performed
|
||||
in a reverse proxy web server and then blindly trusting that all
|
||||
incoming connections have been authenticated by that reverse proxy.
|
||||
When configured to use this form of authentication, Gerrit does
|
||||
not integrate with OpenID providers.
|
||||
|
||||
When installing Gerrit, administrators may optionally include an
|
||||
HTML header or footer snippet which may include user tracking code,
|
||||
such as that used by Google Analytics. This is a per-instance
|
||||
configuration that must be done by hand, and is not supported
|
||||
out of the box. Other site trackers instead of Google Analytics
|
||||
can be used, as the administrator can supply any HTML/JavaScript
|
||||
they choose.
|
||||
|
||||
Gerrit does not integrate with any Google service, or any other
|
||||
services other than those listed above.
|
||||
|
||||
|
||||
Standards / Developer APIs
|
||||
--------------------------
|
||||
|
||||
Gerrit uses an XSRF protected variant of JSON-RPC 1.1 to communicate
|
||||
between the browser client and the server.
|
||||
|
||||
As the protocol is not the GWT-RPC protocol, but is instead a
|
||||
self-describing standard JSON format it is easily implemented by
|
||||
any 3rd party client application, provided the client has a JSON
|
||||
parser and HTTP client library available.
|
||||
|
||||
As the entire command set necessary for the standard web browser
|
||||
based UI is exposed through JSON-RPC over HTTP, there are no other
|
||||
data feeds or command interfaces to the server.
|
||||
|
||||
Commands requiring user authentication may require the user agent to
|
||||
complete a sign-in cycle through the user's OpenID provider in order
|
||||
to establish the HTTP cookie Gerrit uses to track user identity.
|
||||
Automating this sign-in process for non-web browser agents is
|
||||
outside of the scope of Gerrit, as each OpenID provider uses its own
|
||||
sign-in sequence. Use of OpenID providers which have difficult to
|
||||
automate interfaces may make it impossible for non-browser agents
|
||||
to be used with the JSON-RPC interface.
|
||||
|
||||
* link:http://json-rpc.org/wd/JSON-RPC-1-1-WD-20060807.html[JSON-RPC 1.1]
|
||||
* link:http://android.git.kernel.org/?p=tools/gwtjsonrpc.git;a=blob;f=README;hb=HEAD[XSRF JSON-RPC]
|
||||
|
||||
|
||||
Privacy Considerations
|
||||
----------------------
|
||||
|
||||
Gerrit stores the following information per user account:
|
||||
|
||||
* Full Name
|
||||
* Preferred Email Address
|
||||
* Mailing Address '(Optional)'
|
||||
* Country '(Optional)'
|
||||
* Phone Number '(Optional)'
|
||||
* Fax Number '(Optional)'
|
||||
|
||||
The full name and preferred email address fields are shown to any
|
||||
site visitor viewing a page containing a change uploaded by the
|
||||
account owner, or containing a published comment written by the
|
||||
account owner.
|
||||
|
||||
Showing the full name and preferred email is approximately the same
|
||||
risk as the `From` header of an email posted to a public mailing
|
||||
list that maintains archives, and Gerrit treats these fields in
|
||||
much the same way that a mailing list archive might handle them.
|
||||
Users who don't want to expose this information should either not
|
||||
participate in a Gerrit based online community, or open a new email
|
||||
address dedicated for this use.
|
||||
|
||||
As the Gerrit UI data is only available through XSRF protected
|
||||
JSON-RPC calls, "screen-scraping" for email addresses is difficult,
|
||||
but not impossible. It is unlikely a spammer will go through the
|
||||
effort required to code a custom scraping application necessary
|
||||
to cull email addresses from published Gerrit comments. In most
|
||||
cases these same addresses would be more easily obtained from the
|
||||
project's mailing list archives.
|
||||
|
||||
The snail-mail mailing address, country, and phone and fax numbers
|
||||
are gathered to help project leads contact the user should there
|
||||
be a legal question regarding any change they have uploaded.
|
||||
This data is only visible to the account owner and to the Gerrit
|
||||
site administrator. It is expected that the information would only
|
||||
be revealed with a valid court subpoena, but this is really left
|
||||
to the discretion of the Gerrit site administrator as to when it
|
||||
is reasonable to reveal this information to a 3rd party.
|
||||
|
||||
All user account information is stored unencrypted in the Gerrit
|
||||
metadata store, typically a PostgreSQL database.
|
||||
|
||||
|
||||
Spam and Abuse Considerations
|
||||
-----------------------------
|
||||
|
||||
Gerrit makes no attempt to detect spam changes or comments. The
|
||||
somewhat high barrier to entry makes it unlikely that a spammer
|
||||
will target Gerrit.
|
||||
|
||||
To upload a change, the client must speak the native Git protocol
|
||||
embedded in SSH, with some custom Gerrit semantics added on top.
|
||||
The client must have their public key already stored in the Gerrit
|
||||
database, which can only be done through the XSRF protected
|
||||
JSON-RPC interface. The level of effort required to construct
|
||||
the necessary tools to upload a well-formatted change that isn't
|
||||
rejected outright by the Git and Gerrit checksum validations is
|
||||
too high to for a spammer to get any meaningful return.
|
||||
|
||||
To post and publish a comment a client must sign in with an OpenID
|
||||
provider and then use the XSRF protected JSON-RPC interface to
|
||||
publish the draft on an existing change record. Again, the level of
|
||||
effort required to implement the Gerrit specific XSRF protections
|
||||
and the JSON-RPC payload format necessary to post a draft and then
|
||||
publish that draft is simply too high for a spammer to bother with.
|
||||
|
||||
Both of these assumptions are also based upon the idea that Gerrit
|
||||
will be a lot less popular than blog software, and thus will be
|
||||
running on a lot less websites. Spammers therefore have very little
|
||||
returned benefit for getting over the protocol hurdles.
|
||||
|
||||
These assumptions may need to be revisited in the future if any
|
||||
public Gerrit site actually notices spam.
|
||||
|
||||
|
||||
Latency
|
||||
-------
|
||||
|
||||
Gerrit targets for sub-250 ms per page request, mostly by using
|
||||
very compact JSON payloads bewteen client and server. However, as
|
||||
most of the serving stack (network, hardware, PostgreSQL metadata
|
||||
database) is out of control of the Gerrit developers, no real
|
||||
guarantees can be made about latency.
|
||||
|
||||
|
||||
Scalability
|
||||
-----------
|
||||
|
||||
Gerrit is designed for an open source project. Roughly this
|
||||
amounts to parameters such as the following:
|
||||
|
||||
.Design Parameters
|
||||
[grid="all"]
|
||||
`-----------------'----------------
|
||||
Parameter Estimated Maximum
|
||||
-----------------------------------
|
||||
Projects 500
|
||||
Contributors 2,000
|
||||
Changes/Day 400
|
||||
Revisions/Change 2.0
|
||||
Files/Change 4
|
||||
Comments/File 2
|
||||
Reviewers/Change 1.0
|
||||
-----------------------------------
|
||||
|
||||
CPU Usage
|
||||
~~~~~~~~~
|
||||
|
||||
Very few, if any open source projects have more than a handful of
|
||||
Git repositories associated with them. Since Gerrit treats one
|
||||
Git repository as a project, an assumed limit of 500 projects
|
||||
is reasonable. Only an operating system distribution project
|
||||
would really need to be tracking more than a handful of discrete
|
||||
Git repositories.
|
||||
|
||||
Almost no open source project has 2,000 contributors over all time,
|
||||
let alone on a daily basis. This figure of 2,000 was WAG'd by
|
||||
looking at PR statements published by cell phone companies picking
|
||||
up the Android operating system. If all of the stated employees in
|
||||
those PR statements were working on *only* the open source Android
|
||||
repositories, we might reach the 2,000 estimate listed here. Knowing
|
||||
these companies as being very closed-source minded in the past, it
|
||||
is very unlikely all of their Android engineers will be working on
|
||||
the open source repository, and thus 2,000 is a very high estimate.
|
||||
|
||||
The estimate of 400 changes per day was WAG'd off some estimates
|
||||
originally obtained from Android's development history. Writing a
|
||||
good change that will be accepted through a peer-review process
|
||||
takes time. The average engineer may need 4-6 hours per change just
|
||||
to write the code and unit tests. Proper design consideration and
|
||||
additional but equally important tasks such as meetings, interviews,
|
||||
training, and eating lunch will often pad the engineer's day out
|
||||
such that suitable changes are only posted once a day, or once
|
||||
every other day. For reference, the entire Linux kernel has an
|
||||
average of only 79 changes/day.
|
||||
|
||||
The estimate of 2 revisions/change means that on average any
|
||||
given change will need to be modified once to address peer review
|
||||
comments before the final revision can be accepted by the project.
|
||||
Executing these revisions also eats into the contributor's time,
|
||||
and is another factor limiting the number of changes/day accepted
|
||||
by the Gerrit instance.
|
||||
|
||||
The estimate of 1 reviewer/change means that on average only one
|
||||
person will comment on a change. Usually this would be the project
|
||||
lead, or someone who is familiar with the code being modified.
|
||||
The time required to comment further reduces the time available
|
||||
for writing one's own changes.
|
||||
|
||||
Gerrit's web UI would require on average `4+F+F*C` HTTP requests to
|
||||
review a change and post comments. Here `F` is the number of files
|
||||
modified by the change, and `C` is the number of inline comments left
|
||||
by the reviewer per file. The constant 4 accounts for the request
|
||||
to load the reviewer's dashboard, to load the change detail page,
|
||||
to publish the review comments, and to reload the change detail
|
||||
page after comments are published.
|
||||
|
||||
This WAG'd estimate boils down to <12,800 HTTP requests per day
|
||||
(QPD). Assuming these are evenly distributed over an 8 hour work day
|
||||
in a single time zone, we are looking at approximately 26 queries
|
||||
per second (QPS).
|
||||
|
||||
----
|
||||
QPD = Changes_Day * Revisions_Change * Reviewers_Change * (4 + F + F * C)
|
||||
= 400 * 2.0 * 1.0 * (4 + 4 + 4 * 2)
|
||||
= 12,800
|
||||
QPS = QPD / 8_Hours / 60_Seconds
|
||||
= 26
|
||||
----
|
||||
|
||||
Gerrit serves most requests in under 60 ms when using the loopback
|
||||
interface and a single processor. On a single CPU system there is
|
||||
sufficient capacity for 16 QPS. A dual processor system should be
|
||||
sufficient for a site with the estimated load described above.
|
||||
|
||||
Given a more realistic estimate of 79 changes per day (from the
|
||||
Linux kernel) suggests only 2,528 queries per day, and a much lower
|
||||
5.2 QPS when spread out over an 8 hour work day.
|
||||
|
||||
Disk Usage
|
||||
~~~~~~~~~~
|
||||
|
||||
The average size of a revision in the Linux kernel once compressed
|
||||
by Git is 2,327 bytes, or roughly 2 KB. Over the course of a year
|
||||
a Gerrit server running with the parameters above might see an
|
||||
introduction of 570 MB over the total set of 500 projects hosted in
|
||||
that server. This figure assumes the majorty of the content is human
|
||||
written source code, and not large binary blobs such as disk images.
|
||||
|
||||
|
||||
Redundancy & Reliability
|
||||
------------------------
|
||||
|
||||
Gerrit largely assumes that the local filesystem where Git repository
|
||||
data is stored is always available. Important data written to disk
|
||||
is also forced to the platter with an `fsync()` once it has been
|
||||
fully written. If the local filesystem fails to respond to reads
|
||||
or becomes corrupt, Gerrit has no provisions to fallback or retry
|
||||
and errors will be returned to clients.
|
||||
|
||||
Gerrit largely assumes that the metadata PostgreSQL database is
|
||||
online and answering both read and write queries. Query failures
|
||||
immediately result in the operation aborting and errors being
|
||||
returned to the client, with no retry or fallback provisions.
|
||||
|
||||
Due to the relatively small scale described above, it is very likely
|
||||
that the Git filesystem and PostgreSQL based metadata database
|
||||
are all housed on the same server that is running Gerrit. If any
|
||||
failure arises in one of these components, it is likely to manifest
|
||||
in the others too. It is also likely that the administrator cannot
|
||||
be bothered to deploy a cluster of load-balanced server hardware,
|
||||
as the scale and expected load does not justify the hardware or
|
||||
management costs.
|
||||
|
||||
Most deployments caring about reliability will setup a warm-spare
|
||||
standby system and use a manual fail-over process to switch from the
|
||||
failed system to the warm-spare.
|
||||
|
||||
As Git is a distributed version control system, and open source
|
||||
projects tend to have contributors from all over the world, most
|
||||
contributors will be able to tolerate a Gerrit down time of several
|
||||
hours while the administrator is notified, signs on, and brings the
|
||||
warm-spare up. Pending changes are likely to need at least 24 hours
|
||||
of time on the Gerrit site anyway in order to ensure any interested
|
||||
parties around the world have had a chance to comment. This expected
|
||||
lag largely allows for some downtime in a disaster scenario.
|
||||
|
||||
Backups
|
||||
~~~~~~~
|
||||
|
||||
PostgreSQL can be configured to save its write-ahead-log (WAL)
|
||||
and ship these logs to other systems, where they are applied to
|
||||
a warm-standby backup in real time. Gerrit instances which care
|
||||
about reduduncy will setup this feature of PostgreSQL to ensure
|
||||
the warm-standby is reasonably current should the master go offline.
|
||||
|
||||
Gerrit can be configured to replicate changes made to the local
|
||||
Git repositories over any standard Git transports. This can be
|
||||
configured in `'site_path'/replication.conf` to send copies of
|
||||
all changes over SSH to other servers, or to the Amazon S3 blob
|
||||
storage service.
|
||||
|
||||
|
||||
Logging Plan
|
||||
------------
|
||||
|
||||
Gerrit does not maintain logs on its own.
|
||||
|
||||
Published comments contain a publication date, so users can judge
|
||||
when the comment was posted and decide if it was "recent" or not.
|
||||
Only the timestamp is stored in the database, the IP address of
|
||||
the comment author is not stored.
|
||||
|
||||
Changes uploaded over the SSH daemon from `git push` have the
|
||||
standard Git reflog updated with the date and time that the upload
|
||||
occurred, and the Gerrit account identity of who did the upload.
|
||||
Changes submitted and merged into a branch also update the
|
||||
Git reflog. These logs are available only to the Gerrit site
|
||||
administrator, and they are not replicated through the automatic
|
||||
replication noted earlier. These logs are primarly recorded for an
|
||||
"oh s**t" moment where the administrator has to rewind data. In most
|
||||
installations they are a waste of disk space. Future versions of
|
||||
JGit may allow disabling these logs, and Gerrit may take advantage
|
||||
of that feature to stop writing these logs.
|
||||
|
||||
A web server positioned in front of Gerrit (such as a reverse proxy)
|
||||
or the hosting servlet container may record access logs, and these
|
||||
logs may be mined for usage information. This is outside of the
|
||||
scope of Gerrit.
|
||||
|
||||
|
||||
Testing Plan
|
||||
------------
|
||||
|
||||
Gerrit is currently manually tested through its web UI.
|
||||
|
||||
JGit has a fairly extensive automated unit test suite. Most new
|
||||
changes to JGit are rejected unless corresponding automated unit
|
||||
tests are included.
|
||||
|
||||
|
||||
Caveats
|
||||
-------
|
||||
|
||||
Reitveld can't be used as it does not provide the "submit over the
|
||||
web" feature that Gerrit provides for Git.
|
||||
|
||||
Gitosis can't be used as it does not provide any code review
|
||||
features, but it does provide basic access controls.
|
||||
|
||||
Email based code review does not scale to a project as large and
|
||||
complex as Android. Most contributors at least need some sort of
|
||||
dashboard to keep track of any pending reviews, and some way to
|
||||
correlate updated revisions back to the comments written on prior
|
||||
revisions of the same logical change.
|
||||
@@ -33,6 +33,7 @@ Developer Documentation
|
||||
|
||||
* link:dev-readme.html[Developer Setup]
|
||||
* link:dev-eclipse.html[Eclipse Setup]
|
||||
* link:dev-design.html[System Design]
|
||||
* link:i18n-readme.html[i18n Support]
|
||||
|
||||
Gerrit resources:
|
||||
|
||||
Reference in New Issue
Block a user