A fork of Jonathan Corbet's gitdm for OpenStack
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
OpenDev Sysadmins f8387e44c4 OpenDev Migration Patch 1 month ago
gerrit Update gitdm to use gerrit 2.8 workflow names 5 years ago
launchpad Add script to map email addresses to launchpad IDs 7 years ago
openstack-config Update group membership for olaph 3 years ago
playbooks/do-it Import Zuul v3 job 1 year ago
sample-config Fix misspellings in gitdm 5 years ago
tests Add regression tests on gitdm output files 9 years ago
tools Add some scripts for installing launchpadlib in a virtualenv 7 years ago
.gitignore Add some scripts for installing launchpadlib in a virtualenv 7 years ago
.gitreview OpenDev Migration Patch 1 month ago
.zuul.yaml Import Zuul v3 job 1 year ago
COPYING Initial commit 11 years ago
ConfigFile.py Revert "Make ReadConfigLine an iterator" 7 years ago
README Updated the options explanation 8 years ago
committags Update copyright notices 8 years ago
csvdump.py Add -y option to aggregate changes by year, not month 7 years ago
database.py Merge remote-tracking branch 'lwn/master' 6 years ago
do-it.sh Tidy the do-it script to conform to bash8 4 years ago
findoldfiles Update copyright notices 8 years ago
gerritdm Add gerritdm 7 years ago
gitdm Add automated CI emails to be skipped in metrics 4 years ago
gitdm.config Remove now-unneeded Intel GroupMap config entry. 6 years ago
linetags Update copyright notices 8 years ago
logparser.py Move out the grabpatch from the parser 8 years ago
lpdm Add lpdm script to generate reports on launchpad bugs 7 years ago
patterns.py Add version tracking support and an "unknown hackers" report 7 years ago
reports.py Merge remote-tracking branch 'lwn/master' 6 years ago
treeplot Fix up pattern use in treeplot 7 years ago


The code in this directory makes up the "git data miner," a simple hack
which attempts to figure things out from the revision history in a git


gitdm is a python script and doesn't need to be proper installed like other
normal programs. You just have to adjust your PATH variable, pointing it to
the directory of gitdm or alternatively create a symbolic link of the script
inside /usr/bin.

Before actually run gitdm you may want also to update the configuration file
(gitdm.config) with the needed information.


Run it like this:

git log -p -M [details] | gitdm [options]

Alternatively, you can run with:

git log --numstat -M [details] | gitdm -n [options]

The [details] tell git which changesets are of interest; the [options] can

-a If a patch contains signoff lines from both Andrew Morton
and Linus Torvalds, omit Linus's.

-b dir Specify the base directory to fetch the configuration files.

-c file Specify the name of the gitdm configuration file.
By default, "./gitdm.config" is used.

-d Omit the developer reports, giving employer information

-D Rather than create the usual statistics, create a file (datelc.csv)
providing lines changed per day, where the first column displays
the changes happened only on that day and the second sums the day it
happnened with the previous ones. This option is suitable for
feeding to a tool like gnuplot.

-h file Generate HTML output to the given file

-l num Only list the top <num> entries in each report.

-n Use --numstat instead of generated patches to get the statistics.

-o file Write text output to the given file (default is stdout).

-p prefix Dump out the database categorized by changeset and by file type.
It requires -n, otherwise it is not possible to get separated results.

-r pat Only generate statistics for changes to files whose
name matches the given regular expression.

-s Ignore Signed-off-by lines which match the author of
each patch.

-t Generate a report by type of contribution (code, documentation, etc.).
It requires -n, otherwise this option is ignored silently.

-u Group all unknown developers under the "(Unknown)"

-x file Export raw statistics as CSV.

-w Aggregate the data by weeks instead of months in the
CSV file when -x is used.

-z Dump out the hacker database to "database.dump".

A typical command line used to generate the "who write 2.6.x" LWN articles
looks like:

git log -p -M v2.6.19..v2.6.20 | \
gitdm -u -s -a -o results -h results.html


git log --numstat -M v2.6.19..v2.6.20 | \
gitdm -u -s -a -n -o results -h results.html


The main purpose of the configuration file is to direct the mapping of
email addresses onto employers. Please note that the config file parser is
exceptionally stupid and unrobust at this point, but it gets the job done.

Blank lines and lines beginning with "#" are ignored. Everything else
specifies a file with some sort of mapping:

EmailAliases file

Developers often post code under a number of different email
addresses, but it can be desirable to group them all together in
the statistics. An EmailAliases file just contains a bunch of
lines of the form:

alias@address canonical@address

Any patches originating from alias@address will be treated as if
they had come from canonical@address.

It may happen that some people set their git user data in the
following form: "joe.hacker@acme.org <Joe Hacker>". The
"Joe Hacker" is then considered as the email... but gitdm says
it is a "Funky" email. An alias line in the following form can
be used to alias these commits aliased to the correct email

"Joe Hacker" joe.hacker@acme.org

EmailMap file

Map email addresses onto employers. These files contain lines

[user@]domain employer [< yyyy-mm-dd]

If the "user@" portion is missing, all email from the given domain
will be treated as being associated with the given employer. If a
date is provided, the entry is only valid up to that date;
otherwise it is considered valid into the indefinite future. This
feature can be useful for properly tracking developers' work when
they change employers but do not change email addresses.

GroupMap file employer

This is a variant of EmailMap provided for convenience; it contains
email addresses only, all of which are associated with the given

VirtualEmployer name
nn% employer1

This construct (which appears in the main configuration file)
allows causes the creation of a fake employer with the given
"name". It directs that any contributions attributed to that
employer should be split to other (real) employers using the given
percentages. The functionality works, but is primitive - there is,
for example, no check to ensure that the percentages add up to
something rational.

FileTypeMap file

Map file names/extensions onto file types. These files contain lines

order <type1>,<type2>,...,<typeN>

filetype <type> <regex>

This construct allows fine graned reports by type of contribution
(build, code, image, multimedia, documentation, etc.)

Order is important because it is possible to have overlapping between
filenames. For instance, ltmain.sh fits better as 'build' instead of
'code' (the filename instead of '\.sh$'). The first element in order
has precedence over the next ones.


A few other tools have been added to this repository:

Reads a set of commits, then generates a graphviz file charting the
flow of patches into the mainline. Needs to be smarter, but, then,
so does everything else in this directory.

Simple brute-force crawler which outputs the names of any files
which have not been touched since the original (kernel) commit.

I needed to be able to quickly associate a given commit with the
major release which contains it. First attempt used
"git tags --contains="; after it ran for a solid week, I concluded
there must be a better way. This tool just reads through the repo,
remembering tags, and creating a Python dictionary containing the
association. The result is an ugly 10mb pickle file, but, even so,
it's still a better way.

Crawls through a directory hierarchy, counting how many lines of
code are associated with each major release. Needs the pickle file
from committags to get the job done.


Gitdm was written by Jonathan Corbet; many useful contributions have come
from Greg Kroah-Hartman.

Please note that this tool is provided in the hope that it will be useful,
but it is not put forward as an example of excellence in design or
implementation. Hacking on gitdm tends to stop the moment it performs
whatever task is required of it at the moment. Patches to make it less
hacky, less ugly, and more robust are welcome.

Jonathan Corbet