92726d0df5
When using -w option together with the -x file option, the data exported in the CSV file are aggregated by weeks instead of months (the default). This is useful to extract meaningful stats on short periods.
165 lines
5.4 KiB
Plaintext
165 lines
5.4 KiB
Plaintext
The code in this directory makes up the "git data miner," a simple hack
|
|
which attempts to figure things out from the revision history in a git
|
|
repository.
|
|
|
|
|
|
INSTALLING GITDM
|
|
|
|
gitdm is a python script and doesn't need to be proper installed like other
|
|
normal programs. You just have to adjust your PATH variable, pointing it to
|
|
the directory of gitdm or alternatively create a symbolic link of the script
|
|
inside /usr/bin.
|
|
|
|
Before actually run gitdm you may want also to update the configuration file
|
|
(gitdm.config) with the needed information.
|
|
|
|
|
|
RUNNING GITDM
|
|
|
|
Run it like this:
|
|
|
|
git log -p -M [details] | gitdm [options]
|
|
|
|
The [details] tell git which changesets are of interest; the [options] can
|
|
be:
|
|
|
|
-a If a patch contains signoff lines from both Andrew Morton
|
|
and Linus Torvalds, omit Linus's.
|
|
|
|
-b dir Specify the base directory to fetch the configuration files.
|
|
|
|
-c file Specify the name of the gitdm configuration file.
|
|
By default, "./gitdm.config" is used.
|
|
|
|
-d Omit the developer reports, giving employer information
|
|
only.
|
|
|
|
-D Rather than create the usual statistics, create a
|
|
file (datelc) providing lines changed per day, where the first column
|
|
displays the changes happened only on that day and the second sums
|
|
the day it happnened with the previous ones. This option is suitable
|
|
for feeding to a tool like gnuplot.
|
|
|
|
-h file Generate HTML output to the given file
|
|
|
|
-l num Only list the top <num> entries in each report.
|
|
|
|
-o file Write text output to the given file (default is stdout).
|
|
|
|
-r pat Only generate statistics for changes to files whose
|
|
name matches the given regular expression.
|
|
|
|
-s Ignore Signed-off-by lines which match the author of
|
|
each patch.
|
|
|
|
-u Group all unknown developers under the "(Unknown)"
|
|
employer.
|
|
|
|
-x file Export raw statistics as CSV.
|
|
|
|
-w Aggregate the data by weeks instead of months in the
|
|
CSV file when -x is used.
|
|
|
|
-z Dump out the hacker database to "database.dump".
|
|
|
|
A typical command line used to generate the "who write 2.6.x" LWN articles
|
|
looks like:
|
|
|
|
git log -p -M v2.6.19..v2.6.20 | \
|
|
gitdm -u -s -a -o results -h results.html
|
|
|
|
|
|
CONFIGURATION FILE
|
|
|
|
The main purpose of the configuration file is to direct the mapping of
|
|
email addresses onto employers. Please note that the config file parser is
|
|
exceptionally stupid and unrobust at this point, but it gets the job done.
|
|
|
|
Blank lines and lines beginning with "#" are ignored. Everything else
|
|
specifies a file with some sort of mapping:
|
|
|
|
EmailAliases file
|
|
|
|
Developers often post code under a number of different email
|
|
addresses, but it can be desirable to group them all together in
|
|
the statistics. An EmailAliases file just contains a bunch of
|
|
lines of the form:
|
|
|
|
alias@address canonical@address
|
|
|
|
Any patches originating from alias@address will be treated as if
|
|
they had come from canonical@address.
|
|
|
|
It may happen that some people set their git user data in the
|
|
following form: "joe.hacker@acme.org <Joe Hacker>". The
|
|
"Joe Hacker" is then considered as the email... but gitdm says
|
|
it is a "Funky" email. An alias line in the following form can
|
|
be used to alias these commits aliased to the correct email
|
|
address:
|
|
|
|
"Joe Hacker" joe.hacker@acme.org
|
|
|
|
|
|
EmailMap file
|
|
|
|
Map email addresses onto employers. These files contain lines
|
|
like:
|
|
|
|
[user@]domain employer [< yyyy-mm-dd]
|
|
|
|
If the "user@" portion is missing, all email from the given domain
|
|
will be treated as being associated with the given employer. If a
|
|
date is provided, the entry is only valid up to that date;
|
|
otherwise it is considered valid into the indefinite future. This
|
|
feature can be useful for properly tracking developers' work when
|
|
they change employers but do not change email addresses.
|
|
|
|
|
|
GroupMap file employer
|
|
|
|
This is a variant of EmailMap provided for convenience; it contains
|
|
email addresses only, all of which are associated with the given
|
|
employer.
|
|
|
|
OTHER TOOLS
|
|
|
|
A few other tools have been added to this repository:
|
|
|
|
treeplot
|
|
Reads a set of commits, then generates a graphviz file charting the
|
|
flow of patches into the mainline. Needs to be smarter, but, then,
|
|
so does everything else in this directory.
|
|
|
|
findoldfiles
|
|
Simple brute-force crawler which outputs the names of any files
|
|
which have not been touched since the original (kernel) commit.
|
|
|
|
committags
|
|
I needed to be able to quickly associate a given commit with the
|
|
major release which contains it. First attempt used
|
|
"git tags --contains="; after it ran for a solid week, I concluded
|
|
there must be a better way. This tool just reads through the repo,
|
|
remembering tags, and creating a Python dictionary containing the
|
|
association. The result is an ugly 10mb pickle file, but, even so,
|
|
it's still a better way.
|
|
|
|
linetags
|
|
Crawls through a directory hierarchy, counting how many lines of
|
|
code are associated with each major release. Needs the pickle file
|
|
from committags to get the job done.
|
|
|
|
|
|
NOTES AND CREDITS
|
|
|
|
Gitdm was written by Jonathan Corbet; many useful contributions have come
|
|
from Greg Kroah-Hartman.
|
|
|
|
Please note that this tool is provided in the hope that it will be useful,
|
|
but it is not put forward as an example of excellence in design or
|
|
implementation. Hacking on gitdm tends to stop the moment it performs
|
|
whatever task is required of it at the moment. Patches to make it less
|
|
hacky, less ugly, and more robust are welcome.
|
|
|
|
Jonathan Corbet
|
|
corbet@lwn.net
|