Statistical reports for Mercurial repos
Dmitry Dzhus
dima at sphinx.net.ru
Mon Apr 5 21:54:32 UTC 2010
As of now, there are several solutions for generating different reports
for Hg repos:
- activity extension (http://labs.freehackers.org/projects/hgactivity):
generates activity chart for a repository by convolving number of
commits or changed lines data; uses matplotlib for output.
- churn extension: generates changesets|changed lines per user reports,
can group by date; uses textual output.
- chart extension (http://www.bitbucket.org/mg/hgchart/): generates
changes per user, line count, function count and file size graphs
plotted along revision count axis; flexible configuration using YAML
file; uses matplotlib for output.
- chart extension
(http://www.bitbucket.org/Ry4an/hg-chart-extension/wiki/Home):
produces graphs of commit or changed lines rate grouped by date; uses
Google Charts API (pygooglechart) for output.
Some time ago I wrote another in which I tried to combine several
existing solution into one in a flexible way, the code is available at
http://sphinx.net.ru/hg/hgstats/. We process a stream of repository
changesets using a combination of differents filters, piping them after
each other. I've written several filters which allow to get different
meaningful reports. It's possible to combine reports for different repos
onto one graph. Some usage examples with results are available at
http://sphinx.net.ru/stats-test/. User docs are scarce, but most of
stuff is documented in the code.
hgstats.py is currently the only command-line based frontend which
supports plain text output (usable with gnuplot) and Google charts as
well.
Planned features:
- web control panel built into hgwebdir which would provide interface to
filters and their settings;
- new filters to allow splitting/filtering by commiters, skipping
commits before/after given dates, and running external programs for
every revision. More code metrics may be added (like "bugs introduced
within this period" etc.);
- to effectively process large repositories, there should be a
possibility to incrementally update repository stats stored in a
database (should be easy if we have a min/max revision switch) so this
data may be visualized or processed further fast. Currently when text
output backend is used, the whole list with stats data is never fully
built in runtime but get written to stdout incrementally so we don't
eat up much memory on large repositories;
- for web version, a new output backend based on Canvas with some rich
features. This can be implemented as a visualization tool for data
stored in a database as well.
Is anyone interested in mentoring such project for the upcoming GSoC?
Any feedback on my code and this project is really welcome and will be
appreciated.
--
Happy Hacking.
http://sphinx.net.ru
む
More information about the Mercurial
mailing list