[VOTE] git versus mercurial (for DragonflyBSD)
Jakub Narebski
jnareb at gmail.com
Mon Oct 27 01:52:22 UTC 2008
On Mon, 27 Oct 2008, Arne Babenhauserheide wrote:
> Am Sonntag 26 Oktober 2008 19:55:09 schrieb Jakub Narebski:
> >
> > I agree, and I think it is at least partially because of Git having
> > cleaner design, even if you have to understand more terms at first.
>
> What do you mean by "cleaner design"?
Clean _underlying_ design. Git has very nice underlying model of graph
(DAG) of commits (revisions), and branches and tags as pointers to this
graph.
> From what I see (and in my definition of "design"), Mercurial is designed as
> VCS with very clear and clean design, which even keeps things like streaming
> disk access in mind.
I have read description of Mercurial's repository format, and it is not
very clear in my opinion. File changesets, bound using manifest, bound
using changerev / changelog.
Mercurial relies on transactions and O_TRUNC support, while Git relies
on atomic write and on updating data then updating reference to data.
I don't quite understand comment about streaming disk access...
> Also, looking at git, git users still have to garbage collect regularly, which
> shows to me that the design wasn't really cleaner.
Well, they have to a lot less than they used to, and there is
"git gc --auto" that can be put in crontab safely.
Explicit garbage collection was a design _decision_, not a sign of not
clear design. We can argue if it was good or bad decision, but one
should consider the following issues:
* Rolling back last commit to correct it, or equivalently amending
last commit (for example because we forgot some last minute change,
or forgot to signoff a commit), or backing out of changes to the
last commit in Mercurial relies on transactions (and locking) and
correct O_TRUNC, while in Git it leaves dangling objects to be
garbage collected later.
* Mercurial relies on transaction support. Git relies on atomic write
support and on the fact that objects are immutable; those that are
not needed are garbage collected later. Beside IIRC some of ways of
implementing transaction in databases leads to garbage collecting.
* Explicit packing and having two repository "formats": loose and
packed is a bit of historical reason: at the beginning there was
only loose format. Pack format was IIRC invented for network
transport, and was used for on disk storage (the same format!) for
better I/O patterns[1]. Having packs as 'rewrite to pack' instead
of 'append to pack' allows to prefer recency order, which result in
faster access as objects from newer commits are earlier in delta
chain and reduction in size in usual case of size growing with time
as recency order allows to use delete deltas. Also _choosing_ base
object allows further reduce size, especially in presence of
nonlinear history.
* From what I understand Mercurial by default uses packed format for
branches and tags; Git uses "loose" format for recent branches
(meaning one file per branch), while packing older references.
Using loose affects performance (and size) only for insane number of
references, and only for some operations like listing all references,
while using packed format is IMHO a bit error prone when updating.
* Git has reflogs which are pruned (expired) during garbage collecting
to not grow them without bounds; AFAIK Mercurial doesn't have
equivalent of this feature.
(Reflogs store _local_ history of branch tip, noting commits,
fetches, merges, rewinding branch, switching branches, etc._
[1] You wrote about "streaming disk access". Git relies (for reading)
on good mmap implementation.
> As an example: If I want some revision in hg, my repository just reads the
> files in the store, jumps to the latest snapshots, adds the changes after
> these and has the data.
If you want to show some revision in Git, meaning commit message and
diff in patch format (result of "git show"), Git just reads the commit,
outputs commit message, reads parent, reads trees and performs diff.
If you want to checkout to specific revision, Git just reads commit,
reads tree, and writes this tree (via index) to working area.
> In git is has to check all changesets which affect the file.
I don't understand you here... if I understand correctly above,
then you are wrong about Git.
> If you read the hgbook, you'll find one especially nice comment:
>
> "Unlike many revision control systems, the concepts upon which Mercurial is
> built are simple enough that it’s easy to understand how the software really
> works. Knowing this certainly isn’t necessary, but I find it useful to have a
> “mental model” of what’s going on."
> - http://hgbook.red-bean.com/hgbookch4.html
>
> I really like that, and in my opinion it is a great compliment to hg, for two
> reasons:
>
> 1) Hg is easy to understand
Because it is simple... and less feature rich, c.f. multiple local
branches in single repository.
> 2) You don't have to understand it to use it
You don't have to understand details of Git design (pack format, index,
stages, refs,...) to use it either.
>
> And both are indications of a good design, the first of the core, the second
> of the UI.
Well, Git is built around concept of DAG of commits and branches as
references to it. Without it you can use Git, but it is hard. But
if you understand it, you can understand easily most advanced Git
features.
I agree that Mercurial UI is better; as usually in "Worse is Better"
case... :-)
--
Jakub Narebski
Poland
More information about the Mercurial
mailing list