What happens if we have a hash collision

Isaac Jurado diptongo at gmail.com
Sat May 8 00:21:00 UTC 2010


Replying Jesus Cea:
>
> With regular commands (like -r parameter, or "hg log", etc), Mercurial
> only shows 48 bits of the hash.
>
> According to birthday paradox, having a few thousands of changesets
> will have a pretty high (statistically) probability of collision, if
> we only use 48 bits from the hash:
> <http://en.wikipedia.org/wiki/Birthday_problem>.
>
> I know internally mercurial uses 160 bits (for instance, in tags), but
> what it could happen if I do a "hg log" or a "hg pull -r" with a
> truncated hash with a collision?.
>
> Does Mercurial recognize the fact and force you to use the 160 bits in
> that case?.

Quick answer: yes.  If you do programming for a living (or study), you
may want to keep reading.

I know by own experience, as a lazy ass, that asking is much easier and
more comfortable than researching.  Now I think I almost learnt the
lesson so I would like to enlighten you towards the same practices.

If you start to follow mercurial code from commands.py (looking for the
log function), a careful read will bring you to the _partialmatch method
in revlog.py.  There you can see how when a nodeid specified using less
than 40 characters is searched through the revlog index.  Then if
multiple matching entries are found, an exception is raised with the
message "ambiguous identifier".

Cheers.

-- 
Isaac Jurado

"The noblest pleasure is the joy of understanding."
                                  Leonardo da Vinci



More information about the Mercurial mailing list