speed up relink script
Matt Mackall
mpm at selenic.com
Tue Mar 20 15:32:20 UTC 2007
On Tue, Mar 20, 2007 at 06:11:26AM -0500, TK Soh wrote:
> On 3/19/07, Brendan Cully <brendan at kublai.com> wrote:
> >If you want to speed it up, you might try searching from the back to
> >the front (differences should show up faster that way), or perhaps
> >forking off md5sum for the candidate lists and comparing by that
> >(possibly hand-checking matches for md5 collisions). I can't convince
> >myself that it's safe to assume that a match in the last chunk is
> >sufficient.
>
> Coming to think about it again, perhaps reading from back to front
> isn't going to help much either. Apart from the fact that it's will be
> slow to read that way as pointed out by Bryan, most files in the repos
> are likely to stay unchanged over time. So it may actually slow down
> the comparison.
>
> I wonder if we can somehow compare the latest chunk of the index or
> data files checked in. Or, perhaps the last rev data in the index file
> will be representative? I'm not too confident on my understanding on
> the inner of hg to decide. Any input?
Some notes:
Two revlogs are identical if their indices are the same
Two revlogs don't match if they have different numbers of entries
Two revlogs are identical if their heads are the same.
Two revlogs may still be identical if their sizes are different, if
their last records are different, etc.
The first observations says we can avoid ever reading .d files.
This suggest the following approach:
For each .i file in repo A:
record size, MD5 hash, number of entries, and sorted list of heads
For each .i file in repo B:
if sizes match:
if hashes match:
relink files
else:
read index
if counts don't match:
continue
find heads
if heads match:
relink files
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial
mailing list