Troubleshooting SHA1 Failures with Mercurial Repositories

Paul Boddie paul at boddie.org.uk
Sat Jun 13 13:24:00 UTC 2020


Hello,

I have very recently retired a rather old computer that has been my main 
development machine for a very long time, but in the last few months it has 
exhibited some unreliable behaviour in various respects. There is probably an 
interesting detective story here somewhere, and I welcome insights into the 
underlying system issues, but my motivation for sending this message is 
obviously to assess the impacts on my Mercurial repositories.

(To skip the background, just skip the next three paragraphs!)

The cause of this unreliable behaviour became more apparent when obtaining DVD 
images to use with the new computer that will now become my development 
machine. Upon running md5sum, sha1sum, sha256sum or sha512sum on the 
downloaded DVD image files, it was almost impossible to generate the correct 
digests. Moreover, the digests were typically different on each invocation of 
the chosen program on the same file, producing something new each time. And 
yet, two separately downloaded copies of the same file would compare (using 
the cmp program) and be shown to be identical!

Diagnosis of the situation involved writing fairly simple programs to generate 
large files with predictable but varying content and then reading them back, 
which seemed to yield the expected content each time. I also investigated 
other message digest tools and found that the Java-based jacksum tool did 
function more reliably with MD5 digests but not all of the time. OpenSSL-based 
tools did not fare any better than those which presumably use the C library 
digest functions. I ran memtest86+ for some time without any indication of 
memory failure, and there was no obvious indication of disk failure, although 
I shall aim to run more extensive smartctl tests to be sure.

Generally, I have not experienced obvious problems with my data, but I have 
experienced frustration with distribution updates (Debian's apt complaining 
about hash sum mismatches) and it has been largely impossible to clone large 
Git repositories ("index pack failed"), although I assumed that this was just 
Git making increasing demands on system capabilities (and being typically 
unhelpful). I doubt that anyone else runs hardware this old - Pentium 4, 3.0 
GHz, "Prescott" generation - and support for 32-bit x86 is gradually 
disappearing, so I don't know what level of experience other people are likely 
to have with these issues (other than remarks about the system being old and 
needing replacement).

(Here comes the bit specifically related to Mercurial.)

Anyway, I find myself with Mercurial repositories that I have been updating 
during periods of unreliability. On practically no occasion (or not recently, 
and then maybe once) have I had a problem updating or accessing repositories, 
but I wondered what kind of effects this unreliability might have had on 
repository integrity. The Mercurial Wiki and other documentation does not 
readily explain the implications of faulty digests, although I found the 
following interesting remarks:

"The repository owner may continue committing to the heads of the repository, 
but attempts to view the repository at any changeset containing the sensitive 
file data will fail due to the hash mismatch (examples: hg update, hg diff, hg 
annotate). "hg verify" will fail due to the hash mismatch as well."

https://www.mercurial-scm.org/wiki/CensorPlan

Now, having copied repositories to my new machine, I have successfully 
verified the repositories using hg verify. However, using hg convert reveals 
differing nodeids between the original and converted repositories. I have 
tried hg convert with both --branchsort and --sourcesort options. Then, I have 
generated readily comparable logs as follows:

hg log --template '{node}\n' > logfile

Running diff on the logs for the original and converted repositories reveals 
considerable differences in nodeids for some repositories, even ones which 
haven't been touched in years, but no differences for others. It appears that 
--sourcesort replicates history more accurately (as suggested by the 
documentation). For validation, converting the converted repositories again 
(using --sourcesort) produces identical histories, as one might expect.

I suppose I am left wondering about a few things. Are such simple comparisons 
of repository histories useful in assessing the prevalence of faulty nodeids? 
How may faulty nodeids affect the integrity of repositories (considering the 
quote about censored changesets above)? Are there any compelling practical 
arguments for converting these faulty repositories if they otherwise function 
apparently normally? (I realise that combining faulty and converted 
repositories will result in divergence in the graph at inappropriate places.)

Sorry for the long message, but any insights would be much appreciated!

Paul




More information about the Mercurial mailing list