Troubleshooting SHA1 Failures with Mercurial Repositories
Augie Fackler
raf at durin42.com
Sat Jun 13 20:53:19 UTC 2020
> On Jun 13, 2020, at 4:47 PM, Paul Boddie <paul at boddie.org.uk> wrote:
>
> On Saturday, 13 June 2020 22:31:06 CEST Augie Fackler wrote:
>>> On Jun 13, 2020, at 9:24 AM, Paul Boddie <paul at boddie.org.uk> wrote:
>>>
>>> I suppose I am left wondering about a few things. Are such simple
>>> comparisons of repository histories useful in assessing the prevalence of
>>> faulty nodeids? How may faulty nodeids affect the integrity of
>>> repositories (considering the quote about censored changesets above)? Are
>>> there any compelling practical arguments for converting these faulty
>>> repositories if they otherwise function apparently normally? (I realise
>>> that combining faulty and converted repositories will result in
>>> divergence in the graph at inappropriate places.)
>>
>> Over the years we’ve gotten a lot pickier about ordering of metadata in
>> changeset objects we produce. My guess is that if the original repo passes
>> `hg verify` nothing is wrong in the source repo, and that the differences
>> you’re seeing are entirely metadata-ordering related (which is to say
>> harmless).
>
> So is the page about censored changesets now inaccurate with regard to nodeids
> causing some kind of failure if they do not "encode" the stored content
> according to the fundamental rules of Mercurial? Or did I misunderstand the
> intended message of that text? It sounds like there is nothing corrupt with
> regard to the stored content, merely the metadata (which happened to be used
> to construct the history initially) that is corrupt in some way.
More nuanced than that, actually. The censor page is only relevant if you’ve used censor, which you’d know. As I said, if you’ve got a repo passing `hg verify` then it’s definitely _not_ corrupt. The metadata ordering can change and that’ll (by nature of a content-addressed system) change the node ID, but the content is the same. Eg
{‘branch’: ‘foo’, ‘rebase_src’: ’some_hash_here’}
{‘rebase_src’: ’some_hash_here’, ‘branch’: ‘foo’}
are the same key-vale pairs, but in different order. If they are stored in hg in different orders, you’ll get different hashes, and if you use `hg convert` on an older repository the ordering of key/value pairs in various metadata regions will get normalized, which changes hashes.
>
>> Was there any specific thing that motivated using `hg convert`?
>
> I think the wiki mentions it as a tool to investigate repository corruption.
> My reasoning was that repository conversion using "hg convert" would rebuild
> the history and recompute the nodeids. In doing so in an environment where
> SHA1 libraries are not generating something arbitrary, I figured that I would
> obtain the "true" nodeids and see where they diverged in the original history
> from what they should have been.
Ah. Convert is a useful tool for recovering from corrupt repos, but it doesn’t sound like you’ve got any.
>
> Obviously, if there is a better way of "replaying" the history to see where
> and when the nodeids became bad, I would like to hear about it.
>
> Thanks for the reply!
>
> Paul
>
>
More information about the Mercurial
mailing list