SHA1 replacement steps

Augie Fackler raf at durin42.com
Sat Feb 27 03:49:45 UTC 2021



> On Feb 15, 2021, at 9:18 AM, Joerg Sonnenberger <joerg at bec.de> wrote:
> 
> Hello all,
> to help the review process along, here are the rough steps I see in
> preparation for supporting 256bit hashes:
> 
> (1) Move the current 160bit constants from mercurial.node into a
> subclass. Instead of a global constant, derive the correct constant from
> the repo/revlog/... instance and pass it down as necessary. The API
> change itself is in D9750. The expectation for this step is that a
> repository has one hash size and one set of magic values, but it doesn't
> change anything regarding the hash function itself. A follow-up change
> is necessary to replace the global constants (approximately D9465 minus
> D9750).

I was somewhat assuming we’d alter various codepaths to always emit 256-bit hashes, even if they end in all NUL. Your way sounds a little more complicated but is also fine, I don’t feel strongly.

> 
> (2) Adjust various on-disk formats to switch between the current 160bit
> and 256bit encoding based on the node constants in use. This would be a
> non-functional change for existing repositories.

Are any on-disk formats not already using 256-bit storage? I know revlogs are, so I _think_ this is only going to matter for caches.

> 
> (3) Introduce the tagged 256(*) hash function. My plan here is to use
> Blake2b configured for 248bit output and a suffix of b'\x01'. It is a
> bit wasteful to reserve 8bit for the tag, but simplifies code. Biggest
> downside is that the full Blake2b support is not available in Python 2.

Honestly I think new hash functions is exactly the kind of thing we should gate on Python 3. If someone is _really_ enthusiastic they can write a backport extension or something, for the users that are (bafflingly) caring about modern hashes but stuck on an ancient Python.

> 
> The tag would allow different hash functions to co-exist and embed
> existing SHA1 hashes by zero padding.
> 
> (4) Adjust hash verification logic to derive the hash function from the
> tag of a node, not just hard-coding it.
> 
> At the end of step 4, most repositories can be converted in a mostly
> transparent way.

Notably, you can allow people to only upgrade new hashes if they’re so inclined, which lets you preserve gpg signatures etc.

> Some additional changes might be necessary for allowing
> "short" node ids for things like .hgtags, but overall, existing hashes
> should just continue to work as before.

Overall +1. We can arm-wrestle later about if allowing a “new commits are blake2b” mode (vs convert-the-repo mode) is reasonable, I don’t think it’ll take a ton of code either way.

One request: I think we should reserve a couple of suffixes (0xff and 0xfe, perhaps?) for “private use” - this would be useful for large installations that do strange things with hashing out of necessity.

Sorry for taking so long to respond to this - this is well thought out and I was just too busy with other work stuff for a couple weeks straight.

> 
> Joerg
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel




More information about the Mercurial-devel mailing list