Thoughts on diff extensions

Christopher Li hg at chrisli.org
Sun Jun 19 00:07:16 UTC 2005


On Sat, Jun 18, 2005 at 07:05:30PM -0700, Matt Mackall wrote:
> The requirements for replacing mdiff.diff are:
> 
> - significantly faster (obviously)
> - roughly the same size output or better
> - works on line boundaries (so verify continues to work)
> - doesn't choke on binary data including embedded nulls
> - is reasonably simple.
> 
> Unfortunately both of the implementations that got posted have issues
> with the above. Chris M's xdiff-derived version is block-based and
> Chris L's diffutils-derived version is quite large and presumably has
> trouble with embedded nulls in the same way that regular diff does.

That is not true. The GNU diff just hash the buffer into lines, in fact
you can control where to cut the lines. The core diff engine  is working
on the hashed line value. aka pointers. It don't care about null at all.

I by pass the GNU diff input and out function just use it's engine.
It is working fine with null. If there is a problem I will fix it.

It is working fine with null for me. Show me a case it is breaking.

old = "a\n\x00c\nde\n"
new = "a\nde\n"
data = smlib.diff(old, new)
print repr(data)
new, old = old, new
data = smlib.diff(old, new)
print repr(data)

Chris




More information about the Mercurial mailing list