eolext LF CRLF surprises.

Tom Udale tom at ionoptix.com
Sun Aug 28 22:27:17 UTC 2011


Hello All,

One of the things that very much surprised me about eol was the behavior 
of LF and CRLF.  I assumed that they controlled only the format in the 
working directory, not the format in the repository.  So it took me 
indeed quite a while to figure out that LF and CRLF force the same 
endings both inside and outside the repo.

This is basically an alias for BIN - that is, no translation - with the 
single exception that you can somewhat painfully force a change in line 
endings if you need to.  The painful aspect is that you must commit 
because any actual change will the file look modified to the repo.

The reality is that you don't really care what the internal 
representation of the file is _unless_ it is different from the 
repository canonical format when eol is enabled.  And then you only care 
for the practical consideration of preventing spurious commits, not 
because of some deep seated concern for the repo internals.

Another practical need is for the working directory to be in some form 
or another based on the unix/DOSness of the host computer.

So to maximize the utility of eol in the face of differing paths to your 
hg repo (which will result in various states of files in the repo) and 
in the  face of differing working directory needs, you want to be able 
to specify both sides of the equation for each file, the repo and the 
working directory.

You specify the repo to side to contend with files already checked in 
which are not in the canonical format, and you specify the working 
directory side as needed for the host system.

It turns out that you need only about one hurricane's worth of time to 
implement this.  Today I managed to get hg building from sources and 
then added 6 new specifiers that along with the three existing ones fill 
out all combinations of native, LF and CRLF.

I chose them as follows (decode as repo-working where N means "native"):

N-LF
N-CRLF
[existing Native is the same as N-N]

LF-N
LF-CRLF
[existing LF is the same as LF-LF]

CRLF-N
CRLF-LF
[existing CRLF is the same as CRLF-CRLF]


The envisioned use is as follows:

if your hg repo is completely homogenous in text file eols, you set up 
your repo to be canonical in that eol and then pick _always_ one of the 
N- variants N-LF, N-CRLF, or Native to get the files into your working 
directory as needed.  This way you don't have to concern yourself with 
whatever the repo format is and you can still maintain repo homogeneity.

If your repo is heterogeneous in eols, you set up your repo to be 
canonical in the most common eol and then pick one of the 9 specifiers 
to set up your working directory.  You pick the left side based on how 
the file is in the repo and the right side based on how it needs to be 
in the working directory.

The changes needed to add new specifications and converters are, thanks 
to the design of eol, trivial (assuming I am not missing some corner 
cases).  It appears you don't even need to understand how the filters 
are ultimately called :)

If anyone is interested, I would be happy to send them along.



Best regards,

Tom




More information about the Mercurial mailing list