Features in Mercurial 1.6

Paul Boddie paul.boddie at biotek.uio.no
Fri Jun 18 11:38:25 UTC 2010


Martin Geisler wrote:
> Excellent, thanks for providing this example! (I'm not Ant user myself.)
>
> So to make it clear, Ant transcodes the filenames like I expected. This
> is on terminal that only understands Latin-1:
>
>   % cat build.xml
>   <?xml version="1.0" encoding="utf-8"?>
>   <project name="My Project" default="hello">
>           <target name="hello">
>                   <touch file = "æbler.txt" />
>           </target>
>   </project>
>
>   % LC_ALL=en_US.UTF-8 ant
>   Buildfile: /home/mg/tmp/build.xml
>
>   hello:
>       [touch] Creating /home/mg/tmp/æbler.txt
>
>   BUILD SUCCESSFUL
>   Total time: 0 seconds
>
>   % LC_ALL=en_US.ISO-8859-1 ant
>   Buildfile: /home/mg/tmp/build.xml
>
>   hello:
>       [touch] Creating /home/mg/tmp/æbler.txt
>
>   BUILD SUCCESSFUL
>   Total time: 0 seconds
>   

I think it would be somewhat bizarre for Ant to behave any differently. 
As I wrote before, given a filename and a well-defined encoding, the 
tool can "know" the actual characters in the filename (as opposed to a 
byte sequence). If Ant were to just take the byte sequence written in 
the XML file (if it could - an XML parser is not likely to want to 
expose this) and then to use it directly in creating a new file, the XML 
abstraction would effectively be broken: the filename would need to be 
written in the XML file using precisely the bytes to be presented to the 
filesystem, with the encoding of the XML file thereby needing to match 
the presumed encoding for filenames in the filesystem (as seen by a 
particular user with their specific locale, of course).

As I also noted before, given knowledge of the actual characters in a 
filename, there needs to be a way of indicating which encoding filenames 
should use, and in the above example the locale provides this 
information, although I don't regard this as being particularly 
satisfactory in a collaborative environment. Where there is no such 
knowledge in the tool, the encoding is implicit: each user may know how 
they intend their filenames to appear, but such knowledge is more or 
less withheld from the system. Although you can get quite far without 
such knowledge, the task of presenting filenames (and other textual 
data) in the right way for everyone becomes difficult, if not impossible.

Paul



More information about the Mercurial mailing list