Features in Mercurial 1.6
Paul Boddie
paul.boddie at biotek.uio.no
Fri Jun 18 11:38:25 UTC 2010
Martin Geisler wrote:
> Excellent, thanks for providing this example! (I'm not Ant user myself.)
>
> So to make it clear, Ant transcodes the filenames like I expected. This
> is on terminal that only understands Latin-1:
>
> % cat build.xml
> <?xml version="1.0" encoding="utf-8"?>
> <project name="My Project" default="hello">
> <target name="hello">
> <touch file = "æbler.txt" />
> </target>
> </project>
>
> % LC_ALL=en_US.UTF-8 ant
> Buildfile: /home/mg/tmp/build.xml
>
> hello:
> [touch] Creating /home/mg/tmp/æbler.txt
>
> BUILD SUCCESSFUL
> Total time: 0 seconds
>
> % LC_ALL=en_US.ISO-8859-1 ant
> Buildfile: /home/mg/tmp/build.xml
>
> hello:
> [touch] Creating /home/mg/tmp/æbler.txt
>
> BUILD SUCCESSFUL
> Total time: 0 seconds
>
I think it would be somewhat bizarre for Ant to behave any differently.
As I wrote before, given a filename and a well-defined encoding, the
tool can "know" the actual characters in the filename (as opposed to a
byte sequence). If Ant were to just take the byte sequence written in
the XML file (if it could - an XML parser is not likely to want to
expose this) and then to use it directly in creating a new file, the XML
abstraction would effectively be broken: the filename would need to be
written in the XML file using precisely the bytes to be presented to the
filesystem, with the encoding of the XML file thereby needing to match
the presumed encoding for filenames in the filesystem (as seen by a
particular user with their specific locale, of course).
As I also noted before, given knowledge of the actual characters in a
filename, there needs to be a way of indicating which encoding filenames
should use, and in the above example the locale provides this
information, although I don't regard this as being particularly
satisfactory in a collaborative environment. Where there is no such
knowledge in the tool, the encoding is implicit: each user may know how
they intend their filenames to appear, but such knowledge is more or
less withheld from the system. Although you can get quite far without
such knowledge, the task of presenting filenames (and other textual
data) in the right way for everyone becomes difficult, if not impossible.
Paul
More information about the Mercurial
mailing list