Umlauts in filenames on Windows
Martin Geisler
mg at daimi.au.dk
Wed Jan 28 12:46:34 UTC 2009
Stefan Rusek <stefan at rusek.org> writes:
> If they are bytes, why then are there values that are not allowed to
> be stored in a filename? Consider the following code:
>
> def touch(f): open(f, "w").close()
> touch("01\x2f28\x2f2009.log")
>
> This will crash on my ubuntu box, because 0x2f (which is a perfectly
> valid byte) has some reserved meaning and is not allowed in a
> filename. It also just so happens that 0x2f is '/' in every codepage.
> This means that Unix is indeed looking at the name as if it were
> characters.
No, it looks for the byte 0x2f -- that particular byte was chosen since
it represents '/' in ASCII :-) The APIs for Linux filesystems know two
special characters: 0x2f (the directory separator) and 0x00 (the C
string terminator).
> [...]. Yes, it might be a non-trivial change to add proper Unicode
> support, but Python has had full Unicode support for a long time, and
> so the work of handling Unicode properly is to a great extent taken
> care of for us.
I too think that such an extension should be written and agree that
Python helps us some part of the way. But there are also bugs and
strange corner cases, such as os.listdir(u'.') returning some filenames
as byte strings and some as Unicode strings:
http://bugs.python.org/issue2856
This page has more about how Unicode, Python and Windows interact:
http://boodebr.org/main/python/all-about-python-and-unicode#PLAT_WIN
--
Martin Geisler
VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20090128/bbb9d190/attachment-0001.asc>
More information about the Mercurial
mailing list