[PATCH 1 of 4] convert/bzr: expect unicode metadata, encode in UTF-8 (issue3232)

Patrick Mézard pmezard at gmail.com
Thu Feb 2 12:33:40 UTC 2012


Le 02/02/12 13:12, Mads Kiilerich a écrit :
> On 02/02/2012 10:17 AM, Patrick Mezard wrote:
>> # HG changeset patch # User Patrick Mezard<pmezard at gmail.com> #
>> Date 1328174104 -3600 # Branch stable # Node ID
>> d4716c94801ad7d0495c8395bbc94bf6d5c86716 # Parent
>> 0620421044a2bcaafd054a6ee454614888699de8 convert/bzr: expect
>> unicode metadata, encode in UTF-8 (issue3232)
>> 
>> Before this patch, metadata and file names were interpreted like: -
>> unicode objects were converted to UTF-8 - non unicode objects were
>> left unchanged
>> 
>> Looking at the code and bzr being known for transcoding filenames,
>> we expect everything to be returned as unicode objects, and we want
>> to encode them in UTF-8, like the subversion source does. To do
>> that, we just remove the custom implementation of .recode().
> 
> http://selenic.com/pipermail/mercurial-devel/2010-August/024140.html
> proposed what seems to be the opposite approach. I think it is better
> to stick to The Mercurial Way and convert from unicode to the local
> encoding as early as possible.

Right, what I meant by "Looking at the code and bzr being known for transcoding filenames, we expect everything to be returned as unicode objects" is the comment "# bzr returns bytestrings or unicode, depending on the content" seems wrong to me, so there is no need to treat non-unicode strings differently than in any other converters. What the current patch does is use the default .recode() method immediately on all unicode inputs. The difference with the former code is we always try to reencode non-unicode string to utf-8 instead of letting them pass through.

Now, we might want to change the everything to utf-8 behaviour, but in this case I would fix everything at once.
--
Patrick Mézard



More information about the Mercurial-devel mailing list