[PATCH 1 of 4] convert/bzr: expect unicode metadata, encode in UTF-8 (issue3232)
Mads Kiilerich
mads at kiilerich.com
Thu Feb 2 12:12:15 UTC 2012
On 02/02/2012 10:17 AM, Patrick Mezard wrote:
> # HG changeset patch
> # User Patrick Mezard<pmezard at gmail.com>
> # Date 1328174104 -3600
> # Branch stable
> # Node ID d4716c94801ad7d0495c8395bbc94bf6d5c86716
> # Parent 0620421044a2bcaafd054a6ee454614888699de8
> convert/bzr: expect unicode metadata, encode in UTF-8 (issue3232)
>
> Before this patch, metadata and file names were interpreted like:
> - unicode objects were converted to UTF-8
> - non unicode objects were left unchanged
>
> Looking at the code and bzr being known for transcoding filenames, we expect
> everything to be returned as unicode objects, and we want to encode them in
> UTF-8, like the subversion source does. To do that, we just remove the custom
> implementation of .recode().
http://selenic.com/pipermail/mercurial-devel/2010-August/024140.html
proposed what seems to be the opposite approach. I think it is better to
stick to The Mercurial Way and convert from unicode to the local
encoding as early as possible.
/Mads
> diff --git a/hgext/convert/bzr.py b/hgext/convert/bzr.py
> --- a/hgext/convert/bzr.py
> +++ b/hgext/convert/bzr.py
> @@ -143,7 +143,6 @@
> return commit(parents=parents,
> date='%d %d' % (rev.timestamp, -rev.timezone),
> author=self.recode(rev.committer),
> - # bzr returns bytestrings or unicode, depending on the content
> desc=self.recode(rev.message),
> rev=version)
>
> @@ -231,7 +230,11 @@
> continue
>
> # we got unicode paths, need to convert them
> - path, topath = [self.recode(part) for part in paths]
> + path, topath = paths
> + if path is not None:
> + path = self.recode(path)
> + if topath is not None:
> + topath = self.recode(topath)
> seen.add(path or topath)
>
> if topath is None:
> @@ -260,19 +263,3 @@
> parentmap = self.sourcerepo.get_parent_map(ids)
> parents = tuple([parent for parent in ids if parent in parentmap])
> return parents
> -
> - def recode(self, s, encoding=None):
> - """This version of recode tries to encode unicode to bytecode,
> - and preferably using the UTF-8 codec.
> - Other types than Unicode are silently returned, this is by
> - intention, e.g. the None-type is not going to be encoded but instead
> - just passed through
> - """
> - if not encoding:
> - encoding = self.encoding or 'utf-8'
> -
> - if isinstance(s, unicode):
> - return s.encode(encoding)
> - else:
> - # leave it alone
> - return s
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
More information about the Mercurial-devel
mailing list