Patch: Regular expression support for convert extension's 'filemap' option.

Martin Blais blais at furius.ca
Wed Jul 7 00:14:57 UTC 2010


Hi,

Here is a patch that adds support for regular expressions in
the convert command's 'filemap' option.

I was attempting to convert a two-year old Subversion
repository which unfortunately has had unnecessary large
files committed to it in the past (e.g. Java jar files). I
needed to exclude these files by name, but the filenames
varied over the history of the SVN repo; what I needed was to
be able to say "exclude all the Java jar files and the .so"
and so on. The 'filemap' option allows one to exclude files
by name, but not by pattern.

This patch solves that problem by adding a new matching
expression 'exclude_re' to the filemap format which
interprets its filename as a regular expression::

  exclude_re  .*\\.(jar|zip|gz|tgz|bz2|tar|as|cs|exe|so|a|dll|swf|swc|swz|png|xsd|jpg|jpeg|gif|ttf|mp3|fla|pdb|pdf|pem|vcproj|html|chm|ppt|sxi|log)$
  exclude_re  .*\\.so\\..*$
  exclude     bin
  exclude     ThirdParty

I frankly don't see how I could have converted the team from
the Subversion repository without this patch; without
excluding the large files from the history, the converted
mercurial repository is upwards of 6GB which is too large. By
excluding the large files I brought it down to 350MB (which
is a very reasonable size given the project).

(The patch is trivial; I'm extremely busy and this was going
to get lost in the day-to-day grind, but I figured someone
might find it attractive enough and merge it in, so here it is
on the mailing-list. I normally don't monitor the list.)

I love Mercurial! Keep up the amazing work.
cheers,




(Why we need this patch)

  Because it makes importing old repositories with large
  files possible. This expands the usable domain of
  Mercurial. I could not have converted this repository
  without it.


(How you've implemented it)

  I've modified a single file: 'hgext/convert/filemap.py' to
  add a new recognized pattern: 'exclude_re'.


(What file formats and data structures you've used)

  Similar to what was there. 


(What choices you've made)

  Implemented a regexp variant of the lookup function in that
  module.


(Why the choices you've made are the right ones)

  I kept it as simple as possible.


(Why the choices you didn't make are the wrong ones)

  N/A


(What shortcomings exist)

  I did not implement the corresponding 'include_re', it
  would make sense to do so.


(What compatibility issues exist)

  I've only added a new exclude patter to the filemap, did
  not remove any. The file format should support all the
  previous 'filemap commands' and be backwards compatible.
  This should have no impact on compat.


(What's missing, if anything )

  An option to exclude files by size would also be useful
  (i.e., exclude if size is larger than X), but it was
  non-trivial to implement, and this did the job wonderfully.


(Testing)

  I ran the test suite against hg-stable using 'make tests'.
  All tests pass. (I did not add a new test, however.)

    tangerine:~/src/hg-stable$ make tests
    cd tests && python run-tests.py 
    ................................s......................s............s.......sss.........s.....................................................................................................................................................................................s.................................................................................................................
    Skipped test-casefolding: missing feature: case insensitive file system
    Skipped test-convert-baz: missing feature: GNU Arch baz client
    Skipped test-convert-darcs: missing feature: darcs client
    Skipped test-convert-mtn: missing feature: monotone client (> 0.31)
    Skipped test-convert-p4: missing feature: Perforce server and client
    Skipped test-convert-p4-filetypes: missing feature: Perforce server and client
    Skipped test-convert-tla: missing feature: GNU Arch tla client
    Skipped test-no-symlinks: system supports symbolic links
    # Ran 384 tests, 8 skipped, 0 failed.
    tangerine:~/src/hg-stable$ 
    
    



Below is the patch:
--------------------------------------------------------------------------------


util02:~/src/hg-stable$ /usr/bin/hg export -r 10791
# HG changeset patch
# User Martin Blais <blais at furius.ca>
# Date 1275675257 14400
# Node ID 594f38d73da1829badca578b0881f1cf64e564c5
# Parent  efd3b71fc29315e79a29033fdd0d149b309eb398
Added support for regular expressions.

diff -r efd3b71fc293 -r 594f38d73da1 hgext/convert/filemap.py
--- a/hgext/convert/filemap.py	Thu Mar 04 13:10:48 2010 +0100
+++ b/hgext/convert/filemap.py	Fri Jun 04 14:14:17 2010 -0400
@@ -4,7 +4,7 @@
 # This software may be used and distributed according to the terms of the
 # GNU General Public License version 2 or any later version.
 
-import shlex
+import shlex, re
 from mercurial.i18n import _
 from mercurial import util
 from common import SKIPREV, converter_source
@@ -25,6 +25,7 @@
         self.ui = ui
         self.include = {}
         self.exclude = {}
+        self.exclude_re = []
         self.rename = {}
         if path:
             if self.parse(path):
@@ -51,6 +52,9 @@
                 errs += check(name, self.include, 'include')
                 errs += check(name, self.rename, 'rename')
                 self.exclude[name] = name
+            elif cmd == 'exclude_re':
+                regexp = lex.get_token()
+                self.exclude_re.append(re.compile(regexp))
             elif cmd == 'rename':
                 src = lex.get_token()
                 dest = lex.get_token()
@@ -73,15 +77,24 @@
                 pass
         return '', name, ''
 
+    def lookup_re(self, name, remapping):
+        for pre, suf in rpairs(name):
+            if any(r.match(pre) for r in remapping):
+                return pre, pre, suf
+        return '', name, ''
+
     def __call__(self, name):
         if self.include:
             inc = self.lookup(name, self.include)[0]
         else:
             inc = name
+
+        exc = ''
         if self.exclude:
             exc = self.lookup(name, self.exclude)[0]
-        else:
-            exc = ''
+        if self.exclude_re:
+            exc = self.lookup_re(name, self.exclude_re)[0]
+
         if (not self.include and exc) or (len(inc) <= len(exc)):
             return None
         newpre, pre, suf = self.lookup(name, self.rename)
@@ -94,7 +107,7 @@
         return name
 
     def active(self):
-        return bool(self.include or self.exclude or self.rename)
+        return bool(self.include or self.exclude or self.exclude_re or self.rename)
 
 # This class does two additional things compared to a regular source:
 #
util02:~/src/hg-stable$ 




More information about the Mercurial mailing list