[PATCH 3 of 3] use per-directory clustered stat calls even in cases where known tree is walked

Petr Kodl petrkodl at gmail.com
Mon Oct 6 10:59:52 UTC 2008

> Wouldn't it be cleaner to call normpath on those filenames ?

normcase is probably what you want in this case. I submitted a patch 
based on Benoit's suggestions a while ago, but here it is again - just 
in case . It is reasonably clean except for one thing - the error 
handling of errors coming from os.listdir - result of some 
incompatibilities in Python versions

use per-directory clustered stat when appropriate

util module implements two versions of statfiles function

_statfiles_direct calls lstat per file

_statfiles_clustered takes advantage of optimizations in osutil.c and 
stats all
files in directory at once when new directory is hit

util.statfiles dispatches to appropriate version during module loading

At the moment win32 uses clustered stat, rest of OS versions use the direct

diff -r 8b58cb4b4395 mercurial/dirstate.py
--- a/mercurial/dirstate.py	Wed Oct 01 18:23:25 2008 -0400
+++ b/mercurial/dirstate.py	Thu Oct 02 12:56:13 2008 -0400
@@ -535,17 +535,11 @@
                          results[nf] = None

          # step 3: report unseen items in the dmap hash
-        visit = [f for f in dmap if f not in results and match(f)]
-        for nf in util.sort(visit):
-            results[nf] = None
-            try:
-                st = lstat(join(nf))
-                kind = getkind(st.st_mode)
-                if kind == regkind or kind == lnkkind:
-                    results[nf] = st
-            except OSError, inst:
-                if inst.errno not in (errno.ENOENT, errno.ENOTDIR):
-                    raise
+        visit = util.sort([f for f in dmap if f not in results and 
+        for nf, st in zip(visit, util.statfiles([join(i) for i in visit])):
+            if not st is None and not getkind(st.st_mode) in (regkind, 
+                st = None
+            results[nf] = st

          del results['.hg']
          return results
diff -r 8b58cb4b4395 mercurial/util.py
--- a/mercurial/util.py	Wed Oct 01 18:23:25 2008 -0400
+++ b/mercurial/util.py	Thu Oct 02 12:56:13 2008 -0400
@@ -798,6 +798,55 @@
  def openhardlinks():
      '''return true if it is safe to hold open file handles to hardlinks'''
      return True
+def _statfiles_direct(files):
+    '''Stat each file in the list
+    Return None if files does not exist'''
+    lstat = os.lstat
+    for nf in files:
+        try:
+            st = lstat(nf)
+        except OSError, err:
+            if err.errno not in (errno.ENOENT, errno.ENOTDIR):
+                raise
+            st = None
+        yield st
+def _statfiles_clustered(files):
+    '''Stat each file in the list
+    Return None if files does not exist
+    Cluster stat per directory to improve performance'''
+    lstat = os.lstat
+    ncase = os.path.normcase
+    sep   = os.sep
+    dircache = {} # dirname->filename->status
+    for nf in files:
+        nf  = ncase(nf)
+        pos = nf.rfind(sep)
+        if pos == -1:
+            dir, base = '.', nf
+        else:
+            dir, base = nf[:pos], nf[pos+1:]
+        cache = dircache.get(dir, None)
+        if cache is None:
+            try:
+                dmap = dict([(ncase(n), s)
+                    for n, k, s in osutil.listdir(dir, True)])
+            except OSError, err:
+                # 3 is required to work around Win32 errno handling
+                # of directory not found
+                # Python <= 2.4 returns errno native winerr 3
+                # starting Python 2.5 this behaves correctly and 
returns ENOENT
+                if err.errno not in (3, errno.ENOENT, errno.ENOTDIR):
+                    raise
+                dmap = {}
+            cache = dircache.setdefault(dir, dmap)
+        yield cache.get(base, None)
+if sys.platform == 'win32':
+    statfiles = _statfiles_clustered
+    statfiles = _statfiles_direct

  getuser_fallback = None

More information about the Mercurial-devel mailing list