[PATCH 3 of 3] use per-directory clustered stat calls even in cases where known tree is walked
Petr Kodl
petrkodl at gmail.com
Mon Oct 6 10:59:52 UTC 2008
>
> Wouldn't it be cleaner to call normpath on those filenames ?
>
normcase is probably what you want in this case. I submitted a patch
based on Benoit's suggestions a while ago, but here it is again - just
in case . It is reasonably clean except for one thing - the error
handling of errors coming from os.listdir - result of some
incompatibilities in Python versions
use per-directory clustered stat when appropriate
util module implements two versions of statfiles function
_statfiles_direct calls lstat per file
_statfiles_clustered takes advantage of optimizations in osutil.c and
stats all
files in directory at once when new directory is hit
util.statfiles dispatches to appropriate version during module loading
At the moment win32 uses clustered stat, rest of OS versions use the direct
version.
diff -r 8b58cb4b4395 mercurial/dirstate.py
--- a/mercurial/dirstate.py Wed Oct 01 18:23:25 2008 -0400
+++ b/mercurial/dirstate.py Thu Oct 02 12:56:13 2008 -0400
@@ -535,17 +535,11 @@
results[nf] = None
# step 3: report unseen items in the dmap hash
- visit = [f for f in dmap if f not in results and match(f)]
- for nf in util.sort(visit):
- results[nf] = None
- try:
- st = lstat(join(nf))
- kind = getkind(st.st_mode)
- if kind == regkind or kind == lnkkind:
- results[nf] = st
- except OSError, inst:
- if inst.errno not in (errno.ENOENT, errno.ENOTDIR):
- raise
+ visit = util.sort([f for f in dmap if f not in results and
match(f)])
+ for nf, st in zip(visit, util.statfiles([join(i) for i in visit])):
+ if not st is None and not getkind(st.st_mode) in (regkind,
lnkkind):
+ st = None
+ results[nf] = st
del results['.hg']
return results
diff -r 8b58cb4b4395 mercurial/util.py
--- a/mercurial/util.py Wed Oct 01 18:23:25 2008 -0400
+++ b/mercurial/util.py Thu Oct 02 12:56:13 2008 -0400
@@ -798,6 +798,55 @@
def openhardlinks():
'''return true if it is safe to hold open file handles to hardlinks'''
return True
+
+def _statfiles_direct(files):
+ '''Stat each file in the list
+ Return None if files does not exist'''
+ lstat = os.lstat
+ for nf in files:
+ try:
+ st = lstat(nf)
+ except OSError, err:
+ if err.errno not in (errno.ENOENT, errno.ENOTDIR):
+ raise
+ st = None
+ yield st
+
+def _statfiles_clustered(files):
+ '''Stat each file in the list
+ Return None if files does not exist
+ Cluster stat per directory to improve performance'''
+ lstat = os.lstat
+ ncase = os.path.normcase
+ sep = os.sep
+ dircache = {} # dirname->filename->status
+ for nf in files:
+ nf = ncase(nf)
+ pos = nf.rfind(sep)
+ if pos == -1:
+ dir, base = '.', nf
+ else:
+ dir, base = nf[:pos], nf[pos+1:]
+ cache = dircache.get(dir, None)
+ if cache is None:
+ try:
+ dmap = dict([(ncase(n), s)
+ for n, k, s in osutil.listdir(dir, True)])
+ except OSError, err:
+ # 3 is required to work around Win32 errno handling
+ # of directory not found
+ # Python <= 2.4 returns errno native winerr 3
+ # starting Python 2.5 this behaves correctly and
returns ENOENT
+ if err.errno not in (3, errno.ENOENT, errno.ENOTDIR):
+ raise
+ dmap = {}
+ cache = dircache.setdefault(dir, dmap)
+ yield cache.get(base, None)
+
+if sys.platform == 'win32':
+ statfiles = _statfiles_clustered
+else:
+ statfiles = _statfiles_direct
getuser_fallback = None
More information about the Mercurial-devel
mailing list