RFC: exact change detection for non append-only files

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Tue Nov 17 17:10:56 UTC 2015


Now, 'filecache' detects changes of files by 'cachestat.__eq__()' of
posix.py on POSIX platform, and it examines:

  - st_size:

    This works for append-only files (like revlog) as expect in all
    cases (doesn't it ?)

    But some status files (e.g. dirstate, bookmarks and so on) may not
    be changed in size, even if they are actually changed.

  - st_mtime:

    For non append-only files, this works as expect in many cases. But
    'st_mtime' doesn't have enough resolution for recent computing and
    I/O speed, even if it is represented in float (see also issue4836
    for more detail).

  - st_ino:

    This can compensate for 'st_mtime', because copy-on-write
    semantics always changes st_ino.

Therefore, 'st_ino' is the last bastion for change detection of
dirstate and so on.

But inode is quickly reused on some filesystems (perhaps for
performance reason), and it prevents examination of 'st_ino' from
detecting changes as expected.

My instant ideas to detect changes correctly even in such situation
are:

  - ignore this very very rare case :-)

    Because the inode, which is used previously for status file X,
    should be reused for X again, at occurrence of this issue.

  - writer: save also hash of data at writing data out
    reader: check hash, if 'st_ino' can't detect changes

    (e.g. '.hg/dirstate.hash' for '.hg/dirstate')

    This requires reading whole data file in to calculate hash value,
    and it easily decrease performance.

  - writer: incremental and write "generation id" at writing data out
    reader: check "generation id", if 'st_ino' can't detect changes

    (e.g. '.hg/dirstate.genid' for '.hg/dirstate')

Or some other reasonable ones ?

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list