Repo corrupted again, no idea why

Luis Navarro lunadesign at gmail.com
Sat Oct 2 05:56:55 UTC 2010


On Fri, Oct 1, 2010 at 1:37 PM, Adrian Buehlmann <adrian at cadifra.com> wrote:

>
> There error scenario for the hardlink problem mentioned is described at
> http://mercurial.selenic.com/bts/issue761
>
> In that scenario, as I understand matters, there are no error messages
> at the moment the problem starts happening. The corruption is silent.
>
> The problem is, if a 'hg clone' was done (without --pull option), then
> the destination and the source repo share files inside .hg/store by
> using hardlinks [1], if the filesystem provides the hardlinking feature
> (NTFS does).
>
> Mercurial is designed to break such hardlinks inside .hg if a commit or
> push is done to one of the clones. The prerequisite for this is, that
> the Windows API mercurial is using should give a correct answer, if
> mercurial is asking "how many hardlinks are on this file?".
>
> We found out that this answer is almost always wrong (always reporting
> 1, even if it is in fact >1) iff the hg process is running on one
> Windows computer and the repository files are on a network share on a
> different Windows computer.
>

Considering hard links are all about referencing counting, it amazes me that
the OS can't be bothered to provide an accurate reference count.  Sigh.


> Due to that Windows API implementation sillyness (even with the latest
> and greatest Windows 7), mercurial failed to break hardlinks for
> repositories on network shares for quite a long time now (years, that
> is). This means older mercurial versions that don't have the fix may
> corrupt repositories pretty seriously when committing or pushing to a
> repo on a network share.
>
> The fix now detects if the repo is write-accessed *over a network share*
> and then unconditionally assumes all files are hardlinked. So it
> unconditionally executes the "breaking hardlink" method for all files it
> is writing to on a network share (making it slower but safe now).
>
> Breaking a hardlink in this situation means creating a normal full blown
> copy of the file before writing to it, completely separating both files
> from each other (writes to one file no longer affect the other).
>
> Not copying such a file if it is hardlinked means the file modification
> appears in all clones (even though it should be done only on the clone
> where the commit or push is made to).
>
> Clones that were done using 'hg clone --pull' or using Windows Explorer
> should not be affected.
>

I'm not sure whether --pull was used since all of the clones were created
using TortoiseHg 1.1.1.  I didn't check the "Use pull protocol to copy
metadata" but from the docs I've seen, it seems that Mercurial uses hard
links in some situations but not in others.  How do I check if hard links
were used?  Both servers are Windows 2008 x64.  I've seen mentions of some
older Windows Resource Kit tools for searching for hard links but can't seem
to find anything for Win 2008.

Also what do you mean by "using Windows Explorer should not be affected"?
Does this include a Windows Explorer extension like THG?


>  >
> >     Y: looks like a network drive. Do you use Mercurial 1.6.2 or later
> >     which contains the fix http://selenic.com/repo/hg//rev/50523b4407f6?
>
> I think the fix was first released with 1.6.3, according to
> http://mercurial.selenic.com/wiki/WhatsNew (Mads confirmed this on IRC
> in the mean time).
>
> >     If that doesn't explain and fix it: Can you check if files in the
> >     two repositories are hardlinked to each other?
> >
> >
> > Sorry if this is a silly question but how would I do that?
>
> The question is: did you do a 'hg clone' without --pull? In your
> situation, to be safe, I would assume this happened and act accordingly.
>

I'm not sure...see above.


> Stop write access to the share, restore the corrupted repositories from
> backup and upgrade all Mercurial installs on all workstations and
> servers to a version that has the fix.
>


> Then enable write access on the share again.
>

The "restore from backup" part might be tricky.  I believe the initial repo
was the staging one and the rest were cloned from it but I can't be sure.
If I can find some way to check whether a repo is using hard links or not, I
could use that as the "source" repo for re-creating the rest.
Alternatively, is there some way to take a current repo that passes "verify"
and break all the hard links to avoid this sillyness in the Windows API.

I think 1.6.3 or newer *should* be safe to use for committing and
> pushing to Windows network shares.
>
> Although I do not consider pushing and committing to a share
> particularly robust.
>
> [1] http://en.wikipedia.org/wiki/Hard_link
>

I'm not sure how to parse this...you say committing/pushing to a network
share "should be safe" but then say you don't consider it "particularly
robust".  With something as important as source control, its either got to
work 100% or its not worth the hassle of losing data.  As a result, I'd love
to see some official guidance from the Mercurial devs on whether or not
pushes/commits using network shares are reliable.

Thanks for your help Adrian!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20101001/c5862be8/attachment.html>


More information about the Mercurial mailing list