[PATCH 0 of 3] Transactional support for rebase/strip to avoid permanent repository corruptions

Henrik Stuart hg at hstuart.dk
Wed Apr 15 20:34:29 UTC 2009


The following three patches solves a rather serious corruption problem
that we have experienced several times on NTFS file systems. We posit that
it is a general problem, but that it manifests infrequently on ext3, since
ext3 tries to guard against truncate failures (among other things).

The first patch solves that the transaction journal (.hg/store/journal
typically), is always deleted, regardless of whether an abort has succeeded
or not. Currently, there is no way to recover partially succeeded revlog 
updates, since the file is always deleted.

The second patch solves two things: a race condition in the use of a weakref
by testing its liveness without having assigned it to a strong ref first. 
Secondly, the use of weakref here does not guarantee that a transaction that 
is no longer running, will not be reused the next time a transaction is 
requested. This poses a problem in the following use scenario:

  tr = repo.transaction()
  ...
  tr.close()

  tr = repo.transaction()
  ...
  tr.close()

as the second repo.transaction() may yield the previous transaction.
Furthermore, since localrepository.transaction defines an "after" function
that renames journal* to undo*, and if the second call to transaction
returns the first transaction, then the "after" function will be called
again, but now there is suddenly no journal file(s) to rename anymore.
This is solved by, in addition to querying whether there is a weakref, to
see whether the transaction is running - if it is not, a new transaction
is returned.

The alternative would have been to delete the transaction inside repair.strip,
but this would mean that we would never allow strip to be called nested
inside another transaction, something that we were not comfortable with
just decreeing (addchangegroup has this limit currently, but is it
intentional?).

The last patch depends on the first two and solves a corruption issue
of the repository when using strip or rebase that we have been seeing too
frequently on Windows. It does this by collecting all the revlog filenames
that needs to be stripped, add these to a transaction, then proceed to
truncate the files and then commit. If we fail underway, the transaction's
__del__ method will be invoked, and the abort will run (note that the
abort does exactly the same: truncates each file), the difference is that
if the abort also fails, the journal will still be there, and hg recover
can/must be run until the repository is correctly stripped and it is in
a good state again.

We make no effort to unbundle the saved bundle when hg recover is run on
something that is only partially stripped. This would require that the 
transaction journal be extended for these types of changes as well and 
that would needlessly complicate matters right now. One can, in this 
instance, merely unbundle the saved bundle and get to the right state.

There is no easy way to test this in an automated script, I am afraid,
so no new hg tests have been made.

-- 
Kind regards,
  Henrik Stuart


More information about the Mercurial-devel mailing list