[PATCH 3 of 3 RFC] localrepository: add caches to transaction (issue4255)
Gregory Szorc
gregory.szorc at gmail.com
Wed May 28 18:37:30 UTC 2014
On 5/28/14, 10:40 AM, Durham Goode wrote:
> On 5/27/14, 11:00 PM, "Gregory Szorc" <gregory.szorc at gmail.com> wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc at gmail.com>
>> # Date 1401253147 25200
>> # Tue May 27 21:59:07 2014 -0700
>> # Node ID 12c2a7ae1d443e6ea89ea6f8f1032bf2178af15d
>> # Parent 02939296b8052833d4c8bfa6e2e2f1371fe5c517
>> localrepository: add caches to transaction (issue4255)
>>
>> localrepository.transaction() now creates a chained transaction bound
>> to the .hg opener in addition to the .hg/store opener. This new
>> transaction creates backups of most files under .hg/cache.
>>
>> With this change, transaction rollbacks will restore caches, thus
>> preventing cache invalidation and recomputation. This will result
>> in considerable performance wins on repositories where cache
>> generation is slow.
>>
>> On a repository with 12,421 heads and manifests with near 100,000
>> entries, branch caches can take ~17 minutes to generate
>> (259s for base, 68s for immutable, 723s for served). On a server
>> that had hooks causing branch cache updates, transaction rollbacks
>> resulted in branch cache invalidation and an expensive computation.
>> This patch makes the problem go away.
>>
>> THIS PATCH CURRENTLY CAUSES A REGRESSION IN test-fncache.t.
>> THIS PATCH SHOULD ALSO HAVE EXPLICIT TEST COVERAGE THAT CACHES ARE
>> RESTORED.
>>
>> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
>> --- a/mercurial/localrepo.py
>> +++ b/mercurial/localrepo.py
>> @@ -848,8 +848,16 @@ class localrepository(object):
>>
>> def wwritedata(self, filename, data):
>> return self._filter(self._decodefilterpats, filename, data)
>>
>> + def _cachetransactionblacklist(self):
>> + """Set of cache files that are not safe to include in
>> transactions.
>> +
>> + Files in .hg/cache/ are included in transactions by default. To
>> + prevent that from happening, add an entry to this set.
>> + """
>> + return set(['storehash'])
>> +
>> def transaction(self, desc, report=None):
>> tr = self._transref and self._transref() or None
>> if tr and tr.running():
>> return tr.nest()
>> @@ -870,8 +878,27 @@ class localrepository(object):
>> "journal",
>> aftertrans(renames),
>> self.store.createmode,
>> onclose)
>> +
>> + # Add caches to a chained transaction.
>> + if self.ui.configbool("format", "usestore", True):
>> + try:
>> + def noreport(msg):
>> + pass
>> + tr2 = transaction.transaction(noreport, self.opener,
>> + "journal",
>> +
>> createmode=self.store.createmode)
>> + blacklist = self._cachetransactionblacklist()
>> + for filename, mode in self.vfs.readdir("cache"):
>> + if filename not in blacklist:
>> + tr2.addbackup("cache/%s" % filename,
>> hardlink=False)
>> + tr.chaintransaction(tr2)
>> +
>> + except OSError, e:
>> + if e.errno != errno.ENOENT:
>> + raise
>> +
>> self._transref = weakref.ref(tr)
>> return tr
>>
>> def _journalfiles(self):
>
>
> I’m not sure I like the chained transaction concept. How is the chain
> represented on disk (i.e. if the user ctrl+c’s, how does rollback handle
> it)? What happens if the _abort for one transaction fails or the process
> is killed before all the chained transactions are aborted? What if the
> same file is added to two transactions (perhaps by a poorly written
> extension)? These caches can be written outside the lock right (like
> during read only operations)? So what happens if someone modifies a cache
> outside the lock while another process is in the middle of transacting it?
>
> I like how elegantly this addresses your problem, but I think the
> transaction model is already fragile and difficult enough to reason about,
> and I think adding multiple simultaneous transactions will make it even
> harder to reason about transaction behavior in the future. If we want the
> caches to be transacted, I think we need to refactor the existing
> transaction to support them, make sure they are never written outside of a
> transaction, and make sure they are never written during read only
> operations.
>
> I have a local commit where I’ve started doing this for the phase cache,
> but that’s one of the easier ones since it is mostly modified inside locks
> already.
Thank you for the feedback and for confirming some of the concerns I had.
Since it sounds like you are already going down this road, I will hold
off and build on top of whatever you land.
FWIW, the issue of cache population outside of locks is bothersome. If a
cache is invalidated, you may have multiple processes (e.g. hgweb)
rebuilding the cache simultaneously. This can easily bring a server to
its knees. Perhaps I should work on a patch series to address that?
More information about the Mercurial-devel
mailing list