[PATCH 05 of 14] obsstore: add a method to incrementally retrieve obsmarkers
Augie Fackler
raf at durin42.com
Fri Jul 14 18:07:16 UTC 2017
On Sun, Jul 09, 2017 at 07:55:17PM +0200, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld <boris.feld at octobus.net>
> # Date 1497674383 -7200
> # Sat Jun 17 06:39:43 2017 +0200
> # Node ID 7c41d1484f53c53d516153d19a90c331c3f87c16
> # Parent e73d330f6fdb0cf1635920f4302b61b321ec58e2
> # EXP-Topic obs-cache
> obsstore: add a method to incrementally retrieve obsmarkers
This one and its follow-up both look interesting, but I don't know if
we have any imminent callers?
>
> Parsing the full obsstore is slow, so cache that depends on obsstore content
> wants a way to know if the obsstore changed, and it this change was append
> only.
>
> For this purpose we introduce an official cache-key for the obsstore. This
> cache-key work in a way similar to the '(tiprev, tipnode)' pair used for the
> changelog. We use the size of the obsstore file and the hash of its tail. That
> way, we can check if the obsstore grew and if the content we knew is still
> present in the obsstore.
>
> The cachekey comes with a method that only returns the obsmarkers that changed
> between the last seen "cache-key" value and the current state of the
> repository. That method is race free and should be used by caches. See
> documentation for details.
>
> note: we are reading the full obsstore file from disk as part of the key
> validation process. This could be avoided but we keep the implementation
> simple for now. Once it is in place and running we can iterate to make it
> better.
>
> diff -r e73d330f6fdb -r 7c41d1484f53 mercurial/obsolete.py
> --- a/mercurial/obsolete.py Sun Jun 04 10:02:09 2017 -0700
> +++ b/mercurial/obsolete.py Sat Jun 17 06:39:43 2017 +0200
> @@ -70,6 +70,7 @@
> from __future__ import absolute_import
>
> import errno
> +import hashlib
> import struct
>
> from .i18n import _
> @@ -513,6 +514,10 @@
> # parents: (tuple of nodeid) or None, parents of precursors
> # None is used when no data has been recorded
>
> + # how much data to read at the end of file,
> + # 1024 byte should be about 10 markers in average
> + _obskeyspan = 1024
> +
> def __init__(self, svfs, defaultformat=_fm1version, readonly=False):
> # caches for various obsolescence related cache
> self.caches = {}
> @@ -540,6 +545,72 @@
>
> __bool__ = __nonzero__
>
> + def getmarkerssince(self, previouscontentkey):
> + """retrieve all new markers (since "contentkey") + updated content key
> +
> + This function is to be used by cache depending on obsstore content. It
> + make sure cache can incrementally update themselves without fear
> + obsstore stripping or race condition. If the content key is invalid
> + (some obsstore content got stripped), all markers in the obsstore will
> + be returned.
> +
> + return: (reset, obsmarkers, contentkey)
> +
> + :reset: boolean, True if previouscontentkey was invalided. Previously
> + stored data are invalid and should be discarded. Full markers
> + content is return in this case.
> +
> + :obsmarkers: the list of new obsmarkers.
> +
> + :contentkey: a key matching the content of 'previouscontentkey' +
> + 'obsmarkers'.
> +
> + The content key is a pair of:
> +
> + (obsstore size, hash of last N bytes of the obsstore)
> +
> + It must be kept around by cache and provided again for the next
> + incremental read of new obsmarkers.
> + """
> + # XXX This function could avoid loading the whole data from disk (and
> + # only read new markers). It currently use 'self._data' to keep the
> + # code simpler.
> + keysize, keyhash = previouscontentkey
> + fulldata = self._data
> +
> + # check the existing cache key
> + reset = False
> + if len(fulldata) < keysize: # less data than expected, this is a strip
> + reset = True
> + else:
> + if keysize == 0: # no obsstore
> + actualhash = node.nullid
> + else:
> + first = max(0, keysize - self._obskeyspan)
> + keydata = fulldata[first:keysize]
> + actualhash = hashlib.sha1(keydata).digest()
> + reset = actualhash != keyhash
> + newsize = len(fulldata)
> +
> + # read the new data
> + if reset:
> + keysize = None # read all data
> + elif keysize == newsize:
> + # no update since last change, exist early
> + return False, [], previouscontentkey
> + if newsize:
> + start = max(0, newsize - self._obskeyspan)
> + newhash = hashlib.sha1(fulldata[start:newsize]).digest()
> + __, obsmarkers = _readmarkers(fulldata, keysize, newsize)
> + else:
> + # obsstore is empty, use a generic hash and skip reading markers.
> + newhash == node.nullid
> + obsmarkers = []
> +
> + # for now and for the sake of simplicity make sure obsmarkers is a list
> + obsmarkers = list(obsmarkers)
> + return reset, obsmarkers, (newsize, newhash)
> +
> @property
> def readonly(self):
> """True if marker creation is disabled
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
More information about the Mercurial-devel
mailing list