[PATCH 01 of 15] hgext: `speedy` - the history query accelerator

Tomasz Kłeczek tomasz.kleczek at gmail.com
Tue Dec 11 23:28:23 UTC 2012


I accidentally based my patches on stable, i'll fix it in V2.


On Tue, Dec 11, 2012 at 10:38 AM, Tomasz Kleczek <tkleczek at fb.com> wrote:

> # HG changeset patch
> # User Tomasz Kleczek <tkleczek at fb.com>
> # Date 1355250659 28800
> # Branch stable
> # Node ID 13c6bcb8dd900dc7dbf5e3da9ef68d56fed250b3
> # Parent  8973e7dd92d5afdeb82e91d5b66934d53a74e8da
> hgext: `speedy` - the history query accelerator
>
> This is the first in a series of patches that address the problem
> of hg log being slow on a big repo in most cases.
>
>
> Many history queries have performance linear in the number of commits
> or even worse - the size of manifest.
>
> As a result almost every hg log query with non-trivial rev range or
> with directory specified is painfully slow on big repositories
> and its poor performance doesn't depend on actual output size.
>
> Here are some frequently used commands that just print a handful of
> log messages:
>   * hg log dir_with_few_commits
>   * hg log --rev "user(MyLazyFriend)"
>
> They may take more then 30 secs on a big repo (with couple hundred
> thousands of commits).
>
> This extension addresses this problem by introducing a server component
> that maintains indices over the history and uses them to respond to queries
> in an efficient manner.
>
> What is going on:
>
> * the client component forwards certain history queries to the server
>   and waits for a response
> * it takes into account that its history may have diverged from
>   the server's, and still gives correct answers
> * if the server doesn't respond fast enough or crashes, the client falls
>   back to computing the answer locally using normal code path
> * extension setup time is neglible and there is no overhead for queries
>   that cannot be accelerated by the server
>
> The server can be run:
> * locally in the same process as client or
> * remotely, using a custom protocol over http to communicate
>
> Sample performance gains (while running remote history server):
> All commands are run with -l1 option so that displaying output in the
> terminal doesn't affect the measurements:
>
> All commands takes roughly 30 secs without the extension enabled.
>
>   hg log somedir -l1
>     -> time reduced to 2.4 sec
>
>   hg log --rev "author(someuser)" -l1
>     -> time reduced to 1.6 sec
>
>   hg log --rev "date(10/1/2012)" -l1
>     -> time reduced to 1.7 sec
>
>   hg log "relglob:**.html" -l1
>     -> time reduced to 2.4 sec
>
>   hg log . -l1
>     -> time reduced to 11.9 sec
>
> This patch introduces support for `author` revset query. More queries
> will be added in subsequent patches.
>
> diff --git a/hgext/speedy/__init__.py b/hgext/speedy/__init__.py
> new file mode 100644
> --- /dev/null
> +++ b/hgext/speedy/__init__.py
> @@ -0,0 +1,10 @@
> +# Copyright 2012 Facebook
> +#
> +# This software may be used and distributed according to the terms of the
> +# GNU General Public License version 2 or any later version.
> +
> +import client
> +
> +def uisetup(ui):
> +    if ui.configbool('speedy', 'client', False):
> +        client.uisetup(ui)
> diff --git a/hgext/speedy/client.py b/hgext/speedy/client.py
> new file mode 100644
> --- /dev/null
> +++ b/hgext/speedy/client.py
> @@ -0,0 +1,33 @@
> +# Copyright 2012 Facebook
> +#
> +# This software may be used and distributed according to the terms of the
> +# GNU General Public License version 2 or any later version.
> +
> +from mercurial import extensions, commands
> +from mercurial import revset
> +
> +def patchedauthor(repo, subset, x):
> +    """Return the revisions commited by user whose name match x
> +
> +    Used to monkey patch revset.author function.
> +    """
> +    # In the subsequent patches here we are going to forward the query
> +    # to the server
> +    return revset.author(repo, subset, x)
> +
> +def _speedysetup(ui, repo):
> +    """Initialize speedy client."""
> +    revset.symbols['author'] = patchedauthor
> +
> +def uisetup(ui):
> +    # Perform patching and most of the initialization inside log wrapper,
> +    # as this is only needed if log command is being used
> +    initialized = [False]
> +    def logwrapper(cmd, *args, **kwargs):
> +        repo = args[1]
> +        if not initialized[0]:
> +            initialized[0] = True
> +            _speedysetup(ui, repo)
> +        cmd(*args, **kwargs)
> +
> +    extensions.wrapcommand(commands.table, 'log', logwrapper)
> diff --git a/tests/test-speedy.t b/tests/test-speedy.t
> new file mode 100644
> --- /dev/null
> +++ b/tests/test-speedy.t
> @@ -0,0 +1,44 @@
> +Global config file
> +  $ cat >> $HGRCPATH <<EOF_END
> +  > [ui]
> +  > logtemplate = "{desc}\n"
> +  >
> +  > [extensions]
> +  > speedy=
> +  > EOF_END
> +
> +Preparing local repo
> +
> +  $ hg init localrepo
> +  $ cd localrepo
> +
> +  $ mkdir d1
> +  $ echo chg0 > d1/chg0
> +  $ hg commit -Am chg0 -u testuser1
> +  adding d1/chg0
> +  $ echo chg1 > d1/chg1
> +  $ hg commit -Am chg1 -u testuser2 --date "10/20/2012"
> +  adding d1/chg1
> +  $ echo chg2 > d1/chg2
> +  $ hg commit -Am chg2 -u testuser1
> +  adding d1/chg2
> +  $ mkdir d2
> +  $ echo chg3 > d2/chg3.py
> +  $ hg commit -Am chg3 -u testuser1
> +  adding d2/chg3.py
> +  $ echo chg4 > d2/chg4
> +  $ hg commit -Am chg4 -u testuser1
> +  adding d2/chg4
> +  $ echo chg5 > chg5.py
> +  $ hg commit -Am chg5 -u testuser1 --date "10/20/2012"
> +  adding chg5.py
> +
> +  $ hg log -r "reverse(user(testuser1))"
> +  chg5
> +  chg4
> +  chg3
> +  chg2
> +  chg0
> +
> +  $ hg log -r "reverse(author(testuser2))"
> +  chg1
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial-devel/attachments/20121211/35c4d5d5/attachment-0002.html>


More information about the Mercurial-devel mailing list