A hacker's view of .hg [SOLVED]
Nathan Goldbaum
nathan12343 at gmail.com
Tue Jul 10 14:25:26 UTC 2018
You might find the fsmonitor extension interesting:
https://www.mercurial-scm.org/wiki/FsMonitorExtension
On Tue, Jul 10, 2018 at 9:13 AM, Bob Hood <bhood2 at comcast.net> wrote:
> In the absence of an immediate response from anybody with intimate
> knowledge of the internals here, I took it upon myself to do some cursory
> research.
>
> I looked through the dirstate.py file in the Mercurial distribution, and
> did some tests, changing the state of a file and watching the reaction in
> the .hg/dirstate file. My conclusion is that any state change requiring
> Mercurial to affect -- add, remove, etc. -- sets that state into the
> .hg/dirstate file. Simply modifying a managed file, however, does not
> require a Mercurial command to produce a difference in state, so nothing is noted
> in .hg/dirstate. The only way Mercurial would then know about the modified
> file is as a result of a side effect of some other Mercurial command --
> status, id, etc.
>
> Mercurial commands that require knowledge of complete repository states
> will trigger a traversal of the contents of the .hg/dirstate file, checking
> for status changes that are not implicitly cached by other commands. So,
> the overhead I'm experiencing is a direct function of the *number* of the
> cached entries in the .hg/dirstate file. Mercurial must traverse all
> managed files, checking for state changes (in the case of modification, I'm
> guessing it's probably checking time stamps). So, the amount of time it
> takes to perform a "status" will vary depending on the *count* of the
> managed files in the working copy. Hence, a repository with a single file
> will perform a "status" much faster than one with tens of thousands of
> files. Of course, the I/O speed of the drive then also plays a role.
>
> I would guess that the overhead of the Python runtime in this is pretty
> much constant, taking as much time for a working copy with a single file as
> it does for one with thousands.
>
> So it seems that, without some persistent process running in the
> background monitoring what is otherwise the *unmonitored* state of a
> working copy (like the command server, or the TortoiseHG Overlay Icon
> Server), the overhead incurred is simply the result of the design of
> Mercurial, and cannot be avoided. There's no way to determine the accurate
> "dirtiness" of a working copy without traversing it.
>
>
> On 7/7/2018 1:33 PM, Bob Hood wrote:
>
> I recently discovered a pretty nice FLOSS shell system for Windows. Being
> a UN*X geek, I tend to do a lot of my work from the command line, and for
> years, used TCC LE as a result. However, the newest version of Visual
> Studio has kind of broken TCC LE for me, so in searching for a replacement,
> I came across Cmder[1], and kind of fell in love. :)
>
> As you can imagine, most of the customization efforts on projects like
> this are for git. The Mercurial support that Cmder has is horribly
> inefficient, with the prompt code that displays the branch and working copy
> status taking between 5 and 10 seconds to complete on my working copy each
> time the prompt was being generated. They were using some horrendous
> command that generated tons of output just to determine if there were
> changes pending in the working copy. I refactored that code to use "id"
> instead, since it provides both the branch name and an indication of the
> status of pending changes within the working copy. (I have issued a pull
> request on the main project for these changes.)
>
> This reduced the huge time overhead down to about 1 second on my
> repository. However, when I went into a git working copy, and the info
> response for its branch and status within Cmder was just about instant.
> Certainly, 1 second is better than 10, but even that 1 second gets a bit
> tiresome when you have many things to do from the command line.
>
> So, my question here has to do with the internals of the .hg folder. I
> see that there is a "branch" file inside it that contains the value of the
> active branch in the working copy, so reading that directly (instead of
> trying to parse it from "hg id") would be very fast. However, is there
> some corresponding indicator within the .hg folder that holds the state of
> the working copy with regard to pending changes (i.e., if it's "dirty" or
> "clean")? Reading something like that directly would hopefully also be
> *much* faster than waiting for the output of the "id" command.
>
> Thanks in advance for any insights, or alternative approaches.
>
> (btw, I'm using the hg.exe that comes with TortoiseHG, and it is v4.6.1)
>
>
> [1] https://github.com/cmderdev/cmder
>
>
>
> _______________________________________________ Mercurial mailing list
> Mercurial at mercurial-scm.org https://www.mercurial-scm.org/
> mailman/listinfo/mercurial
>
>
>
> _______________________________________________
> Mercurial mailing list
> Mercurial at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20180710/304428c0/attachment-0002.html>
More information about the Mercurial
mailing list