A hacker's view of .hg [SOLVED]

Bob Hood bhood2 at comcast.net
Tue Jul 10 14:13:31 UTC 2018


In the absence of an immediate response from anybody with intimate knowledge 
of the internals here, I took it upon myself to do some cursory research.

I looked through the dirstate.py file in the Mercurial distribution, and did 
some tests, changing the state of a file and watching the reaction in the 
.hg/dirstate file.  My conclusion is that any state change requiring Mercurial 
to affect -- add, remove, etc. -- sets that state into the .hg/dirstate file. 
Simply modifying a managed file, however, does not require a Mercurial command 
to produce a difference in state, so nothing isnoted in .hg/dirstate.  The 
only way Mercurial would then know about the modified file is as a result of a 
side effect of some other Mercurial command -- status, id, etc.

Mercurial commands that require knowledge of complete repository states will 
trigger a traversal of the contents of the .hg/dirstate file, checking for 
status changes that are not implicitly cached by other commands.  So, the 
overhead I'm experiencing is a direct function of the /number/of the cached 
entries in the .hg/dirstate file.  Mercurial must traverse all managed files, 
checking for state changes (in the case of modification, I'm guessing it's 
probably checking time stamps).  So, the amount of time it takes to perform a 
"status" will vary depending on the /count/of the managed files in the working 
copy.  Hence, a repository with a single file will perform a "status" much 
faster than one with tens of thousands of files.  Of course, the I/O speed of 
the drive then also plays a role.

I would guess that the overhead of the Python runtime in this is pretty much 
constant, taking as much time for a working copy with a single file as it does 
for one with thousands.

So it seems that, without some persistent process running in the background 
monitoring what is otherwise the /unmonitored/state of a working copy (like 
the command server, or the TortoiseHG Overlay Icon Server), the overhead 
incurred is simply the result of the design of Mercurial, and cannot be 
avoided. There's no way to determine the accurate "dirtiness" of a working 
copy without traversing it.


On 7/7/2018 1:33 PM, Bob Hood wrote:
> I recently discovered a pretty nice FLOSS shell system for Windows.  Being a 
> UN*X geek, I tend to do a lot of my work from the command line, and for 
> years, used TCC LE as a result.  However, the newest version of Visual 
> Studio has kind of broken TCC LE for me, so in searching for a replacement, 
> I came across Cmder[1], and kind of fell in love. :)
>
> As you can imagine, most of the customization efforts on projects like this 
> are for git.  The Mercurial support that Cmder has is horribly inefficient, 
> with the prompt code that displays the branch and working copy status taking 
> between 5 and 10 seconds to complete on my working copy each time the prompt 
> was being generated.  They were using some horrendous command that generated 
> tons of output just to determine if there were changes pending in the 
> working copy.  I refactored that code to use "id" instead, since it provides 
> both the branch name and an indication of the status of pending changes 
> within the working copy.  (I have issued a pull request on the main project 
> for these changes.)
>
> This reduced the huge time overhead down to about 1 second on my 
> repository.  However, when I went into a git working copy, and the info 
> response for its branch and status within Cmder was just about instant.  
> Certainly, 1 second is better than 10, but even that 1 second gets a bit 
> tiresome when you have many things to do from the command line.
>
> So, my question here has to do with the internals of the .hg folder.  I see 
> that there is a "branch" file inside it that contains the value of the 
> active branch in the working copy, so reading that directly (instead of 
> trying to parse it from "hg id") would be very fast.  However, is there some 
> corresponding indicator within the .hg folder that holds the state of the 
> working copy with regard to pending changes (i.e., if it's "dirty" or 
> "clean")?  Reading something like that directly would hopefully also be 
> /much/faster than waiting for the output of the "id" command.
>
> Thanks in advance for any insights, or alternative approaches.
>
> (btw, I'm using the hg.exe that comes with TortoiseHG, and it is v4.6.1)
>
>
> [1] https://github.com/cmderdev/cmder
>
>
>
> _______________________________________________ Mercurial mailing list 
> Mercurial at mercurial-scm.org 
> https://www.mercurial-scm.org/mailman/listinfo/mercurial

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20180710/701ff0b0/attachment-0002.html>


More information about the Mercurial mailing list