A few questions about dirstate / 00changelog.i (revlogng)
Wujek Srujek
wujek.srujek at googlemail.com
Sun Apr 8 22:50:23 UTC 2012
Hi. I am trying to play around with these files in mercurial. It is going
pretty well, but I have a few questions to which I cannot find an easy
answer, and before I attempt to reverse engineer the files, I would like to
try the easier way of asking ;d
Dirstate:
1. What encoding do the filenames use? I'm on a linux with UTF-8, created
and added some file with polish characters, read the bytes that constitute
the file name and (somewhat surprisingly) the bytes object just printed out
correctly - does this mean these names use UTF-8, and just by chance the
bytes were printed out nice as my local encoding is UTF-8 as well?
00changelog.i (revlogng)
2. What does 'offset' mean in this file? it seems to be just a summing up
of the previous changeset's offset plus uncompressed length, and it makes a
lot of sense for the separate 00changelog.d file, but I don't really
understand how it fits the inline scheme - it definitely doesn't mean an
offset in the file, as it doesn't take the 64 bytes into account. why does
the inline revlogng need this piece of information anyways (again, I
perfecly understand the need when the index and data files are separate)?
3. http://mercurial.selenic.com/wiki/RevlogNG says at the very bottom:
"RevlogNG also supports interleaving of index and data. This can greatly
reduce storage overhead for smaller revlogs. In this format, the data chunk
immediately follows its index entry. The position of the next index entry
is calculated by adding the compressed length to the offset."
The last sentence seems somewhat ambiguous, as the position of the next
index entry is calculated by adding the compressed length to the offset +
64 for the index entry, right? Unless 'offset' is not the offset field from
the entry, but rather the offset in the file that one is when starting to
read the data, but this is even more confusing. Maybe cleaning up this part
would be a good idea (unless I am completely mistaken, ofc ;d).
4. What does the 'flags' field mean? Currently (in the inline changelog) I
see just zeros there.
5. When I read the data, how do I uncompress it? Is it gzip, bz or
something proprietary (doubt that)?
6. I have the following changelog (output of my changelog reader):
rev offset flags clen ulen base par_a par_b nodeid
0 0 0 476 2049 0 -1 -1
f6cd931021b6e33eab7bec2b5d5f3676cd1d9a2c
1 476 0 237 292 1 0 -1
fbbde301c402fb6d6b19008e38ecd2263be0eac6
2 713 0 289 395 1 1 -1
eb71a711801ede89d6c27f7b86d4899c12f3912b
I don't fully understand what the base means yet, could anybody fill in the
gaps?
rev 0: base is 0, so it means the data is full in this very revision
rev 1: base is 1, so again, the data is full in this very revision - why is
ulen (uncompressed length) so small (292 bytes) compared to 2049 (strange,
as the revlog files are append only by default)? it can't possibly be the
full data, so I am understanding it wrong
7. what is this 'delta'? what does it look like?
8. the 'link revision' - this just increments for each index entry - is
this the same as the 'local revision numbers'? I assumed so, and so in my
output it is the 'rev' column, but could anybody assert this claim?
Are such questions wrong in this mailing list? Should I try the
'developers' list? (I guess not, as these are very basic questions, but
feel free to direct me there).
Regards,
wujek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20120409/eea53829/attachment-0002.html>
More information about the Mercurial
mailing list