[Updated] [++-- ] D11625: dirstate-v2: Document flags/mode/size/mtime fields of tree nodes
SimonSapin
phabricator at mercurial-scm.org
Tue Oct 12 17:00:35 UTC 2021
SimonSapin updated this revision to Diff 30716.
REPOSITORY
rHG Mercurial
CHANGES SINCE LAST UPDATE
https://phab.mercurial-scm.org/D11625?vs=30712&id=30716
BRANCH
default
CHANGES SINCE LAST ACTION
https://phab.mercurial-scm.org/D11625/new/
REVISION DETAIL
https://phab.mercurial-scm.org/D11625
AFFECTED FILES
mercurial/helptext/internals/dirstate-v2.txt
mercurial/pure/parsers.py
rust/hg-core/src/dirstate_tree/on_disk.rs
CHANGE DETAILS
diff --git a/rust/hg-core/src/dirstate_tree/on_disk.rs b/rust/hg-core/src/dirstate_tree/on_disk.rs
--- a/rust/hg-core/src/dirstate_tree/on_disk.rs
+++ b/rust/hg-core/src/dirstate_tree/on_disk.rs
@@ -64,44 +64,24 @@
uuid: &'on_disk [u8],
}
+/// Fields are documented in the *Tree metadata in the docket file*
+/// section of `mercurial/helptext/internals/dirstate-v2.txt`
#[derive(BytesCast)]
#[repr(C)]
struct TreeMetadata {
root_nodes: ChildNodes,
nodes_with_entry_count: Size,
nodes_with_copy_source_count: Size,
-
- /// How many bytes of this data file are not used anymore
unreachable_bytes: Size,
-
- /// Current version always sets these bytes to zero when creating or
- /// updating a dirstate. Future versions could assign some bits to signal
- /// for example "the version that last wrote/updated this dirstate did so
- /// in such and such way that can be relied on by versions that know to."
unused: [u8; 4],
- /// If non-zero, a hash of ignore files that were used for some previous
- /// run of the `status` algorithm.
- ///
- /// We define:
- ///
- /// * "Root" ignore files are `.hgignore` at the root of the repository if
- /// it exists, and files from `ui.ignore.*` config. This set of files is
- /// then sorted by the string representation of their path.
- /// * The "expanded contents" of an ignore files is the byte string made
- /// by concatenating its contents with the "expanded contents" of other
- /// files included with `include:` or `subinclude:` files, in inclusion
- /// order. This definition is recursive, as included files can
- /// themselves include more files.
- ///
- /// This hash is defined as the SHA-1 of the concatenation (in sorted
- /// order) of the "expanded contents" of each "root" ignore file.
- /// (Note that computing this does not require actually concatenating byte
- /// strings into contiguous memory, instead SHA-1 hashing can be done
- /// incrementally.)
+ /// See *Optional hash of ignore patterns* section of
+ /// `mercurial/helptext/internals/dirstate-v2.txt`
ignore_patterns_hash: IgnorePatternsHash,
}
+/// Fields are documented in the *The data file format*
+/// section of `mercurial/helptext/internals/dirstate-v2.txt`
#[derive(BytesCast)]
#[repr(C)]
pub(super) struct Node {
@@ -114,45 +94,6 @@
children: ChildNodes,
pub(super) descendants_with_entry_count: Size,
pub(super) tracked_descendants_count: Size,
-
- /// Depending on the bits in `flags`:
- ///
- /// * If any of `WDIR_TRACKED`, `P1_TRACKED`, or `P2_INFO` are set, the
- /// node has an entry.
- ///
- /// - If `HAS_MODE_AND_SIZE` is set, `data.mode` and `data.size` are
- /// meaningful. Otherwise they are set to zero
- /// - If `HAS_MTIME` is set, `data.mtime` is meaningful. Otherwise it is
- /// set to zero.
- ///
- /// * If none of `WDIR_TRACKED`, `P1_TRACKED`, `P2_INFO`, or `HAS_MTIME`
- /// are set, the node does not have an entry and `data` is set to all
- /// zeros.
- ///
- /// * If none of `WDIR_TRACKED`, `P1_TRACKED`, `P2_INFO` are set, but
- /// `HAS_MTIME` is set, the bytes of `data` should instead be
- /// interpreted as the `Timestamp` for the mtime of a cached directory.
- ///
- /// The presence of this combination of flags means that at some point,
- /// this path in the working directory was observed:
- ///
- /// - To be a directory
- /// - With the modification time as given by `Timestamp`
- /// - That timestamp was already strictly in the past when observed,
- /// meaning that later changes cannot happen in the same clock tick
- /// and must cause a different modification time (unless the system
- /// clock jumps back and we get unlucky, which is not impossible but
- /// but deemed unlikely enough).
- /// - All direct children of this directory (as returned by
- /// `std::fs::read_dir`) either have a corresponding dirstate node, or
- /// are ignored by ignore patterns whose hash is in
- /// `TreeMetadata::ignore_patterns_hash`.
- ///
- /// This means that if `std::fs::symlink_metadata` later reports the
- /// same modification time and ignored patterns haven’t changed, a run
- /// of status that is not listing ignored files can skip calling
- /// `std::fs::read_dir` again for this directory, iterate child
- /// dirstate nodes instead.
flags: Flags,
data: Entry,
}
diff --git a/mercurial/pure/parsers.py b/mercurial/pure/parsers.py
--- a/mercurial/pure/parsers.py
+++ b/mercurial/pure/parsers.py
@@ -55,7 +55,7 @@
- p1_tracked: is the file tracked in working copy first parent
- p2_info: the file has been involved in some merge operation. Either
because it was actually merged, or because the p2 version was
- ahead, or because some renamed moved it there. In either case
+ ahead, or because some rename moved it there. In either case
`hg status` will want it displayed as modified.
# about the file state expected from p1 manifest:
diff --git a/mercurial/helptext/internals/dirstate-v2.txt b/mercurial/helptext/internals/dirstate-v2.txt
--- a/mercurial/helptext/internals/dirstate-v2.txt
+++ b/mercurial/helptext/internals/dirstate-v2.txt
@@ -371,6 +371,114 @@
(For example, `hg rm` makes a file untracked.)
This counter is used to implement `has_tracked_dir`.
-* Offset 30 and more:
- **TODO:** docs not written yet
- as this part of the format might be changing soon.
+* Offset 30:
+ Some boolean values packed as bits of a single byte.
+ Starting from least-significant, bit masks are::
+
+ WDIR_TRACKED = 1 << 0
+ P1_TRACKED = 1 << 1
+ P2_INFO = 1 << 2
+ HAS_MODE_AND_SIZE = 1 << 3
+ HAS_MTIME = 1 << 4
+
+ Other bits are unset. The meaning of these bits are:
+
+ `WDIR_TRACKED`
+ Set if the working directory contains a tracked file at this node’s path.
+ This is typically set and unset by `hg add` and `hg rm`.
+
+ `P1_TRACKED`
+ set if the working directory’s first parent changeset
+ (whose node identifier is found in tree metadata)
+ contains a tracked file at this node’s path.
+ This is a cache to reduce manifest lookups.
+
+ `P2_INFO`
+ Set if the file has been involved in some merge operation.
+ Either because it was actually merged,
+ or because the version in the second parent p2 version was ahead,
+ or because some rename moved it there.
+ In either case `hg status` will want it displayed as modified.
+
+ Files that would be mentioned at all in the `dirstate-v1` file format
+ have a node with at least one of the above three bits set in `dirstate-v2`.
+ Let’s call these files "tracked anywhere",
+ and "untracked" the nodes with all three of these bits unset.
+ Untracked nodes are typically for directories:
+ they hold child nodes and form the tree structure.
+ Additional untracked nodes may also exist.
+ Although implementations should strive to clean up nodes
+ that are entirely unused, other untracked nodes may also exist.
+ For example, a future version of Mercurial might in some cases
+ add nodes for untracked files or/and ignored files in the working directory
+ in order to optimize `hg status`
+ by enabling it to skip `readdir` in more cases.
+
+ When a node is for a file tracked anywhere,
+ the rest of the node data is three fields:
+
+ * Offset 31:
+ If `HAS_MODE_AND_SIZE` is unset, four zero bytes.
+ Otherwise, a 32-bit integer for the Unix mode (as in `stat_result.st_mode`)
+ expected for this file to be considered clean.
+ Only the `S_IXUSR` bit (owner has execute permission) is considered.
+
+ * Offset 35:
+ If `HAS_MTIME` is unset, four zero bytes.
+ Otherwise, a 32-bit integer for expected modified time of the file
+ (as in `stat_result.st_mtime`),
+ truncated to its 31 least-significant bits.
+ Unlike in dirstate-v1, negative values are not used.
+
+ * Offset 39:
+ If `HAS_MODE_AND_SIZE` is unset, four zero bytes.
+ Otherwise, a 32-bit integer for expected size of the file
+ truncated to its 31 least-significant bits.
+ Unlike in dirstate-v1, negative values are not used.
+
+ If an untracked node `HAS_MTIME` *unset*, this space is unused:
+
+ * Offset 31:
+ 12 bytes set to zero
+
+ If an untracked node `HAS_MTIME` *set*,
+ what follows is the modification time of a directory
+ represented with separated second and sub-second components
+ since the Unix epoch:
+
+ * Offset 31:
+ The number of seconds as a signed (two’s complement) 64-bit integer.
+
+ * Offset 39:
+ The number of nanoseconds as 32-bit integer.
+ Always greater than or equal to zero, and strictly less than a billion.
+ Increasing this component makes the modification time
+ go forward or backward in time dependening
+ on the sign of the integral seconds components.
+ (Note: this is buggy because there is no negative zero integer,
+ but will be changed soon.)
+
+ The presence of a directory modification time means that at some point,
+ this path in the working directory was observed:
+
+ - To be a directory
+ - With the given modification time
+ - That time was already strictly in the past when observed,
+ meaning that later changes cannot happen in the same clock tick
+ and must cause a different modification time
+ (unless the system clock jumps back and we get unlucky,
+ which is not impossible but deemed unlikely enough).
+ - All direct children of this directory
+ (as returned by `std::fs::read_dir`)
+ either have a corresponding dirstate node,
+ or are ignored by ignore patterns whose hash is in tree metadata.
+
+ This means that if `std::fs::symlink_metadata` later reports
+ the same modification time
+ and ignored patterns haven’t changed,
+ a run of status that is not listing ignored files
+ can skip calling `std::fs::read_dir` again for this directory,
+ and iterate child dirstate nodes instead.
+
+
+* (Offset 43: end of this node)
To: SimonSapin, #hg-reviewers, mharbison72
Cc: mharbison72, mercurial-patches
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial-patches/attachments/20211012/328e265b/attachment-0002.html>
More information about the Mercurial-patches
mailing list