D11095: dirstate-v2: Reuse existing nodes when appending to a data file
SimonSapin
phabricator at mercurial-scm.org
Thu Jul 15 15:28:36 UTC 2021
SimonSapin created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.
REVISION SUMMARY
When writing a dirstate in v2 format by appending to an existing data file,
nodes that are still "unparsed" from the previous on-disk representation
have been unchanged and can therefore be reused.
This makes the new data point to previously-existing data for tree nodes.
REPOSITORY
rHG Mercurial
BRANCH
default
REVISION DETAIL
https://phab.mercurial-scm.org/D11095
AFFECTED FILES
rust/hg-core/src/dirstate_tree/on_disk.rs
CHANGE DETAILS
diff --git a/rust/hg-core/src/dirstate_tree/on_disk.rs b/rust/hg-core/src/dirstate_tree/on_disk.rs
--- a/rust/hg-core/src/dirstate_tree/on_disk.rs
+++ b/rust/hg-core/src/dirstate_tree/on_disk.rs
@@ -590,8 +590,17 @@
&mut self,
nodes: dirstate_map::ChildNodesRef,
) -> Result<ChildNodes, DirstateError> {
- // `dirstate_map::ChildNodes` is a `HashMap` with undefined iteration
- // order. Sort to enable binary search in the written file.
+ if self.append {
+ if let dirstate_map::ChildNodesRef::OnDisk(nodes_slice) = nodes {
+ let start = self.offset_of(nodes_slice);
+ let len = child_nodes_len_from_usize(nodes_slice.len());
+ return Ok(ChildNodes { start, len });
+ }
+ }
+
+ // `dirstate_map::ChildNodes::InMemory` contains a `HashMap` which has
+ // undefined iteration order. Sort to enable binary search in the
+ // written file.
let nodes = nodes.sorted();
let nodes_len = nodes.len();
@@ -664,32 +673,65 @@
// ⦠so we can write them contiguously, after writing everything else
// they refer to.
let start = self.curent_offset();
- let len = u32::try_from(nodes_len)
- // Could only panic with over 4 billion nodes
- .expect("dirstate-v2 path length overflow")
- .into();
+ let len = child_nodes_len_from_usize(nodes_len);
self.out.extend(on_disk_nodes.as_bytes());
Ok(ChildNodes { start, len })
}
+ /// Takes a slice of items within `on_disk` and returns its offset for the
+ /// start of `on_disk`.
+ ///
+ /// Panics if the given slice is not within `on_disk`.
+ fn offset_of<T>(&self, slice: &[T]) -> Offset
+ where
+ T: BytesCast,
+ {
+ fn address_range(slice: &[u8]) -> std::ops::RangeInclusive<usize> {
+ let start = slice.as_ptr() as usize;
+ let end = start + slice.len();
+ start..=end
+ }
+ let slice_addresses = address_range(slice.as_bytes());
+ let on_disk_addresses = address_range(self.dirstate_map.on_disk);
+ assert!(on_disk_addresses.contains(slice_addresses.start()));
+ assert!(on_disk_addresses.contains(slice_addresses.end()));
+ let offset = slice_addresses.start() - on_disk_addresses.start();
+ offset_from_usize(offset)
+ }
+
fn curent_offset(&mut self) -> Offset {
let mut offset = self.out.len();
if self.append {
offset += self.dirstate_map.on_disk.len()
}
- u32::try_from(offset)
- // Could only panic for a dirstate file larger than 4 GiB
- .expect("dirstate-v2 offset overflow")
- .into()
+ offset_from_usize(offset)
}
fn write_path(&mut self, slice: &[u8]) -> PathSlice {
let start = self.curent_offset();
- let len = u16::try_from(slice.len())
- // Could only panic for paths over 64 KiB
- .expect("dirstate-v2 path length overflow")
- .into();
+ let len = path_len_from_usize(slice.len());
self.out.extend(slice.as_bytes());
PathSlice { start, len }
}
}
+
+fn offset_from_usize(x: usize) -> Offset {
+ u32::try_from(x)
+ // Could only panic for a dirstate file larger than 4 GiB
+ .expect("dirstate-v2 offset overflow")
+ .into()
+}
+
+fn child_nodes_len_from_usize(x: usize) -> Size {
+ u32::try_from(x)
+ // Could only panic with over 4 billion nodes
+ .expect("dirstate-v2 slice length overflow")
+ .into()
+}
+
+fn path_len_from_usize(x: usize) -> PathSize {
+ u16::try_from(x)
+ // Could only panic for paths over 64 KiB
+ .expect("dirstate-v2 path length overflow")
+ .into()
+}
To: SimonSapin, #hg-reviewers
Cc: mercurial-patches, mercurial-devel
More information about the Mercurial-devel
mailing list