[Updated] D8506: rust-regex: fix issues with regex anchoring and performance

Alphare (Raphaël Gomès) phabricator at mercurial-scm.org
Wed May 13 00:24:26 UTC 2020


Closed by commit rHGeb301282bacc: rust-regex: fix issues with regex anchoring and performance (authored by Alphare).
This revision was automatically updated to reflect the committed changes.

CHANGED PRIOR TO COMMIT
  https://phab.mercurial-scm.org/D8506?vs=21324&id=21368#toc

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D8506?vs=21324&id=21368

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8506/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8506

AFFECTED FILES
  rust/hg-core/src/filepatterns.rs
  rust/hg-core/src/matchers.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/matchers.rs b/rust/hg-core/src/matchers.rs
--- a/rust/hg-core/src/matchers.rs
+++ b/rust/hg-core/src/matchers.rs
@@ -347,7 +347,9 @@
 ) -> PatternResult<impl Fn(&HgPath) -> bool + Sync> {
     use std::io::Write;
 
-    let mut escaped_bytes = vec![];
+    // The `regex` crate adds `.*` to the start and end of expressions if there
+    // are no anchors, so add the start anchor.
+    let mut escaped_bytes = vec![b'^', b'(', b'?', b':'];
     for byte in pattern {
         if *byte > 127 {
             write!(escaped_bytes, "\\x{:x}", *byte).unwrap();
@@ -355,6 +357,7 @@
             escaped_bytes.push(*byte);
         }
     }
+    escaped_bytes.push(b')');
 
     // Avoid the cost of UTF8 checking
     //
diff --git a/rust/hg-core/src/filepatterns.rs b/rust/hg-core/src/filepatterns.rs
--- a/rust/hg-core/src/filepatterns.rs
+++ b/rust/hg-core/src/filepatterns.rs
@@ -176,9 +176,7 @@
         return vec![];
     }
     match syntax {
-        // The `regex` crate adds `.*` to the start and end of expressions
-        // if there are no anchors, so add them.
-        PatternSyntax::Regexp => [b"^", &pattern[..], b"$"].concat(),
+        PatternSyntax::Regexp => pattern.to_owned(),
         PatternSyntax::RelRegexp => {
             // The `regex` crate accepts `**` while `re2` and Python's `re`
             // do not. Checking for `*` correctly triggers the same error all
@@ -196,15 +194,14 @@
         }
         PatternSyntax::RootFiles => {
             let mut res = if pattern == b"." {
-                vec![b'^']
+                vec![]
             } else {
                 // Pattern is a directory name.
-                [b"^", escape_pattern(pattern).as_slice(), b"/"].concat()
+                [escape_pattern(pattern).as_slice(), b"/"].concat()
             };
 
             // Anything after the pattern must be a non-directory.
             res.extend(b"[^/]+$");
-            res.push(b'$');
             res
         }
         PatternSyntax::RelGlob => {
@@ -216,7 +213,7 @@
             }
         }
         PatternSyntax::Glob | PatternSyntax::RootGlob => {
-            [b"^", glob_to_re(pattern).as_slice(), GLOB_SUFFIX].concat()
+            [glob_to_re(pattern).as_slice(), GLOB_SUFFIX].concat()
         }
         PatternSyntax::Include | PatternSyntax::SubInclude => unreachable!(),
     }



To: Alphare, #hg-reviewers, indygreg
Cc: mercurial-patches
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial-patches/attachments/20200513/5fab8fb9/attachment-0002.html>


More information about the Mercurial-patches mailing list