win32text and excluding patterns

Mark Hammond skippy.hammond at gmail.com
Tue Apr 14 00:49:02 UTC 2009


On 14/04/2009 8:43 AM, Mads Kiilerich wrote:
> Mark Hammond wrote, On 04/10/2009 06:16 AM:
>> Hi all,
>> I'm trying to get the win32text extension to ignore certain patterns
>> and I'm having trouble. What I want is something like the following:
>>
>> [encode]
>> **.dsp = !
>> **.dsw = !
>> ** = cleverencode:
>>
>> IOW, all files _except_ *.dsp and *.dsw should use clever encoding.
>> I've dug around the mailing lists and the sources and it seems some
>> attempt is indeed made to handle '!' - and it seems to have been
>> introduced with a similar motivation:
>>
>> http://markmail.org/message/g55ev2ka7yseaept
>>
>> But best I can tell, it's not working in my case as the '**' still
>> matches. IOW, using '**=!' is useful - it temporarily disables all
>> encodings - but it's not useful for any other extension when a '**' is
>> in force. Am I misunderstanding something?
>>
>> So I thought maybe a way forward was to define new "pass-through"
>> encoders called, eg, 'exact' - they just return exactly what was passed
>> to them. Then I could do something like:
>>
>> [encode]
>> **.dsp = exact:
>> **.dsw = exact:
>> ** = cleverencode:
>>
>> But this falls over for a similar reason; ordering in the sections is
>> not maintained, so the '**' may still match first.
>>
>> An easy solution that avoids trying to capture "full" ordering might be
>> to have the code classify the filters into 3 categories - those without
>> a wildcard, those with a wildcard, and '**'/'*', and ensure the filters
>> are applied in that order.
>>
>> The handling of '!' still seems suspect to me though - it acts more like
>> "pretend this filter line doesn't exist" than the expected "record that
>> this extension explicitly wants no filtering". Am I missing the intent
>> of '!' (and therefore my idea of a new 'exact' encoding makes sense), or
>> is the implementation of '!' suboptimal, meaning I could implement my
>> requirements by changing the handling of '!'?
>>
>> I'm happy to make a patch for this, but thought I'd check here first
>> that I'm not missing anything obvious and what the best way forward is.
>
> I think it "works as designed": The "!" notation is only for disabling
> all filtering for _a_specific_ pattern

Thanks for the reply.

My point is that it does *not* disable all filtering for a specific 
pattern.  If I have the configuration:

  [encode]
  **.dsp=!
  ** = cleverencode:

filtering is *not* disabled for the pattern '**.dsp' - it gets clever 
encoding.

All '!' does is disable that specific rule - almost identical to 
commenting out the line (but I understand its not identical to 
commenting due to the merging of different config files.)  So what it 
actually does is 'allows you to disable a previously configured rule for 
a specific pattern'.

I'm not trying to nitpick, but 'disabling a previously defined rule' is 
quite a bit different to the user than 'disabling filtering for a 
pattern' - I'm after a way of disabling *all* filtering for a specific 
pattern.

> only be one filter specification for each pattern. I think ordering _is_
> preserved for the filters, but while an early "!" disables that pattern
> it doesn't stop the filter engine from continuing with the next filter
> and match and apply that.

Yes, I understand that - it disables a single rule.

Also, I've re-confirmed that ordering is *not* preserved.  A simple 
'print' statement, eg:

--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -539,6 +539,7 @@
          if filter not in self.filterpats:
              l = []
              for pat, cmd in self.ui.configitems(filter):
+                print "CHECK", pat, cmd
                  if cmd == '!':
                      continue
                  mf = util.matcher(self.root, "", [pat], [], [])[1]

Shows that the order of patterns listed in the 'print' statement has no 
bearing on the order in the INI file.  If you look at the impl of the 
ConfigParser, you will note a dictionary is used, which is why the 
ordering is lost.

Further, if you check the rest of the impl of _filter() in localrepo.py, 
it uses a dictionary to remember the filters it has loaded, so even if 
the config kept the order for us, _filter()'s current impl would lose it.

> What you ask for could be a nice feature, but it isn't obvious to me how
> it would fit what currently is implemented. And note that a patch must
> preserve the current behavior.

The current behaviour as described or as implemented <wink>?  I can see 
a use case for '!' meaning 'disable all filtering for this pattern', but 
I *can't* see a use-case where people would want the INI file like I 
posted above to use clever encoding on the .dsp file.  Am I missing 
something?

> Take a look at mercurial.localrepo.localrepository._filter
>
> Hmm. Perhaps you can do something close to what you ask for with
> something like:
>
> [encode]
> ** = cleverencode:
> **.dsp = cleverdecode:
> **.dsw = cleverdecode:
>
> But the simplest solution would perhaps be to create an extension with
> your own custom "smarterencode" which does exactly what you want.

Yes - that is exactly where I started the email from (but I called it 
'exact' instead of 'smarterencode'.)  But as I noted above, I'm stuck 
here with the fact ordering is lost, so the '**' may get used before the 
more specific pattern.

I think I must be missing something though, as the current behaviour 
doesn't really seem useful for people using Windows all day.  My 
situation is:

* I've many repos which use exclusively text; **=cleverencode seems a 
perfectly good fit for me, as the docs suggest.  I want this setting 
'globally', not per-repo, so I don't forget to configure a repo and 
accidentally create mixed line endings etc in repos I use day-to-day.

* I've a handful of repos with a few files with windows line endings in 
the repo - the .dsp and .dsw are the obvious ones, but I also note 
mozilla has a fair number too of .html, .js etc, so entire directory 
trees should be excluded there.

Every single time I work with one of the second class of repos, I get 
loads of warnings about the mixed line endings, and 'hg diff' shows the 
files as being changed.

All attempts to avoid this have been fruitless.  The only solution seems 
to *not* use **=cleverencode, but as mentioned, I believe there are good 
reasons to ensure hg does the right thing *by default* and not rely on 
me remembering some process each time I clone/create a repo.

What do others do here?

Cheers,

Mark



More information about the Mercurial-devel mailing list