[PATCH 2 of 3] lfs: add a small language to filter files

Matt Harbison mharbison72 at gmail.com
Mon Jan 8 04:09:02 UTC 2018


On Sun, 07 Jan 2018 03:17:12 -0500, Yuya Nishihara <yuya at tcha.org> wrote:

> On Thu, 04 Jan 2018 23:58:55 -0500, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison at yahoo.com>
>> # Date 1514704880 18000
>> #      Sun Dec 31 02:21:20 2017 -0500
>> # Node ID 8c20ade835ce43441c61e56e63d9bf92deaacd55
>> # Parent  2798cb4faacdae2db46e84ba0f3beaf506848915
>> lfs: add a small language to filter files
>>
>> This diff adds a small language for that. It's self-explained, and deals
>> with both simple and complex cases. For example:
>>
>>   always                 # everything
>>   >20MB                  # larger than 20MB
>>   !.txt                  # except for .txt files
>>   .zip | .tar.gz | .7z   # some types of compressed files
>>   /bin                   # files under "bin" in the project root
>>   (.php & >2MB) | (.js & >5MB) | .tar.gz | (/bin & !/bin/README) | >1GB
>>
>> [1]  
>> https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-December/109387.html
>
> Can't we make it a subset of the fileset language so we can eventually  
> switch
> to it if O(n) issue is solved?
>
> i.e. _compile() the result of fileset.parse(), but abort if unsupported  
> element
> found.

I think the answer is yes.  I cobbled together enough in  
filterlang._compile() that this size() query worked:

-check('!!.bin | >20B | /bin | !>10 | !always',
+check('!!.bin | size(">20B") | /bin | !size(">10") | !always',
        [('a.bin', 11), ('b.txt', 21), ('bin/abc', 11)],
        [('a.notbin', 11), ('b.txt', 11), ('bin2/abc', 11)])

But I basically had to copy/paste the implementation of fileset.size() for  
its various operators, and generally had trouble because of my very  
limited understanding of the python code that backs it.  This would  
definitely be better handled by someone more familiar with the code for  
these mini languages.

If we are trying to mimic filesets, why not also mimic existing pattern  
matching?  It's overkill in some sense, but also less for a user to  
learn.  IDK what the end goal is here.



More information about the Mercurial-devel mailing list