bfiles filename encoding

Greg Ward greg-hg at gerg.ca
Mon Jun 7 13:28:37 UTC 2010


On Sat, Jun 5, 2010 at 6:29 PM, Benjamin Pollack <benjamin at bitquabit.com> wrote:
> Greg: the more I play with this, and with bfiles on Windows, the more I'm thinking that at least the push destinations should be encoded using the fncache naming strategy.

Well, I *know* that the structure of bfiles' central store will have
to change the minute someone tries to bfput a file called "aux" to a
central store running on Windows.  Or even "foo" and "Foo".  In fact
the case-sensitivity issue will almost certainly bite on OS X just as
soon as I write a test for it.  The only solution I can see is to
encode filenames on the central store, and reusing Mercurial's code
for doing that seems very desirable.

But I don't understand what you mean by "the push destinations should
be encoded".  Are you talking about wire protocol changes?  That seems
unnecessary; this is all about dealing with filesystems that are not
100% traditional Unix filesystems: HFS+ and NTFS.

> What are your feelings on changing this for HTTP store? What about for the SSH store?

Those just select a different protocol to access the same underlying
central store.  Same as the relationship between Mercurial's wire
protocols and the repo that you're talking to.  bfiles needs to fix
its relationship with the filesystem, not the network.

> (There's an argument for .hgbfiles being that way, too, due to file name length limits on Windows, but I'm happy to discuss that issue separately.)

Oh crap, I hadn't thought about that.  But is it really a problem?  I
mean, if you have

  .hgbfiles/really/long/deep/path/to/bigfile

then that represents

  really/long/deep/path/to/bigfile

which is only slightly shorter than the path in .hgbfiles.  So
mangling paths in .hgbfiles to workaround Windows brain damage only
buys, what, 10 more bytes of headroom in the path?  Not worth it,
IMHO.

> The fncache code also currently only escapes files that are located in .hg/store, which has been frustrating for me on other occasions when I've wanted to reuse the logic for other locations. (E.g., Kiln caches annotation output, which required some copy-paste coding unless we wanted to store the annotation data in .hg/store). What are the feelings on abstracting that code so that it can provide names for files in other directories?

Hmmm.  If fncache is not factored for reusability, that means either
1) don't encode filenames that way, 2) submit refactoring patches to
Mercurial and make bfiles require Mercurial 1.6, or 3) copy the code
for now and remove the copy once bfiles requires Mercurial 1.6.  Yuck.

Consider also that there are two known cases not covered by fncache encoding:

1) Windows Vista and 7 mangle leading whitespace in filenames, which
corrupts hg repos
2) bad stuff happens if you commit .DS_Store on OS X

So if we reuse fncache in bfiles, either by copying or refactoring,
then bfiles will inherit those two bugs.

Perhaps we should cook up a new filename encoding algorithm for
bfiles.  If it works, we could even propose it for core Mercurial once
people have the appetite for yet another change there.

Greg



More information about the Mercurial-devel mailing list