[PATCH] who: remove OpenJDK

Josef 'Jeff' Sipek jeffpc at josefsipek.net
Sun Jul 26 15:12:25 UTC 2020


On Sun, Jul 26, 2020 at 04:11:06 +0200, Joerg Sonnenberger wrote:
> On Sat, Jul 25, 2020 at 01:36:32PM -0400, Josef 'Jeff' Sipek wrote:
> > First off, the clone itself.  I cloned it from the official upstream repos.
> > My internet connection is 150 Mbit/s, the storage is a 3-way ZFS mirror.  I
> > used hg 4.9.1 (py27), and git 2.21.0.  (I know, I need to update both.  This
> > is on a box that has a solid network connection but is harder to update.  If
> > there is interest I can spend the effort to update them and re-run it with
> > newer versions.)
> 
> It should be noted that for all intends and purposes, a git clone is
> much more comparable to hg clone --stream.

I don't know if this is a temporary error or if the java.net server
disallows it, but:

$ hg clone --stream https://hg.openjdk.java.net/jdk/jdk jdk-stream
streaming all changes
abort: locking the remote repository failed

It'd make sense for this to be a disabled by policy, because you don't want
someone doing a slow streaming pull to lock the server's repo for hours
preventing other pushes (assuming that's the same lock).


Doing the clone over the LAN (gigabit ethernet) took 1m26s total (including
the checkout):

$ hg clone --stream http://server-host:8000 test-hg
streaming all changes
187754 files to transfer, 1.07 GB of data
transferred 1.07 GB in 45.5 seconds (24.0 MB/sec)
updating to branch default
65415 files updated, 0 files merged, 0 files removed, 0 files unresolved

The client host was running at 99% CPU while receiving the data, while the
server was at around 80-90%.  So, I'm concluding that in this local case I
was CPU bound on the client, but the server wasn't exactly lightly loaded.

For comparison, git cloning (including checkout) over the same LAN took 60
seconds.  So, faster than hg streaming clone, but only by ~26 seconds.

> > Now, hg specifics.  It looks like the manifest is huge.  This corresponds to
> > how long it took to download.
> > 
> > -rw-r--r--   1 jeffpc   jeffpc     25.2M Jul 25 12:16 00changelog.d
> > -rw-r--r--   1 jeffpc   jeffpc     3.68M Jul 25 12:01 00changelog.i
> > -rw-r--r--   1 jeffpc   jeffpc      434M Jul 25 12:09 00manifest.d
> > -rw-r--r--   1 jeffpc   jeffpc     3.67M Jul 25 12:09 00manifest.i
> 
> I have similar reservations about the way manifests are handled for the
> NetBSD repository. It's been a topic of discussion recently on IRC. The
> manifest processing itself currently takes nearly half of the total
> clone time and that looks ...suspicious at best.

Indeed.  I don't have the knowledge/experience to suggest improvements, but
I can run benchmarks :)

> > I'm guessing that they would have benefited from treemanifest.
> 
> From my testing, treemanifests don't help at all.

They seemed to help with the jdk repo.  I'm guessing that jdk has a deeper
nested directories with longer file names because the conversion certainly
seemed to help (tm == treemanifest):

$ hg --config extensions.convert= convert ../jdk-hg . ../tm-map
$ cd ..
$ du -sAh */.{git,hg}
452M    jdk-git/.git 
1.11G   jdk-hg/.hg
784M    jdk-tm/.hg

Not amazing, but it is about 70% of the "monolithic" manifest repo.  The
manifest part itself:

$ ls -lh 00*
-rw-r--r--   1 jeffpc   jeffpc     25.2M Jul 25 20:46 00changelog.d
-rw-r--r--   1 jeffpc   jeffpc     3.68M Jul 25 20:47 00changelog.i
-rw-r--r--   1 jeffpc   jeffpc     4.08M Jul 25 20:46 00manifest.d
-rw-r--r--   1 jeffpc   jeffpc     3.67M Jul 25 20:47 00manifest.i

$ du -sAh meta    
89.4M   meta

So, the (treemanifest) manifest data is about 97M total vs. 437MB total with
the monolithic manifest.  This equates to 22% of the original manifest size.

...
> > I just kicked off a conversion to treemanifest.  It'll take a while.
> 
> Did you convert to generaldelta and etc already?

'hg clone' produced a reasonable repo without conversion.  The only
requirement added during the conversion was treemanifest.

$ cat jdk-hg/.hg/requires
dotencode
fncache
generaldelta
revlogv1
sparserevlog
store
$ diff jdk-{hg,tm}/.hg/requires
6a7
> treemanifest

I can try other requirements, but I think the manifest problem jdk people
saw was the huge size due to data duplication inside the manifest data -
duplication that went away by manifest subtree "dedup" between revisions.

Jeff.

-- 
UNIX is user-friendly ... it's just selective about who its friends are



More information about the Mercurial-devel mailing list