D8076: worker: Manually buffer reads from pickle stream

yuja (Yuya Nishihara) phabricator at mercurial-scm.org
Fri Feb 7 15:52:26 UTC 2020


yuja added a comment.


  >   > Might be better to optimize the common case `wrapped.read(size) == size`.
  >   I thought my code was already pretty optimized: We allocate the buffer to read into just once (no matter how many reads we issue) and give it to the unpickler without copying the data.
  
  My gut feeling is that running Python code is generally slow compared to calling
  a C function, but yeah, that wouldn't matter here.
  
  Queued for stable since the previous attempt was in stable. Thanks.
  
  >   > FWIW, if we don't mind issuing extra `read()` syscalls, maybe we can abuse
  >   > BufferedReader of `buffer_size=1`.
  >   My gut says that using just read(1) will be way worse when we try to transmit large objects. That said, I don't know if the Unpickler tries to do such large reads.
  
  Unpickler does issue a few small read()s + one read(serialized_obj_size), so
  using unbuffered IO (like the original implementation) isn't optimal. Using
  a buffered I/O of buffer_size=1 appears to split read(serialized_obj_size)
  into read(1) + read(serialized_obj_size - 1) on Python 3.7, but that is
  an implementation detail and I don't know how it differs across Python versions.
  
  >   That would be an option, too, although `bytearray.extend` + `BytesIO.read` + `bytearray.__delitem__` looks a bit heavy on the memcpys/memmoves. We also have to restart the unpickling whenever the data was truncated. And remember when we're actually EOF.
  
  Suppose repeated `os.read()`s are way slower than memcpy, I guess this is faster,
  but I agree it's a bit more complex.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8076/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8076

To: heftig, #hg-reviewers, yuja
Cc: yuja, mjpieters, mercurial-devel


More information about the Mercurial-devel mailing list