[PATCH 1 of 6] util: add iterfile to workaround a fileobj.__iter__ issue with EINTR

Jun Wu quark at fb.com
Mon Nov 14 23:35:53 UTC 2016


# HG changeset patch
# User Jun Wu <quark at fb.com>
# Date 1479166374 0
#      Mon Nov 14 23:32:54 2016 +0000
# Node ID 7e3bb7754d338399dee3ee41770b7d6624b81fc1
# Parent  038547a14d850f14ecd2671852093dc07848a134
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r 7e3bb7754d33
util: add iterfile to workaround a fileobj.__iter__ issue with EINTR

The fileobj.__iter__ implementation in Python 2.7.12 (hg changeset
45d4cea97b04) is buggy: it cannot handle EINTR correctly.

In Objects/fileobject.c:

    size_t Py_UniversalNewlineFread(....) {
        ....
        if (!f->f_univ_newline)
            return fread(buf, 1, n, stream);
        ....
    }

According to the "fread" man page:

    If an error occurs, or the end of the file is reached, the return value
    is a short item count (or zero).

Therefore it's possible for "fread" (and "Py_UniversalNewlineFread") to
return a positive value while errno is set to EINTR and ferror(stream)
changes from zero to non-zero.

There are multiple "Py_UniversalNewlineFread": "file_read", "file_readinto",
"file_readlines", "readahead". While the first 3 have code to handle the
EINTR case, the last one "readahead" doesn't:

    static int readahead(PyFileObject *f, Py_ssize_t bufsize) {
        ....
        chunksize = Py_UniversalNewlineFread(
            f->f_buf, bufsize, f->f_fp, (PyObject *)f);
        ....
        if (chunksize == 0) {
            if (ferror(f->f_fp)) {
                PyErr_SetFromErrno(PyExc_IOError);
                ....
            }
        }
        ....
    }

It means "readahead" could ignore EINTR, if "Py_UniversalNewlineFread"
returns a non-zero value. And at the next time "readahead" got executed, if
"Py_UniversalNewlineFread" returns 0, "readahead" would raise a Python error
without a incorrect errno - could be 0 - thus "IOError: [Errno 0] Error".

The only user of "readahead" is "readahead_get_line_skip".
The only user of "readahead_get_line_skip" is "file_iternext", aka.
"fileobj.__iter__", which should be avoided.

There are multiple places where the pattern "for x in fp" is used. This
patch adds a "iterfile" method in "util.py" so we can migrate our code from
"for x in fp" to "fox x in util.iterfile(fp)".

diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -2191,4 +2191,9 @@ def wrap(line, width, initindent='', han
     return wrapper.fill(line).encode(encoding.encoding)
 
+def iterfile(fp):
+    """like fp.__iter__ but does not have issues with EINTR. Python 2.7.12 is
+    known to have such issues."""
+    return iter(fp.readline, '')
+
 def iterlines(iterator):
     for chunk in iterator:


More information about the Mercurial-devel mailing list