Created attachment 136569 [details] [review] Check return code of getChar(), abort reading on error. Hi, I ran into an issue when using pdftotext on files stored on a CIFS mount. A problem with the CIFS server was causing EIO to be returned for read() calls. It takes about 1 minute for EIO to be returned, blocking each read() for about 1 minute before it fails. This caused pdftotext to run for around 16 hours before finally failing. I tracked this down to PDFDoc::checkHeader(), which attempts to read 1024 chars using FileStream::getChar() into a buffer. It uses this buffer for file type detection. The problem is that it does not check the return code of getChar(). getChar() returns EOF in response to the EIO, but that EOF is just placed into the buffer, and another read() is attempted (1024 times) making the process block in uninterruptible sleep for 16 hours or so. The fix is to check this return code and stop reading. I am attaching a patch.
Patch looks reasonable, after all we're not doing really much in that function. But it only makes failing faster right? I mean you can't really see the file either because your CIFS server is broken, no?
Right, this just avoids 1023 additional reads that will fail. After this function returns the next read fails and pdftotext exits.
Pushed, thanks :)
For very small PDF files, EOF will be reached when reading the header. This patch causes spurious warnings when reading such (valid) files.
So open a new bug and attach a patch?
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.