Bug 104502

Summary: I/O errors during checkheader() cause hang
Product: poppler Reporter: Ben Timby <btimby>
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED FIXED QA Contact:
Severity: minor    
Priority: medium CC: evangelos, jwilk
Version: unspecified   
Hardware: All   
OS: Linux (All)   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=105674
Whiteboard:
i915 platform: i915 features:
Attachments: Check return code of getChar(), abort reading on error.

Description Ben Timby 2018-01-05 14:57:41 UTC
Created attachment 136569 [details] [review]
Check return code of getChar(), abort reading on error.

Hi, I ran into an issue when using pdftotext on files stored on a CIFS mount. A problem with the CIFS server was causing EIO to be returned for read() calls. It takes about 1 minute for EIO to be returned, blocking each read() for about 1 minute before it fails.

This caused pdftotext to run for around 16 hours before finally failing. I tracked this down to PDFDoc::checkHeader(), which attempts to read 1024 chars using FileStream::getChar() into a buffer. It uses this buffer for file type detection.

The problem is that it does not check the return code of getChar(). getChar() returns EOF in response to the EIO, but that EOF is just placed into the buffer, and another read() is attempted (1024 times) making the process block in uninterruptible sleep for 16 hours or so.

The fix is to check this return code and stop reading. I am attaching a patch.
Comment 1 Albert Astals Cid 2018-01-07 12:08:20 UTC
Patch looks reasonable, after all we're not doing really much in that function.

But it only makes failing faster right? I mean you can't really see the file either because your CIFS server is broken, no?
Comment 2 Ben Timby 2018-01-07 21:00:02 UTC
Right, this just avoids 1023 additional reads that will fail. After this function returns the next read fails and pdftotext exits.
Comment 3 Albert Astals Cid 2018-01-08 22:43:05 UTC
Pushed, thanks :)
Comment 4 Jakub Wilk 2018-03-07 12:55:45 UTC
For very small PDF files, EOF will be reached when reading the header.
This patch causes spurious warnings when reading such (valid) files.
Comment 5 Albert Astals Cid 2018-03-09 21:53:06 UTC
So open a new bug and attach a patch?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.