Created attachment 13775 [details] [review] Patch to poppler-0.6.3 to check fseek() and similar calls pdftotext fails uncleanly if its input is not seekable. For example cat some.pdf | pdftotext /dev/stdin out This produces errors such as Error: PDF file is damaged - attempting to reconstruct xref table... Error: Couldn't find trailer dictionary Error: Couldn't read xref table But the PDF file is not damaged - just the file is not seekable. pdftotext doesn't check that its fseek() calls succeed. In general I suggest it should check the result of all file I/O calls and do something half-sensible on failure. (Instead of plain fseek() the code uses large-file-capable seek functions on platforms that have them - those too need to be checked.) (If you need a PDF file that reproduces this bug, you can create one with 'echo | enscript -p - | ps2pdf - some.pdf'. But I think it happens with pretty much any input file.) I attach a patch which simplemindedly solves the problem by checking all fseek() calls and throwing an exception on failure, which produces an 'Illegal seek' message on stderr. This is a bit better than ignoring the failure. You might not want to apply this particular patch because it depends on C++ exception support, but something similar will be needed to check fseek() and related calls for success or failure.
Hi, I'd be interested to know what you think of this bug report and patch - firstly whether you agree that it's a bug that the return value of fseek() is being ignored, and secondly whether this patch is an acceptable fix.
Do you read minds or something? Just yesterday i was having a look at this bug report and really i don't know what to do. Using exceptions is a no go, as the whole project does not uses exceptions. So one has to resort to other solutions. For FoFiBase::readFile and GfxFont::readExtFontFile it is easy because they return a * that can be NULL in case of error, but all the Stream functions return void, changing them to bool is not acceptable since that would mean checking for errors in hundreds of places where that's not checked currently. So the solution i liked most was something like return NULL in FoFiBase::readFile and GfxFont::readExtFontFile along with an error() call when we can not seek, and for all the others i think one should try to do a setPos(1), getPos in PDFDoc::setup and if getPos returns != 1 we just assume the stream is not seekable and return a false in GBool. Could you try coding such a patch?
>but all the Stream functions >return void, changing them to bool is not acceptable since that would mean >checking for errors in hundreds of places where that's not checked currently. I feel that you just have to knuckle down and put in all the error checks, tedious though it may be... of course, this is the reason why exceptions were added to the language. However, your compromise may be enough: >So the solution i liked most was something like return NULL in >FoFiBase::readFile and GfxFont::readExtFontFile along with an error() call when >we can not seek, and for all the others i think one should try to do a >setPos(1), getPos in PDFDoc::setup and if getPos returns != 1 we just assume >the stream is not seekable and return a false in GBool. This would take care of the /dev/stdin case. It wouldn't work correctly when fseek() really does fail part way through, as might happen on network filesystems. But you might ignore such cases - most Unix software is so buggy anyway with checking the success of file I/O that everyone mounts NFS as 'hard' mounts and failing operations just hang rather than reporting error codes. >Could you try coding such a patch? Certainly. Do you require copyright assignment?
Created attachment 15382 [details] [review] Patch to check some fseek() calls, and check file is seekable on initialization Hi, here's a patch that does roughly what you suggested.
Hi, i used the first part of your patch, for the second i went a different way you can see at http://cgit.freedesktop.org/poppler/poppler/commit/?id=066595dd06c930997d5ec65a06c822616af9baa0 So that will be fixed next week when we release 0.8, hope that's good enough for you. And no, we don't requiere copyright assignment.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.