The pdftotext man page says the following about the output file specified on the command line: If text-file is '-', the text is sent to stdout. This does not work with the -bbox option. The HTML header and footer are correctly written to stdout, but the contents of the PDF file are appended to a file actually named '-' in the current directory. If I specify "/dev/stdout" as the output file, I get the expected behaviour.
Can you please write the exact command line you are using, what is the real output and what is the expected output?
(In reply to comment #1) > Can you please write the exact command line you are using, what is the real > output and what is the expected output? Sure... This is the output without -bbox, everything works correctly: $ pdftotext test.pdf - Hello! This is a sample PDF file. Same command with -bbox added, note how the body element has no content: $ pdftotext -bbox test.pdf - <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> <meta name="Creator" content="Writer"/> <meta name="Producer" content="LibreOffice 3.4"/> <meta name="CreationDate" content=""/> </head> <body> </body> </html> Where did the content go? Into a file literally named '-': $ cat ./- <doc> <page width="612.000000" height="792.000000"> <word xMin="56.800000" yMin="57.208000" xMax="88.084000" yMax="70.492000">Hello!</word> <word xMin="56.800000" yMin="71.008000" xMax="78.064000" yMax="84.292000">This</word> <word xMin="81.184000" yMin="71.008000" xMax="89.152000" yMax="84.292000">is</word> <word xMin="92.176000" yMin="71.008000" xMax="97.492000" yMax="84.292000">a</word> <word xMin="100.480000" yMin="71.008000" xMax="134.392000" yMax="84.292000">sample</word> <word xMin="137.464000" yMin="71.008000" xMax="159.424000" yMax="84.292000">PDF</word> <word xMin="162.436000" yMin="71.008000" xMax="181.336000" yMax="84.292000">file.</word> </page> </doc> The expected output is to have the <doc>...</doc> content inside the body element that is sent to stdout, and no file named '-' generated. I can get the expected output with 'pdftotext -bbox test.pdf /dev/stdout' instead, but that is not very portable. Basically, the code that writes the header and footer has a special case to convert a filename of '-' to stdout, but the code that writes the bbox content lacks the special case, so they interpret the output filename differently. (For some reason the output file is closed and reopened by these different components, instead of being left open.)
Created attachment 83776 [details] [review] Proposed patch
Commited, thanks
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.