Summary: | pdftotext -bbox fails to write to stdout | ||
---|---|---|---|
Product: | poppler | Reporter: | awendt |
Component: | utils | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | ||
Version: | unspecified | ||
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | Proposed patch |
Description
awendt
2012-01-23 23:33:23 UTC
Can you please write the exact command line you are using, what is the real output and what is the expected output? (In reply to comment #1) > Can you please write the exact command line you are using, what is the real > output and what is the expected output? Sure... This is the output without -bbox, everything works correctly: $ pdftotext test.pdf - Hello! This is a sample PDF file. Same command with -bbox added, note how the body element has no content: $ pdftotext -bbox test.pdf - <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> <meta name="Creator" content="Writer"/> <meta name="Producer" content="LibreOffice 3.4"/> <meta name="CreationDate" content=""/> </head> <body> </body> </html> Where did the content go? Into a file literally named '-': $ cat ./- <doc> <page width="612.000000" height="792.000000"> <word xMin="56.800000" yMin="57.208000" xMax="88.084000" yMax="70.492000">Hello!</word> <word xMin="56.800000" yMin="71.008000" xMax="78.064000" yMax="84.292000">This</word> <word xMin="81.184000" yMin="71.008000" xMax="89.152000" yMax="84.292000">is</word> <word xMin="92.176000" yMin="71.008000" xMax="97.492000" yMax="84.292000">a</word> <word xMin="100.480000" yMin="71.008000" xMax="134.392000" yMax="84.292000">sample</word> <word xMin="137.464000" yMin="71.008000" xMax="159.424000" yMax="84.292000">PDF</word> <word xMin="162.436000" yMin="71.008000" xMax="181.336000" yMax="84.292000">file.</word> </page> </doc> The expected output is to have the <doc>...</doc> content inside the body element that is sent to stdout, and no file named '-' generated. I can get the expected output with 'pdftotext -bbox test.pdf /dev/stdout' instead, but that is not very portable. Basically, the code that writes the header and footer has a special case to convert a filename of '-' to stdout, but the code that writes the bbox content lacks the special case, so they interpret the output filename differently. (For some reason the output file is closed and reopened by these different components, instead of being left open.) Created attachment 83776 [details] [review] Proposed patch Commited, thanks |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.