Created attachment 114932 [details] [review] Adds -bbox-layout command to pdftotext We're looking to generate ALTO-compatible XML (http://en.wikipedia.org/wiki/ALTO_%28XML%29) from PDFs, and the current -bbox flag almost does what we need, but skips over some important data - blocks and lines. I have created some code based on 0.22.5 (in order to ensure compatibility on our CentOS 7 system) which appears to apply cleanly to the current master, and produces the same output as my 0.22.5 hack as far as I can tell. The change adds a new flag, -bbox-layout, which is still very generic output, but is sufficient for us to then transform as needed.
Can you please update the man page too? pdftotext.1
Created attachment 117147 [details] [review] Adds -bbox-layout command with man page update Adds the new flag as well as the man page change to document the flag. (This was created via git format-patch)
/home/tsdgeos/devel/poppler/utils/pdftotext.cc: In function ‘void printLine(FILE*, TextLine*)’: /home/tsdgeos/devel/poppler/utils/pdftotext.cc:512:35: warning: format not a string literal and no format arguments [-Wformat-security] fprintf(f, wordXML.str().c_str()); ^ Please fix.
Created attachment 117280 [details] [review] Adds -bbox-layout command with man page update Adds the new flag, "-bbox-layout", a man page addition, and deals with security issue by using fputs instead of fprintf when there is no format string
Pushed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.