Created attachment 23324 [details] [review] The cropping patch for pdftotext, based on version 0.8.7, works on current version as well. pdftotext has problems properly extracting text from multi-column PDFs. To solve this problem, I added support for the -x, -y, -W, -H, and -r options to pdftotext, taking the corresponding code sections from pdftoppm. This way, I can crop out parts of a page and process columns separately. I would greatly appreciate it if this minor change would become available to all users. If you need me to reformat or improve the patch in any way, feel free to contact me. I made the patch based on version 0.8.7, which I know is not the current one. But the patch succeeds error-free with only a few lines of offset in the current version as well. Thanks for considering the patch!
Any specific reason you need the resolution parameter?
(In reply to comment #1) > Any specific reason you need the resolution parameter? > None if you stick to pt (1/72 in) as a unit for -x, -y etc. But, if you want to specify -x, -y, et.al. in a different scale, the resolution parameter helps. I use pdftoppm to find the right area, then transfer the same options (including resolution) over to the patched pdftotext, and voila: the text appears, without me needing to recalculate offsets.
Patch will be released as part of poppler 0.12
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.