Bug 20331 - Patch for pdftotext to accept cropping options like pdftoppm
Summary: Patch for pdftotext to accept cropping options like pdftoppm
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: All All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-26 08:09 UTC by Jan Jockusch
Modified: 2009-03-08 05:45 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
The cropping patch for pdftotext, based on version 0.8.7, works on current version as well. (2.16 KB, patch)
2009-02-26 08:09 UTC, Jan Jockusch
Details | Splinter Review

Description Jan Jockusch 2009-02-26 08:09:58 UTC
Created attachment 23324 [details] [review]
The cropping patch for pdftotext, based on version 0.8.7, works on current version as well.

pdftotext has problems properly extracting text from multi-column PDFs.

To solve this problem, I added support for the -x, -y, -W, -H, and -r options to pdftotext, taking the corresponding code sections from pdftoppm.

This way, I can crop out parts of a page and process columns separately.

I would greatly appreciate it if this minor change would become available to all users.

If you need me to reformat or improve the patch in any way, feel free to contact me.

I made the patch based on version 0.8.7, which I know is not the current one. But the patch succeeds error-free with only a few lines of offset in the current version as well.

Thanks for considering the patch!
Comment 1 Albert Astals Cid 2009-03-03 15:31:34 UTC
Any specific reason you need the resolution parameter?
Comment 2 Jan Jockusch 2009-03-04 08:00:48 UTC
(In reply to comment #1)
> Any specific reason you need the resolution parameter?
> 

None if you stick to pt (1/72 in) as a unit for -x, -y etc.

But, if you want to specify -x, -y, et.al. in a different scale, the resolution parameter helps. I use pdftoppm to find the right area, then transfer the same options (including resolution) over to the patched pdftotext, and voila: the text appears, without me needing to recalculate offsets.
Comment 3 Albert Astals Cid 2009-03-08 05:45:19 UTC
Patch will be released as part of poppler 0.12


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.