Bug 94504 - pdftotext and pdftohtml fails to extract columns
Summary: pdftotext and pdftohtml fails to extract columns
Status: RESOLVED INVALID
Alias: None
Product: poppler
Classification: Unclassified
Component: utils (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-11 22:13 UTC by John Damm Sørensen
Modified: 2016-03-12 10:37 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
PDF-filewith columns that is not processed corerctly (7.87 MB, text/plain)
2016-03-11 22:13 UTC, John Damm Sørensen
Details

Description John Damm Sørensen 2016-03-11 22:13:29 UTC
Created attachment 122241 [details]
PDF-filewith columns that is not processed corerctly

pdftotext and pdftohtml fail to correctly process certain PDF pages with three columns.
For the attached PDF-file the error occurs on page 5 where the rendered text is not in correct order.

Rendered text (XXXX represents social security numbers in the file. The rendered text is correctly with 4 digits):
S08032016-17
Alle og enhver, der har noget til gode
i nedennævnte bo, indkaldes herved
til at anmelde og dokumentere deres
krav inden 8 uger

S08032016-21
Alle og enhver, der har noget til gode
i nedennævnte bo, indkaldes herved
til at anmelde og dokumentere deres
krav inden 8 uger

S08032016-26
Alle og enhver, der har noget til gode
i nedennævnte bo, indkaldes herved
til at anmelde og dokumentere deres
krav inden 8 uger

Afdøde
Cpr.nr. 190521XXXX
Dødsdato 11.02.2016
Frede Jensen
Hyldevej 12
9300 Sæby

Afdøde
Cpr.nr. 150733XXXX
Dødsdato 04.01.2016
Inger Kathrine Simonsen
Gl. Tingvej 40F, 1 th.
9600 Aars

Afdøde
Cpr.nr. 300121XXXX
Dødsdato 26.01.2016
Anna Hartlev
Gulkrog 16, st
7100 Vejle
Comment 1 John Damm Sørensen 2016-03-11 22:15:37 UTC
Forgot to mention the the version of poppler is poppler-0.41.0 compiled on Fedora 22.
Comment 2 John Damm Sørensen 2016-03-12 10:37:50 UTC
the option -raw seems to work.
Sorry John


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.