Bug 105867 - small characters and extra spaces
Summary: small characters and extra spaces
Status: RESOLVED INVALID
Alias: None
Product: poppler
Classification: Unclassified
Component: utils (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-03 15:35 UTC by Nupur Patel
Modified: 2018-04-03 17:01 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Original PDF (3.05 MB, application/pdf)
2018-04-03 15:35 UTC, Nupur Patel
Details
Docparser Rendering View 1 (400.03 KB, image/png)
2018-04-03 15:36 UTC, Nupur Patel
Details
Docparser_rendering_view_2 (6.87 KB, image/png)
2018-04-03 15:37 UTC, Nupur Patel
Details
attachment-1149-0.html (7.21 KB, text/html)
2018-04-03 16:06 UTC, Nupur Patel
Details

Description Nupur Patel 2018-04-03 15:35:39 UTC
Created attachment 138551 [details]
Original PDF

We use Docparser to parse the PDFs and Docparser uses Popplerutil render the text from PDF as text.  You can find out their version by emailing support@docparser.com

Docparser shows extra spaces and character size changes when the document is rendered.  While Adobe, Foxit, SodaPDF, xpdftools are not showing the same issue.

I am attaching two files.

One is the PDF file that should not have any spaces between characters.
If I copy/paste from Adobe, the text reads:
REBATE#: 82632-PIP % VARIES FROM: TO: 12302017
PO NBR
ITEM #
ITMPK
ITMSIZE
DESCRIPTION
PO QTY
PO WGT
RBT AMNT
PO COST
PO EXT COST
RBT EXT AMNT
7387200
1086186
1
9KG
CHEESE CREAM PLAIN LT
3
27.00
.0250
73.7500
221.25
5.53
7387200
1139571
12
250G
CHEESE PARMESAN SHAKER
12
36.00
.0350
76.0800
912.96
31.95
7387200
1139586
4
2.3KG
CHEESE CHED MED COL
9
82.80
.0250
117.0000
1053.00
26.33
7387200
1139596

The copy/pasting from Docparser shows:
 R       E   B   A   T   E   # :         8 2 6 3 2 - P               I P               %                  V       A       R       I E   S                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      P       A       G       E    :                               2
 
 
              F   R       O   M   :                                                       T   O     :       1 2 3 0 2 0 1 7
 
 
 
 
  P   O   N       B       R                     I T   E   M       #                                              I T       M       P    K                        I T       M       S       I Z       E   D     E    S    C       R           I P         T       I O             N                                                                                                                                                   P   O   Q    T       Y         P   O     W       G   T   R   B   T     A         M       N       T                                                   P   O       C       O       S   T                             P       O           E       X   T        C       O       S   T                       R   B   T   E   X   T    A   M    N    T
 
 
7 3 8 7 2 0 2                                   1 0 7 0 4 7 2                                                                  1 2 0                             4 3       M           L                 D     R     E   S       S       I N             G               I T         A           L           S           P       R       I N                 G               H           E       R       B                                            4                 2 0 . 8 4                         . 0 2 3 5                                                                   3 5 . 3 2 0 0                                                                                          1 4 1 . 2 8                                                                  3 . 3 2
 
 
7 3 8 7 7 1 1                                   1 0 8 9 0 5 8                                                                              6                     2 . 8 4 L                               J U       I C       E           T       O           M           A       T           O               C       A           N                                                                                                               4 0               6 9 7 . 6 0                            . 0 3 6 5                                                                   2 4 . 7 7 0 0                                                                                          9 9 0 . 8 0                                                               3 6 . 1 6
 
 
7 3 8 7 7 1 1                                   1 2 3 5 3 5 6                                                                              6                     2 . 8 4 L                               K     E    T    C       H           U           P           B       I G                 R           E       D               P       L           A           S                                                                           4 0               7 7 4 . 8 0                            . 0 4 6 5                                                                   4 3 . 3 9 0 0                                                                                     1 7 3 5 . 6 0                                                                  8 0 . 7 1
 
 
7 3 8 7 7 1 1                                   1 2 6 3 3 1 3                                                                          2 4                       1 5 6 M                   L             J U       I C       E           T       O           M           A       T           O               C       A           N                                                                                                               1 6                    6 1 . 2 8                         . 0 3 6 5                                                                   1 4 . 0 1 0 0                                                                                          2 2 4 . 1 6                                                                  8 . 1 8
Comment 1 Nupur Patel 2018-04-03 15:36:25 UTC
Created attachment 138552 [details]
Docparser Rendering View 1
Comment 2 Nupur Patel 2018-04-03 15:37:39 UTC
Created attachment 138553 [details]
Docparser_rendering_view_2
Comment 3 Jason Crain 2018-04-03 16:02:02 UTC
If this is part of some software you've purchsed, you should contact docparser for support.

I'm guessing this is using poppler's pdftohtml to display the PDFs, though I can't reproduce this issue. pdftohtml, pdftocairo, and pdftotext from poppler 0.62 are working correctly for me with no extra spaces. What version of poppler are you using?
Comment 4 Nupur Patel 2018-04-03 16:06:03 UTC
Created attachment 138554 [details]
attachment-1149-0.html

Hi Jason,

I have asked Docparser support for the version.  They said they were using pdftotext.

I did contact them first and they said they use PopplerUtil to render the text.  Thus, I contacted you.  Docparser also said it could be how the file was created, therefore I contacted Abyyy.  Abbyy said it is rendering since no other application (other than Docparser) sees those spaces.

Do you see any problem with how the file is written?

Nupur

From: bugzilla-daemon@freedesktop.org [mailto:bugzilla-daemon@freedesktop.org]
Sent: Tuesday, April 03, 2018 12:02 PM
To: Nupur Patel <Nupur.Patel@blacksmithapplications.com>
Subject: [Bug 105867] small characters and extra spaces

Comment # 3<https://bugs.freedesktop.org/show_bug.cgi?id=105867#c3> on bug 105867<https://bugs.freedesktop.org/show_bug.cgi?id=105867> from Jason Crain<mailto:jason@inspiresomeone.us>

If this is part of some software you've purchsed, you should contact docparser

for support.



I'm guessing this is using poppler's pdftohtml to display the PDFs, though I

can't reproduce this issue. pdftohtml, pdftocairo, and pdftotext from poppler

0.62 are working correctly for me with no extra spaces. What version of poppler

are you using?

________________________________
You are receiving this mail because:

  *   You reported the bug.
Comment 5 Jason Crain 2018-04-03 17:01:45 UTC
Poppler's pdftotext program is working fine for me. Whatever the issue is it seems to be specific to something Docparser is doing, and it's their responsibility to provide support for their services.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.