Created attachment 138551 [details] Original PDF We use Docparser to parse the PDFs and Docparser uses Popplerutil render the text from PDF as text. You can find out their version by emailing support@docparser.com Docparser shows extra spaces and character size changes when the document is rendered. While Adobe, Foxit, SodaPDF, xpdftools are not showing the same issue. I am attaching two files. One is the PDF file that should not have any spaces between characters. If I copy/paste from Adobe, the text reads: REBATE#: 82632-PIP % VARIES FROM: TO: 12302017 PO NBR ITEM # ITMPK ITMSIZE DESCRIPTION PO QTY PO WGT RBT AMNT PO COST PO EXT COST RBT EXT AMNT 7387200 1086186 1 9KG CHEESE CREAM PLAIN LT 3 27.00 .0250 73.7500 221.25 5.53 7387200 1139571 12 250G CHEESE PARMESAN SHAKER 12 36.00 .0350 76.0800 912.96 31.95 7387200 1139586 4 2.3KG CHEESE CHED MED COL 9 82.80 .0250 117.0000 1053.00 26.33 7387200 1139596 The copy/pasting from Docparser shows: R E B A T E # : 8 2 6 3 2 - P I P % V A R I E S P A G E : 2 F R O M : T O : 1 2 3 0 2 0 1 7 P O N B R I T E M # I T M P K I T M S I Z E D E S C R I P T I O N P O Q T Y P O W G T R B T A M N T P O C O S T P O E X T C O S T R B T E X T A M N T 7 3 8 7 2 0 2 1 0 7 0 4 7 2 1 2 0 4 3 M L D R E S S I N G I T A L S P R I N G H E R B 4 2 0 . 8 4 . 0 2 3 5 3 5 . 3 2 0 0 1 4 1 . 2 8 3 . 3 2 7 3 8 7 7 1 1 1 0 8 9 0 5 8 6 2 . 8 4 L J U I C E T O M A T O C A N 4 0 6 9 7 . 6 0 . 0 3 6 5 2 4 . 7 7 0 0 9 9 0 . 8 0 3 6 . 1 6 7 3 8 7 7 1 1 1 2 3 5 3 5 6 6 2 . 8 4 L K E T C H U P B I G R E D P L A S 4 0 7 7 4 . 8 0 . 0 4 6 5 4 3 . 3 9 0 0 1 7 3 5 . 6 0 8 0 . 7 1 7 3 8 7 7 1 1 1 2 6 3 3 1 3 2 4 1 5 6 M L J U I C E T O M A T O C A N 1 6 6 1 . 2 8 . 0 3 6 5 1 4 . 0 1 0 0 2 2 4 . 1 6 8 . 1 8
Created attachment 138552 [details] Docparser Rendering View 1
Created attachment 138553 [details] Docparser_rendering_view_2
If this is part of some software you've purchsed, you should contact docparser for support. I'm guessing this is using poppler's pdftohtml to display the PDFs, though I can't reproduce this issue. pdftohtml, pdftocairo, and pdftotext from poppler 0.62 are working correctly for me with no extra spaces. What version of poppler are you using?
Created attachment 138554 [details] attachment-1149-0.html Hi Jason, I have asked Docparser support for the version. They said they were using pdftotext. I did contact them first and they said they use PopplerUtil to render the text. Thus, I contacted you. Docparser also said it could be how the file was created, therefore I contacted Abyyy. Abbyy said it is rendering since no other application (other than Docparser) sees those spaces. Do you see any problem with how the file is written? Nupur From: bugzilla-daemon@freedesktop.org [mailto:bugzilla-daemon@freedesktop.org] Sent: Tuesday, April 03, 2018 12:02 PM To: Nupur Patel <Nupur.Patel@blacksmithapplications.com> Subject: [Bug 105867] small characters and extra spaces Comment # 3<https://bugs.freedesktop.org/show_bug.cgi?id=105867#c3> on bug 105867<https://bugs.freedesktop.org/show_bug.cgi?id=105867> from Jason Crain<mailto:jason@inspiresomeone.us> If this is part of some software you've purchsed, you should contact docparser for support. I'm guessing this is using poppler's pdftohtml to display the PDFs, though I can't reproduce this issue. pdftohtml, pdftocairo, and pdftotext from poppler 0.62 are working correctly for me with no extra spaces. What version of poppler are you using? ________________________________ You are receiving this mail because: * You reported the bug.
Poppler's pdftotext program is working fine for me. Whatever the issue is it seems to be specific to something Docparser is doing, and it's their responsibility to provide support for their services.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.