Bug 107450 - Glyphs in PDFs produced by Tesseract OCR render as white boxes when selected
Summary: Glyphs in PDFs produced by Tesseract OCR render as white boxes when selected
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: glib frontend (show other bugs)
Version: unspecified
Hardware: All All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-01 21:20 UTC by James R Barlow
Modified: 2018-08-20 22:00 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Test file (149.73 KB, application/pdf)
2018-08-01 21:20 UTC, James R Barlow
Details

Description James R Barlow 2018-08-01 21:20:34 UTC
Created attachment 140931 [details]
Test file

Tesseract OCR uses a glyphless font (a font with a single glyph that occupies empty space) in the PDFs it produces.

When PDFs produced by Tesseract are rendered in and text is selected, Poppler draws white boxes over top of the background image that contains the text. The Tesseract team has worked pretty hard on PDF viewer support and compatibility - to my knowledge the Tesseract glyphless font works correctly in Acrobat, Pdfium, PDF.js, macOS Preview, Dropbox PDF Viewer, MuPDF and Ghostscript; with multiple platform and including mobile testing. Other PDF viewers do not attempt to render the glyphless font on top of the background.

This was first reported against Evince, which claims the issue is in Poppler.
https://gitlab.gnome.org/GNOME/evince/issues/953

See that issue for screenshots as no screenshots can be added easily here.

Related issues:
* https://github.com/jbarlow83/OCRmyPDF/issues/249
* https://github.com/jbarlow83/OCRmyPDF/issues/178

The design notes of the glyphless font may be relevant.
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/pdfrenderer.cpp
Comment 1 GitLab Migration User 2018-08-20 22:00:47 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/157.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.