Created attachment 140931 [details]
Tesseract OCR uses a glyphless font (a font with a single glyph that occupies empty space) in the PDFs it produces.
When PDFs produced by Tesseract are rendered in and text is selected, Poppler draws white boxes over top of the background image that contains the text. The Tesseract team has worked pretty hard on PDF viewer support and compatibility - to my knowledge the Tesseract glyphless font works correctly in Acrobat, Pdfium, PDF.js, macOS Preview, Dropbox PDF Viewer, MuPDF and Ghostscript; with multiple platform and including mobile testing. Other PDF viewers do not attempt to render the glyphless font on top of the background.
This was first reported against Evince, which claims the issue is in Poppler.
See that issue for screenshots as no screenshots can be added easily here.
The design notes of the glyphless font may be relevant.
-- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/157.