Poppler does not support ActualText. The ActualText entry is used to specify replacement text for content that does translate to text but is represented in a non standard way (eg glyphs for ligatures). ActualText support is required to enable text to be correctly extracted from the pdf. Some examples of PDFs that use ActualText are at http://www.unicode.org/udhr/ One of the PDFs that I tested is http://www.unicode.org/udhr/d/udhr_san.pdf A patch to implement ActualText support is attached.
Created attachment 13005 [details] [review] ActualText patch Patch to implement ActualText
Patch commited thanks a lot. Are you subscribed to poppler mailing list? If not we will be happy of getting people like you there :-)
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.