54268 – problem copy/pasting CID? / Identity-H? text

Bug 54268 - problem copy/pasting CID? / Identity-H? text

Summary: problem copy/pasting CID? / Identity-H? text

Status:	RESOLVED NOTOURBUG

Alias:	None

Product:	poppler
Classification:	Unclassified
Component:	general (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	poppler-bugs
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-08-30 14:50 UTC by Frederic Peters
Modified:	2012-08-30 15:52 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments

Description Frederic Peters 2012-08-30 14:50:29 UTC

I got a whole lot of PDF files where poppler somehow fails (example at <http://people.gnome.org/~fpeters/pdf-identity-h-bug.pdf>).

The first page is ok but then it got a second page attached, with a single word, in a monospace font (looking in document properties in poppler it's "FreeMono, Truetype (CID), encoded as Identity-H"). That word is displayed correctly but converted to something entirely different when copy/pasting from evince, or using the pdftotext or pdftohtml entities.

The displayed word is "tapiraient" while the word extracted as text is "WDSLUDLHQW". In the serie of documents I have, other examples give:

  DQJRLVVHUD -> angoissera
  HQDPRXUHU -> enamourer
  FRQWUHFDUUDLW -> contrecarrait

It looks like the mapping is always the same, and letters are kept in the same order (ex: D->a, E->?, F->c, G->?, H->e...); I checked poppler-data and there is CMap/Identity-H but I couldn't figure if it's used, or relevant.

Comment 1 Frederic Peters 2012-08-30 15:52:22 UTC

As suggested on IRC by Carlos, I had a friend try it with Adobe Reader, and the result is somehow similar.

It gives (don't know if the unicode chars will survive): 􀁗􀁄􀁓􀁌􀁕􀁄􀁌􀁈􀁑􀁗 which is U00100057, U00100044..., if the high 1 is ignored, the output is identical to what I get with poppler.

I suppose there is nothing to do here but to consider the file invalid :/

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.