Bug 8986

Summary: Adobe Glyph Naming convention (was: Fonts with _-separated ligatures in mapping tables)
Product: poppler Reporter: Ed Catmur <ed>
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: high CC: uws+freedesktop
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
URL: http://www.adobe.com/devnet/opentype/archives/glyph.html
Whiteboard:
i915 platform: i915 features:
Attachments: mapping_tables.patch
mapping_tables.patch
mapping_tables.patch
mapping_tables.patch
mapping_tables.patch
mapping_tables.patch
mapping_tables_r2.patch

Description Ed Catmur 2006-11-11 08:28:07 UTC
http://bugzilla.gnome.org/show_bug.cgi?id=341947 and bug 7002 have attached a
PDF where the font (Minion) has entries of the form A_a in its mapping tables.

Some of these (e.g. f_l, f_f_i) are in Unicode and would be easy to add to
nameToUnicodeTab, but others (T_h, f_f_t, f_j) are not in Unicode and can only
be represented as strings. This will probably require some rearchitecting.
Comment 1 Ed Catmur 2006-11-11 16:54:24 UTC
Created attachment 7750 [details] [review]
mapping_tables.patch

Patch, also fixes bug 8985.
Comment 2 Ed Catmur 2006-11-13 02:00:01 UTC
See also bug 9001.
Comment 3 Ed Catmur 2006-11-13 02:02:07 UTC
Hm. That patch sucks; it can't handle ligated codepoint references. (e.g.
A_uni030B for A with U+030B COMBINING DOUBLE GRAVE ACCENT.) I'll put together a
better patch when I have time.
Comment 4 Ed Catmur 2006-11-13 02:02:56 UTC
Also, we should probably complain to stderr when we can't understand a mapping
table character name. It's only polite.
Comment 5 Ed Catmur 2006-11-13 05:40:00 UTC
Idea for improved handling:

Change the 2-pass flow (currently, recognised names then numeric codes, writing
into a Unicode[256] for initialising in CharCodeToUnicode::make8BitToUnicode) to
have a preliminary pass deciding whether to use hex or decimal, and a single
main pass which reads character names, numeric codes and ligatures, by passing
components to a small function which takes the component and "hex" bool and
writes into uBuf if successful.

This will mean initialising CharCodeToUnicode::make8BitToUnicode with an empty
Unicode[256] (because the main pass needs to use ::setMapping), but that's just
a performance hack and ::setMapping is fast for the single-character case anyway.

Hopefully the code will be clearer than the above.
Comment 6 Ed Catmur 2006-11-15 01:19:51 UTC
Created attachment 7796 [details] [review]
mapping_tables.patch

Right, that's better.

Addresses comment 3 and comment 4.

The suggestion in comment 5 can't work exactly as stated, because that runs the
risk of false positives (e.g. "ae" pushing the parser into hex mode where
decimal is intended). The supplied patch avoids any change to behaviour except
in parsing ligatures, also cutting down on patch size.

The warnings to stderr should help in future diagnosing why text isn't being
copied properly and in driving future improvements to the mapping parser.
Comment 7 Ed Catmur 2006-11-22 15:08:40 UTC
Created attachment 7867 [details] [review]
mapping_tables.patch

Updated patch.

Also handles bug 9128: glyph variants.
Comment 8 Jeff Muizelaar 2006-12-08 23:09:21 UTC
Instead of printing to stderr it would probably be better to use the error()
function like the other users in GfxFont.cc. Using error() means that
applications like KPDF can display the error messages in the user interface.
Comment 9 Ed Catmur 2006-12-09 11:50:10 UTC
Created attachment 8042 [details] [review]
mapping_tables.patch

Use error() instead of fprintf.
Comment 10 Ed Catmur 2007-06-19 15:48:31 UTC
Created attachment 10380 [details] [review]
mapping_tables.patch

for 0.5.9
Comment 11 Albert Astals Cid 2007-12-15 08:38:07 UTC
Hi Ed, sorry for taking to long for reacting on this, i'm not sure i understand what this patch is about, i understand A_a is something like an "Aa" ligature on the Minion font and as this "Aa" ligature does not "exist" in the Unicode standard our current code has problems rendering those ligature and your patch fixes it?
Comment 12 Adrian Johnson 2007-12-16 03:26:19 UTC
There is an Adobe document [1] that contains the procedure for mapping glyph names to a sequence of Unicode characters to support text extraction. The document specifies some additional forms of glyph naming that the patch does not appear to handle.

[1] http://www.adobe.com/devnet/opentype/archives/glyph.html
Comment 13 Ed Catmur 2007-12-16 14:37:11 UTC
@comment 11: Yes, precisely; high-end fonts will often contain ligatures that are not present in Unicode.

@comment 12: Thanks, I didn't know about that specification.  I'll work on getting the wholw of the spec into the patch.
Comment 14 Albert Astals Cid 2007-12-16 14:54:31 UTC
Great :-)
Comment 15 Ed Catmur 2007-12-17 16:07:27 UTC
Created attachment 13172 [details] [review]
mapping_tables.patch

OK, this should work.
Comment 16 Albert Astals Cid 2007-12-18 11:20:44 UTC
Patch's in for next non bugfix release
Comment 17 Ed Catmur 2007-12-21 18:15:01 UTC
Created attachment 13303 [details] [review]
mapping_tables_r2.patch

Supplementary patch to fix issues discussed on-list:
 http://lists.freedesktop.org/archives/poppler/2007-December/003236.html
http://lists.freedesktop.org/archives/poppler/2007-December/003238.html

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.