Summary: | [patch] to fix Syntax Warning: Could not parse ligature component "BE" of "S_BE" in parseCharName | ||
---|---|---|---|
Product: | poppler | Reporter: | William Bader <williambader> |
Component: | general | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | cwolfe, williambader |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
patch to add three glyph name mappings
Sample PDF that gets Could not parse ligature component "BE" of "S_BE" in parseCharName |
Description
William Bader
2015-07-09 21:11:15 UTC
Can we have that pdf? Created attachment 117031 [details]
Sample PDF that gets Could not parse ligature component "BE" of "S_BE" in parseCharName
atril, pdftops, pdftoppm, pdftocairo and others show the diagnostic below multiple times when processing this file.
Syntax Warning: Could not parse ligature component "BE" of "S_BE" in parseCharName
The PDF was made with Scribus 1.4.2.dfsg+r18267-1ubuntu2 on Ubuntu 14.04 LTS. What's the patch useful for? 1) The patch stops a stream of cryptic warnings from poppler utilities and from viewers like evince that use libpoppler when opening a PDF that uses a font package included in the LTS release of a major Linux distribution. 2) Presumably the patch makes those three glyph names accessible. Acrobat Reader does not complain about the PDF. The question is whether you want poppler to be lenient or strict about processing glyph names that are valid but don't conform to naming recommendations. William What do you mean by " makes those three glyph names accessible." ? poppler follows a convention (which is not a required rule) of splitting glyph names at underscores to handle ligatures, see parseCharName() in GfxFont.cc. // Step 2: split the remaining string into a sequence of components, using // underscore (U+005F LOW LINE) as the delimiter. if (ligatures && strchr(charName, '_')) { // parse names of the form A_a (e.g. f_i, T_h, l_quotesingle) If a glyph name with an embedded underscore is not in nameToUnicodeTextTab[], poppler will split it at the underscore and won't be able to find it. When poppler-based applications run on the attached PDF (which has a glyph named "S_BE"), they display Syntax Warning: Could not parse ligature component "BE" of "S_BE" in parseCharName because parseCharName() thinks that "S_BE" is an "S" plus a ligature called "BE", and there is no ligature called "BE" because the glyph is named "S_BE" with the embedded underscore. When parseCharName() prints that syntax warning, it has failed to parse the glyph name, and it has placed either nothing or the wrong value in uBuf[]. That is what I meant that "S_BE" is inaccessible. The comment for parseCharName() says // This function is in part a derived work of the Adobe Glyph Mapping // Convention: http://www.adobe.com/devnet/opentype/archives/glyph.html // Algorithmic comments are excerpted from that document to aid // maintainability. but Acrobat displays the attached file without showing error messages, so Adobe's document (the basis of the code in parseCharName()) does not fully describe how Acrobat works. Very few fonts have glyph names with embedded underscores because it violates Adobe's recommendations. The attached patch includes three of them that are relatively widespread. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/102. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.