Bug 92597 - Fix showing of some non-ASCII characters in forms
Summary: Fix showing of some non-ASCII characters in forms
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-10-22 14:34 UTC by Marek Kasik
Modified: 2016-05-05 10:00 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Find correct glyph or return 0 (1.31 KB, patch)
2015-10-22 14:34 UTC, Marek Kasik
Details | Splinter Review
Fix showing of some non-ASCII characters (1.01 KB, patch)
2015-10-22 14:34 UTC, Marek Kasik
Details | Splinter Review
Find correct glyph or return 0 (901 bytes, patch)
2015-10-23 08:17 UTC, Marek Kasik
Details | Splinter Review

Description Marek Kasik 2015-10-22 14:34:08 UTC
Poppler renders some characters in forms as "fi" instead of showing the correct glyph or nothing (if the font doesn't contain necessary glyphs). This can be reproduced by opening the form from https://bugzilla.gnome.org/show_bug.cgi?id=756805 and typing "šč" into a text field. Actual result is "fifi".
I've found that there are 2 minor bugs which if fixed improves the situation a little.

The first one is that there is a misplaced continue statement in "CharCodeToUnicode::mapToCharCode()". It should be called on the parent loop instead of the nested one. After fixing this, there are no characters shown which should be the case if it doesn't find them.

The second one is that "Annot::Layout()" calls "ccToUnicode->mapToCharCode(&uChar, &c, 2)" with wrong parameter. The last one should be "1" because the uChar is Unicode and hence the &uChar is an array with just one member, not 2. After fixing this, at least the "š" is shown.

See the attached patches.
Comment 1 Marek Kasik 2015-10-22 14:34:37 UTC
Created attachment 119072 [details] [review]
Find correct glyph or return 0
Comment 2 Marek Kasik 2015-10-22 14:34:52 UTC
Created attachment 119073 [details] [review]
Fix showing of some non-ASCII characters
Comment 3 Albert Astals Cid 2015-10-22 22:00:01 UTC
"It should be called on the parent loop instead of the nested one. After fixing this, there are no characters shown which should be the case if it doesn't find them."

Why? Doesn't "if (j==sMap[i].len) {" exactly do what your new variable does?
Comment 4 Jose Aliste 2015-10-23 00:16:32 UTC
I think Albert is right. Instead of adding the variable to make the continue statement, probably just changing the continue to a break statement should work.
Comment 5 Marek Kasik 2015-10-23 08:17:45 UTC
Created attachment 119132 [details] [review]
Find correct glyph or return 0

You are right, the break is enough.
Comment 6 Albert Astals Cid 2015-10-26 18:50:51 UTC
Both patches pushed, tahnks!
Comment 7 Kamil Páral 2016-05-04 13:15:02 UTC
I can now write "š" into the form from attachment 313655 , but I can't write "ě", "č", "ř", "ď", "ť", "ň". And maybe some others. Is that expected?

poppler-0.41.0-1.fc24.x86_64
Comment 8 Kamil Páral 2016-05-04 13:17:10 UTC
A working link to the sample form:
https://bugzilla.gnome.org/attachment.cgi?id=313655

The characters are no longer converted to "fi", they simply disappear when being rendered.
Comment 9 Marek Kasik 2016-05-04 13:32:50 UTC
(In reply to Kamil Páral from comment #7)
> I can now write "š" into the form from attachment 313655 , but I can't write
> "ě", "č", "ř", "ď", "ť", "ň". And maybe some others. Is that expected?

This is expected because the other characters are not part of PDFDocEncoding.
Comment 10 Kamil Páral 2016-05-05 08:45:15 UTC
Thanks, Marek. Is there some chance of having this "fixed" (i.e. worked around the same way Adobe Reader does it - by taking a system font and attaching it to the pdf document) in a foreseeable future? Because currently most of government- or bussiness-related pdf forms (in my country) can't be filled out on Linux, the only way is to run Windows and Adobe Reader. And that's a strong reason for people to say they can't use Linux, which makes me sad. I understand those pdf forms are not well created, they should have contained the correct font covering all needed glyphs. But I can complain as much as I want to the form authors (I did, many times) - as long as it works for 99% of people, they're not going to waste their time on this.
Comment 11 Marek Kasik 2016-05-05 10:00:40 UTC
(In reply to Kamil Páral from comment #10)
> Thanks, Marek. Is there some chance of having this "fixed" (i.e. worked
> around the same way Adobe Reader does it - by taking a system font and
> attaching it to the pdf document) in a foreseeable future? Because currently
> most of government- or bussiness-related pdf forms (in my country) can't be
> filled out on Linux, the only way is to run Windows and Adobe Reader. And
> that's a strong reason for people to say they can't use Linux, which makes
> me sad. I understand those pdf forms are not well created, they should have
> contained the correct font covering all needed glyphs. But I can complain as
> much as I want to the form authors (I did, many times) - as long as it works
> for 99% of people, they're not going to waste their time on this.

I would like to work on this but I don't have time for it now. If I'll find some time for it (or time of my colleagues) I'll look at it but I can not promise anything.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.