Summary: | Problem with Cyrillic and Japanese at PDF forms (evince, okular) | ||
---|---|---|---|
Product: | poppler | Reporter: | Stanislav German-Evtushenko <ginermail> |
Component: | general | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | akerkunov, alex.serpuxov, Andrej.Skvortzov, artem-terehin, asceriwili.nino, bugs-freedesktop, bycel, dafagazova, davidmelkihan, djashlar, doarten.ruslan, doityourselfteam, dority, drewtov, ertash, fedya.retunov, g.gmx, grakic, iveand, jumpjet68, kparal, lenar.shakirov, lioncub, matvey, mpsuzuki, mschmidt, oliver.joos, petr.pisar, remenkin, reminov.tolyan, reva.wertuna, rozelak, sergei.rasder, slex.bi, tamir.r, tayinin, tonal.promsoft, tropikhajma, vasda, vovik-wfa, wertyhes, xavabova, zyx1984 |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux (All) | ||
See Also: | https://bugzilla.gnome.org/show_bug.cgi?id=725808 | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Test forms with Kanji and Cyrillic
screenshot of font property of sample PDF 19393 |
Description
Stanislav German-Evtushenko
2008-10-05 06:07:17 UTC
Hi Stanislav, can you please attach a document showing the issue? Created attachment 19393 [details]
Test forms with Kanji and Cyrillic
Reading comments from http://qa.openoffice.org/issues/show_bug.cgi?id=42985 it looks like the issue is that embedded font does not have unicode characters. Are there any news about this bug? Created attachment 38442 [details]
screenshot of font property of sample PDF 19393
The sample PDF has no embedded fonts, and the referred font
is Helvetica under WinAnsiEncoding only. So giving Unicode
text for Japanese & Cyrillic codepoint is wrong. When it is
opened by Adobe Reader, strangely, the font tab in property
window shows KozMinPr6N-Regular (Adobe Reader's default font for
Adobe-Japan1) as if it's subsetted & embedded. If I hide
KozMinPr6N-Regular.otf from Adobe Reader, then AdobeMingStd-Light
(Adobe Reader's default font for Adobe-CNS1) is shown as if
it is subsetted & embedded. If I hide it, AdobeSongStd-Light
(for Adobe-GB1) is shown as if ...
I'm afraid that Adobe Reader shows Kanji & Cyrillic text by
some fallback feature, and poppler's behaviour is not a bug,
although it is not inconvenient than Adobe Reader.
Sorry for typo. I meant "although it is inconvenient than Adobe Reader" I confirm this bug (or missing feature). It not only affects Arabic or Hebrew characters! I tried to insert a "line separator" (Unicode character '\u2028') into a text field of a PDF form. In Gnome (e.g. in a Terminal) this works by pressing Ctrl+Shift+u 2 0 2 8. In acroread it works similar (just keep pressing Ctrl+Shift while typing the digits). Evince displays a PDF form with '\u2028' as expected. But a '\u2028' cannot be inserted and is even removed whenever the containing text field is edited somehow! For details about why and how PDF forms may contain '\u2028' or '\u2029' see http://blogs.adobe.com/formfeed/2009/01/paragraph_breaks_in_plain_text.html ( This is a copy of my comment https://bugzilla.gnome.org/show_bug.cgi?id=627024#c4 ) Confirms bug Evince 3.10.3 (uses poppler / cairo (0.24.1). Really looking forward to solve the problem Confirms bug Evince 3.10.3 x64 Confirms bug. Any progress? I confirm this bug. Confirms bug, looking forward to solve the problem. Confirms bug Evince 3.10.3, Okular 0.18 Confirms bug Evince 3.10.3, Okular 0.18 (uses poppler) Confirm this bug. Stop saying "confirm this bug". Do you think it helps? No it does not. Then please enlighten that you need to solve this bug? A patch that fixes it would be nice. If you can't provide that infinite amounts of time would be nice too. If you can't provide that maybe you can hire someone to code the patch for you. Okay, I will do my best confirm this bug - trying in the evince, okular - the best readers in the Linux :( It don't show cyrrilic comments in Ocular. Sill actual for Evince 3.18.2 I confirm this bug. Compiled the latest poppler trunk from the GIT repository - 0.40.0 including commits up to commit ab3c9ccb630004be049cb59f303612aa2a35f408 - on Ubuntu 16.04 daily with updates. Also compiled the latest trunk of qpdfview - 0.4.16.99 - including all the funny libraries and passing all tests. No improvement. Solution: Install the latest Windows 10 (64bit) version of Foxit Reader using wine (staging or devel from wine PPA on Ubuntu) and open the file in it. Works like a charm, looks beautifully. Alternative solution: Is there any alternative to poppler on Linux? Note that Foxit for Linux 1.0.1.0925 does not even support forms! You can add text via comments, but then again, the "special" characters are broken or invisible (amusingly in a different way than using poppler). Affected readers (confirmed): evince (called Document viewer on Ubuntu), okular, qpdfview, Foxit Reader for Linux Not affected: Foxit Reader for Windows on wine-1.9.3 (Staging)) Not working (crashes on wine-1.9.3 (Staging)): Adobe acrobat reader XI for Windows on Wine To demonstrate the problem, type this into a PDF form: Příliš žľuťoučký kůň úpěl ďábelské ódy Some character disappear, some are replaced with free spaces and become invisible. If entered using Foxit Reader for Windows however, the whole phrase displays correctly on Linux PDF viewers. It becomes malformed when copied into any other form. So 8 years passed and no fix for this? I tested with this form: http://archive.mid.ru//bul_ns_en.nsf/uvedomlenie.pdf Foxit Reader for Linux works fine, no need to use Wine So does google-chrome, it even edits the forms. Firefox fails, but prints a ? for every cyrillic letter Im quite sure that this problem can be solved with some font knowlegde. Somehow it must be possible to find ut which Cyrillic font is used in the document, and then make some kind of font substitution in the system. But weel, I found two working alternatives, so I do not need to fill in these documents by hand this summer :-) Is there any hope to fix it properly? Confirm the issue in Gnome 3.20.1 in Debian/Testing. (In reply to Alexander from comment #24) > So 8 years passed and no fix for this? this is a gnome devs and they breaking my caps switch every new release )) so the pdf is much harder to fix here, just use mupdf Guy, do you have any updates? I am experiencing the same issue with okular. Okular version 0.25.0 (KDE 4.14.24) I can assist with anything may help you to fix this bug. poppler version 0.45.0 Same problem reproduced with displaying PDF form in fresh Firefox and Chrome web browsers, that use JS rendering. So the problem is in PDF document - PDF document don't embed non-cyrillic font letters in PDF file. For this cases is better to display symbols via some default font, instead of missing letters. So can we add workaround for "warning: layoutText: cannot convert U+65E5" - replace font for this symbol to system default? The same issue exists on Fedora 25 even if latest poppler 0.52 is compiled and installed. Guys, can we set some reward for the hero who will solve this? This bug seems rather difficult, since it remains for so many years. And, of course, it delivers lots of pain to russian and japanese users. Are there something like Mozilla-Russia «Bug Bounty» program? http://mozilla-russia.org/contribute/bounty-en.html You can create bounty for this for example here: https://freedomsponsors.org/ Andrej.Skvortzov, thanks for this link! I created an issue and invite everyone to sponsor it: https://freedomsponsors.org/issue/807/problem-with-cyrillic-and-japanese-at-pdf-forms-evince-okular (In reply to fidanjan-karen from comment #35) > https://freedomsponsors.org/issue/807/problem-with-cyrillic-and-japanese-at- > pdf-forms-evince-okular I've added myself as a sponsor to the bounty. There's not even a link to this bugzilla in the bounty description. You're not making it easy for people to understand what that bounty is about. Also, this is a universal issue about characters not available in the embedded fonts or default pdf fonts. This is not just about cyrillic and japanese. I'm also hit by this issue very often, and I use central european characters. But as long as the bounty description says only cyrillic and japanese, I'm not going to increase the reward. Please adjust. (In reply to Kamil Páral from comment #37) > There's not even a link to this bugzilla in the bounty description. You're > not making it easy for people to understand what that bounty is about. > > Also, this is a universal issue about characters not available in the > embedded fonts or default pdf fonts. This is not just about cyrillic and > japanese. I'm also hit by this issue very often, and I use central european > characters. But as long as the bounty description says only cyrillic and > japanese, I'm not going to increase the reward. Please adjust. Kamil, thanks for your comment! Unfortunately, I don't see the way to change the title of the issue, but I wrote more general description and added links to this thread and the bug 627024 at bugzilla.gnome.org. Confirm the issue in last Archlinux KDE Frameworks 5.37.0 Qt 5.9.1 (built against 5.9.1) Okular 1.2.0 (In reply to nick from comment #39) > Confirm the issue in last Archlinux > KDE Frameworks 5.37.0 > Qt 5.9.1 (built against 5.9.1) > Okular 1.2.0 The last Foxitreader for Linux (2.4.1.0609)can correctly fill forms with Cyrilic Confirming this on current Debian Testing with libpoppler 0.57.0, in frontends: evince, qpdfview, mupdf. This bug is 9 years old, confirmed a hundred times and is still in status "NEW"? This is ridiculous. You are wasting previous time of people working on poppler doing such a useless comment, please refrain from commenting on the future unless you can add something useful to the discussion. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/463. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.