Bug 17913 - Problem with Cyrillic and Japanese at PDF forms (evince, okular)
Summary: Problem with Cyrillic and Japanese at PDF forms (evince, okular)
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-10-05 06:07 UTC by Stanislav German-Evtushenko
Modified: 2019-04-01 05:37 UTC (History)
43 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Test forms with Kanji and Cyrillic (3.28 KB, application/pdf)
2008-10-05 14:06 UTC, Stanislav German-Evtushenko
Details
screenshot of font property of sample PDF 19393 (40.31 KB, image/png)
2010-09-04 10:38 UTC, suzuki toshiya
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stanislav German-Evtushenko 2008-10-05 06:07:17 UTC
messages in console:

Japanese (日本国):
warning: layoutText: cannot convert U+65E5
warning: layoutText: cannot convert U+672C
warning: layoutText: cannot convert U+56FD

Cyrillic (привет):
warning: layoutText: cannot convert U+043F
warning: layoutText: cannot convert U+0440
warning: layoutText: cannot convert U+0438
warning: layoutText: cannot convert U+0432
warning: layoutText: cannot convert U+0435
warning: layoutText: cannot convert U+0442
Comment 1 Pino Toscano 2008-10-05 09:03:00 UTC
Hi Stanislav,

can you please attach a document showing the issue?
Comment 2 Stanislav German-Evtushenko 2008-10-05 14:06:21 UTC
Created attachment 19393 [details]
Test forms with Kanji and Cyrillic
Comment 3 Goran Rakic 2010-03-30 08:45:50 UTC
Reading comments from http://qa.openoffice.org/issues/show_bug.cgi?id=42985 it looks like the issue is that embedded font does not have unicode characters. Are there any news about this bug? 
Comment 4 suzuki toshiya 2010-09-04 10:38:21 UTC
Created attachment 38442 [details]
screenshot of font property of sample PDF 19393 

The sample PDF has no embedded fonts, and the referred font
is Helvetica under WinAnsiEncoding only. So giving Unicode
text for Japanese & Cyrillic codepoint is wrong. When it is
opened by Adobe Reader, strangely, the font tab in property
window shows KozMinPr6N-Regular (Adobe Reader's default font for
Adobe-Japan1) as if it's subsetted & embedded. If I hide
KozMinPr6N-Regular.otf from Adobe Reader, then AdobeMingStd-Light
(Adobe Reader's default font for Adobe-CNS1) is shown as if
it is subsetted & embedded. If I hide it, AdobeSongStd-Light
(for Adobe-GB1) is shown as if ...

I'm afraid that Adobe Reader shows Kanji & Cyrillic text by
some fallback feature, and poppler's behaviour is not a bug,
although it is not inconvenient than Adobe Reader.
Comment 5 suzuki toshiya 2010-09-04 10:41:57 UTC
Sorry for typo.
I meant "although it is inconvenient than Adobe Reader"
Comment 6 Oliver Joos 2013-02-18 14:51:03 UTC
I confirm this bug (or missing feature).

It not only affects Arabic or Hebrew characters! I tried to insert a "line
separator" (Unicode character '\u2028') into a text field of a PDF form. In
Gnome (e.g. in a Terminal) this works by pressing Ctrl+Shift+u 2 0 2 8. In
acroread it works similar (just keep pressing Ctrl+Shift while typing the
digits). Evince displays a PDF form with '\u2028' as expected. But a '\u2028'
cannot be inserted and is even removed whenever the containing text field is
edited somehow!

For details about why and how PDF forms may contain '\u2028' or '\u2029' see
http://blogs.adobe.com/formfeed/2009/01/paragraph_breaks_in_plain_text.html

( This is a copy of my comment https://bugzilla.gnome.org/show_bug.cgi?id=627024#c4 )
Comment 7 Tamir 2014-03-15 06:26:53 UTC
Confirms bug Evince 3.10.3 (uses poppler / cairo (0.24.1). Really looking forward to solve the problem
Comment 8 lioncub 2014-03-16 05:02:37 UTC
Confirms bug Evince 3.10.3 x64
Comment 9 alex 2014-03-16 07:19:57 UTC
Confirms bug.
Any progress?
Comment 10 zeonchameleon 2014-03-17 06:37:39 UTC
I confirm this bug.
Comment 11 Toly 2014-03-17 06:49:34 UTC
Confirms bug, looking forward to solve the problem.
Comment 12 Ruslan 2014-03-17 07:06:30 UTC
Confirms bug Evince 3.10.3, Okular 0.18
Comment 13 Nino 2014-03-17 07:20:51 UTC
Confirms bug Evince 3.10.3, Okular 0.18 (uses poppler)
Comment 14 Elena 2014-03-17 10:40:34 UTC
Confirm this bug.
Comment 15 Albert Astals Cid 2014-03-17 10:50:48 UTC
Stop saying "confirm this bug". Do you think it helps? No it does not.
Comment 16 Tamir 2014-03-17 12:25:37 UTC
Then please enlighten that you need to solve this bug?
Comment 17 Albert Astals Cid 2014-03-17 12:31:02 UTC
A patch that fixes it would be nice. If you can't provide that infinite amounts of time would be nice too. If you can't provide that maybe you can hire someone to code the patch for you.
Comment 18 Tamir 2014-03-17 14:25:40 UTC
Okay, I will do my best
Comment 19 Alex 2014-03-17 19:43:12 UTC
confirm this bug - trying in the evince, okular - the best readers in the Linux :(
Comment 20 Viktar 2014-11-28 09:31:54 UTC
It don't show cyrrilic comments in Ocular.
Comment 21 Dmitriy 2015-12-10 18:50:23 UTC
Sill actual for Evince 3.18.2
Comment 22 vladimir 2016-02-03 09:27:39 UTC
I confirm this bug.
Comment 23 slazer 2016-02-15 19:53:12 UTC
Compiled the latest poppler trunk from the GIT repository - 0.40.0 including commits up to commit ab3c9ccb630004be049cb59f303612aa2a35f408 - on Ubuntu 16.04 daily with updates. Also compiled the latest trunk of qpdfview - 0.4.16.99 - including all the funny libraries and passing all tests.

No improvement.

Solution:
Install the latest Windows 10 (64bit) version of Foxit Reader using wine (staging or devel from wine PPA on Ubuntu) and open the file in it. Works like a charm, looks beautifully.

Alternative solution:
Is there any alternative to poppler on Linux? Note that Foxit for Linux 1.0.1.0925 does not even support forms! You can add text via comments, but then again, the "special" characters are broken or invisible (amusingly in a different way than using poppler).

Affected readers (confirmed):
evince (called Document viewer on Ubuntu), okular, qpdfview, Foxit Reader for Linux

Not affected:
Foxit Reader for Windows on wine-1.9.3 (Staging))

Not working (crashes on wine-1.9.3 (Staging)):
Adobe acrobat reader XI for Windows on Wine  

To demonstrate the problem, type this into a PDF form:
Příliš žľuťoučký kůň úpěl ďábelské ódy

Some character disappear, some are replaced with free spaces and become invisible. If entered using Foxit Reader for Windows however, the whole phrase displays correctly on Linux PDF viewers. It becomes malformed when copied into any other form.
Comment 24 Alexander 2016-04-18 08:57:07 UTC
So 8 years passed and no fix for this?
Comment 25 Kjeld Flarup 2016-06-05 00:09:44 UTC
I tested with this form: http://archive.mid.ru//bul_ns_en.nsf/uvedomlenie.pdf

Foxit Reader for Linux works fine, no need to use Wine
So does google-chrome, it even edits the forms.
Firefox fails, but prints a ? for every cyrillic letter

Im quite sure that this problem can be solved with some font knowlegde. Somehow it must be possible to find ut which Cyrillic font is used in the document, and then make some kind of font substitution in the system.

But weel, I found two working alternatives, so I do not need to fill in these documents by hand this summer :-)
Comment 26 Anton Kochkov 2016-07-03 13:47:56 UTC
Is there any hope to fix it properly?
Comment 27 Andrey Skvortsov 2016-07-11 07:44:46 UTC
Confirm the issue in Gnome 3.20.1 in Debian/Testing.
Comment 28 Ivan 2016-08-01 17:23:25 UTC
(In reply to Alexander from comment #24)
> So 8 years passed and no fix for this?

this is a gnome devs and they breaking my caps switch every new release ))
so the pdf is much harder to fix here, just use mupdf
Comment 29 pkozlov 2016-10-26 15:11:02 UTC
Guy, do you have any updates?
I am experiencing the same issue with okular.

Okular version 0.25.0 (KDE 4.14.24)

I can assist with anything may help you to fix this bug.
Comment 30 pkozlov 2016-10-26 15:12:10 UTC
poppler version 0.45.0
Comment 31 Murz 2017-02-02 07:47:11 UTC
Same problem reproduced with displaying PDF form in fresh Firefox and Chrome web browsers, that use JS rendering. So the problem is in PDF document - PDF document don't embed non-cyrillic font letters in PDF file. For this cases is better to display symbols via some default font, instead of missing letters.

So can we add workaround for "warning: layoutText: cannot convert U+65E5" - replace font for this symbol to system default?
Comment 32 Andrey Alekseenkov 2017-03-13 14:34:50 UTC
The same issue exists on Fedora 25 even if latest poppler 0.52 is compiled and installed.
Comment 33 Karen Fidanyan 2017-04-11 14:05:13 UTC
Guys, can we set some reward for the hero who will solve this? 
This bug seems rather difficult, since it remains for so many years. And, of course, it delivers lots of pain to russian and japanese users.
Are there something like Mozilla-Russia «Bug Bounty» program? http://mozilla-russia.org/contribute/bounty-en.html
Comment 34 Andrey Skvortsov 2017-04-11 14:14:53 UTC
You can create bounty for this for example here: https://freedomsponsors.org/
Comment 35 Karen Fidanyan 2017-04-21 08:21:12 UTC
Andrej.Skvortzov, thanks for this link!
I created an issue and invite everyone to sponsor it:
https://freedomsponsors.org/issue/807/problem-with-cyrillic-and-japanese-at-pdf-forms-evince-okular
Comment 36 Andrey Skvortsov 2017-04-21 08:44:34 UTC
(In reply to fidanjan-karen from comment #35)
> https://freedomsponsors.org/issue/807/problem-with-cyrillic-and-japanese-at-
> pdf-forms-evince-okular

I've added myself as a sponsor to the bounty.
Comment 37 Kamil Páral 2017-04-21 08:49:43 UTC
There's not even a link to this bugzilla in the bounty description. You're not making it easy for people to understand what that bounty is about.

Also, this is a universal issue about characters not available in the embedded fonts or default pdf fonts. This is not just about cyrillic and japanese. I'm also hit by this issue very often, and I use central european characters. But as long as the bounty description says only cyrillic and japanese, I'm not going to increase the reward. Please adjust.
Comment 38 Karen Fidanyan 2017-04-21 09:25:15 UTC
(In reply to Kamil Páral from comment #37)
> There's not even a link to this bugzilla in the bounty description. You're
> not making it easy for people to understand what that bounty is about.
> 
> Also, this is a universal issue about characters not available in the
> embedded fonts or default pdf fonts. This is not just about cyrillic and
> japanese. I'm also hit by this issue very often, and I use central european
> characters. But as long as the bounty description says only cyrillic and
> japanese, I'm not going to increase the reward. Please adjust.

Kamil, thanks for your comment!
Unfortunately, I don't see the way to change the title of the issue, but I wrote more general description and added links to this thread and the bug 627024 at bugzilla.gnome.org.
Comment 39 nick 2017-09-04 11:49:54 UTC
Confirm the issue in last Archlinux 
KDE Frameworks 5.37.0
Qt 5.9.1 (built against 5.9.1)
Okular 1.2.0
Comment 40 nick 2017-09-04 12:06:09 UTC
(In reply to nick from comment #39)
> Confirm the issue in last Archlinux 
> KDE Frameworks 5.37.0
> Qt 5.9.1 (built against 5.9.1)
> Okular 1.2.0

The last Foxitreader for Linux (2.4.1.0609)can correctly fill forms with Cyrilic
Comment 41 Vladimir 2017-11-28 17:10:11 UTC
Confirming this on current Debian Testing with libpoppler 0.57.0, in frontends: evince, qpdfview, mupdf.

This bug is 9 years old, confirmed a hundred times and is still in status "NEW"? This is ridiculous.
Comment 42 Albert Astals Cid 2017-12-02 00:30:49 UTC
You are wasting previous time of people working on poppler doing such a useless comment, please refrain from commenting on the future unless you can add something useful to the discussion.
Comment 43 GitLab Migration User 2018-08-21 10:59:54 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/463.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.