Bug 36111 (forms_unicode)

Summary: text in Russian in pdf forms
Product: poppler Reporter: Misha <mikekoltsov>
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: adam.reichold, alex.serpuxov, artem-terehin, butirsky, cfeck, dafagazova, davidmelkihan, denis.rackitniy, djashlar, doityourselfteam, dority, dr0x29a, drewtov, ertash, fedya.retunov, fixot, g.gmx, k644, kparal, mschmidt, radic, remenkin, reva.wertuna, sergei.rasder, speedytux, tayinin, terentev.mn, vasda, wertyhes
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
See Also: https://bugs.kde.org/show_bug.cgi?id=193234
https://bugs.freedesktop.org/show_bug.cgi?id=17913
Whiteboard:
i915 platform: i915 features:
Attachments: sample file

Description Misha 2011-04-10 03:34:56 UTC
Russian text in pdf forms is neither displayed(until "show forms" is checked) nor printed.

To reproduce the bug just copy-paste this text "Привет!"("hello!" in Russian) into a form, then uncheck "show forms". The text will disappear.

Okular's sample output:
...
warning: layoutText: cannot convert U+041A
warning: layoutText: cannot convert U+043E
...

The problem exists at least in poppler 0.14.5-1.
Comment 1 Misha 2011-04-10 03:51:01 UTC
Created attachment 45457 [details]
sample file

The file contains russian and english text in the "family name(surename)" field.
Comment 2 Alexis-Emmanuel Haeringer 2011-04-15 16:16:22 UTC
Hello,
+I can confirm this behavior with poppler-utils (0.16.3-1)  and okular  4:4.4.5-2
For some Esperanto's caractere, and Chinese, Russian , etc. (but not for š ß þúùûü ý ÿ ž by example :/)

+Okular's sample output:
[..]
Annotation Widget not supported. 
Annotation Widget not supported. 
warning: layoutText: cannot convert U+041F
warning: layoutText: cannot convert U+0440
warning: layoutText: cannot convert U+0438
[..]

+Please, have a look to the concerned section of a pdf  file : 
(the interesting sentence is "Les dév de Poppler sont fabuleux !!!! FIN Привет FIN")
========
8 0 obj <</T (selvstendig) /FT /Btn /Kids [448 0 R 453 0 R ] /V /D >> endobj
427 0 obj <</Rect [271.406 719.42 539.126 736.278 ] /Subtype /Widget /F 4 /P 425 0 R /T (land) /DA (/TiRo 11 Tf 0 g) /FT /Tx /Type /Annot /Ff 4194304 /MK <<>> /V (   L e s   d   v   d e   P o p p l e r   s o n t   f a b u l e u x   ! ! ! !   S U I T E  ???@?8?2?5?B   S U I T E) /AP <</N 563 0 R >> >> endobj
448 0 obj <</Rect [139.117 427.699 155.689 442.557 ] /Subtype /Widget /Parent 8 0 R /F 4 /P 425 0 R /Type /Annot /MK <</CA (8) >> /AP <</N 564 0 R >> /AS /D >> endobj
453 0 obj <</Rect [207.69 427.699 224.262 442.557 ] /Subtype /Widget /Parent 8 0 R /F 4 /P 425 0 R /Type /Annot /MK <</CA (8) >> /AP <</N 565 0 R >> /AS /Off >> endobj
563 0 obj <</Length 118 /Subtype /Form /BBox [0 0 267.72 16.858 ] /Resources <</Font <</Arial 526 0 R /Helv 5 0 R /ZaDb 6 0 R /TiRo 7 0 R >> /Encoding <</PDFDocEncoding 4 0 R >> >> >> stream
/Tx BMC
q
BT
/TiRo 11 Tf 0 g 1 0 0 1 2.00 4.03 Tm
(Les d\351v de Poppler sont fabuleux !!!! SUITE  SUITE) Tj
ET
Q
EMC

endstream
endobj
564 0 obj <</Length 57 /Subtype /Form /BBox [0 0 16.572 14.858 ] /Resources <</Font <</Arial 526 0 R /Helv 5 0 R /ZaDb 6 0 R /TiRo 7 0 R >> /Encoding <</PDFDocEncoding 4 0 R >> >> >> stream
q
BT
/ZaDb 14.00 Tf 0 g 1 0 0 1 3.55 1.83 Tm
(8) Tj
ET
Q

endstream
endobj
========

+ You could also notice that when you fill a form and you save it, and next, re-fill the form and you save it.  
All the history of revisions is available in pdf content ! 
I supposed that it's a bug of the form's edition   (okular is responsible ?) ? 

Best regards
Comment 3 Max Filippov 2011-11-07 13:52:15 UTC
Hi. I'm also hit by this bug and I'd like to fix it.

I see the issue is the following:
- unicode-to-charcode map is not loaded from an external font, even when there's cmap table in it;
- form text conversion by the ccToUnicode->mapToCharCode fails for the symbols with codes > 255 in AnnotWidget::layoutText;

Do I understand it right that provision of GfxFont subclass that can load an appropriate cmap table and initialize its CharCodeToUnicode instance with it is the right way to deal with this bug, or is there a better way?
Comment 4 Vladimir 2012-02-21 00:31:25 UTC
Any news?
Comment 5 Vladimir 2012-02-21 00:35:42 UTC
Sorry for impropriety, but is there any progress on this? 
Maybe some rough assumption about when libreoffice will be able to embed unicode fonts in forms?
Comment 6 Misha 2012-02-25 00:39:14 UTC
All the same with poppler 0.16.7-1.
If I have some free time the next week, I'll dive into sources and try to  understand(and probably fix) the problem.
Comment 7 Alexander 2012-05-26 23:36:25 UTC
I need this solution also as well as my friends. The bug prevents from using okular (and Linux generally) huge amount of users.
Comment 8 Adam Reichold 2012-12-05 19:14:35 UTC
*** Bug 57817 has been marked as a duplicate of this bug. ***
Comment 9 lioncub 2014-03-06 10:18:40 UTC
Progress?
Comment 10 Albert Astals Cid 2014-03-06 10:35:19 UTC
Don't play with the priority.
Comment 11 zeonchameleon 2014-03-11 07:36:13 UTC
The problem IS ACTUAL!
Any progress?
Comment 12 Denis 2014-03-12 13:11:01 UTC
Confirms bug (ubuntu 14.04, Evince 3.10.3) x64 with the latest. Really looking forward to solve the problem.
Comment 13 Denis 2014-03-12 13:12:41 UTC
(In reply to comment #12)
Confirms bug (ubuntu 14.04, Evince 3.10.3, uses poppler / cairo (0.24.1)) x64 with the latest. Really looking forward to solve the problem.
Comment 14 Denis 2014-03-12 13:14:15 UTC
Confirms bug (ubuntu 14.04, Evince 3.10.3, uses poppler / cairo (0.24.1))
x32 with the latest updates. Really looking forward to solve the problem.
Comment 15 Tamir 2014-03-13 16:43:42 UTC
Confirms bug  poppler/cairo (0.24.1), forward to solve the problem
Comment 16 FIX 2014-03-13 17:16:26 UTC
Confirms bug  poppler
Comment 17 alex 2014-03-16 07:10:22 UTC
Confirms bug.
Any progress?
Comment 18 Andrew 2015-04-15 09:03:28 UTC
Also hit this bug.
Comment 19 dr0x29a 2016-03-20 16:16:00 UTC
Confirms bug.
Comment 20 Anton Kochkov 2016-07-03 13:53:31 UTC
Isn't this a duplicate of https://bugs.freedesktop.org/show_bug.cgi?id=17913 ?
Comment 21 Murz 2017-02-02 07:50:10 UTC
For solve this issue can we use as workaround - system-default character font for symbols that displays with "warning: layoutText: cannot convert"? This is better that display empty space instead of character.
Comment 22 GitLab Migration User 2018-08-20 22:12:36 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/230.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.