Bug 81746 - Japanese text not rendered from a pdf
Summary: Japanese text not rendered from a pdf
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-25 14:40 UTC by Jehan
Modified: 2014-08-12 20:37 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Katakana pdf (490.85 KB, text/plain)
2014-07-25 14:40 UTC, Jehan
Details
poppler-data: add a pkg-config file. (2.28 KB, patch)
2014-07-26 17:52 UTC, Jehan
Details | Splinter Review
poppler: use poppler-data pkg-config (1.22 KB, patch)
2014-07-26 17:54 UTC, Jehan
Details | Splinter Review
poppler: use poppler-data pkg-config for both cmake and autotools builds. (2.78 KB, patch)
2014-07-28 12:14 UTC, Jehan
Details | Splinter Review
poppler-data: Adding a pkg-config file (2.78 KB, patch)
2014-08-12 17:14 UTC, Jehan
Details | Splinter Review
Template poppler-data.pc.in forgotten. (640 bytes, patch)
2014-08-12 19:45 UTC, Jehan
Details | Splinter Review

Description Jehan 2014-07-25 14:40:22 UTC
Created attachment 103448 [details]
Katakana pdf

I am a GIMP developer, and we use libpoppler as our pdf-import backend. We have been reported a pdf with Japanese characters in it, which won't render (i.e. the pdf renders well, except for the Japanese characters) on Windows (works fine on Linux).
See https://bugzilla.gnome.org/show_bug.cgi?id=733525

Note that the thumbnail displays the characters (as far as one can see on a thumbnail at least), but I guess that's normal since poppler_page_get_thumbnail() would just return the embedded thumbnail image, not render it.

I've seen you had a bug 5690 about Type 1 fonts not rendering. Not sure if that's the same issue, so I made a different report. Feel free to merge if you consider that's the same thing.

I see 3 embedded fonts in the pdf:
$ pdffonts katakana.pdf 
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
GQUPXH+KozGoPr6N-Medium              CID Type 0C       Identity-H       yes yes no       5  0
WJSCNB+AdobeMingStd-Light            CID Type 0C       Identity-H       yes yes no       6  0
MPSUDV+KristenITC-Regular            TrueType          WinAnsi          yes yes no       7  0


Though I couldn't find any tool to know which font is used for the specific katakana and Kanji characters, according to some preview website I found, KritenITC don't have Japanese glyphs. So that would be one of the CID Type 0C fonts (or both), I guess.

Note: to be sure that the issue is not in GIMP code, I reproduced the problem with poppler-0.26.3 and the test code found here too: http://www.gtkforums.com/viewtopic.php?p=9086
All cross-compiled from a GNU system for Windows 64bit with mingw-w64.

Also I don't use Windows on a normal basis, but already have a setup to cross-compile for Windows on my Linux. So if you need me to test patches, just ask.
Comment 1 Albert Astals Cid 2014-07-25 22:00:34 UTC
Do people in windows have poppler-data installed?
Comment 2 Jehan 2014-07-26 17:52:41 UTC
Created attachment 103512 [details] [review]
poppler-data: add a pkg-config file.

Oh! Well indeed when I install this in my dev build, it works. I guess our official release does not have it either, then.

Could I propose an improvement to poppler-data package? I submit a pkg-config file. This would allow third-party to decide whether they want to have a strong dependency to poppler-data (or at least a warning maybe for the packager if poppler-data is missing).
I know pkg-config is rather known to be used for libraries, but it is also actually very commonly used for data packages as well (check your /usr/share/pkgconfig/, you will likely see a lot of .pc files for data packages there).
If you were to add this pkg-config, we would likely add at least a check of poppler-data with a warning on GIMP configure script.

As a side feature, it will allow the poppler package to detect the right path for the datadir. Right now, poppler determines the data root either if the packager explicitely gives the --datarootdir option, or by assuming by default it will be on the same prefix as poppler itself. Using pkg-config simplifies and gives flexibility.
Attached patch: 0001-Adding-a-pkg-config-file-for-poppler-data.patch
Comment 3 Jehan 2014-07-26 17:54:53 UTC
Created attachment 103514 [details] [review]
poppler: use poppler-data pkg-config

And a second patch, for the poppler repo this time, which would use the new pkg-config of poppler-data if it exists to determine the datadir path.
Comment 4 Albert Astals Cid 2014-07-26 22:12:22 UTC
Sincerely, i don't see the need, the method we have now works. Changing it will probably break some other people's stuff.
Comment 5 Jehan 2014-07-27 09:52:49 UTC
Well, you choose. But pkg-config won't ever break any existing script. pkg-config has ever been just a *plus*, not a replacement. If someone don't care and don't want to use pkg-config, this won't break anything existing currently.

On the other hand, if someone would like to check for the package existence, pkg-config is the cleaner and safer existing way. A lot of data packages use pkg-config (checking my /usr/share/pkgconfig, I see: iso codes, GNOME themes, udev, X Keyboard configuration data, Freedesktop common MIME db...). That's just the best current way currently to link a program to a set of independent data which could be anywhere in the system.

Also there is another advantage: it will allow also to manage versionning. If you improve your encoding packages, add new ones, or fix existing, right now there is no way for a third party to ensure to have the right versions. With pkg-config, you could add dependencies based on versions.

In any case, I repeat: that won't break any existing stuff. That just adds a .pc file in the datadir. If third party don't care, they just don't use it. The existing script will still work. I know that in GIMP, I will care so that the next Windows (or other) builds, the packager would not forget the package.
Comment 6 Albert Astals Cid 2014-07-27 22:15:31 UTC
Well, at least it seems to me you're ignoring the cmake buildsystem side, that needs fixing before any discussion if this needs to go in or not.
Comment 7 Jehan 2014-07-28 12:14:10 UTC
Created attachment 103586 [details] [review]
poppler: use poppler-data pkg-config for both cmake and autotools builds.

You are right! Patch updated, which takes both autotools and cmake into account.
Comment 8 Jehan 2014-08-05 19:45:48 UTC
Hi again,

Did you consider my patches? :-) Thanks!
Comment 9 Albert Astals Cid 2014-08-12 15:58:47 UTC
Can you fix the poppler-data patch so that make dist actually creates poppler-data.pc if it's not there?
Comment 10 Albert Astals Cid 2014-08-12 15:58:57 UTC
And adds it to the tarball.
Comment 11 Jehan 2014-08-12 17:14:56 UTC
Created attachment 104513 [details] [review]
poppler-data: Adding a pkg-config file

Hi,

Sure! Actually we want to include poppler-data.pc.in since $pkgdatadir may change depending on make install invocation. Well I included both in the dist.

Also since you don't have a configure step, I realized we want to force-generate poppler-data.pc at each `make install` for the same reason. So I made a small update to this script to do this. Attached.
Thanks!
Comment 12 Albert Astals Cid 2014-08-12 19:06:22 UTC
Pushed. Thanks.
Comment 13 Jehan 2014-08-12 19:13:56 UTC
Hi,

Thanks. But it seems that you removed the poppler-data.pc.in from my patch when you amended it. :-) So that's broken right now.
Comment 14 Jehan 2014-08-12 19:45:23 UTC
Created attachment 104519 [details] [review]
Template poppler-data.pc.in forgotten.

Hey just to be sure this is not forgotten, I reopen.
I also reattach a patch for just this forgotten file (though you could also get it from my previous patch. That's the same).
Thanks.
Comment 15 Albert Astals Cid 2014-08-12 20:37:52 UTC
I forgot to push the file


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.