Bug 32206 - pdftotext from poppler-0.12.4windows(KDE4) can not generated text file because missing cjk font map
Summary: pdftotext from poppler-0.12.4windows(KDE4) can not generated text file becaus...
Status: RESOLVED INVALID
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Windows (All)
: medium major
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-07 18:01 UTC by dangbinghoo
Modified: 2010-12-09 11:31 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
testing pdfs and my result (224.97 KB, application/zip)
2010-12-07 18:01 UTC, dangbinghoo
Details

Description dangbinghoo 2010-12-07 18:01:52 UTC
Created attachment 40889 [details]
testing pdfs and my result

I use pdftotext command from package poppler for KDE4 on windows, pdftotext can 
not extract text because of miss cjk font map:

$ pdftotext -raw CrystalReport.pdf

Error: Missing language pack for 'Adobe-GB1' mapping
Error: Missing language pack for 'Adobe-GB1' mapping
Error: Unknown font tag 'TT2'
Error (1471): No font in show
Error: Unknown font tag 'TT2'
Error (1483): No font in show
Error (1497): No font in show
Error (1535): No font in show
Error (1567): No font in show/space
Error (1624): No font in show/space
Error (1636): No font in show
Error (1655): No font in show
Error (1673): No font in show
Error (1682): No font in show
Error (1717): No font in show
Error (1735): No font in show
Error (1747): No font in show/space
Error (1758): No font in show/space
Error (1769): No font in show
Error (1774): No font in show
Error (1783): No font in show
Error (1811): No font in show
Error (1823): No font in show
Error (1835): No font in show
Error (1872): No font in show/space
Error (1884): No font in show
Error (1938): No font in show/space
Error (1971): No font in show/space
Error (1988): No font in show
Error (2052): No font in show
Error (2081): No font in show/space
Error (2103): No font in show
Error (2139): No font in show/space
Error (2166): No font in show/space
Error (2175): No font in show
Error (2206): No font in show
Error (2242): No font in show/space
Error (2286): No font in show/space
Error (2330): No font in show/space
Error (2382): No font in show/space
Error (2394): No font in show
Error (2407): No font in show/space
Error (2419): No font in show
Error (2429): No font in show
Error (2442): No font in show/space
Error (2464): No font in show/space
Error (2484): No font in show
Error (2512): No font in show/space
Error (2550): No font in show/space
Error (2582): No font in show/space
Error (2602): No font in show/space
Error (2620): No font in show
Error (2623): No font in show
Error (2630): No font in show/space
Error (2633): No font in show/space
Error (2637): No font in show
Error (2642): No font in show
Error (2648): No font in show
Error (2648): No font in show
Error (2656): No font in show
Error (2661): No font in show/space
Error (2666): No font in show/space
Error (2675): No font in show/space
Error (2677): No font in show
Error (2683): No font in show
Error (2686): No font in show
Error (2690): No font in show
Error (2697): No font in show
Error (2699): No font in show
Error (2706): No font in show
Error (2711): No font in show/space
Error (2719): No font in show
Error (2739): No font in show/space
Error (2750): No font in show
Error (2765): No font in show
Error (2780): No font in show/space
Error (2791): No font in show
Error (2810): No font in show
Error (2813): No font in show
Error (2838): No font in show/space
Error (2857): No font in show
Error (2870): No font in show/space
Error (2887): No font in show/space
Error (2899): No font in show/space
Error (2916): No font in show/space
Error (2920): No font in show
Error (2927): No font in show
Error (2935): No font in show
Error (2954): No font in show
Error (2970): No font in show
Error (3011): No font in show/space
Error: Unknown font tag 'TT2'
Error (3073): No font in show
Error: Unknown font tag 'TT2'
Error (3165): No font in show
Error (3174): No font in show
Error (3176): No font in show
Error (3179): No font in show
Error (3191): No font in show
Error (3209): No font in show
Error (3232): No font in show/space
Error (3249): No font in show
Error (3253): No font in show
Error (3255): No font in show
Error (3265): No font in show
Error (3279): No font in show/space
Error (3287): No font in show/space
Error (3301): No font in show/space
Error (3304): No font in show
Error: Unknown font tag 'TT7'
Error (3331): No font in show/space
Error: Unknown font tag 'TT2'
Error (3352): No font in show
Error (3376): No font in show
Error (3383): No font in show
Error (3425): No font in show
Error (3443): No font in show
Error (3450): No font in show
Error (3461): No font in show
Error (3479): No font in show
Error (3482): No font in show
Error: No font in show

And with different Chinese PDF , the result maybe quite different:

with utf-8 encode pdf  file there's no error reported,but the gernerated text
file has only on single unreadable char (see the attached files)

$ pdftotext -raw CrystalReport-utf8-OO.pdf

$ pdftotext -raw CrystalReport-gsPDF-win.pdf

both this file will has the same result.
Comment 1 Albert Astals Cid 2010-12-08 00:54:37 UTC
Not a bug, install poppler-data
Comment 2 dangbinghoo 2010-12-08 01:17:37 UTC
(In reply to comment #1)
> Not a bug, install poppler-data

of course I have poppler-data installed, I just talked this on #poppler irc, and 
someone toled me to commit this bug. I attached the documents, you can just take
a littel test, and you will found that this is a problem.

thanks!
Comment 3 Albert Astals Cid 2010-12-08 11:40:05 UTC
I do have poppler-data and it works for me. So probably your poppler-data is not being found correctly.
Comment 4 dangbinghoo 2010-12-09 06:57:35 UTC
Do you have a test with my attached documents? I just use the KDE4
windows default installation and configuration. maybe it's wrong about
the font config. my pdf was generated by pdfCreator , if the document
have chinese char. it
will report error or the extracted text file may have some of chinese
char. repeat for statement. I will check the pdftotext source and find
out how it extract char. and how it output to file. maybe I can find
out what's wrong with
my fontconfig.

Thanks!

2010/12/9  <bugzilla-daemon@freedesktop.org>:
> https://bugs.freedesktop.org/show_bug.cgi?id=32206
>
> --- Comment #3 from Albert Astals Cid <tsdgeos@terra.es> 2010-12-08 11:40:05 PST ---
> I do have poppler-data and it works for me. So probably your poppler-data is
> not being found correctly.
>
> --
> Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
>
Comment 5 Albert Astals Cid 2010-12-09 11:31:32 UTC
Yes i have tested with your Document.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.