Bug 34057

Summary: Font info not getting properly into html when using pdftohtml
Product: poppler Reporter: Sushant Sinha <sushant354>
Component: utilsAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: koleygr
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: the example document

Description Sushant Sinha 2011-02-08 17:57:05 UTC
Created attachment 43145 [details]
the example document

I have attached a pdf document which is a mix of english and hindi
languages. For Hindi it uses Aryan2 font. When I use pdftohtml on this
doc, I do not get any font information in the html file. When I use the
"-xml" or the "-c" Aryan2 font is still outputted as Times. So there is
some problem with embedded fonts.

I have attached the pdf doc for your analysis.

$ pdffonts 2211.pdf 

name                                 type              emb sub uni object ID
------------------------------------ ----------------- --- --- ---
---------
CFFEEL+TimesNewRoman                 TrueType          yes yes no 1852 0
CFFEGM+TimesNewRoman,Bold            TrueType          yes yes no 1854  0
CFFFEJ+TimesNewRoman,Italic          TrueType          yes yes no   93  0
CFFFHI+SymbolMT                      CID TrueType      yes yes yes  94  0
CFFGDG+Aryan2-Bold                   TrueType          yes yes no   95  0
CFFGEI+Aryan2-Normal                 TrueType          yes yes no   97  0
CFFGEH+Aryan2-Normal                 CID TrueType      yes yes yes  96  0
CFFGII+Tahoma,Bold                   TrueType          yes yes no   98  0
CFFGLJ+Tahoma                        TrueType          yes yes no   99  0
Comment 1 Albert Astals Cid 2012-05-28 14:45:47 UTC
You can use -fontfullname once poppler 0.22 gets released

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.