Bug 71063

Summary: Wrong text encoding of title extracted from properties
Product: poppler Reporter: Germán Poo-Caamaño <gpoo+bfdo>
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED INVALID QA Contact:
Severity: normal    
Priority: medium CC: kurt.pfeifle
Version: unspecified   
Hardware: Other   
OS: All   
See Also: https://bugzilla.gnome.org/show_bug.cgi?id=711154
Whiteboard:
i915 platform: i915 features:
Attachments: PDF Test case

Description Germán Poo-Caamaño 2013-10-30 20:15:45 UTC
Created attachment 88379 [details]
PDF Test case

This was reported on Evince: https://bugzilla.gnome.org/show_bug.cgi?id=711154

---
Characters in the title seems not to be shown in utf-8. For example an ü will
be shown as ü.
---

Checking the properties using the poppler-glib-demo shows the issue.
However, when trying to uncompress the stream I got:

$ qpdf --stream-data=uncompress ü.pdf u-un.pdf
WARNING: ü.pdf (object 13 0, file position 1409): empty object treated as null
qpdf: operation succeeded with warnings; resulting file may have some problems

It might be related with the issue. If so, I am unsure if Poppler can do something about it.  So, I am reporting here if you have any clue.
Comment 1 kurt.pfeifle 2013-11-07 20:51:52 UTC
In the Adobe edition of the official ISO 32000-1:2008 PDF-1.7 specification, the table on page 652 says this about PDFDocEncoding:

    "Encoding for text strings in a PDF document outside the document’s
     content streams."

So it looks like the assumption that the display of the title properity of the document (which clearly is "outside the document's content streams") seems not to be valid.

* BTW and FWIW, Preview.app, callas pdfToolbox, and Adobe Acrobat Pro (on 
  a Mac also do display this title string as 'ü' -- as they should! So 
  Poppler for once agrees with Adobe and Apple about how to treat this file...

----

The excercise with the qpdf commandline to uncompress some streams in the 'ü.pdf' file is not leading anywhere in relation to this bug report. It just reveals part of what a more complete check of the ü.pdf does:

 qpdf --check ü.pdf 
   checking ü.pdf
   PDF Version: 1.3
   File is not encrypted
   File is not linearized
   WARNING: ü.pdf (object 13 0, file position 1409): empty object treated as null 

A more close look at the file reveals, that object no. 7 is also 'empty'.

This PDF is not fully conforming to the spec.

The status of this report should be set to INVALID.
Comment 2 Albert Astals Cid 2013-12-12 21:21:37 UTC
Agreeing with Kurt

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.