Bug 71063 - Wrong text encoding of title extracted from properties
Summary: Wrong text encoding of title extracted from properties
Status: RESOLVED INVALID
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-10-30 20:15 UTC by Germán Poo-Caamaño
Modified: 2013-12-12 21:21 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
PDF Test case (2.04 KB, application/pdf)
2013-10-30 20:15 UTC, Germán Poo-Caamaño
Details

Description Germán Poo-Caamaño 2013-10-30 20:15:45 UTC
Created attachment 88379 [details]
PDF Test case

This was reported on Evince: https://bugzilla.gnome.org/show_bug.cgi?id=711154

---
Characters in the title seems not to be shown in utf-8. For example an ü will
be shown as ü.
---

Checking the properties using the poppler-glib-demo shows the issue.
However, when trying to uncompress the stream I got:

$ qpdf --stream-data=uncompress ü.pdf u-un.pdf
WARNING: ü.pdf (object 13 0, file position 1409): empty object treated as null
qpdf: operation succeeded with warnings; resulting file may have some problems

It might be related with the issue. If so, I am unsure if Poppler can do something about it.  So, I am reporting here if you have any clue.
Comment 1 kurt.pfeifle 2013-11-07 20:51:52 UTC
In the Adobe edition of the official ISO 32000-1:2008 PDF-1.7 specification, the table on page 652 says this about PDFDocEncoding:

    "Encoding for text strings in a PDF document outside the document’s
     content streams."

So it looks like the assumption that the display of the title properity of the document (which clearly is "outside the document's content streams") seems not to be valid.

* BTW and FWIW, Preview.app, callas pdfToolbox, and Adobe Acrobat Pro (on 
  a Mac also do display this title string as 'ü' -- as they should! So 
  Poppler for once agrees with Adobe and Apple about how to treat this file...

----

The excercise with the qpdf commandline to uncompress some streams in the 'ü.pdf' file is not leading anywhere in relation to this bug report. It just reveals part of what a more complete check of the ü.pdf does:

 qpdf --check ü.pdf 
   checking ü.pdf
   PDF Version: 1.3
   File is not encrypted
   File is not linearized
   WARNING: ü.pdf (object 13 0, file position 1409): empty object treated as null 

A more close look at the file reveals, that object no. 7 is also 'empty'.

This PDF is not fully conforming to the spec.

The status of this report should be set to INVALID.
Comment 2 Albert Astals Cid 2013-12-12 21:21:37 UTC
Agreeing with Kurt


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.