Bug 16939

Summary: man page wrong about default text encoding for pdftotext
Product: poppler Reporter: Sebastien Bacher <seb128>
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Sebastien Bacher 2008-08-01 04:37:18 UTC
the bug has been opened on https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/251002

"The man page for pdftotext(1) says -enc defaults to Latin1, but my testing shows that I get identical output with no -enc and with -enc UTF-8. -enc Latin 1 gives different output. I'm using a French PDF, and viewing the text with less(1). In an LANG=en_CA xterm, the -enc Latin1 text looks right. In a LANG=en_CA.utf8 gnome-terminal, the default/-enc UTF-8 output looks right. When it's mismatched, you see an inverse-video question-mark sort of glyph, or less's highlighting of control characters, depending on what locale less is using.

xpdfrc(5) says the default for textEncoding is Latin1. pdftotext(1) says this config option corresponds to -enc.

 Anyway, UTF-8 output seems to work properly, it's just the documentation that says it's not the default."
Comment 1 Albert Astals Cid 2008-08-01 08:50:43 UTC
Man page fixed, we don't use xpdfrc at all by the way.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.