Bug 16939 - man page wrong about default text encoding for pdftotext
Summary: man page wrong about default text encoding for pdftotext
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-01 04:37 UTC by Sebastien Bacher
Modified: 2008-08-01 08:50 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Sebastien Bacher 2008-08-01 04:37:18 UTC
the bug has been opened on https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/251002

"The man page for pdftotext(1) says -enc defaults to Latin1, but my testing shows that I get identical output with no -enc and with -enc UTF-8. -enc Latin 1 gives different output. I'm using a French PDF, and viewing the text with less(1). In an LANG=en_CA xterm, the -enc Latin1 text looks right. In a LANG=en_CA.utf8 gnome-terminal, the default/-enc UTF-8 output looks right. When it's mismatched, you see an inverse-video question-mark sort of glyph, or less's highlighting of control characters, depending on what locale less is using.

xpdfrc(5) says the default for textEncoding is Latin1. pdftotext(1) says this config option corresponds to -enc.

 Anyway, UTF-8 output seems to work properly, it's just the documentation that says it's not the default."
Comment 1 Albert Astals Cid 2008-08-01 08:50:43 UTC
Man page fixed, we don't use xpdfrc at all by the way.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.