Bug 60517 - pdf* utilities cannot open files with non-ASCII characters in file path on Windows
Summary: pdf* utilities cannot open files with non-ASCII characters in file path on Wi...
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: utils (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Windows (All)
: medium minor
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-02-08 21:46 UTC by aurimas.dev
Modified: 2018-08-20 21:47 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
PDF test files (17.41 KB, text/plain)
2013-02-08 21:46 UTC, aurimas.dev
Details

Note You need to log in before you can comment on or make changes to this bug.
Description aurimas.dev 2013-02-08 21:46:40 UTC
Created attachment 74453 [details]
PDF test files

pdfinfo and pdftotext (have not tested others) cannot open PDF files with UTF-8 characters in file path.

Environment:

Windows 7 Pro (x64)

poppler.0.22.0_win32: I've been having trouble compiling poppler myself, so I got poppler.0.22.0_win32 binaries from http://blog.alivate.com.au/poppler-windows/ (perhaps this is not a problem with official binaries, which I could not find)

Steps to reproduce:
1. Download PDF test files (or create PDF file with UTF-8 character in the name, e.g. testα)
2. Open command prompt and navigate to the directory with PDF file
3. Run `chcp 65001` to activate UTF-8 codepage
4. Run `pdfinfo.exe testα.pdf`

Outcome:
`I/O Error: Couldn't open file 'testa.pdf': No such file or directory.`

Note that "testα.pdf" is converted to "testa.pdf"

Using the command line to open that same file with Adobe Acrobat Reader 11.0 worked just fine, so the characters are being correctly passed from the commandline to the program.

Here's a summary of my tests

>poppler.0.22.0_win32\bin\pdfinfo.exe test.pdf
Tagged:         no
Form:           none
Pages:          1
Encrypted:      no
Page size:      612 x 792 pts (letter)
Page rot:       0
File size:      14622 bytes
Optimized:      no
PDF version:    1.4

>poppler.0.22.0_win32\bin\pdfinfo.exe testα.pdf
I/O Error: Couldn't open file 'testa.pdf': No such file or directory.

>poppler.0.22.0_win32\bin\pdftotext.exe test.pdf

>poppler.0.22.0_win32\bin\pdftotext.exe testα.pdf
I/O Error: Couldn't open file 'testa.pdf': No such file or directory.


Looking at poppler's code, it looks like a win32 version of PDFDoc constructor is defined in PDFDoc.cc
```
#ifdef _WIN32
PDFDoc::PDFDoc(wchar_t *fileNameA, int fileNameLen, GooString *ownerPassword,
	       GooString *userPassword, void *guiDataA) {
```
but LocalPDFDocBuilder.cc calls the general PDFDoc constructor no matter what (by always passing a GooString* instead of wchar_t*). Can't test this though, since I'm having trouble compiling poppler.
Comment 1 Albert Astals Cid 2013-02-08 21:51:51 UTC
a) There are no official binaries in the poppler project
b) There are not xpdf utilities but poppler utilities, for xpdf you are in the wrong bugzilla.
c) a patch would be mostly welcome
Comment 2 aurimas.dev 2013-02-08 22:00:15 UTC
(In reply to comment #1)
> a) There are no official binaries in the poppler project
> b) There are not xpdf utilities but poppler utilities, for xpdf you are in
> the wrong bugzilla.

Fair point. I was simply referring to the pdfinfo, pdftotext, and other pdf* utilities in poppler that are based on the xpdf project. It seems that xpdf project is not as actively maintained as poppler, so I'm reporting it here.

> c) a patch would be mostly welcome

I'll try to generate one if I can get this to compile.

Thanks for the prompt reply.
Comment 3 Yury G. Kudryashov 2013-08-07 11:56:15 UTC
I think that the source of the problem is in the first PDFDoc construction, see #ifdef'd code. This code assumes that the filename is an ASCII string. You should change it to use something like mbstowcs (what is the right way to guess encoding?).
Comment 4 konstantin.klein 2015-08-04 08:56:43 UTC
In particular this also affects PDF import from file paths including German umlauts (Win7)
Comment 5 GitLab Migration User 2018-08-20 21:47:52 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/75.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.