Created attachment 135691 [details] Source file I split the attached pdf with `pdfseparate`. If I then merge some of the resulted files again with `pdfunite`, "Syntax Error" / "Illegal character"(s) are found by `pdfinfo` on the resulted files. The problem could be triggered by page 2 and 3, other pages seem fine. Version: 0.61.1 Steps to reproduce: $ pdfinfo pst-book-gog.pdf >/dev/null $ mkdir sep uni $ cd sep/ $ pdfseparate -f 2 -l 3 ../pst-book-gog.pdf %03d.pdf $ pdfinfo 002.pdf >/dev/null $ pdfinfo 003.pdf >/dev/null $ cd ../uni/ $ pdfunite ../sep/002.pdf 002.pdf $ pdfunite ../sep/003.pdf 003.pdf $ pdfinfo 002.pdf >/dev/null Syntax Error (69): Illegal character '}' Syntax Error (70): Illegal character '}' Syntax Error (79): Illegal character '>' $ pdfinfo 003.pdf >/dev/null Syntax Error (120): Illegal character '{'
I can't reproduce that with this PDF and git master, fetched with git clone https://anongit.freedesktop.org/git/poppler/poppler.git
Can't reproduce either, did you self compile? Which options did you use? Otherwise which distribution are you in?
Well it's Termux https://github.com/termux/termux-packages/tree/master/packages/poppler
not going to install that, FWIW i also tried the packaged 0.61.1 in ArchLinux (besides my own self compiled one) and can't reproduce this either so I guess unless you can help us reproduce the problem you're on your own
Created attachment 135987 [details] Patch to use %c instead of \x to output binary
From https://github.com/termux/termux-packages/issues/1900: "prior to Android P (i.e. AOSP master), Android's printf didn't accept invalid UTF8 sequences in the format string, and so can't be used to print arbitrary bytes like this. you'd need to use %c%c%c%c -- as your other example does -- or (hilariously) %s (because only the format string is checked). this is from BSD code, so it's not just Android that has/had this limitation, so you may be able to get upstream to switch to the same %c%c%c%c workaround as your other example." Would the above patch work for you?
Created attachment 135988 [details] [review] Patch marked with the correct mime type
(In reply to Fredrik Fornwall from comment #6) > From https://github.com/termux/termux-packages/issues/1900: > > "prior to Android P (i.e. AOSP master), Android's printf didn't accept > invalid UTF8 sequences in the format string, and so can't be used to print > arbitrary bytes like this. you'd need to use %c%c%c%c -- as your other > example does -- or (hilariously) %s (because only the format string is > checked). this is from BSD code, so it's not just Android that has/had this > limitation, so you may be able to get upstream to switch to the same > %c%c%c%c workaround as your other example." > > Would the above patch work for you? Yup I am on Android N and can confirm that the \x is the problem: $ cat test.c #include <stdio.h> int main() { printf("%%\xE2\xE3\xCF\xD3\n"); printf("%%%s\n", "\xE2\xE3\xCF\xD3"); printf("%%%c%c%c%c\n", 0xe2, 0xe3, 0xcf, 0xd3); } $ clang test.c $ ./a.out %%���� %���� $ ./arch/startarch [21:37 home ]$ clang test.c [21:38 home ]$ ./a.out %���� %���� %���� And pdfinfo no longer complains about any file produced by pdfunite in poppler 0.61.1-2 on Termux (which has the patch applied).
Before commiting this patch, anyone can explain the "printf didn't accept invalid UTF8 sequences in the format string" part a bit more? What's invalid in that code?
(In reply to Albert Astals Cid from comment #9) > Before commiting this patch, anyone can explain the > > "printf didn't accept invalid UTF8 sequences in the format string" > > part a bit more? What's invalid in that code? Only the first byte in a UTF-8 sequence can start with '11' (in binary). The second and subsequent bytes must start with '10'. 0xE2, 0xE3 is 11100010, 11100011 I assume the printf implementation is trying to parse the format string one character at a time in the current locale in order to pick off the conversion specifiers. I'm not sure what the standards say about this but it is probably best to avoid putting arbitrary binary sequences in the format string.
Pushed
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.