Created attachment 35151 [details] proposed patch I have a 9 MB file that produces this message a large number of times. It looks like the file has <00##> where pdftops expects a two-digit value. The attached patch to CharCodeToUnicode.cc fixes the problem by allowing either <##> or <00##>. Several places in CharCodeToUnicode.cc produce the identical error message. My patches also make the messages slightly different to help localize the problem.
Can you put a link to the file?
I copied a file that shows the problem to http://williambader.com/LFW_20100401-bfrange.pdf I made the file from running ps2pdf from gs 8.71 on http://williambader.com/LFW_20100401-orig.pdf (which does not have the problem). I have limited space on this server, so I will eventually remove the files.
It removes the warnings, but the rendering stays the same, right?
Yes, I think that the output is the same. I posted the original file as LFW_20100401-orig.pdf (which doesn't get the errors) if you want to compare visually. pdftops from the patched svn snapshot from Apr 18 matches pdftops from an unpatched snapshot from Jan 10. You could try the same test before and after applying the patch to confirm that it does not break anything. I would have expected differences. Maybe gs generated mappings for characters that aren't used. William $ pdftops-poppler-12jan10 LFW_20100401-bfrange.pdf xold.ps Error: Illegal entry in bfrange block in ToUnicode CMap Error: Illegal entry in bfrange block in ToUnicode CMap ... Error: Illegal entry in bfrange block in ToUnicode CMap $ pdftops LFW_20100401-bfrange.pdf xnew.ps $ ls -l xnew.ps xold.ps -rw-rw-rw- 1 william users 38832030 2010-04-20 21:06 xnew.ps -rw-rw-rw- 1 william users 38832030 2010-04-20 21:06 xold.ps $ cmp xnew.ps xold.ps $
If you take the parts of the patches that add more text to the error messages, writing the tokens is probably a security problem. If you want, I can resubmit the patches without printing the tokens or without changing any of the messages.
Can you rephrase your last comment? I'm not sure i understand what you mean
The replacement - if (!(n1 == 2 + nDigits && tok1[0] == '<' && tok1[n1 - 1] == '>' && - n2 == 2 + nDigits && tok2[0] == '<' && tok2[n2 - 1] == '>')) { + if (!(((n1 == 2 + nDigits && tok1[0] == '<' && tok1[n1 - 1] == '>') || + (n1 == 4 + nDigits && tok1[0] == '<' && tok1[n1 - 1] == '>' && tok1[1] == '0' && tok1[2] == '0')) && + ((n2 == 2 + nDigits && tok2[0] == '<' && tok2[n2 - 1] == '>') || + (n2 == 4 + nDigits && tok2[0] == '<' && tok2[n2 - 1] == '>' && tok1[1] == '0' && tok1[2] == '0')))) { is what stops the "Illegal entry in bfrange block in ToUnicode CMap" error. Five places in CharCodeToUnicode.cc printed the identical error message. To figure out which of the five places was causing the problem, I added some identifying text to the end of each message. I left the changed messages in my patch. In some of the messages, I also printed the tokens that caused the error, for example, - error(-1, "Illegal entry in bfrange block in ToUnicode CMap"); + error(-1, "Illegal entry in bfrange block in ToUnicode CMap, found '%s' '%s'", tok1, tok2); Printing the token is a security hole because it passes unfiltered user data to the screen. For example, if the message goes to an xterm, it might be possible to write an invalid pdf where the data in the tok1 string in the error message makes the xterm run a command by using these codes http://invisible-island.net/xterm/ctlseqs/ctlseqs.html
pushed to master
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.