Bug 27728 - pdftops gets "Error: Illegal entry in bfrange block in ToUnicode CMap"
Summary: pdftops gets "Error: Illegal entry in bfrange block in ToUnicode CMap"
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-04-18 20:43 UTC by William Bader
Modified: 2010-04-23 14:53 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
proposed patch (2.04 KB, application/octet-stream)
2010-04-18 20:43 UTC, William Bader
Details

Description William Bader 2010-04-18 20:43:15 UTC
Created attachment 35151 [details]
proposed patch

I have a 9 MB file that produces this message a large number of times.
It looks like the file has <00##> where pdftops expects a two-digit value.
The attached patch to CharCodeToUnicode.cc fixes the problem by allowing either <##> or <00##>.
Several places in CharCodeToUnicode.cc produce the identical error message.  My patches also make the messages slightly different to help localize the problem.
Comment 1 Albert Astals Cid 2010-04-20 11:09:25 UTC
Can you put a link to the file?
Comment 2 William Bader 2010-04-20 11:42:16 UTC
I copied a file that shows the problem to http://williambader.com/LFW_20100401-bfrange.pdf
I made the file from running ps2pdf from gs 8.71 on http://williambader.com/LFW_20100401-orig.pdf (which does not have the problem).
I have limited space on this server, so I will eventually remove the files.
Comment 3 Albert Astals Cid 2010-04-20 12:47:05 UTC
It removes the warnings, but the rendering stays the same, right?
Comment 4 William Bader 2010-04-20 13:39:33 UTC
Yes, I think that the output is the same.  I posted the original file as LFW_20100401-orig.pdf (which doesn't get the errors) if you want to compare visually.  pdftops from the patched svn snapshot from Apr 18 matches pdftops from an unpatched snapshot from Jan 10.  You could try the same test before and after applying the patch to confirm that it does not break anything.  I would have expected differences.  Maybe gs generated mappings for characters that aren't used.

William

$ pdftops-poppler-12jan10 LFW_20100401-bfrange.pdf xold.ps
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
...
Error: Illegal entry in bfrange block in ToUnicode CMap
$ pdftops LFW_20100401-bfrange.pdf xnew.ps
$ ls -l xnew.ps xold.ps
-rw-rw-rw- 1 william users 38832030 2010-04-20 21:06 xnew.ps
-rw-rw-rw- 1 william users 38832030 2010-04-20 21:06 xold.ps
$ cmp xnew.ps xold.ps
$
Comment 5 William Bader 2010-04-21 06:01:46 UTC
If you take the parts of the patches that add more text to the error messages, writing the tokens is probably a security problem.  If you want, I can resubmit the patches without printing the tokens or without changing any of the messages.
Comment 6 Albert Astals Cid 2010-04-21 11:01:57 UTC
Can you rephrase your last comment? I'm not sure i understand what you mean
Comment 7 William Bader 2010-04-21 11:36:29 UTC
The replacement

-       if (!(n1 == 2 + nDigits && tok1[0] == '<' && tok1[n1 - 1] == '>' &&
-             n2 == 2 + nDigits && tok2[0] == '<' && tok2[n2 - 1] == '>')) {
+       if (!(((n1 == 2 + nDigits && tok1[0] == '<' && tok1[n1 - 1] == '>') ||
+              (n1 == 4 + nDigits && tok1[0] == '<' && tok1[n1 - 1] == '>' && tok1[1] == '0' && tok1[2] == '0')) &&
+             ((n2 == 2 + nDigits && tok2[0] == '<' && tok2[n2 - 1] == '>') ||
+              (n2 == 4 + nDigits && tok2[0] == '<' && tok2[n2 - 1] == '>' && tok1[1] == '0' && tok1[2] == '0')))) {

is what stops the "Illegal entry in bfrange block in ToUnicode CMap" error.
Five places in CharCodeToUnicode.cc printed the identical error message.
To figure out which of the five places was causing the problem, I added some identifying text to the end of each message.
I left the changed messages in my patch.
In some of the messages, I also printed the tokens that caused the error, for example,
-         error(-1, "Illegal entry in bfrange block in ToUnicode CMap");
+         error(-1, "Illegal entry in bfrange block in ToUnicode CMap, found '%s' '%s'", tok1, tok2);
Printing the token is a security hole because it passes unfiltered user data to the screen.  For example, if the message goes to an xterm, it might be possible to write an invalid pdf where the data in the tok1 string in the error message makes the xterm run a command by using these codes http://invisible-island.net/xterm/ctlseqs/ctlseqs.html
Comment 8 Albert Astals Cid 2010-04-23 14:53:11 UTC
pushed to master


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.