Bug 54090

Summary: Problems sent by Mateusz Jurczyk and Gynvael Coldwind
Product: poppler Reporter: Thomas Freitag <Thomas.Freitag>
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: williambader
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: 37.pdf.asan.2d.686, 61.pdf.asan.13.953, 68.pdf.asan.7.1030 72.pdf.asan.2d.1075, 72.pdf.asan.42.1075
patch for 100.pdf.asan.38.2
Patch for 158.pdf.asan.d.451
Patch for 195.pdf.asan.7c.492
557.pdf.asan.47.894
Patch for 589.pdf.SIGSEGV.8b1.929
Patch for 829. and 839. asan and sigsegv series
remove patch for 37.pdf.asan.2d.686 from first path
solves 1066.pdf.asan.38.75, extends patch for 37.pdf.asan.2d.686
solves 100.pdf.asan.38.2, extends patch for 100.pdf.asan.38.2
solves 1043.pdf.asan.47.50, extends solution for 557.pdf.asan.47.894
Patch for 569.pdf.SIGSEGV.c1.907
Partial patch for 682.pdf.SIGFPE.f3.1033
Patch for 782.pdf.SIGSEGV.68.1144
Patch for 1028.pdf.SIGSEGV.ae6.33
solves 1255.pdf.SIGSEGV.56f.285, extends Patch for 195.pdf.asan.7c.492
solves 682.pdf.SIGFPE.f3.1033 and 569.pdf.SIGSEGV.c1.907, extends Patch for 829. and 839. asan and sigsegv series
solves 121.pdf.asan.6f.235, extends 682.pdf.SIGFPE.f3.1033 and 569.pdf.SIGSEGV.c1.907, extends Patch for 829. and 839. asan and sigsegv series
rebased patch for 1001.pdf.asan.2a.4, extends patch for 100.pdf.asan.38.2
solves 1258.pdf.SIGSEGV.dee.288 and 1255.pdf.asan.38.285, extends 1043.pdf.asan.47.50 and 557.pdf.asan.47.894
add several sanity checks for JPEG 2000 problems
solves 1162.pdf.SIGSEGV.28e.182
solves 1106.pdf.asan.30.120.patch
remove solution for 1106.pdf.asan.30.120 and 1162.pdf.SIGSEGV.28e.182 from JPEG 2000 patch

Description Thomas Freitag 2012-08-26 16:50:32 UTC
Created attachment 66142 [details] [review]
37.pdf.asan.2d.686, 61.pdf.asan.13.953, 68.pdf.asan.7.1030 72.pdf.asan.2d.1075, 72.pdf.asan.42.1075

The attached patch solves some problems found by the AddressSanitizer
Comment 1 Thomas Freitag 2012-08-27 14:27:13 UTC
SplashClip.cc and SplashXPathScanner.cc solves 37.pdf.asan.2d.686, 72.pdf.asan.2d.1075 and 72.pdf.asan.42.1075, all have the same reason
FoFiType1C.cc and *.h solves 61.pdf.asan.13.953
Gfx.cc solves 68.pdf.asan.7.1030
So it's quite easy to divide that patch to have one for each PDF
Comment 2 Thomas Freitag 2012-08-27 14:30:31 UTC
Created attachment 66175 [details] [review]
patch for 100.pdf.asan.38.2
Comment 3 Thomas Freitag 2012-08-27 14:31:50 UTC
Created attachment 66176 [details] [review]
Patch for 158.pdf.asan.d.451
Comment 4 Thomas Freitag 2012-08-27 14:32:50 UTC
Created attachment 66177 [details] [review]
Patch for 195.pdf.asan.7c.492
Comment 5 Thomas Freitag 2012-08-27 14:33:35 UTC
Created attachment 66178 [details] [review]
557.pdf.asan.47.894
Comment 6 Thomas Freitag 2012-08-27 14:37:00 UTC
All patches run sucessfully through the regression test.
It's SO annoying looking at code problems caused by the fact, that trying to repair a damaged PDF produces an invalid PDF, so I need the break where I regtest them :-)
Comment 7 Thomas Freitag 2012-08-29 14:16:11 UTC
Created attachment 66274 [details] [review]
Patch for  589.pdf.SIGSEGV.8b1.929
Comment 8 Thomas Freitag 2012-08-29 14:17:31 UTC
Created attachment 66275 [details] [review]
Patch for 829. and 839. asan and sigsegv series
Comment 9 Thomas Freitag 2012-09-01 08:59:24 UTC
Created attachment 66433 [details] [review]
remove patch for 37.pdf.asan.2d.686 from first path

solves 61.pdf.asan.13.953, 68.pdf.asan.7.1030 72.pdf.asan.2d.1075,
72.pdf.asan.42.1075
Comment 10 Thomas Freitag 2012-09-01 09:01:53 UTC
Created attachment 66434 [details] [review]
solves 1066.pdf.asan.38.75, extends patch for 37.pdf.asan.2d.686
Comment 11 Thomas Freitag 2012-09-01 09:05:01 UTC
Created attachment 66435 [details] [review]
solves 100.pdf.asan.38.2, extends patch for 100.pdf.asan.38.2
Comment 12 Thomas Freitag 2012-09-01 09:07:21 UTC
(In reply to comment #11)
> Created attachment 66435 [details] [review] [review]
> solves 100.pdf.asan.38.2, extends patch for 100.pdf.asan.38.2

sorry, meant "solves 1001.pdf.asan.2a.4, extends patch for 100.pdf.asan.38.2"
Comment 13 Thomas Freitag 2012-09-01 09:09:13 UTC
Created attachment 66436 [details] [review]
solves 1043.pdf.asan.47.50, extends solution for 557.pdf.asan.47.894
Comment 14 William Bader 2012-09-03 12:29:08 UTC
Created attachment 66538 [details] [review]
Patch for 569.pdf.SIGSEGV.c1.907
Comment 15 William Bader 2012-09-03 19:53:58 UTC
Created attachment 66577 [details] [review]
Partial patch for 682.pdf.SIGFPE.f3.1033

This patch fixes a non-fatal bad access in the test to show the "Weird tile index in JPX stream" warning by moving a test of an index before using the index.

With libopenjpeg, it fails with a divide by 0 in j2k_decode.

Without libopenjpeg, it gets stuck trying to parse a 622236252 x 19 bitmap in JBIG2Stream::readGenericBitmap() called by JBIG2Stream::readSymbolDictSeg().
Is there a way to detect an unrealistically large bitmap?
Comment 16 William Bader 2012-09-03 23:38:32 UTC
Created attachment 66582 [details] [review]
Patch for 782.pdf.SIGSEGV.68.1144

Check for EOF in JPXStream::getImageParams() when skipping over unknown box types in the header.  An invalid header in this pdf has an item with a length of 3255930980, and the valgrind run wouldn't complete in a reasonable time.

Initialize img.ySize in JPXStream::readBoxes().  If the header turned out to be invalid, JPXStream::reset() set curY to an uninitialized value, which caused valgrind to complain on a test at the top of JPXStream::fillReadBuf().
Comment 17 William Bader 2012-09-03 23:56:22 UTC
1022.pdf.SIGSEGV.422.27 does not crash for me, but it calls JBIG2Stream::readGenericBitmap() with h=2010370370 so it does not complete in a reasonable time.
Comment 18 William Bader 2012-09-04 00:47:31 UTC
Created attachment 66586 [details] [review]
Patch for 1028.pdf.SIGSEGV.ae6.33

The JBIG2Bitmap constructor could crash if it was called with a null bitmap.
Comment 19 Thomas Freitag 2012-09-04 08:01:22 UTC
(In reply to comment #15)
> Created attachment 66577 [details] [review] [review]
> Partial patch for 682.pdf.SIGFPE.f3.1033
> 
> This patch fixes a non-fatal bad access in the test to show the "Weird tile
> index in JPX stream" warning by moving a test of an index before using the
> index.
> 
> With libopenjpeg, it fails with a divide by 0 in j2k_decode.
> 
> Without libopenjpeg, it gets stuck trying to parse a 622236252 x 19 bitmap in
> JBIG2Stream::readGenericBitmap() called by JBIG2Stream::readSymbolDictSeg().
> Is there a way to detect an unrealistically large bitmap?

I don't think that we need a way to detect it, it just takes a while (but TOO long with valgrind). But if You're patient enough, it will later exit with "Out of memory" with another JBIG2Bitmap bitmap, therefore I'll change the constructor in the following way:
  // need to allocate one extra guard byte for use in combine()
  data = (Guchar *)gmalloc_checkoverflow(h * line + 1);
  if (data != NULL) {
    data[h * line] = 0;
  }
This works for this PDF, and it finishes now without any further problems, but I want to regtest it first before upload it (next weekend)!
Comment 20 Thomas Freitag 2012-09-04 08:17:21 UTC
(In reply to comment #16)
> Created attachment 66582 [details] [review] [review]
> Patch for 782.pdf.SIGSEGV.68.1144
> 
> Check for EOF in JPXStream::getImageParams() when skipping over unknown box
> types in the header.  An invalid header in this pdf has an item with a length
> of 3255930980, and the valgrind run wouldn't complete in a reasonable time.
> 
> Initialize img.ySize in JPXStream::readBoxes().  If the header turned out to be
> invalid, JPXStream::reset() set curY to an uninitialized value, which caused
> valgrind to complain on a test at the top of JPXStream::fillReadBuf().

The problem here seems to be, that the loop in JBIG2Stream::readSymbolDictSeg

  while (i < numNewSyms) {
:::

never will end, because

      arithDecoder->decodeInt(&dh, iadhStats);

works and add dh to symHeight, but

arithDecoder->decodeInt(&dw, iadwStats) 

will return false and therefore it loops forever. If I add something like

    symHeight += dh;
    if (symHeight > 0x40000000) {
      error(errSyntaxError, curStr->getPos(), "Bad height value in JBIG2 symbol dictionary");
      goto syntaxError;
    }

it loops a long time, but it will finish, and the PDF finishes without any further problems. Once again, I want to regtest that first.
Comment 21 Thomas Freitag 2012-09-04 08:23:15 UTC
(In reply to comment #17)
> 1022.pdf.SIGSEGV.422.27 does not crash for me, but it calls
> JBIG2Stream::readGenericBitmap() with h=2010370370 so it does not complete in a
> reasonable time.

Hmmh, under Windows it works now in a short time with my actual code, but I need to test it under Unix once again. I'll send You my actual complete patch (including Your changes) and the actual progress list in private.
Comment 22 Thomas Freitag 2012-09-09 04:53:46 UTC
Created attachment 66858 [details] [review]
solves 1255.pdf.SIGSEGV.56f.285, extends Patch for 195.pdf.asan.7c.492
Comment 23 Thomas Freitag 2012-09-09 05:00:35 UTC
Created attachment 66859 [details] [review]
solves 682.pdf.SIGFPE.f3.1033 and 569.pdf.SIGSEGV.c1.907, extends Patch for 829. and 839. asan and sigsegv series
Comment 24 Thomas Freitag 2012-09-09 05:05:52 UTC
Created attachment 66860 [details] [review]
solves 121.pdf.asan.6f.235, extends 682.pdf.SIGFPE.f3.1033 and 569.pdf.SIGSEGV.c1.907, extends Patch for 829. and 839. asan and sigsegv series
Comment 25 Thomas Freitag 2012-09-09 05:22:07 UTC
Created attachment 66861 [details] [review]
rebased patch for 1001.pdf.asan.2a.4, extends patch for 100.pdf.asan.38.2
Comment 26 Thomas Freitag 2012-09-09 05:28:37 UTC
Created attachment 66862 [details] [review]
solves 1258.pdf.SIGSEGV.dee.288 and 1255.pdf.asan.38.285, extends 1043.pdf.asan.47.50 and 557.pdf.asan.47.894
Comment 27 Thomas Freitag 2012-09-09 05:37:04 UTC
Comment on attachment 66538 [details] [review]
Patch for 569.pdf.SIGSEGV.c1.907

included in patch for 121.pdf.asan.6f.235, 682.pdf.SIGFPE.f3.1033, 569.pdf.SIGSEGV.c1.907, patch for 829. and 839. asan and sigsegv series
Comment 28 Thomas Freitag 2012-09-09 05:39:16 UTC
Comment on attachment 66577 [details] [review]
Partial patch for 682.pdf.SIGFPE.f3.1033

will be inlcuded in JPEG2000 patch
Comment 29 Thomas Freitag 2012-09-09 05:40:21 UTC
Comment on attachment 66582 [details] [review]
Patch for 782.pdf.SIGSEGV.68.1144

will be included in JPEG2000 patch
Comment 30 Thomas Freitag 2012-09-09 05:42:14 UTC
Comment on attachment 66586 [details] [review]
Patch for 1028.pdf.SIGSEGV.ae6.33

included in patch for 121.pdf.asan.6f.235, 682.pdf.SIGFPE.f3.1033,
569.pdf.SIGSEGV.c1.907, patch for 829. and 839. asan and sigsegv series
Comment 31 Thomas Freitag 2012-09-09 05:50:00 UTC
Created attachment 66868 [details] [review]
add several sanity checks for JPEG 2000 problems

This patch solves the JPEG 2000 problems in the PDF sent by Mateusz Jurczyk and Gynvael Coldwind, when poppler is compiled without libopenjpeg. Be aware that it DOESN'T solve any problems in the JPXStream implementation itself. I.e. the output for "PlaySOM and PocketSOMPlayer, Alternative Interfaces to Large Music Collections .pdf" is still wrong when compiled without libopenjpeg how it was without these sanity checks.
Comment 32 Thomas Freitag 2012-09-09 06:05:17 UTC
Summmary:

With the just uploaded JPEG 2000 patch my examination of the PDF sent by Mateusz Jurczyk and Gynvael Coldwind is complete. All address sanitizer problems are solved by the several patches. But three PDF still have now other problems:

168.pdf.SIGSEGV.598.462 crashes now in libfreetype.so.6.3.22, open because it is not in the scope of poppler to solve freetype crashes.

530.pdf.SIGSEGV.75.865.pdf fixed, but it dies now under Unix with clang. Core dumps of clang compiled programs cannot be examined with gdb in the moment, but because poppler runs under Windows and under Unix with gcc for this PDF I didn't care anymore.
 
682.pdf.SIGFPE.f3.1033 fixed, after fix it crashes now in clang itself (out of memory in clang!), runs under Windows and under Unix with gcc
Comment 33 Thomas Freitag 2012-09-09 06:11:49 UTC
Created attachment 66870 [details] [review]
solves 1162.pdf.SIGSEGV.28e.182
Comment 34 Thomas Freitag 2012-09-09 06:14:22 UTC
Created attachment 66871 [details] [review]
solves 1106.pdf.asan.30.120.patch
Comment 35 Thomas Freitag 2012-09-09 06:16:25 UTC
Created attachment 66872 [details] [review]
remove solution for 1106.pdf.asan.30.120 and 1162.pdf.SIGSEGV.28e.182 from JPEG 2000 patch
Comment 36 Albert Astals Cid 2012-09-09 10:51:56 UTC
Patch for 589.pdf.SIGSEGV.8b1.929 commited
Comment 37 Albert Astals Cid 2012-09-09 21:23:25 UTC
Commited 

Patch for 589.pdf.SIGSEGV.8b1.929
Patch for 1028.pdf.SIGSEGV.ae6.33
solves 1162.pdf.SIGSEGV.28e.182
remove patch for 37.pdf.asan.2d.686 from first path
solves 1066.pdf.asan.38.75, extends patch for 37.pdf.asan.2d.686
solves 1255.pdf.SIGSEGV.56f.285, extends Patch for 195.pdf.asan.7c.492
solves 1106.pdf.asan.30.120.patch
Comment 38 Albert Astals Cid 2012-09-26 13:03:25 UTC
Pushed all the other patches, so closing this bug.

Thomas can you make sure i did not miss any of the patches?
Comment 39 Thomas Freitag 2012-09-27 06:49:04 UTC
(In reply to comment #38)
> Pushed all the other patches, so closing this bug.
> 
> Thomas can you make sure i did not miss any of the patches?

I've double checked it: 
1. after merging git master there are no more differences in my copy to git master.
2. testing all PDF sent by Mateusz Jurczyk and Gynvael Coldwind with the actual clang compiled pdftoppm I got again only the problems already mentioned in comment 32.
So I think we've definitely solved it.
Comment 40 Albert Astals Cid 2012-09-28 16:58:33 UTC
Neat, thanks for double checking and thanks to all involved in the fixing!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.