Summary: | syntax errors reported on PDFs created by xsane | ||
---|---|---|---|
Product: | poppler | Reporter: | Larry Myerscough <hippostech> |
Component: | utils | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | clark |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
two-page music score as PDF
fix inline images in pdfimages attachment-29513-0.html the file that crashes with the proposed patch Fix crash |
A 4800x6959 inline image! The PDF standard says "Because the inline format gives the reader less flexibility in managing the image data, it shall be used only for small images (4 KB or less)." But it should work. It is just less efficient than an image stream. It is a bug in pdfimages. I've found the cause and can make it work. I'll post a patch when a write a proper fix. Thanks for the prompt response. From my point of view, it's great to hear that it's a bug, since I have so many would-be done-and-dusted PDFs exhibiting this phenomenon! I guess I ought also to have a quiet word with the xsane team about their dubious use of (strictly too) big in-line images. Thanks! [Perhaps off-topic so don't feel compelled to reply ...] Is there an easy way (with poppler tooling?) to re-style my PDFs to use a more standard construction without changing the actual image part of the data. (I would prefer our official archive to contain unarguably valid PDFs with no bending of the standard.) (In reply to Larry Myerscough from comment #2) > Is there an easy way (with poppler tooling?) to re-style my PDFs to use a > more standard construction without changing the actual image part of the > data. (I would prefer our official archive to contain unarguably valid PDFs > with no bending of the standard.) I tried running it through ghostscript: $ pdf2ps African-Symphony-Baritone-C.pdf out.ps $ ps2pdf out.ps out.pdf $ pdfimages -list out.pdf page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio -------------------------------------------------------------------------------------------- 1 0 image 4800 6959 gray 1 1 ccitt no 8 0 600 600 105K 2.6% 2 1 image 4800 6959 gray 1 1 ccitt no 14 0 600 600 86.4K 2.1% Not only has it converted it to a standard image, it has also encoded the images with CCITT which gives better compression for 1 bpc images compared with Flate. Thanks! I'd never thought of converting via '.ps' - even though I'd used ps2pdf a lot in the past. If the space saving is typical for the whole bunch, this will also require much less space in the cloud for the official archive. Created attachment 135162 [details] [review] fix inline images in pdfimages Created attachment 135449 [details] attachment-29513-0.html Hi Adrian & Co. Please confirm whether any action is required of me. I wasn't able to apply the patch probablky becaue I had the wrong base version. (my git knowledge is sketchy!). I don't urgently need a fixed version so, unless advised othewise, I'll wait for it to make it into the main release. Thanks, Larry 2017-10-30 10:00 GMT+01:00 <bugzilla-daemon@freedesktop.org>: > *Comment # 5 <https://bugs.freedesktop.org/show_bug.cgi?id=103446#c5> on > bug 103446 <https://bugs.freedesktop.org/show_bug.cgi?id=103446> from > Adrian Johnson <ajohnson@redneon.com> * > > Created attachment 135162 [details] [review] <https://bugs.freedesktop.org/attachment.cgi?id=135162> [details] <https://bugs.freedesktop.org/attachment.cgi?id=135162&action=edit> [review] <https://bugs.freedesktop.org/page.cgi?id=splinter.html&bug=103446&attachment=135162> > fix inline images in pdfimages > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > > Adrian, sorry it took me so long to come back to this, but this patch makes pdfimages crash with 104418018297-AttenInSuspensionsIrregularlyShapedSedimentParticles.pdf tsdgeos@xps:~/okularfiles/pdf/scripts:$ valgrind ~/devel/poppler/build-new/utils/pdfimages -png ../104418018297-AttenInSuspensionsIrregularlyShapedSedimentParticles.pdf old-pdfimages/bla ==18590== Memcheck, a memory error detector ==18590== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==18590== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==18590== Command: /home/tsdgeos/devel/poppler/build-new/utils/pdfimages -png ../104418018297-AttenInSuspensionsIrregularlyShapedSedimentParticles.pdf old-pdfimages/bla ==18590== ==18590== Invalid write of size 8 ==18590== at 0x4C320C3: memcpy@GLIBC_2.2.5 (vg_replace_strmem.c:1017) ==18590== by 0x4FB14A9: memcpy (string3.h:53) ==18590== by 0x4FB14A9: EmbedStream::getChars(int, unsigned char*) (Stream.cc:1140) ==18590== by 0x4FB2391: doGetChars (Stream.h:120) ==18590== by 0x4FB2391: ImageStream::getLine() (Stream.cc:512) ==18590== by 0x10F449: ImageOutputDev::writeImageFile(ImgWriter*, ImageOutputDev::ImageFormat, char const*, Stream*, int, int, GfxImageColorMap*) (ImageOutputDev.cc:476) ==18590== by 0x10FB5C: ImageOutputDev::writeImage(GfxState*, Object*, Stream*, int, int, GfxImageColorMap*, bool) (ImageOutputDev.cc:671) ==18590== by 0x4F610EB: Gfx::doImage(Object*, Stream*, bool) (Gfx.cc:4592) ==18590== by 0x4F618F9: Gfx::opBeginImage(Object*, int) (Gfx.cc:4895) ==18590== by 0x4F59F30: Gfx::go(bool) (Gfx.cc:738) ==18590== by 0x4F5A47E: Gfx::display(Object*, bool) (Gfx.cc:700) ==18590== by 0x4FA613A: Page::displaySlice(OutputDev*, double, double, int, bool, bool, int, int, int, int, bool, bool (*)(void*), void*, bool (*)(Annot*, void*), void*, bool) (Page.cc:560) ==18590== by 0x4FA63C7: Page::display(OutputDev*, double, double, int, bool, bool, bool, bool (*)(void*), void*, bool (*)(Annot*, void*), void*, bool) (Page.cc:483) ==18590== by 0x4FAAB68: PDFDoc::displayPages(OutputDev*, int, int, double, double, int, bool, bool, bool, bool (*)(void*), void*, bool (*)(Annot*, void*), void*) (PDFDoc.cc:516) ==18590== Address 0xe2281a8 is 984 bytes inside a block of size 989 alloc'd ==18590== at 0x4C2DB2F: malloc (vg_replace_malloc.c:299) ==18590== by 0x4F004E3: gmalloc (gmem.cc:110) ==18590== by 0x4F004E3: gmallocn (gmem.cc:192) ==18590== by 0x4F004E3: gmallocn_checkoverflow (gmem.cc:200) ==18590== by 0x4FB20BC: ImageStream::ImageStream(Stream*, int, int, int) (Stream.cc:454) ==18590== by 0x10F21A: ImageOutputDev::writeImageFile(ImgWriter*, ImageOutputDev::ImageFormat, char const*, Stream*, int, int, GfxImageColorMap*) (ImageOutputDev.cc:384) ==18590== by 0x10FB5C: ImageOutputDev::writeImage(GfxState*, Object*, Stream*, int, int, GfxImageColorMap*, bool) (ImageOutputDev.cc:671) ==18590== by 0x4F610EB: Gfx::doImage(Object*, Stream*, bool) (Gfx.cc:4592) ==18590== by 0x4F618F9: Gfx::opBeginImage(Object*, int) (Gfx.cc:4895) ==18590== by 0x4F59F30: Gfx::go(bool) (Gfx.cc:738) ==18590== by 0x4F5A47E: Gfx::display(Object*, bool) (Gfx.cc:700) ==18590== by 0x4FA613A: Page::displaySlice(OutputDev*, double, double, int, bool, bool, int, int, int, int, bool, bool (*)(void*), void*, bool (*)(Annot*, void*), void*, bool) (Page.cc:560) ==18590== by 0x4FA63C7: Page::display(OutputDev*, double, double, int, bool, bool, bool, bool (*)(void*), void*, bool (*)(Annot*, void*), void*, bool) (Page.cc:483) ==18590== by 0x4FAAB68: PDFDoc::displayPages(OutputDev*, int, int, double, double, int, bool, bool, bool, bool (*)(void*), void*, bool (*)(Annot*, void*), void*) (PDFDoc.cc:516) ==18590== Created attachment 136438 [details]
the file that crashes with the proposed patch
Created attachment 136509 [details] [review] Fix crash *** Bug 104453 has been marked as a duplicate of this bug. *** Pushed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 135032 [details] two-page music score as PDF using poppler-0.60.1 with libpoppler.so.71 on PDFs created by xsane v0.99, I get errors from the pdfimages -list <filename> command like: ... gill@happy ~/MEW_Archive/A/African Symphony/African Symphony Bari-Euph $ pdfimages -list African-Symphony-Baritone-C.pdf page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio -------------------------------------------------------------------------------------------- Syntax Error (174623): Unexpected end of file in flate stream 1 0 image 4800 6959 gray 1 1 image no [inline] 600 600 4078K 100% Syntax Error (175257): Unknown compression method in flate stream Syntax Error (147236): Unexpected end of file in flate stream 2 1 image 4800 6959 gray 1 1 image no [inline] 600 600 4078K 100% Syntax Error (322879): Unknown compression method in flate stream gill@happy ~/MEW_Archive/A/African Symphony/African Symphony Bari-Euph $ ... We have created a few thousand such files using xsane in the past months and distributed these to about 50 people (mainly windows users using Adobe Acrobat reader) who have had no problems viewing and printing these files. As "one last check" however, I decided to run them through pdfimages, hoping topick up whewhter eg.g any had been scanned in colour by mistake. The stdout data looks ok (comp=1 means zlib compression?), so I suppose I could just ignore the stderror stuff... but I'm concerned in csae there is a problem in there which will come back to bite me later! I am attaching the file to which the above error messages relate - rather large alas! I am willing to dig deeper myself if necessary to track down what's happening, but I would like some advice on how to proceed. Thanks.