Created attachment 113337 [details] [review] patch to fix a FoFiTrueType test and to use alloca in GooString::appendfv() I looked at a report of slowness on the PDF at https://www.gnu.org/software/autoconf/manual/autoconf-2.60/html_node/Particular-Functions.html It gets two warnings due to a recently added test in fofi/FoFiTrueType.cc that is < and should be <= Syntax Warning: length bigger than vheaTab size Syntax Warning: length bigger than vmtxTab size The attached patch fixes the test. This file also gets the warnings below for each page because the page offset table has a length of 0. See Hints::readPageOffsetTable(). ghostscript does not complain, but I think that the page offset table should have at least 1 item, so the PDF might be bad. The Creator is "PDFCreator 2.0.2.0". Syntax Warning: Invalid least number of objects reading page offset hints table Syntax Warning: Failed to get object num from hint tables for page 1 Syntax Warning: Failed parsing page 1 using hint tables Syntax Warning: Failed to get object num from hint tables for page 1 Syntax Warning: Failed parsing page 1 using hint tables Then I did a build with profiling. I have some of the results below. pdftops makes a lot of use of GooString::formatDouble() and GooString::appendfv() to write numbers to the output postscript file. appendfv() is called a lot and allocates a buffer each time. I changed the allocation from gmallocn() to alloca() (including updating configure.ac to check for AC_FUNC_ALLOCA). alloca() is a little risky because it does not do overflow checking, but I think that appendfv() is never fed huge strings. After that, it would be nice to be able to inline FlateStream::getChar(), but I think that it would require a lot of special case code for each type of stream. On my Fedora 20, 2.5 GHz i5-2450M laptop, running on a RAM disk, "pdftops 2-DESCR_648853-IT-EN-DE_MANITOU_RIGENERATO.pdf x" takes 3.6s and "pdftocairo -jpeg 2-DESCR_648853-IT-EN-DE_MANITOU_RIGENERATO.pdf x" takes 4.6s. This document has 68 pages, so that is about 15 pages per second. On page 11 and some other pages, valgrind with pdftocairo complains jpeg_write_scanlines (in /usr/lib64/libjpeg.so.62.1.0) JpegWriter::writeRow(unsigned char**) (JpegWriter.cc:166) writePageImage(GooString*) (pdftocairo.cc:418) main (pdftocairo.cc:660) In pdftocairo, height = 1755, stride = 4960, width = 1240, transp = 0, tiff = 0, gray = 0, mono = 0, so it is in the case for RGB. pdftoppm -jpeg is clean, so it might be an interaction between cairo and libjpeg. pdftops 2-DESCR_648853-IT-EN-DE_MANITOU_RIGENERATO.pdf xxx % cumulative self self total time seconds seconds calls ms/call ms/call name 14.97 0.28 0.28 2968599 0.00 0.00 GooString::formatDouble(double, char*, int, int, bool, char**, int*) 12.83 0.52 0.24 2057501 0.00 0.00 GooString::appendfv(char const*, __va_list_tag*) 7.22 0.66 0.14 7576688 0.00 0.00 FlateStream::readSome() 7.22 0.79 0.14 37198267 0.00 0.00 FlateStream::getChar() 5.35 0.89 0.10 35633758 0.00 0.00 JBIG2Segment::~JBIG2Segment() 5.35 0.99 0.10 6171541 0.00 0.00 Lexer::getObj(Object*, int) 4.28 1.07 0.08 5535785 0.00 0.00 Parser::getObj(Object*, bool, unsigned char*, CryptAlgorithm, int, int, int, int, bool) 3.74 1.14 0.07 29067578 0.00 0.00 Lexer::lookChar() 3.74 1.21 0.07 11505008 0.00 0.00 FlateStream::getHuffmanCodeWord(FlateHuffmanTab*) 3.48 1.28 0.07 12915 0.01 0.01 FlateStream::compHuffmanCodes(int*, int, FlateHuffmanTab*) 2.94 1.33 0.06 4415831 0.00 0.00 FlateStream::getCodeWord(int) 2.67 1.38 0.05 14501741 0.00 0.00 FileStream::getChar() 2.67 1.43 0.05 146 0.34 11.52 Gfx::go(bool) 2.14 1.47 0.04 12755583 0.00 0.00 Object::free() 2.14 1.51 0.04 6114655 0.00 0.00 Parser::shift(int) 1.60 1.54 0.03 1816652 0.00 0.00 Gfx::execOp(Object*, Object*, int) 1.60 1.57 0.03 218 0.14 0.14 Lexer::getChar(bool) pdftocairo -jpeg 2-DESCR_648853-IT-EN-DE_MANITOU_RIGENERATO.pdf xxx 32.43 0.36 0.36 writePageImage(GooString*) 9.01 0.46 0.10 3027722 0.00 0.00 Lexer::getObj(Object*, int) 8.11 0.55 0.09 5304419 0.00 0.00 FlateStream::getHuffmanCodeWord(FlateHuffmanTab*) 5.41 0.61 0.06 3436913 0.00 0.00 FlateStream::readSome() 5.41 0.67 0.06 2762912 0.00 0.00 Parser::getObj(Object*, bool, unsigned char*, CryptAlgorithm, int, int, int, int, bool) 2.70 0.70 0.03 18211014 0.00 0.00 FlateStream::getChar() 2.70 0.73 0.03 17502359 0.00 0.00 GlobalParams::getSecurityHandler(char*) 2.70 0.76 0.03 6246744 0.00 0.00 Object::free() 2.70 0.79 0.03 18126 0.00 0.00 GfxDeviceCMYKColorSpace::getRGBLine(unsigned char*, unsigned int*, int) 2.70 0.82 0.03 5172 0.01 0.01 FlateStream::compHuffmanCodes(int*, int, FlateHuffmanTab*) 1.80 0.84 0.02 14309886 0.00 0.00 Lexer::lookChar() pdftoppm -jpeg 2-DESCR_648853-IT-EN-DE_MANITOU_RIGENERATO.pdf xxx 9.35 0.29 0.29 12022517 0.00 0.00 Splash::pipeRunAARGB8(SplashPipe*) 9.35 0.58 0.29 275 1.05 1.61 Splash::scaleImageYuXuBilinear(bool (*)(void*, unsigned char*, unsigned char*), void*, SplashColorMode, int, bool, int, int, int, int, SplashBitmap*) 7.42 0.81 0.23 68 3.38 3.38 Splash::compositeBackground(unsigned char*) 6.77 1.02 0.21 260902 0.00 0.00 SplashXPathScanner::clipAALine(SplashBitmap*, int*, int*, int) 4.19 1.15 0.13 8111163 0.00 0.00 void std::make_heap<SplashXPathSeg*, cmpXPathSegsFunctor>(SplashXPathSeg*, SplashXPathSeg*, cmpXPathSegsFunctor) 4.19 1.28 0.13 209068 0.00 0.00 SplashXPathSeg* std::__unguarded_partition<SplashXPathSeg*, SplashXPathSeg, cmpXPathSegsFunctor>(SplashXPathSeg*, SplashXPathSeg*, SplashXPathSeg const&, cmpXPathSegsFunctor) 4.19 1.41 0.13 154405 0.00 0.01 Splash::fillWithPattern(SplashPath*, bool, SplashPattern*, double) 3.23 1.51 0.10 582666 0.00 0.00 SplashXPathScanner::renderAALine(SplashBitmap*, int*, int*, int, bool) 3.23 1.61 0.10 126376 0.00 0.00 SplashXPathScanner::computeIntersections() 2.90 1.70 0.09 14488586 0.00 0.00 Splash::pipeRunSimpleRGB8(SplashPipe*) 2.90 1.79 0.09 229187 0.00 0.00 SplashClip::clipAALine(SplashBitmap*, int*, int*, int, bool) 2.26 1.86 0.07 18211014 0.00 0.00 FlateStream::getChar() 2.26 1.93 0.07 6248717 0.00 0.00 Object::free() 2.26 2.00 0.07 20483 0.00 0.00 expandRow(unsigned char*, unsigned char*, int, int, int) 2.26 2.07 0.07 3028636 0.00 0.00 Lexer::getObj(Object*, int)
The PDF is at https://dl.dropboxusercontent.com/u/9451086/2-DESCR_648853%20IT-EN-DE_MANITOU_RIGENERATO.pdf
These three changes are unrelated and should be split into three separate commits. My only concern with the alloca change is ensuring the stack usage is not too large. > argsSize = 8; > +#if HAVE_ALLOCA_H > + args = (GooStringFormatArg *)alloca(argsSize * sizeof(GooStringFormatArg)); > +#else This is a fixed size allocation that I estimate to be up to 544 bytes so it should be fine. > argsSize *= 2; > +#if HAVE_ALLOCA_H > + { > + GooStringFormatArg *new_args = (GooStringFormatArg > *)alloca(argsSize * sizeof(GooStringFormatArg)); > + memcpy(new_args, args, argsLen); > + args = new_args; > + } > +#else It looks like the allocation is increased if there are more than 8 format args. I have not searched the code to see if there are more than 8 format args used. I'm guessing this is unlikely so maybe when there are more than 8 args we could fallback to using gmallocn/greallocn. If > 8 args is rare there will be little performance impact but it does put an upper bound on the stack usage. > +#if HAVE_ALLOCA_H > +#else > gfree(args); > +#endif #ifndef HAVE_ALLOCA_H gfree(args); #endif would be better. > +#if 0 > int FlateStream::getChar() { > if (pred) { > return pred->getChar(); > } > return doGetRawChar(); > } > +#endif I don't see any need for keeping this. It will always be in the git history. I had trouble understanding what the timings you provided are referring to. Could you provide separate times for master, master + alloca change, and master + alloc + inline change.
Thanks for looking at it. >These three changes are unrelated and should be split into three separate commits. Later today, I will make a patch with just the FoFiTrueType patch here and open separate reports for the alloca change and the Flate change. >#ifndef HAVE_ALLOCA_H I did that intentionally for symmetry with the allocation to make it clear that the other case was not forgotten. I do that in my own code, but I will make the new patch match the style in poppler. >I had trouble understanding what the timings The timings are after the change. The alloca change made appendfv() about 5% faster. The FlateStream change had no effect. Some of the other streams inline their getChar() also, but I suspect that all of the stream getChar() functions are called from places that gcc can not tell the Stream type at compile time. I ran several pdftoxxx utilities because the person who reported the performance issues did not say what output device he was using. On my laptop, pdftocairo is significantly faster than pdftoppm because cairo uses assembly code that takes advantage of vector instructions while splash is pure C++. The person who reported the issue had an ARM CPU, and the coding of the output method and the display hardware could make a big difference. For example, we have a RaspberryPi running Fedora, and moving windows on the console is so slow that you can see the pixels paint, like the first time that I ran X on a 20MHz 386 nearly 30 years ago.
I've pushed the off by one fix to FoFiTrueType, thanks for catching! Please open different bugs for the rest.
Thanks. The gmallocn() GooString::appendfv() patch is in https://bugs.freedesktop.org/show_bug.cgi?id=89096 The Stream inline patch is in https://bugs.freedesktop.org/show_bug.cgi?id=89097
This is great. I Understand when we face problems in playing Xbox live. here https://freexboxlive.xyz/ you can play Xbox and become a free member of Xbox gold.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.