poppler allocates and frees a lot of objects, especially GooString objects. In my profiling malloc() and free() are at the top of functions that take most of the time. I instrumented the code to record sizes of GooStrings at the time of destruction. It turns out that ~90% of them are 16 bytes or less. Currently allocating GooString() requires 2 malloc()s: * one from new to allocate the object itself * another one to allocate s pointer to a string A very simple optimization is to keep a small buffer within GooString itself and only allocate new space if the string gets bigger. This halfs the number of malloc() calls for string that entirely fit in the buffer. In case of poppler, a buffer of size 16 would eliminate 45% malloc()/free() calls that are caused by GooString(). Another good part is that it doesn't even use more memory. malloc() rounds allocation sizes anyway (by 16 bytes on Ubuntu, try printf("allocation rounding: %d\n", -(int)((char*)malloc(1) - (char*)malloc(1)));) and the OS needs book-keeping information (BOOK_KEEP_SIZE) which is implementation dependent but usually not less than 8 bytes. Assuming those parameters, memory used by current implementation of GooString is: * BOOK_KEEP_SIZE + round_16(sizeof(GooString)) + round_16($str_size) + BOOK_KEEP_SIZE = 2*BOOK_KEEP_SIZE + round_16(8) + round_16($str_size) = 2*8+16+round_16($str_size) = 32+round_16($str_size) In implementation with static buffer, assuming buffer size 16: * for $str_size <= 16: BOOK_KEEP_SIZE + round_16(sizeof(GooString) + round_16($str_size) = 8 + round_16(24) = 8 + 32 = 40 so it's actually less real memory used For $str_size > 16 a bit more is used. How to choose size of static space: currently I use 16 but it might be better to use 24 (so that sizeof(GooString) is 32 because it'll probably be anyway due to rounding to 16 bytes). Does it give any real speedup? My test consisted of just loading (no rendering) of PDFReference16.pdf file (PDF 1.6 spec available from Adobe website) which is about 8.72 in size. I used a release build and recorded user time averaged over 4 runs. I got a 10% speedup (from 1453.095 milliseconds to 1303.7425 milliseconds). On top of that, the implementation is dead simple. Attached patch has a #define FAST_GOO_STRING in GooString.h so that interested parties can easily compile both versions and compare the speeds. It also has #define DO_HIST that adds code to collect string sizes at the time of deletion. This shows that majority of strings in poppler is <=16 bytes. Howerver, I do not recommend keeping those #defines in final version. It's trivial to remove them.
Created attachment 6496 [details] [review] Patch for the faster string implementation
Created attachment 6540 [details] [review] GooString optimizations
Created attachment 6541 [details] [review] Lexer and UGooString optimization
I've created two new patches that, when combined, provide ~19% speed improvement when loading PDFReference16.pdf document (PDF reference from Adobe website). It cleans up previous patch and adds additional improvements. Brief overview of changes: * make GooString use internal buffer for short strings; re-factor GooString to remove code duplication * gfree() doesn't have to check for NULL pointer (C library does it anyway, it's in the C ISO standard). gfree() is called so often that removing that check improves the speed by 1% * make UGooString use internal buffer as well; refactor the code to make it more like GooString * Parser::getObj(): make 'key' to be UGooString to avoid creating temporary objects since dictAdd() uses UGooString as the argument * Lexer::lookChar() and Lexer::getChar() - getChar() is often called right after lookChar() (for about 30% of all getChar()s). Currently it has to re-do all the work that lookChar() did. A very simple optimization is to cache the last value of lookChar() and return it in getChar() if available. * PageLabelInfo.cc: #include <config.h> since it's needed for compilation on Windows Most of those changes reduce the number of malloc()/free() calls. There are still ways to go.
Created attachment 6798 [details] [review] Additional fix for previous Lexer caching patch This patch fixes a bug introduces in patch 6541 (Lexer and UGooString optimization) by taking into account the fact that we might have one value cached when creating a substream from current stream. They should be applied together.
Krzysztof, although patch aren't commented yet, I think they are just great, thanks for your work, it should certainly go into poppler.
commited to the head branch. Thanks for the work.
I found at least 3 files that now fail to parse correctly because of the lookChar() caching patch. http://www.apago.com/~dkelly/bugzilla_7808.zip
I can see poppler failing in the Gates_direct.pdf but not on the other 2, what's the problem there?
Created attachment 8562 [details] [review] Tentative patch to fix Gates_direct.pdf problems Does this patch fix the problems you are having?
fixed
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.