poppler allocates and frees a lot of objects, especially GooString objects. In
my profiling malloc() and free() are at the top of functions that take most of
I instrumented the code to record sizes of GooStrings at the time of
destruction. It turns out that ~90% of them are 16 bytes or less.
Currently allocating GooString() requires 2 malloc()s:
* one from new to allocate the object itself
* another one to allocate s pointer to a string
A very simple optimization is to keep a small buffer within GooString itself and
only allocate new space if the string gets bigger. This halfs the number of
malloc() calls for string that entirely fit in the buffer. In case of poppler, a
buffer of size 16 would eliminate 45% malloc()/free() calls that are caused by
Another good part is that it doesn't even use more memory. malloc() rounds
allocation sizes anyway (by 16 bytes on Ubuntu, try printf("allocation rounding:
%d\n", -(int)((char*)malloc(1) - (char*)malloc(1)));) and the OS needs
book-keeping information (BOOK_KEEP_SIZE) which is implementation dependent but
usually not less than 8 bytes. Assuming those parameters, memory used by current
implementation of GooString is:
* BOOK_KEEP_SIZE + round_16(sizeof(GooString)) + round_16($str_size) +
BOOK_KEEP_SIZE = 2*BOOK_KEEP_SIZE + round_16(8) + round_16($str_size) =
2*8+16+round_16($str_size) = 32+round_16($str_size)
In implementation with static buffer, assuming buffer size 16:
* for $str_size <= 16: BOOK_KEEP_SIZE + round_16(sizeof(GooString) +
round_16($str_size) = 8 + round_16(24) = 8 + 32 = 40 so it's actually less real
For $str_size > 16 a bit more is used.
How to choose size of static space: currently I use 16 but it might be better to
use 24 (so that sizeof(GooString) is 32 because it'll probably be anyway due to
rounding to 16 bytes).
Does it give any real speedup? My test consisted of just loading (no rendering)
of PDFReference16.pdf file (PDF 1.6 spec available from Adobe website) which is
about 8.72 in size.
I used a release build and recorded user time averaged over 4 runs. I got a 10%
speedup (from 1453.095 milliseconds to 1303.7425 milliseconds).
On top of that, the implementation is dead simple.
Attached patch has a #define FAST_GOO_STRING in GooString.h so that interested
parties can easily compile both versions and compare the speeds.
It also has #define DO_HIST that adds code to collect string sizes at the time
of deletion. This shows that majority of strings in poppler is <=16 bytes.
Howerver, I do not recommend keeping those #defines in final version. It's
trivial to remove them.
Created attachment 6496 [details] [review]
Patch for the faster string implementation
Created attachment 6540 [details] [review]
Created attachment 6541 [details] [review]
Lexer and UGooString optimization
I've created two new patches that, when combined, provide ~19% speed improvement
when loading PDFReference16.pdf document (PDF reference from Adobe website).
It cleans up previous patch and adds additional improvements.
Brief overview of changes:
* make GooString use internal buffer for short strings; re-factor GooString to
remove code duplication
* gfree() doesn't have to check for NULL pointer (C library does it anyway, it's
in the C ISO standard). gfree() is called so often that removing that check
improves the speed by 1%
* make UGooString use internal buffer as well; refactor the code to make it more
* Parser::getObj(): make 'key' to be UGooString to avoid creating temporary
objects since dictAdd() uses UGooString as the argument
* Lexer::lookChar() and Lexer::getChar() - getChar() is often called right after
lookChar() (for about 30% of all getChar()s). Currently it has to re-do all the
work that lookChar() did. A very simple optimization is to cache the last value
of lookChar() and return it in getChar() if available.
* PageLabelInfo.cc: #include <config.h> since it's needed for compilation on Windows
Most of those changes reduce the number of malloc()/free() calls. There are
still ways to go.
Created attachment 6798 [details] [review]
Additional fix for previous Lexer caching patch
This patch fixes a bug introduces in patch 6541 (Lexer and UGooString
optimization) by taking into account the fact that we might have one value
cached when creating a substream from current stream.
They should be applied together.
Krzysztof, although patch aren't commented yet, I think they are just great,
thanks for your work, it should certainly go into poppler.
commited to the head branch. Thanks for the work.
I found at least 3 files that now fail to parse correctly because of the lookChar() caching patch.
I can see poppler failing in the Gates_direct.pdf but not on the other 2, what's the problem there?
Created attachment 8562 [details] [review]
Tentative patch to fix Gates_direct.pdf problems
Does this patch fix the problems you are having?