Original bug is reported here for GNOME: https://bugzilla.gnome.org/show_bug.cgi?id=738704 Here for SUSE: https://bugzilla.opensuse.org/show_bug.cgi?id=898323 This crashes for a PDF which looks like it has no text and is just an image, see the GNOME bug for the link to it. I can confirm this bug, but it's not a Tracker bug as far as I can see. We call: text = poppler_page_get_text (page); and we run out of memory and it does take an age to come back from that API call.
Meant to say, this is using version 0.26.5
The PDF is drawing the dots in the chart with the unicode character U+22C5 DOT OPERATOR. If you have enough memory and patience the file will be successfully processed. On my machine it takes 202 seconds and has peak memory usage of 2.7GB. The output file contains over 100,000 U+22C5 characters. I recall a discussion a few years ago about improving the efficiency of the text extraction: http://lists.freedesktop.org/archives/poppler/2010-November/006646.html I'm not sure what happened to those patches.
(In reply to Adrian Johnson from comment #2) > The PDF is drawing the dots in the chart with the unicode character U+22C5 > DOT OPERATOR. If you have enough memory and patience the file will be > successfully processed. On my machine it takes 202 seconds and has peak > memory usage of 2.7GB. The output file contains over 100,000 U+22C5 > characters. Yea, still, for a 2Mb file, that's rather a lot of memory use to draw 100k characters. The speed is also the reason Tracker will SIGABRT on this file, that's way too long to extract some text from a PDF - arguably, there is none anyway :) Is there another API we could use that is more efficient OR to detect if there is even any content to extract in the first place to avoid this problem? > I recall a discussion a few years ago about improving the efficiency of the > text extraction: > > http://lists.freedesktop.org/archives/poppler/2010-November/006646.html > > I'm not sure what happened to those patches. This is clearly a problem extending past Tracker if Evince is using poppler too.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/72.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.