Summary: | Huge spike in CPU and memory usage by tracker extractor due to rogue file | ||
---|---|---|---|
Product: | poppler | Reporter: | Martyn Russell <mr> |
Component: | general | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | badshah400, dominique-freedesktop.org, rishi.is |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Martyn Russell
2014-10-19 11:43:02 UTC
Meant to say, this is using version 0.26.5 The PDF is drawing the dots in the chart with the unicode character U+22C5 DOT OPERATOR. If you have enough memory and patience the file will be successfully processed. On my machine it takes 202 seconds and has peak memory usage of 2.7GB. The output file contains over 100,000 U+22C5 characters. I recall a discussion a few years ago about improving the efficiency of the text extraction: http://lists.freedesktop.org/archives/poppler/2010-November/006646.html I'm not sure what happened to those patches. (In reply to Adrian Johnson from comment #2) > The PDF is drawing the dots in the chart with the unicode character U+22C5 > DOT OPERATOR. If you have enough memory and patience the file will be > successfully processed. On my machine it takes 202 seconds and has peak > memory usage of 2.7GB. The output file contains over 100,000 U+22C5 > characters. Yea, still, for a 2Mb file, that's rather a lot of memory use to draw 100k characters. The speed is also the reason Tracker will SIGABRT on this file, that's way too long to extract some text from a PDF - arguably, there is none anyway :) Is there another API we could use that is more efficient OR to detect if there is even any content to extract in the first place to avoid this problem? > I recall a discussion a few years ago about improving the efficiency of the > text extraction: > > http://lists.freedesktop.org/archives/poppler/2010-November/006646.html > > I'm not sure what happened to those patches. This is clearly a problem extending past Tracker if Evince is using poppler too. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/72. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.