(Sorry if I did the patch file wrong or anything like that... this is my first (and hopefully not the last :-D) contribution to a FOSS project other than a typo correction, so... yeah. :-) ) Anyways, so I made some minor improvements to TextPage:findText in poppler/TextOutputDev.cc. Hopefully I didn't make any mis-optimizations out of misunderstanding bits of the codebase. :-) Changes: - If a block is past what we've already found, then break. - If a line is past the end of the searching position, break if searching forwards - If a line is past what we've already found, break if searching forwards, otherwise just continue - When searching within a line, instead of searching according to backward, search via searchBackwards which takes line rotation into account, thus allowing us to break once the first match in a line is found. I've also got an implementation of KMP ( http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm ) with caching of failure functions since most programs do find-as-you-type, but I need to make sure the (minor) extra complexity incurred doesn't outweigh the O() improvement. Probably will attach that in a couple days. I hope it's of some use. (Changes made against the 0.5.4 release.) Hmm... I take it the attach file thing is after commiting the report? (I'm not that familiar with Bugzilla yet.)
Created attachment 7577 [details] [review] patch for TextOutputDev.cc (from 0.5.4 release)
Oops. enhancement means "Request for enhancement" which isn't quite right. Sorry. :-) I'll just set them to the default settings I guess.
Thanks for the patch David. I'm going to ask you two things: * please repost the patch with diff -u, that makes patches more readable, at least imho. * Send a mail to poppler mailing list asking if someone can review it (give a link to here) as my knowledge of that part of the code is not as big as i would want.
Created attachment 7638 [details] [review] Better diff format Okay, redid with diff -u... will post to mailing list as soon as I figure out how that is done. ;-)
Go to http://poppler.freedesktop.org and take the poppler mailing list link
Does this still apply?
Oh, hey. This thing. I would imagine it doesn't, given that it's been 3-4 years. There might also been some problems with the patch, or perhaps it was the KMP I never posted? I don't remember. I'll take a look at it sometime and see what can be done, or if even it's worth touching. I don't think this younger me ever checked things in a profiler.
I'd also be interested in a KMP version. Maybe I'll give it a shot. David, could you dig up your KMP patch and post it (no matter how bad it is)?
Review of attachment 7638 [details] [review]: This patch is about making "backwards" mean backwards when the page is upside down (rotation=2). I'm uncertain if the semantics have been discussed enough. Since it is not about performance, it should be in another bug.
Created attachment 36255 [details] [review] Implementation of KMP (Knuth-Morris-Pratt search algorithm) I implemented the Knuth-Morris-Pratt algorithm. As far as I see it works correctly. I tested with the glib-demo tool. But I didn't test backward search, although I implemented it (simply by reversing array access indices). I was able to see about twice the speed in some corner cases, but usually it has no gain. However, in no case I saw a degradation. Please review and test! It should also be considered that we rebuild the search string (Unicode conversion) and the next-array for KMP on every call of TextPage::findText. Since it is normally called through all pages, one could save some calculation there. I also looked at Boyer-Moore, but it seems stupid to make a lookup table that covers the whole Unicode space, and implementing it over a list or hashmap pretty much takes away the benefit of BM.
(In reply to comment #10) > It should also be considered that we rebuild the search string (Unicode > conversion) and the next-array for KMP on every call of TextPage::findText. > Since it is normally called through all pages, one could save some calculation > there. I just realized that there is no easy fix for this since findText is a operation based on pages, not documents. All in all, the dealing with blocks and lines is quite clumsy, but I guess there is no other way.
Created attachment 36291 [details] [review] starting from HEAD, only reformats whitespaces whitespaces fixes first in one patch
Created attachment 36292 [details] [review] Implementation of KMP (Knuth-Morris-Pratt search algorithm) Patch without whitespace (apply other patch first)
You didn't seemed really convinced in the mailing list this should go in, should i spend my time reviewing it or not?
(In reply to comment #14) > You didn't seemed really convinced in the mailing list this should go in, > should i spend my time reviewing it or not? Please do.
Do I have to apply both patches? There seems to be already unrelated changes in the combination of both patches. Could you provide a single patch with only the kmp implementation, please?
You can either apply the one from 2010-06-13 23:45 PDT, or the two later ones. They are the same, but I was asked to split them since I cleaned up the whitespace mess (which is all that the first patch does). The patches from David are unrelated.
Hi Johannes, sorry for the delay. There seems to be still urelated changes in your second patch. Could you please send a single patch with only changes required to implement KMP search? That code is inherited from xpdf code, we avoid to fix indentation or whitespaces issues because it would make future xpdf merges harder for us.
Review of attachment 36292 [details] [review]: ::: poppler/TextOutputDev.cc @@ +3386,3 @@ + j = 0; + kmpNext[0] = -1; + for (int i = 0; i < m; i++) { var i was initialized to 2 before and you are redefining it here.
So what's the status on this?
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/223.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.