The new -bbox-layout option introduced in 911d9fc8d85b776418039b4eebb37200a0987554 adds extra bounding box info. However it only displays the first page content, other pages are shown empty. The -bbox option still works as intended. By browsing the code, my guess is that comes from textOut->takeText() (line 528 from utils/pdftotext.cc) who get the TextPage content only in its first invocation.
I can confirm same problem here - version 0.40
Created attachment 123792 [details] [review] Get rid of ActualText class, move its functionality to TextPage
The reason is the broken TextOutputDev::takeText(), it does not account for an extra reference to the page kept by the ActualText class. The easy fix would be not to use takeText(), the right one on my opinion is to remove the ActualText class altogether, as its functionality is so tightly connected with the TextPage, so it makes no sense keeping them apart. The patch in the previous comment does this.
Is there any update on this? This is a rather frustrating bug. As we now have to execute the program once for each page.
Vladimir, if you could provide a much less "intrusive" patch that would be easier for me to integrate your fix. Your patch touches CairoOutputDev and i don't have much knowledge about it, so if possible i'd like your "simpler" option instead of your "in my opinion this is more correct" option.
For future Googlers who might find this bug and are looking for an up-to-date version of Vladimir's patch, I've a fork on Github with an updated version of that patch applied: https://github.com/LouisStAmour/poppler/commit/67de9fe25214c9d9134621502bb90e08db6c227a I won't say its perfect or well-tested, but it gets the job done for me. :)
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/88.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.