I am using pdftohtml to reformat e-books in PDF format for easier reading on handheld devices. The handheld Adobe reader software supports a "reflow" mode in which the PDF text is adjusted to the screen size, just like normal HTML text. In this mode, paragraphs are preserved as paragraphs and the lines wrap at the width of the screen, so it is not necessary to scroll the document or have lines broken in their original spots in the document. Reading documents in reflow mode is a huge improvement, especially on small screens, and it would be an very useful addition to pdftohtml. I have been testing a patch someone else wrote and posted at http://lists.freedesktop.org/archives/poppler/2008-September/004126.html I have been testing it and it works pretty well, although it's not perfect, perhaps it could be applied a first step?
The problem is that i asked some questions[1] to the patch author and he never answered back, if you think this is a useful feature you might want to track him down and make him answer my questions [1] http://lists.freedesktop.org/archives/poppler/2009-January/004346.html
Created attachment 26473 [details] [review] Removes the unused variable vertOverlap The file HtmlOutputDev.cc has the function coalesce which expresses a compiler warning because the variable vertOverlap is never used. This patch removes it. This patch is part of the separation and fixing of the patch that showed up in this email: http://lists.freedesktop.org/archives/poppler/2008-September/004126.html A second patch will follow w/ the actual work for reflow.
Created attachment 26474 [details] [review] Adds -reflow to pdftohtml. This patch is a rework of the patch that showed up in this email: http://lists.freedesktop.org/archives/poppler/2008-September/004126.html That patch was answered later by this email: http://lists.freedesktop.org/archives/poppler/2009-January/004346.html In that email some issues were brought up. This patch should address all of those issues including: 1. The vertOverlap compiler warning is consequential to the reflow feature. That part of the patch was separated out and attached to this bug. 2. The noMerge assignment in pdftohtml.cc when -xml is used has been removed. 3. Crufty unnecessary changes like the bgcolor and the comments on file extensions have been removed from the patch. 4. The <p> markup now has corresponding </p> closure markup. There are two extra changes in this patch that were not in the original posters patch. 1. The file HtmlFonts.cc has a change to not do   for space if reflow is on. 2. The original patch did not insert a space if merging a line. Some additional logic was added to allow for that. I am especially interested in whether this logic looks valid or whether it is faulty. It is hoped that both the original bug reported and Alberto get a chance to try out and review this patch. Thanks.
Albert not Alberto ;-) Sorry for taking so long. I still don't like the patch it introduces behavioural changes when not using the -reflow option that is not what people expect. The noMerge = gTrue; removal is still there and there vertOverlap for sure has an impact then not using -reflow What i want is a patch that adds the feature you want without touching the existing behaviour
Adding Erik to the CC Erik i added a comment, please read it in bugzilla
I have no idea why I attached an 'o' to the end of your first name, a thousand apologies. Thanks for the update. I will make my best effort to get a better patch going.
In my opinion it would be much better to use the Tagged-PDF structure (see bug #64813) to provide better support to know which objects are paragraphs, lists, figures, etc. The logic used in the attached patch could still be used for PDF files which do not contain the Tagged-PDF document structure, as a fallback.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/596.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.