Bug 20652 - PDF reflow
Summary: PDF reflow
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: All All
: medium enhancement
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-03-13 20:55 UTC by D W
Modified: 2018-08-21 11:16 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Removes the unused variable vertOverlap (674 bytes, patch)
2009-06-05 13:59 UTC, Erik Hovland
Details | Splinter Review
Adds -reflow to pdftohtml. (9.99 KB, patch)
2009-06-05 14:07 UTC, Erik Hovland
Details | Splinter Review

Description D W 2009-03-13 20:55:27 UTC
I am using pdftohtml to reformat e-books in PDF format for easier reading on handheld devices.

The handheld Adobe reader software supports a "reflow" mode in which the PDF text is adjusted to the screen size, just like normal HTML text.  In this mode, paragraphs are preserved as paragraphs and the lines wrap at the width of the screen, so it is not necessary to scroll the document or have lines broken in their original spots in the document.  Reading documents in reflow mode is a huge improvement, especially on small screens, and it would be an very useful addition to pdftohtml.

I have been testing a patch someone else wrote and posted at 
http://lists.freedesktop.org/archives/poppler/2008-September/004126.html
I have been testing it and it works pretty well, although it's not perfect, perhaps it could be applied a first step?
Comment 1 Albert Astals Cid 2009-03-21 08:22:53 UTC
The problem is that i asked some questions[1] to the patch author and he never answered back, if you think this is a useful feature you might want to track him down and make him answer my questions

[1] http://lists.freedesktop.org/archives/poppler/2009-January/004346.html
Comment 2 Erik Hovland 2009-06-05 13:59:10 UTC
Created attachment 26473 [details] [review]
Removes the unused variable vertOverlap

The file HtmlOutputDev.cc has the function coalesce which expresses a compiler warning because the variable vertOverlap is never used. This patch removes it. This patch is part of the separation and fixing of the patch that showed up in this email:
http://lists.freedesktop.org/archives/poppler/2008-September/004126.html

A second patch will follow w/ the actual work for reflow.
Comment 3 Erik Hovland 2009-06-05 14:07:05 UTC
Created attachment 26474 [details] [review]
Adds -reflow to pdftohtml.

This patch is a rework of the patch that showed up in this email:
http://lists.freedesktop.org/archives/poppler/2008-September/004126.html

That patch was answered later by this email:
http://lists.freedesktop.org/archives/poppler/2009-January/004346.html

In that email some issues were brought up. This patch should address all of those issues including:
1. The vertOverlap compiler warning is consequential to the reflow feature. That part of the patch was separated out and attached to this bug.
2. The noMerge assignment in pdftohtml.cc when -xml is used has been removed.
3. Crufty unnecessary changes like the bgcolor and the comments on file extensions have been removed from the patch.
4. The <p> markup now has corresponding </p> closure markup.

There are two extra changes in this patch that were not in the original posters patch.
1. The file HtmlFonts.cc has a change to not do &nbsp for space if reflow is on.
2. The original patch did not insert a space if merging a line. Some additional logic was added to allow for that. I am especially interested in whether this logic looks valid or whether it is faulty.

It is hoped that both the original bug reported and Alberto get a chance to try out and review this patch. Thanks.
Comment 4 Albert Astals Cid 2009-08-17 14:55:52 UTC
Albert not Alberto ;-)

Sorry for taking so long.

I still don't like the patch it introduces behavioural changes when not using the -reflow option that is not what people expect.

The noMerge = gTrue; removal is still there and there vertOverlap for sure has an impact then not using -reflow

What i want is a patch that adds the feature you want without touching the existing behaviour
Comment 5 Albert Astals Cid 2009-08-17 14:56:32 UTC
Adding Erik to the CC

Erik i added a comment, please read it in bugzilla
Comment 6 Erik Hovland 2009-08-17 15:11:00 UTC
I have no idea why I attached an 'o' to the end of your first name, a thousand apologies. Thanks for the update. I will make my best effort to get a better patch going.
Comment 7 Adrian Perez de Castro 2013-05-21 07:45:00 UTC
In my opinion it would be much better to use the Tagged-PDF structure
(see bug #64813) to provide better support to know which objects are
paragraphs, lists, figures, etc. The logic used in the attached patch
could still be used for PDF files which do not contain the Tagged-PDF
document structure, as a fallback.
Comment 8 GitLab Migration User 2018-08-21 11:16:20 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/596.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.