Bug 46011

Summary: pdftohtml to generate outlines in -xml mode
Product: poppler Reporter: Igor Slepchin <igor.redhat>
Component: pdftohtmlAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: generate outlines in -xml mode
consistently check if outlines need to be generated

Description Igor Slepchin 2012-02-13 14:00:00 UTC
pdftohtml currently dumps document outlines in html mode but not in xml mode. The following patch adds generation of outlines in -xml mode. The generated .xml will have something like the following at the end if outlines were present (indentation added for readability):

<...>
</page>
<outline>
  <item page="5">Contents</item>
  <item page="9">Figures</item>
  <item page="13">Tables</item>
  <item page="23">Preface</item>
  <item page="25">1 Introduction</item>
  <outline>
	<item page="25">1.1 About This Book</item>
	<item page="28">1.2 Introduction to PDF 1.7 Features</item>
	<outline>
	  <item page="28">1.2.1 Presentation of 3D Artwork</item>
	  <item page="29">1.2.2 Interactive Features</item>
	  <outline>
		<item page="29">Interactive Features That Aid Technical Communication</item>
		<item page="29">Interactive Feature for Use in a Legal Setting</item>
	  </outline>
	  <item page="29">1.2.3 Accessibility Related Features</item>
	  <item page="30">1.2.4 Document Navigation Feature</item>
	  <item page="30">1.2.5 Security-Related Features</item>
	  <item page="31">1.2.6 General Features</item>
	  <item page="31">1.2.7 PDF Reference Changes</item>
<...>
Comment 1 Igor Slepchin 2012-02-13 14:01:35 UTC
Created attachment 56993 [details] [review]
generate outlines in -xml mode
Comment 2 Igor Slepchin 2012-02-13 14:05:14 UTC
Created attachment 56994 [details] [review]
consistently check if outlines need to be generated

This should probably be applied to 0.18 branch as well; without it we will generate dangling outline references for pdfs that have Outlines entry with no OutlineItems in their catalog.
Comment 3 Igor Slepchin 2012-02-13 18:11:39 UTC
(In reply to comment #2)
> This should probably be applied to 0.18 branch as well; 

Just a clarification - this comment applies to the second attached patch only (it can be cleanly applied independently of the first one).
Comment 4 Albert Astals Cid 2012-02-19 14:49:39 UTC
Patch looks good but think <outline> should be reserved only for the toplevel one, inner ones should use something like <children> or some other name.
Comment 5 Igor Slepchin 2012-02-23 12:02:42 UTC
(In reply to comment #4)
> Patch looks good but think <outline> should be reserved only for the toplevel
> one, inner ones should use something like <children> or some other name.

I actually tend to like having the same names for both top and other levels since they do indeed have the exact same structure. If you really prefer having a separate top-level element, how about enclosing the XML outline output into another element, say, "outlines" (plural) or "toc" (for table of content")?
Comment 6 Albert Astals Cid 2012-02-23 14:09:56 UTC
Ok, no worries. I've commited the patch

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.