pdftohtml currently dumps document outlines in html mode but not in xml mode. The following patch adds generation of outlines in -xml mode. The generated .xml will have something like the following at the end if outlines were present (indentation added for readability): <...> </page> <outline> <item page="5">Contents</item> <item page="9">Figures</item> <item page="13">Tables</item> <item page="23">Preface</item> <item page="25">1 Introduction</item> <outline> <item page="25">1.1 About This Book</item> <item page="28">1.2 Introduction to PDF 1.7 Features</item> <outline> <item page="28">1.2.1 Presentation of 3D Artwork</item> <item page="29">1.2.2 Interactive Features</item> <outline> <item page="29">Interactive Features That Aid Technical Communication</item> <item page="29">Interactive Feature for Use in a Legal Setting</item> </outline> <item page="29">1.2.3 Accessibility Related Features</item> <item page="30">1.2.4 Document Navigation Feature</item> <item page="30">1.2.5 Security-Related Features</item> <item page="31">1.2.6 General Features</item> <item page="31">1.2.7 PDF Reference Changes</item> <...>
Created attachment 56993 [details] [review] generate outlines in -xml mode
Created attachment 56994 [details] [review] consistently check if outlines need to be generated This should probably be applied to 0.18 branch as well; without it we will generate dangling outline references for pdfs that have Outlines entry with no OutlineItems in their catalog.
(In reply to comment #2) > This should probably be applied to 0.18 branch as well; Just a clarification - this comment applies to the second attached patch only (it can be cleanly applied independently of the first one).
Patch looks good but think <outline> should be reserved only for the toplevel one, inner ones should use something like <children> or some other name.
(In reply to comment #4) > Patch looks good but think <outline> should be reserved only for the toplevel > one, inner ones should use something like <children> or some other name. I actually tend to like having the same names for both top and other levels since they do indeed have the exact same structure. If you really prefer having a separate top-level element, how about enclosing the XML outline output into another element, say, "outlines" (plural) or "toc" (for table of content")?
Ok, no worries. I've commited the patch
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.