Bug 103823

Summary: Add option to omit DOCTYPE for XML output
Product: poppler Reporter: Gerrit Imsieke <gerrit.imsieke>
Component: pdftohtmlAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Gerrit Imsieke 2017-11-20 08:53:26 UTC
Currently (I’m using 0.57.0 on Cygwin), with the -xml option, there is always a DOCTYPE declaration with no public identifier and the system identifier 'pdf2xml.dtd'.
The issue is that I can’t use an XML catalog to direct pdf2xml.dtd to a (possibly empty) DTD. This is an issue with relative paths as system identifiers. It is described on http://www.sagehill.net/docbookxsl/WriteCatalog.html#RelativeSysId
As a consequence, I need to remove the DOCTYPE line manually or with xmllint in order to be able to process the output with Java-based tools such as oXygen, Saxon, or XML Calabash. 
Please add either a public identifier (any non-empty string will do) that can be diverted by means of an XML catalog, or add a command-line option not to issue a DOCTYPE declaration at all.
Comment 1 GitLab Migration User 2018-08-21 11:11:54 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/566.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.