Bug 103823 - Add option to omit DOCTYPE for XML output
Summary: Add option to omit DOCTYPE for XML output
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: pdftohtml (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-20 08:53 UTC by Gerrit Imsieke
Modified: 2018-08-21 11:11 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Gerrit Imsieke 2017-11-20 08:53:26 UTC
Currently (I’m using 0.57.0 on Cygwin), with the -xml option, there is always a DOCTYPE declaration with no public identifier and the system identifier 'pdf2xml.dtd'.
The issue is that I can’t use an XML catalog to direct pdf2xml.dtd to a (possibly empty) DTD. This is an issue with relative paths as system identifiers. It is described on http://www.sagehill.net/docbookxsl/WriteCatalog.html#RelativeSysId
As a consequence, I need to remove the DOCTYPE line manually or with xmllint in order to be able to process the output with Java-based tools such as oXygen, Saxon, or XML Calabash. 
Please add either a public identifier (any non-empty string will do) that can be diverted by means of an XML catalog, or add a command-line option not to issue a DOCTYPE declaration at all.
Comment 1 GitLab Migration User 2018-08-21 11:11:54 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/566.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.