83061 – pdftotext -htmlmeta should quote text content

Bug 83061 - pdftotext -htmlmeta should quote text content

Summary: pdftotext -htmlmeta should quote text content

Status:	RESOLVED MOVED

Alias:	None

Product:	poppler
Classification:	Unclassified
Component:	utils (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	poppler-bugs
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-08-25 12:35 UTC by Jean-Francois Dockes
Modified:	2018-08-21 11:03 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
Pdf document with a title property, and a text body, containing HTML special characters (24.59 KB, application/pdf) 2014-08-25 21:26 UTC, Jean-Francois Dockes	Details
View All

Description Jean-Francois Dockes 2014-08-25 12:35:04 UTC

Special HTML characters (<>&"') inside the main text or PDF metadata (e.g.: title) are not escaped in the HTML output, possibly resulting in invalid HTML.

This is trivial to reproduce, but, if you need a sample doc, just ask...

Comment 1 Albert Astals Cid 2014-08-25 17:30:42 UTC

Having the file to reproduce will make it easier to fix, so yes, attach such a file.

Comment 2 Jean-Francois Dockes 2014-08-25 21:26:00 UTC

Created attachment 105255 [details]
Pdf document with a title property, and a text body, containing HTML special characters

The HTML special characters, should be replaced with character entities in the HTML output (< should become &lt; etc.) but they are not. As a result, some pieces of text disappear in the display (e.g. <un tag>), or bad HTML syntax results in unpredictable behaviour.

Comment 3 Jean-Francois Dockes 2014-08-26 05:48:54 UTC

Forgot: if someone is looking at this, the problem is also present for the "content" attribute of <meta> elements. Generally, no text should go from the PDF to the HTML document without being escaped.

Comment 4 GitLab Migration User 2018-08-21 11:03:18 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/485.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.