Bug 107318 - Emit more font information when pdftohtml is run with -xml
Summary: Emit more font information when pdftohtml is run with -xml
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: utils (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-21 03:49 UTC by ulatekh
Modified: 2018-08-21 11:17 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Patch to add functionality (7.68 KB, patch)
2018-07-21 03:49 UTC, ulatekh
Details | Splinter Review

Description ulatekh 2018-07-21 03:49:03 UTC
Created attachment 140750 [details] [review]
Patch to add functionality

I'm about to use pdftohtml to extract information from PDFs and organize the results into a database, so I had a chance to dig through the code.

The patch merely emits more information in the <fontspec> elements when pdftohtml is run with -xml. The PDFs I'm trying to analyze appear to be pretty consistent with their font usage, to the point where I can use them to infer the text's meaning. But I needed more information in the <fontspec> to do that, and this patch does that for me.
Comment 1 GitLab Migration User 2018-08-21 11:17:46 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/605.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.