Bug 49864 - Wrong font id used when first word of a line has certain style applied (xml)
Summary: Wrong font id used when first word of a line has certain style applied (xml)
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: pdftohtml (show other bugs)
Version: unspecified
Hardware: All other
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-13 05:46 UTC by Luis Parravicini
Modified: 2018-08-20 21:49 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Test files to reproduce the bug (21.21 KB, application/zip)
2012-05-13 05:46 UTC, Luis Parravicini
Details

Description Luis Parravicini 2012-05-13 05:46:10 UTC
Created attachment 61552 [details]
Test files to reproduce the bug

When generating an xml version of a pdf, the font id used in a certain line of the text seems to be that of the first word of that line.

This creates the following bug: it the first word in a line contains a word with italics, the font id outputted for the whole line is the font of the italic word, not of the rest of the line.

I've created a file in LibreOffice (I've come accross this problem with pdf created with other programs so it's not a problem in the way LibreOffice is generating the pdf) with four lines like the following text (italic words are marked here with <i> tags): 

------------
Line 1
line 2
<i>line</i> 3
line <i>4</i>
------------

All the text has the same font/size applied. And the xml generated is:


<page number="1" position="absolute" top="0" left="0" height="1263" width="892">
        <fontspec id="0" size="16" family="Times" color="#000000"/>
        <fontspec id="1" size="16" family="Times" color="#000000"/>
<text top="85" left="85" width="46" height="20" font="0">Line 1</text>
<text top="106" left="85" width="41" height="20" font="1"><i>line</i> 2</text>
<text top="126" left="85" width="40" height="20" font="0">line 3</text>
<text top="147" left="85" width="41" height="20" font="0">line <i>4</i></text>
</page>
Comment 1 Luis Parravicini 2012-05-13 05:51:50 UTC
This was tested with pdftohtml 0.20.0
Comment 2 GitLab Migration User 2018-08-20 21:49:13 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/91.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.