Created attachment 61552 [details] Test files to reproduce the bug When generating an xml version of a pdf, the font id used in a certain line of the text seems to be that of the first word of that line. This creates the following bug: it the first word in a line contains a word with italics, the font id outputted for the whole line is the font of the italic word, not of the rest of the line. I've created a file in LibreOffice (I've come accross this problem with pdf created with other programs so it's not a problem in the way LibreOffice is generating the pdf) with four lines like the following text (italic words are marked here with <i> tags): ------------ Line 1 line 2 <i>line</i> 3 line <i>4</i> ------------ All the text has the same font/size applied. And the xml generated is: <page number="1" position="absolute" top="0" left="0" height="1263" width="892"> <fontspec id="0" size="16" family="Times" color="#000000"/> <fontspec id="1" size="16" family="Times" color="#000000"/> <text top="85" left="85" width="46" height="20" font="0">Line 1</text> <text top="106" left="85" width="41" height="20" font="1"><i>line</i> 2</text> <text top="126" left="85" width="40" height="20" font="0">line 3</text> <text top="147" left="85" width="41" height="20" font="0">line <i>4</i></text> </page>
This was tested with pdftohtml 0.20.0
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/91.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.