Bug 22506

Summary: Selected CJK character displays incorrectly in Evince
Product: poppler Reporter: Ryan Li <ryan>
Component: cairo backendAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED FIXED QA Contact:
Severity: minor    
Priority: low CC: mkasik
Version: unspecified   
Hardware: All   
OS: All   
URL: https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/299725
Whiteboard:
i915 platform: i915 features:
Attachments: Sample PDF file
proposed patch for selection
another example file
A pdf suffering regression

Description Ryan Li 2009-06-26 21:44:49 UTC
Created attachment 27184 [details]
Sample PDF file

Binary package hint: evince

Use Evince to open a PDF-file including CJK characters, and select some of these characters with cursor, and most of them disappears, and some are rendered as different characters or ASCII characters.
Using Ubuntu 8.10 with Evince 2.24.1.

ProblemType: Bug
Architecture: i386
DistroRelease: Ubuntu 8.10
ExecutablePath: /usr/bin/evince
Package: evince 2.24.1-0ubuntu1
ProcEnviron:
 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: evince
Uname: Linux 2.6.27-8-generic i686
Comment 1 Marek Kasik 2010-04-15 08:01:31 UTC
Created attachment 35064 [details] [review]
proposed patch for selection

Hi,

these are two problems:

The first one is that poppler doesn't create new word (TextWord in selection code) when there are different fonts between two consecutive chars even if it does this for different size of fonts.

The second one is that poppler checks 'y' coordinate when looking for begining and end of selection in line (TextLine). It shouldn't do that. It is searching for a word in line, so it should perform one dimensional search in 'x' only.

Attached patch fixes theses two problems.

Regards

Marek

P.S.: the attached patch fixes bugs #6923, #9608, #9672, #13441 and #16305 for me
Comment 2 Marek Kasik 2010-04-15 08:08:23 UTC
Created attachment 35066 [details]
another example file

This is another example of the issue. The second "Hello" shows the problem when there are 2 different fonts in one word.
The third Hello shows the problem of comparing of 'y' coordinates. The horizontal part of bounding box of those "ll" is smaller than bounding box of neighbouring characters, so it doesn't include them into string for painting when an user doesn't "hit" them during selection.

Btw, I don't think that this is specific to Cairo backend.
Comment 3 Albert Astals Cid 2010-04-17 10:20:16 UTC
Your patch breaks lots of PDF in pdftotext where you have a word like iPhone where the "i" is written in italics, your patch will output "i Phone" instead of "iPhone" because they are using different fonts.
Comment 4 Marek Kasik 2010-04-19 02:35:58 UTC
(In reply to comment #3)
> Your patch breaks lots of PDF in pdftotext where you have a word like iPhone
> where the "i" is written in italics, your patch will output "i Phone" instead
> of "iPhone" because they are using different fonts.

I thought so too but I didn't observe it. Could you point me to a pdf against which I can review my modifications of the patch?

Marek
Comment 5 Albert Astals Cid 2010-04-21 00:43:32 UTC
Created attachment 35196 [details]
A pdf suffering regression

See how old pdftotext says
"This demo example uses a channel model described in [4]"
where the patched version with your code says
"This demo example uses a channel model described in [4 ]"
and if you visually read the file the first version is the correct one.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.