Bug 50138 - [PATCH] Large Indian Rupee Sign not recognized as text
Summary: [PATCH] Large Indian Rupee Sign not recognized as text
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-20 00:59 UTC by Carlos Garcia Campos
Modified: 2012-12-29 09:17 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Allow large chars in TextPage (806 bytes, patch)
2012-11-24 07:19 UTC, Jason Crain
Details | Splinter Review
Use page size for max value in TextPage::visitSelection (850 bytes, patch)
2012-11-24 07:39 UTC, Jason Crain
Details | Splinter Review

Description Carlos Garcia Campos 2012-05-20 00:59:28 UTC
Bug forwarded from Evince: https://bugzilla.gnome.org/show_bug.cgi?id=632365

"The attached PDF shows the new Indian Rupee Sign (₹) exported from Inkscape; 
with the smaller text at the bottom of the PDF it is possible to drag over and
select it for copy and paste.

However, neither "Select All", or dragging over the symbol itself highlights
the large symbol for copy-and-paste.

The large symbol is textual in nature (this can be confirmed by re-opening the
PDF in Inkscape, or using pdftohtml).

Ideally it would be possible to highlight and select all text in a document,
regardless of size.  In this case the document had been created specifically to
encourage people to copy-and-paste the correct symbol into their own documents!"

Test case is attached to original bug report. I can confirm that pdftotext doesn't include the first large symbol, and acroread allows to select and copy/paste it.
Comment 1 Jason Crain 2012-11-24 07:19:23 UTC
Created attachment 70502 [details] [review]
Allow large chars in TextPage

The large symbol is not selectable because TextPage::addChar rejects characters larger than the page size.  This patch removes that that check, though I do not know why it was added in the first place.
Comment 2 Jason Crain 2012-11-24 07:39:36 UTC
Created attachment 70504 [details] [review]
Use page size for max value in TextPage::visitSelection

The previous patch will cause TextPage::visitSelection to skip the "Indian ₹upee Sign" text because its bottom edge falls outside the page size.  This also affects poppler_page_get_text, which indirectly calls visitSelection.

This patch fixes that by using the page size if the TextBlock's border is outside the page.
Comment 3 Albert Astals Cid 2012-12-01 01:07:42 UTC
Jason, just to make sure, just one or both of the patches have to be applied?
Comment 4 Jason Crain 2012-12-01 15:40:43 UTC
(In reply to comment #3)
> Jason, just to make sure, just one or both of the patches have to be applied?

Both need to be applied.

The "Allow large chars" patch fixes the bug.  The "Use page size" patch fixes a side effect.
Comment 5 Albert Astals Cid 2012-12-01 19:03:36 UTC
I've commited the first patch, i'll let the second to Carlos as evince is the one that only uses the visitSelection code.
Comment 6 Carlos Garcia Campos 2012-12-29 09:17:02 UTC
Pushed the second patch to git master. Thanks!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.