Bug 71877 - Word Count Wrong for ZWSP delimited text in SEA langauges (Thai, Lao, Khmer, and Burmese)
Summary: Word Count Wrong for ZWSP delimited text in SEA langauges (Thai, Lao, Khmer, ...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Not Assigned
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-21 14:11 UTC by Robert M Campbell
Modified: 2015-01-14 08:14 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Test document including ZWSP and non-ZWSP Thai, Lao, Khmer, and Burmese text (42.63 KB, application/vnd.oasis.opendocument.text)
2013-11-21 14:11 UTC, Robert M Campbell
Details
Test document including ZWSP and non-ZWSP Thai, Lao, Khmer, and Burmese text (38.13 KB, application/vnd.oasis.opendocument.text)
2013-11-25 04:21 UTC, Robert M Campbell
Details
Mittaphap (24.57 KB, application/x-font-ttf)
2013-11-25 04:42 UTC, Robert M Campbell
Details
Mittaphap Book (24.63 KB, application/x-font-ttf)
2013-11-25 04:43 UTC, Robert M Campbell
Details

Description Robert M Campbell 2013-11-21 14:11:17 UTC
Created attachment 89590 [details]
Test document including ZWSP and non-ZWSP Thai, Lao, Khmer, and Burmese text

When working with text that uses ZWSPs (zero width spaces) to delimit text, LibreOffice does not count each word. When the ZWSPs are removed, the word count acts fine.

But, word selection (double click) and line breaking work fine with or without ZWSPs.

Testing document attached.
Comment 1 Robinson Tryon (qubit) 2013-11-24 22:54:58 UTC
CONFIRMED in LO Version: 4.2.0.0.beta1 + Ubuntu 12.04.3

(In reply to comment #0)
> When working with text that uses ZWSPs (zero width spaces) to delimit text,
> LibreOffice does not count each word. When the ZWSPs are removed, the word
> count acts fine.

Per instructions in Test document:

REPRO STEPS:
- Open test document in LibreOffice
- Highlight first 4 paragraphs

As noted in the document, the bottom bar shows "202 words"

- Highlight the next set of 4 paragraphs

As noted in the document, the bottom bar shows "2 words"

> But, word selection (double click) and line breaking work fine with or
> without ZWSPs.

Well, at least there's that!

> 
> Testing document attached.

Thanks for the test document. Some of the fonts are not present on my system -- would it be possible to change the test document to use fonts included in LO that exercise the same bug?  (if not, perhaps point to where the fonts might be downloaded)

Status -> NEW
Comment 2 Robinson Tryon (qubit) 2013-11-24 22:57:05 UTC
Andras - Is this behavior a bug?
Comment 3 Robert M Campbell 2013-11-25 04:17:08 UTC
Paragraphs 1 & 5 (Thai) - No LibreOffice fonts that I can tell
Droid Sans
https://www.google.com/fonts/specimen/Droid+Sans

Paragraphs 2 & 6 (Khmer) - No LibreOffice fonts that I can tell
Khmer OS
http://sourceforge.net/projects/khmer/files/Fonts%20-%20KhmerOS/KhmerOS%20Fonts%204.0-%20LGPL%20License/

Paragraphs 3 & 7 (Lao) - No LibreOffice fonts that I can tell
Mittaphap
http://hg.palaso.org/font-lao2/file/d0764b11848f

Padauk (included in LibreOffice) is the Burmese Font

I'll adjust the document to the fonts listed. Mittaphap in particular is fairly new and only available as source, not ttf yet, but I have generated some fonts and can attach them here if that would be helpful?
Comment 4 Robert M Campbell 2013-11-25 04:21:26 UTC
Created attachment 89726 [details]
Test document including ZWSP and non-ZWSP Thai, Lao, Khmer, and Burmese text
Comment 5 Robinson Tryon (qubit) 2013-11-25 04:38:03 UTC
(In reply to comment #3)
> [...various font things ..] 
> I'll adjust the document to the fonts listed.

thanks

> Mittaphap in particular is
> fairly new and only available as source, not ttf yet, but I have generated
> some fonts and can attach them here if that would be helpful?

As long as the links are stable and fonts under some FOSS license so we may test against them, then it's generally fine to link to external font files.
Comment 6 Robert M Campbell 2013-11-25 04:42:58 UTC
Created attachment 89727 [details]
Mittaphap
Comment 7 Robert M Campbell 2013-11-25 04:43:29 UTC
Created attachment 89728 [details]
Mittaphap Book
Comment 8 Robert M Campbell 2013-11-25 04:52:49 UTC
Mittaphap is licensed OFL
Comment 9 Robert M Campbell 2014-01-22 03:16:36 UTC
Any news on this bug? Anything I can do to help?
Comment 10 Robinson Tryon (qubit) 2015-01-14 08:14:19 UTC
(In reply to Robert M Campbell from comment #9)
> Any news on this bug? Anything I can do to help?

Hi Robert,
Good question -- sorry for the late reply here! As you can see, we have a large number of open bug reports filed against LibreOffice, so it's often a matter of finding the right resource to help address a particular bug or set of bugs.

This bug appears to affect a number of different languages including Thai, so I'd suggest that you check with the Thai mailing list and see if others are experiencing the same problem:
https://wiki.documentfoundation.org/Local_Mailing_Lists#Thai

If the problem is affecting many people, then we can try to identify someone who'd be interested in working on a fix. This could be a great opportunity for a university CS student or someone else familiar with programming to learn more about LibreOffice.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.