Bug 52020

Summary: : ICU breakiterator not working with Khmer and Hunspell
Product: LibreOffice Reporter: Nathan Wells <sungkhum>
Component: LibreofficeAssignee: Caolán McNamara <caolanm>
Status: REOPENED --- QA Contact:
Severity: normal    
Priority: medium    
Version: 3.6.0.0.beta2   
Hardware: Other   
OS: All   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=59448
https://bugs.freedesktop.org/show_bug.cgi?id=59447
Whiteboard: BSA target:3.7.0 target:3.6.0.2
i915 platform: i915 features:
Attachments: Screenshot of "misspelled" Khmer words that should be treated as two words

Description Nathan Wells 2012-07-12 16:39:53 UTC
Created attachment 64144 [details]
Screenshot of "misspelled" Khmer words that should be treated as two words

Problem description: While ICU automatic line-breaking now works for Khmer in LibreOffice 3.6, Hunspell does not seem to be using the same word-breaking data and only sees one long line of text (Khmer does not have traditional "spaces" between words, like Thai). 

Steps to reproduce:
1. Type ឲ្យគេ (should be automatically broken by ICU into ឲ្យ|គេ)
2. If you have the SBBIC spelling checker installed http://extensions.libreoffice.org/extension-center/khmer-spelling-checker-sbbic-version and CTL enabled, you will see that ឲ្យគេ is treated as one word, rather than two, and is therefore misspelled.
3. You might need a font to correctly display Khmer (download one here: http://www.sbbic.org/2011/01/19/khmer-sbbic-unicode-system-font/ )

Current behavior: No Khmer words are automatically broken for Hunspell, so we have to continue manually putting zero-width spaces between words to spell check (even though line-breaking is now automatic)

Expected behavior: Khmer words should be automatically broken for Hunspell to check.

Platform (if different from the browser): 
              
Browser: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.47 Safari/536.11
Comment 1 Tia Seng 2012-07-13 03:01:56 UTC
It would be great to see this feature included in LibreOffice for Cambodians.Thanks
Comment 2 chomneau 2012-07-13 04:41:41 UTC
I don't like to type with space between word in khmer. it speed down my typing.
Comment 3 Not Assigned 2012-07-13 08:55:41 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=8ad1d4443e67784a8c0d3c1a3a72f089cb0cd3ec

Resolves: fdo#52020 ICU breakiterator not used for Khmer
Comment 4 Not Assigned 2012-07-13 09:19:24 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "libreoffice-3-6":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=5e3c37c8a3b567cf3d8c9a47b37155e3c2ffefb9&g=libreoffice-3-6

Resolves: fdo#52020 ICU breakiterator not used for Khmer


It will be available in LibreOffice 3.6.
Comment 5 Nathan Wells 2012-07-13 09:42:15 UTC
Wonderful news! Thank you for your time on this!
Comment 6 Nathan Wells 2013-07-19 03:06:19 UTC
After re-evaluating this solution, I want to ask that this patch be reversed for the time being in relation to these two bugs: https://bugs.freedesktop.org/show_bug.cgi?id=59448

and

https://bugs.freedesktop.org/show_bug.cgi?id=59447

Currently this patch makes it so that the user cannot be sure that all the words are correctly spelled in Khmer because the ICU word-breaker is not 100% accurate (so if it splits a word wrong it might be shown as being spelled correctly when in fact it is not).

I originally thought this patch would be a good thing, but it is now apparent that until the two other bugs/feature requests are solved, the ICU breaker should not be used for Hunspell/spell checking for Khmer.

Thanks, and sorry for causing this mess!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.