Bug 49885

Summary: sync custom breakiterator rules with icu originals
Product: LibreOffice Reporter: Caolán McNamara <caolanm>
Component: LibreofficeAssignee: Not Assigned <libreoffice-bugs>
Status: NEW --- QA Contact:
Severity: normal    
Priority: medium CC: libreoffice
Version: Master old -3.6   
Hardware: Other   
OS: All   
Whiteboard: EasyHack SkillCpp DifficultyInteresting TopicCleanup
i915 platform: i915 features:

Description Caolán McNamara 2012-05-13 14:54:17 UTC
http://cgit.freedesktop.org/libreoffice/core/tree/i18npool/source/breakiterator/data/README

We have a bunch of breakiterator rules that are used to find the right place to break a line or word etc.

They are all derived from originals bundled into icu, the "master" versions can be found via 
svn checkout
http://source.icu-project.org/repos/icu/icu/trunk/source/data/brkitr 
(They no longer appear in the icu tarballs, but are in icu's svn)

At various stages these copies have been customized and are now horribly out of sync. It's unclear which diffs from the base versions are deliberate and which are now accidental :-(

What's needed is a review of the various issues referenced in the commits to our breakiterator rules that caused customizations and see if those are still relevant or overtaken by changes in later unicode specifications. Ideally then writing regression tests for them (see i18npool/qa/cppunit/test_breakiterator.cxx) and if any are still relavant then apply those changes back on top of the latest versions from icu, otherwise simply drop the rules entirely and fall directly back to build-in icu ones.
Comment 1 Björn Michaelsen 2013-10-04 18:47:15 UTC
adding LibreOffice developer list as CC to unresolved EasyHacks for better visibility.

see e.g. http://nabble.documentfoundation.org/minutes-of-ESC-call-td4076214.html for details

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.