de-bloat internal ICU Background: We re-use ICU internally, however - we use only a fraction of its functionality - yet we build and ship it all. These files are big - the icudata alone is 5.5Mb (compressed), and 13Mb on disk, and the redundant code chews a chunk of run-time memory usage. If we are building the internal ICU, we should disable everything we do not need. Unfortunately ICU has no way of doing this easily; so we need to do some manual work to the build to hack out pieces we do not need. First some API auditing is needed: eg. we do not use any ucnv_, ures_, unorm_, utrans_, u_shapeArabic, prefixed code at all - so none of that should be compiled in; we need to study of what ICU header are included (g grep 'include.*unicode'). More than that we need to kill some of the big data files eg. ~4Mb of charset conversion tables that are (apparently) unused - we already have charset conversion code in sal/ (based on ICU). To do that we most likely need to tweak the makefiles in icu/unxlngi6.pro/misc/build/icu/source/ - though this has to be done by updating the patch we apply in icu/ to the top-level pristine project. There are some links you can read on how to shrink the ICU data library here: [4] Skills: gnu make, simple C, diff/patch
ICU is compiled and unpacked by some dmake magic; hopefully the makefile.mk shows how patches can be applied in there. The code itself tends to be unpacked to eg. icu/unxlngi6.pro/misc/build/icu-* and as you re-run 'build ; deliver' in the top-level it can be re-unpacked over that so take care ;-)
You can read more about customising ICU's data library to remove un-needed pieces here: http://userguide.icu-project.org/icudata see eg. "Reducing the Size of ICU's Data: Locale Data" "Reducing the Size of ICU's Data: Conversion Tables" etc. Hopefully there are some easy wins there from just reading the manual and creating some new patches to add to icu/makefile.mk to configure that lot out.
Why having a bundled ICU at all ?
Because it is not available by default on all platforms CC'ing Michael, who was the original mentor for this IIRC?
Deteted "Easyhack" from summary
@eike: is this still open? I vaguely remember you doing something in this area.
There was someone working on it early this year or so, had some luck with stripping down a bit the data libraries, but never came up with the final patch (which would be just some makefile.mk hackery to pull a different tarball from ext_sources) nor a verification whether the stripped down data actually worked or not. Anyway, we'd have to redo things because in the mean time upgraded to ICU 49 and data packages have to be assembled individually for each version.
17:04 <@Sweetshark> erAck, mmeeks: is this still an easyhack: https://bugs.freedesktop.org/show_bug.cgi?id=38836 -- or should we better remove the whitespace keyword? from the last comment it seems not directly actionable to me .... 17:06 <@erAck> Sweetshark: close that, we can't remove anything anymore from ICU as external libs now depend on it.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.