Upstreaming the report in https://bugzilla.gnome.org/show_bug.cgi?id=63633 about using compat decompositions for Unicode codepoints whose glyphs are missing in the current font. I've got some time to work on this now, and having poked around it looks simpler in the new HarfBuzz than it would've been in the old one. Looks to me like the core of the changes should be to add support for compat decompositions in hb-ot-shape-normalize.cc’s decompose() function. However, this requires adding support for compat decompositions to hb_unicode_decompose(). I don't know about the ICU implementation, but the GLib implementation could use g_unichar_fully_decompose() similarly to how hb_glib_unicode_decompose() currently uses g_unichar_decompose(). This would require a new vfunc to be added to hb-unicode.cc (which is why I haven't gone ahead and written a patch already, since I'm unsure of the best way to do this). I was wondering about basically mirroring the GLib API: size_t hb_unicode_compat_decompose (hb_unicode_funcs_t*, hb_codepoint_t in, hb_codepoint_t *out, size_t out_len) from what I can gather, this would fit with the ICU API as well. If that sounds reasonable I'll go ahead and concoct a patch.
(In reply to comment #0) > Upstreaming the report in https://bugzilla.gnome.org/show_bug.cgi?id=63633 > about using compat decompositions for Unicode codepoints whose glyphs are > missing in the current font. Thanks. > I've got some time to work on this now, and having poked around it looks > simpler in the new HarfBuzz than it would've been in the old one. Indeed. > Looks to me like the core of the changes should be to add support for compat > decompositions in hb-ot-shape-normalize.cc’s decompose() function. However, > this requires adding support for compat decompositions to > hb_unicode_decompose(). Yeah, almost. > I don't know about the ICU implementation, but the GLib implementation could > use g_unichar_fully_decompose() similarly to how hb_glib_unicode_decompose() > currently uses g_unichar_decompose(). > > This would require a new vfunc to be added to hb-unicode.cc (which is why I > haven't gone ahead and written a patch already, since I'm unsure of the best > way to do this). I was wondering about basically mirroring the GLib API: > > size_t hb_unicode_compat_decompose (hb_unicode_funcs_t*, hb_codepoint_t in, > hb_codepoint_t *out, size_t out_len) Indeed. That's exactly why I added g_unichar_fully_decompose() and deprecate g_unicode_decomposition()! :). > from what I can gather, this would fit with the ICU API as well. Yeah, I don't see any big problem there. > If that sounds reasonable I'll go ahead and concoct a patch. Go ahead. If you do the plumbing, I'll make the hb-ot-shape-normalize.cc changes.
Created attachment 51635 [details] [review] Add support for compatibility decompositions Sorry about the wait; I ended up without an internet connection this weekend. This adds a decompose_fully vfunc and nil, GLib and ICU implementations. It extends the normalisation tests in test-unicode.c with a small selection of compatibility decomposition tests, all of which pass for GLib and ICU for me.
(In reply to comment #2) > Created an attachment (id=51635) [details] > Add support for compatibility decompositions > > Sorry about the wait; I ended up without an internet connection this weekend. > > This adds a decompose_fully vfunc and nil, GLib and ICU implementations. It > extends the normalisation tests in test-unicode.c with a small selection of > compatibility decomposition tests, all of which pass for GLib and ICU for me. 1. Lets rename from decompose_fully to decompose_compatibility(). 2. Lets ditch the buffer-len argument and require that the array has at least 18 entries room. What do you think? 3. I like to see a fallback glib implementation too, but I can do that myself.
Created attachment 51686 [details] [review] Add support for compatibility decompositions (updated) > 1. Lets rename from decompose_fully to decompose_compatibility(). Done. > 2. Lets ditch the buffer-len argument and require that the array has at least > 18 entries room. What do you think? I can't think of any reason not to. It seems to work for GLib's g_unichar_to_utf8(). Done. > 3. I like to see a fallback glib implementation too, but I can do that myself. Done, using a round-trip to UTF-8 and g_utf8_normalize() in NFKD mode.
Ping?
Created attachment 54183 [details] [review] Add support for compatibility decompositions (updated again) Here's an updated patch with the minor documentation fix we talked about on IRC, plus the necessary modifications to decompose() to use the compatibility decomposition if the original or canonical glyphs don't exist in the font. Hopefully this is what you had in mind; it certainly seems to work for me (and the test suite passes, though I haven't added any new tests).
Just pushed out a commit based on your patch, but with modifications to the algorithm and callback semantics. Please test. Thanks!
I tested it with a few fonts. Works beautifully.
Thanks for sorting this out!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.