58714 – Kannada u+0cb0 u+200d u+0ccd u+0c95 u+0cbe does not provide same results as Windows8

Bug 58714 - Kannada u+0cb0 u+200d u+0ccd u+0c95 u+0cbe does not provide same results as Windows8

Summary: Kannada u+0cb0 u+200d u+0ccd u+0c95 u+0cbe does not provide same results as ...

Status:	RESOLVED FIXED

Alias:	None

Product:	HarfBuzz
Classification:	Unclassified
Component:	src (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Behdad Esfahbod
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-12-24 11:58 UTC by Pravin
Modified:	2013-10-19 21:36 UTC (History)
CC List:	3 users (show)

See Also:
i915 platform:
i915 features:

Attachments
image showing correct and actual rendering (109.93 KB, image/png) 2012-12-24 11:58 UTC, Pravin	Details
Lohit font with contextual substituion rule for this combination. (191.93 KB, application/octet-stream) 2012-12-27 12:41 UTC, Pravin	Details
View All

Description Pravin 2012-12-24 11:58:19 UTC

Created attachment 72068 [details]
image showing correct and actual rendering

While working on https://bugzilla.redhat.com/show_bug.cgi?id=694724 I found that Tunga on Windows gives expected results. But Tunga font with harfbuzz does not gives proper results.

More information
  Unicode Chapter 9-> Kannada -> Consonant Clusters Involving RA provides more information on same. 

Tested with the master version.

Comment 1 Pravin 2012-12-27 12:41:26 UTC

Created attachment 72177 [details]
Lohit font with contextual substituion rule for this combination.

By adding contextual chaining substitution for  abvs, its working fine with Harfbuzz now. 

Now check with uniscribe. Attached lohit with rule for this combination

$./hb-view /NotBackedUp/fedora-git/lohit-kannada-fonts/master/lohit-kannada-2.5.1/Lohit-Kannada.ttf  ರ‍್ಕಿ

Comment 2 Behdad Esfahbod 2013-01-09 00:30:58 UTC

Jonathan, can you take a look at this?  With Tunga, Uniscribe returns:

[glyph171=0+942|uni0CBE=0+1296|space=0+0|glyph181=0+454]

whereas we return:

[uni0CB0=0+1288|space=0+0|uni0CBE=0+1296|glyph181=0+454]

Looks to me like Uniscribe may be skipping over ZWJ when matching.  Which according to Unicode is the right thing to do.  Maybe I try implementing that.  I believe for ZWNJ just letting it block ligatures is what we want.

Comment 3 Behdad Esfahbod 2013-01-09 00:34:33 UTC

It's tricky.  We don't want to unconditionally skip over ZWJ, in case someone wants to define ligatures using it.

Comment 4 Pravin 2013-01-09 08:04:09 UTC

Yeah, looks like Uniscribe is ingoring ZWJ. Lohit Kannada contextual chaining rules with ZWJ does not work in Windows.

Agree with you, we might need to consider ZWJ for some ticky combinations in fonts.

Comment 5 Behdad Esfahbod 2013-01-10 07:28:34 UTC

(In reply to comment #4)
> Yeah, looks like Uniscribe is ingoring ZWJ. Lohit Kannada contextual
> chaining rules with ZWJ does not work in Windows.
> 
> Agree with you, we might need to consider ZWJ for some ticky combinations in
> fonts.

Pravin,

Will you be in a position to test whether ZWJ is ignored in other Uniscribe shapers too?

Thanks,
b

Comment 6 Pravin 2013-01-10 09:01:02 UTC

Tested for Malayalam and Devanagari on Windows8. They do consider ZWJ for these script.

1. Malayalam chill combinations:  ര്‍ക with Lohit Malayalam uses ZWJ in gsub

2. Devanagari Test fonts: http://pravins.fedorapeople.org/Lohit-Devanagari.ttf
This font has psts rule with ZWJ ligature for: क्‍क

Test file: http://pravins.fedorapeople.org/testing-zwj

Comment 7 Behdad Esfahbod 2013-01-10 18:28:11 UTC

Ok, it's getting tricky.  Then we need to test this in every Uniscribe engine (default, Arabic, Hebrew, various Indic ones) and match it.  Not hard, just someone has to do the research.

Comment 8 Behdad Esfahbod 2013-01-10 18:32:02 UTC

It is also possible to make it match ZWJ if that is in the rule, otherwise skip it.  Maybe that's the best way forward?

Comment 9 Pravin 2013-01-11 05:45:31 UTC

Yes, idea looks fare enough.

If fonts has rules defined with ZWJ consider it, else ignore. I think it will not create any kind of regression as well.

Comment 10 Behdad Esfahbod 2013-02-14 18:12:53 UTC

Fixed:

commit cfc507c5432e6327e8484b07b9e091212653bc92
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Thu Feb 14 10:40:12 2013 -0500

    [Indic-like] Disable automatic joiner handling for basic shaping features
    
    Not for Arabic, but for Indic-like scripts.  ZWJ/ZWNJ have special
    meanings in those scripts, so let font lookups take full control.
    
    This undoes the regression caused by automatic-joiners handling
    introduced two commits ago.
    
    We only disable automatic joiner handling for the "basic shaping
    features" of Indic, Myanmar, and SEAsian shapers.  The "presentation
    forms" and other features are still applied with automatic-joiner
    handling.
    
    This change also changes the test suite failure statistics, such that
    a few scripts show more "failures".  The most affected is Kannada.
    However, upon inspection, we believe that in most, if not all, of the
    new failures, we are producing results superior to Uniscribe.  Hard to
    count those!
    
    Here's an example of what is fixed by the recent joiner-handling
    changes:
    
      https://bugs.freedesktop.org/show_bug.cgi?id=58714
    
    New numbers, for future reference:
    
    BENGALI: 353892 out of 354188 tests passed. 296 failed (0.0835714%)
    DEVANAGARI: 707336 out of 707394 tests passed. 58 failed (0.00819911%)
    GUJARATI: 366262 out of 366457 tests passed. 195 failed (0.0532122%)
    GURMUKHI: 60706 out of 60747 tests passed. 41 failed (0.067493%)
    KANNADA: 950680 out of 951913 tests passed. 1233 failed (0.129529%)
    KHMER: 299074 out of 299124 tests passed. 50 failed (0.0167155%)
    LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
    MALAYALAM: 1047983 out of 1048334 tests passed. 351 failed (0.0334817%)
    ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
    SINHALA: 271539 out of 271847 tests passed. 308 failed (0.113299%)
    TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
    TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
    TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)

commit 0b45479198d61d5135dad771e9c68408eb13f930
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Thu Feb 14 10:46:52 2013 -0500

    [OTLayout] Add fine-grained control over ZWJ matching
    
    Not used yet.  Next commit...

commit 607feb7cff0e50f8738d2e49ca463fc9d7d494de
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Thu Feb 14 07:43:13 2013 -0500

    [OTLayout] Ignore default-ignorables when matching GSUB/GPOS
    
    When matching lookups, be smart about default-ignorable characters.
    In particular:
    
    Do nothing specific about ZWNJ, but for the other default-ignorables:
    
    If the lookup in question uses the ignorable character in a sequence,
    then match it as we used to do.  However, if the sequence match will
    fail because the default-ignorable blocked it, try skipping the
    ignorable character and continue.
    
    The most immediate thing it means is that if Lam-Alef forms a ligature,
    then Lam-ZWJ-Alef will do to.  Finally!
    
    One exception: when matching for GPOS, or for backtrack/lookahead of
    GSUB, we ignore ZWNJ too.  That's the right thing to do.
    
    It certainly is possible to build fonts that this feature will result
    in undesirable glyphs, but it's hard to think of a real-world case
    that that would happen.
    
    This *does* break Indic shaping right now, since Indic Unicode has
    specific rules for what ZWJ/ZWNJ mean, and skipping ZWJ is breaking
    those rules.  That will be fixed in upcoming commits.

Comment 11 Behdad Esfahbod 2013-03-19 09:38:22 UTC

Two things:

- I need to revert the ZWJ changes to the Indic shaper; too many fonts make assumptions about how joiners are handled, we can't skip them the way we do now.

- The root cause of the bug here is something else I believe:

The sequence:

U+0CB0,U+200D,U+0CCD,U+0C95,U+0CBF

needs to reorder the top-matra U+0CBF to "pre-sub" position.  Base is zero.  Correct reordering would be to position 0CBF *before* the 200D.  Right now we are positioning it after, and *that*'s the real bug.

Comment 12 Behdad Esfahbod 2013-04-05 03:51:06 UTC

FWIW, the reverting is done.

Comment 13 Behdad Esfahbod 2013-04-05 03:54:14 UTC

To address this part:

"The sequence:

U+0CB0,U+200D,U+0CCD,U+0C95,U+0CBF

needs to reorder the top-matra U+0CBF to "pre-sub" position.  Base is zero.  Correct reordering would be to position 0CBF *before* the 200D.  Right now we are positioning it after, and *that*'s the real bug.
"

I'm thinking that perhaps we should attach joiners and halants that are post base to what comes *after* them, not before.  That would require thorough testing though, so it would take some time before I'm confident to do that.

Comment 14 Pravin 2013-05-31 10:51:40 UTC

While checking with master, i see this giving me expected results. Did we fixed this?

Comment 15 Behdad Esfahbod 2013-05-31 18:49:22 UTC

Humm.  Not that I know of!  I'll check.

Comment 16 Behdad Esfahbod 2013-10-19 21:36:16 UTC

Should be fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.