Bug 58922

Summary: Issue with mark advance zeroing in generic shaper
Product: HarfBuzz Reporter: Elie Roux <elie.roux>
Component: srcAssignee: Behdad Esfahbod <freedesktop>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: chris.sherlock79, dr.khaled.hosny, elie.roux, freedesktop, jmadero.dev
Version: unspecified   
Hardware: Other   
OS: All   
URL: http://lists.freedesktop.org/archives/harfbuzz/2013-April/003101.html
Whiteboard:
i915 platform: i915 features:
Attachments: Font showing the bug
buggy result with LibreOffice
Good result (with Harfbuzz's hb-view)
docx document with the issue.

Description Elie Roux 2013-01-01 17:46:54 UTC
Created attachment 72363 [details]
Font showing the bug

When using writer if I type the text གཚོའི་ཁིའུ་ using the attached font TestLig.ttf, I can see a buggy result (I'll attach the result in a later note), with an enormous space between the two syllables. The correct result is basically the same but without the space, I'll attach a correct result also in a later note.

This bug comes from a quite rare use case in the layout engine: the font contains in a ccmp table (lookup 2) the ligature: uni0F7C(ོ) uni0F60(འ) uni0F72(ི) -> uni0F7Cuni0F60uni0F72 : the first is zero-width mark apllying to a glyph (in my example ཚ), and the second is not (the third doesn't really matter here), the result is thus a non-zero-width mark... Some OT layout engines also have difficulties with this case:
 - ConTeXt/LuaTeX had the same bug but I reported it and it's now fixed
 - on Debian stable, the layout engine behind gedit has it too
 - harfbuzz seems to handle it correctly.

Please tell me if I can provide more information or if I should report it to some other place (ICU ?).
Comment 1 Elie Roux 2013-01-01 17:47:56 UTC
Created attachment 72364 [details]
buggy result with LibreOffice
Comment 2 Elie Roux 2013-01-01 17:49:39 UTC
Created attachment 72365 [details]
Good result (with Harfbuzz's hb-view)
Comment 3 Elie Roux 2013-04-07 12:15:05 UTC
The bug seems to have appeared in Harfbuzz (version from two days ago), and is still present with latest LibreOffice version. There is another bug with the text ཨཱཿཀ , also present in Harfbuzz. I will bugreport to ICU too.
Comment 4 Behdad Esfahbod 2013-04-09 04:07:25 UTC
Oops.  Sorry for the noise.  I thought this is filed against HarfBuzz.  Reverting title back to what it was.
Comment 5 Joel Madero 2013-04-16 17:27:21 UTC
Can you give additional steps on how you are using this font? What operating system, what language is it? 

I might have to run this through Ibus but I want to make sure I know what language it is before I test it. Also, what key combination do I put in to get the same result that you show in your two images?

Marking as NEEDINFO, once you provide the information mark as UNCONFIRMED and we'll investigate.


Thanks!
Comment 6 Elie Roux 2013-04-18 07:14:06 UTC
Hello,

I'm using the font under Linux, this is tibetan. You cannot reproduce the unicode characters I have pasted, unless you have m17n-bo-ewts (under Linux) or TISE (under Windows), the best it to copy/paste them (གཚོའི་ཁིའུ་ཨཱཿཀ).

If you can input tibetan unicode with ewts transliteration, you can write "mtsho'i khiu aH ka" (without the quotes).

I am not sure I understood your last question... if I want te reproduce it, I just copy/paste the unicode in LibreOffice and I select the font... Otherwise with hb-view:

./hb-view --output-file=foo.png --output-format=png TestLig.ttf གཚོའི་ཁིའུ་ཨཱཿཀ

What else do you need to know?

Thank you,
--
Elie
Comment 7 Joel Madero 2013-04-18 13:59:44 UTC
Should be enough - in the future please make sure to read everything carefully, the bug needs to go back to UNCONFIRMED status for us to know it needs triaged once you provide info requested by QA staff :)


Thanks so much for the clear explanation, going to attempt to reproduce after installing those packages
Comment 8 Elie Roux 2013-04-18 14:06:32 UTC
Oh, sorry for the workflow... if you are under linux, m17n can be a little tricky to make work... Some clues here: http://www.digitaltibetan.org/index.php/Tibetan_Input_Method_for_Linux_%28Gnome%29 (I used uim). Also their EWTS parser is quite buggy... you shouldn't encounter bugs with the string I've given you though...

Also, hb-view is only available when you compile harfbuzz, in the utils/ directory.

Thank you,
Comment 9 Chris Sherlock 2013-04-30 17:53:07 UTC
I'm installing TISE to see if I can reproduce this.
Comment 10 Chris Sherlock 2013-04-30 17:58:43 UTC
I can reproduce this after installing TISE on Windows 7, using LibreOffice 4.0.2.2.
Comment 11 Chris Sherlock 2013-04-30 18:03:38 UTC
OK, this is odd. When I open the attached document in Word 2010, I am getting exactly the same thing.
Comment 12 Chris Sherlock 2013-04-30 18:04:59 UTC
Created attachment 78660 [details]
docx document with the issue.
Comment 13 Elie Roux 2013-05-01 10:48:46 UTC
If you want some details: http://lists.freedesktop.org/archives/harfbuzz/2013-April/003101.html 

If I understand correctly there are two bugs: one with incorrect spacings, and one with the ligature at the beginning that is made or not...

Thank you,
-- 
Elie
Comment 14 Caolán McNamara 2013-05-14 19:22:39 UTC
caolanm->khaled: seeing as the original report suggests that "harfbuzz does it right", and we're using harfbuzz by default for 4.1, is this now fixed ?
Comment 15 Behdad Esfahbod 2013-05-14 19:25:58 UTC
(In reply to comment #14)
> caolanm->khaled: seeing as the original report suggests that "harfbuzz does
> it right", and we're using harfbuzz by default for 4.1, is this now fixed ?

Some of this issue is still remaining and I have it on my radar.  I believe you can close this, or reassign to HarfBuzz and someone will find the correct discussion thread on the HarfBuzz list and add it here.
Comment 16 Behdad Esfahbod 2013-05-15 18:43:42 UTC
http://lists.freedesktop.org/archives/harfbuzz/2013-April/003101.html

At the Ngapi hackfest in February we changed the generic shaper to zero
advance of glyphs that are from characters that are non-spacing marks in
Unicode.  Before that, the decision to zero mark advance was based on the GDEF
class of the glyph.  Testing shows that Uniscribe uses one or the other
behavior in different shapers and now we try to do the same.

It was brought to my attention however, that if a mark glyph ligates with
following non-mark glyphs, HarfBuzz zeros the advance while Uniscribe doesn't.
 This makes sense.  We already have logic in GSUB for synthetic GDEF, to
categorize the ligature as mark if all components are mark.  I think we want
the same logic to be used for the Unicode general category.  Ie, if a non-mark
is present, change the category of the result to non-mark.

Here's a font for testing: https://bugs.freedesktop.org/attachment.cgi?id=72363

Here's the sequence: 0f42,U+0F7C,U+0F60,U+0F72,U+0F42

Uniscribe output:
[uni0F42=0+356|uni0F7C0F600F72=2+1782|uni0F42=4+356]

HB:
[uni0F42=0+356|uni0F7C0F600F72=0+0|uni0F42=4+356]

Note that the ligature in question has GDEF class 3 (mark), so it's definitely
the Unicode-based logic that is involved here, not GDEF-based.

I'll go ahead and add the logic to do this, though it's getting a bit uglier
than I like.
Comment 17 Behdad Esfahbod 2013-05-27 18:53:32 UTC
commit 7e08f1258da229dfaf7e1c4b5c41e5bb83906cb0
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Mon May 27 14:48:34 2013 -0400

    Don't zero advance of mark-non-mark ligatures
    
    If there's a mark ligating forward with non-mark, they were
    inheriting the GC of the mark and later get advance-zeroed.
    Don't do that if there's any non-mark glyph in the ligature.
    
    Sample test: U+1780,U+17D2,U+179F with Kh-Metal-Chrieng.ttf
    
    Also:
    Bug 58922 - Issue with mark advance zeroing in generic shaper

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.