|Summary:||Equivalent Unicode sequences rendered differently with DejaVu|
|Product:||DejaVu||Reporter:||Roozbeh Pournader <roozbeh>|
|Component:||General||Assignee:||Deja Vu bugs <dejavu-bugs>|
|Status:||REOPENED ---||QA Contact:|
|i915 platform:||i915 features:|
|Bug Depends on:|
patch to fix the U+0387 = U+00B7 part
Patch to handle the prime and double prime case
patch to make U+0343 and U+0313 equal
patch to remove U+2329/232A and use or add U+27E8/U+27E9
Description Roozbeh Pournader 2006-11-15 12:08:23 UTC
Some cannonically equivalent Unicode sequences are rendered differently when using some (all?) Deja Vu fonts. This makes the fonts not conforming to the Unicode standard, as the standard requires all canonically equivalent sequences to be rendered the same way. A short list of some of the cases I discovered follows. I didn't do a thorough analysis: U+0374 = U+02B9, but the glyphs are different: ʹʹ U+0387 = U+00B7, but the glyphs are different: ·· I think that for each of such cases, DejaVu should select the most appropriate of the two shapes (or one in the middle) and then create two CMAP entries from the two characters to the same glyph (or make one use glyph references to the other). In the case the characters really need to appear differently in Greek contexts, DejaVu should add the greek alternates under names such as "periodcentered.greek" and use the appropriate OpenType features to handle this. Of course, it may be said that the rendering engines should take care of Unicode equivalence, instead of passing the character directly to the font. Even if that is the desired behavior, a rendering engine that normalizes strings to, say, NFC before passing it to the font layout subsystem, will never display the Greek forms of the glyphs (U+0374 and U+0387) and will always use U+00B7 and U+02B9 forms.
Comment 1 Denis Jacquerye 2006-11-15 17:21:35 UTC
You mean U+0384, instead of U+0387, being decomposable to U+00B7.
Comment 2 Denis Jacquerye 2006-11-15 17:28:27 UTC
(In reply to comment #1) > You mean U+0384, instead of U+0387, being decomposable to U+00B7. > Sorry, got mixed up, your correct.
Comment 3 Ben Laenen 2006-11-15 17:47:01 UTC
about dotcentered: in sans and mono the anoteleia are references to dotcentered and should therefor look the same. In Serif that's apparently not the case (should be corrected), but the glyph shapes are the same. So I don't know why you say they look different. About the Greek number sign and prime: I don't want to have a number sign that looks like the current prime, so I won't touch that. I don't know if prime could be changed. The option with local glyph variants you're suggesting seems to me a bit "ahead of technology", as I don't see locl support in Pango and Qt very soon. Therefor I'd prefer it to leave it that way until we're sure that a user that types the Greek number sign gets the glyph he is expecting.
Comment 4 Roozbeh Pournader 2006-11-15 18:37:44 UTC
(In reply to comment #3) > but the glyph shapes are the same. So I don't > know why you say they look different. I just checked, and it seems that they are really the same in the font. My confusing should have come from a hinting/anti-aliasing issue, it seems. > About the Greek number sign and prime: I don't want to have a number sign that > looks like the current prime, so I won't touch that. I don't know if prime > could be changed. Looking at the Unicode charts, it seems that prime should be changed. It is not upright at all in the charts. Should I attach a patch?
Comment 5 Roozbeh Pournader 2006-11-15 18:44:32 UTC
(In reply to comment #3) > a user that types the Greek number sign gets the glyph he is expecting. BTW, it seems that the Unicode Technical Committee is planning to deprecate U+0374, U+0387, and a bunch of other characters (U+0344, U+2126, ...) in 5.1. So the only way to make a user get the proper rendering in the future is by making sure U+02B9, U+00B7 etc. work properly for Greek.
Comment 6 Ben Laenen 2006-11-16 03:51:22 UTC
(In reply to comment #4) > I just checked, and it seems that they are really the same in the font. My > confusing should have come from a hinting/anti-aliasing issue, it seems. The Serif anoteleia isn't hinted, while Serif dotcentered is, so that could cause the different look. It could also be that you have the autohinter enabled and the autohinter is making different choices for both glyphs (wouldn't be the first time that references look differently than original glyph with autohinter). > Looking at the Unicode charts, it seems that prime should be changed. It is > not upright at all in the charts. > > Should I attach a patch? Be my guest :-) Easiest way would be to make prime a reference to the Greek number sign. Don't forget to adjust double prime as well if you make a patch.
Comment 7 Roozbeh Pournader 2006-11-19 09:05:18 UTC
Created attachment 7826 [details] [review] patch to fix the U+0387 = U+00B7 part
Comment 8 Roozbeh Pournader 2006-11-19 11:56:58 UTC
Created attachment 7832 [details] [review] Patch to handle the prime and double prime case
Comment 9 Roozbeh Pournader 2006-11-19 13:42:55 UTC
More equivalent things that must look but don't look like each other: U+0343 COMBINING GREEK KORONIS and U+0313 COMBINING COMMA ABOVE * In Sans. U+0343 doesn't exist in Mono and Serif. U+1FBE GREEK PROSGEGRAMMENI and U+03B9 GREEK SMALL LETTER IOTA * This pair are pretty weird as they are also different in Unicode charts. Investigating. U+3008 LEFT ANGLE BRACKET and U+2329 LEFT-POINTING ANGLE BRACKET U+3009 RIGHT ANGLE BRACKET and U+232A RIGHT-POINTING ANGLE BRACKET * U+3008 and U+3009 don't exist in the fonts.
Comment 10 Ben Laenen 2006-11-19 14:05:54 UTC
(In reply to comment #9) > U+1FBE GREEK PROSGEGRAMMENI and U+03B9 GREEK SMALL LETTER IOTA > * This pair are pretty weird as they are also different in Unicode charts. > Investigating. that, and the fact that I chose to have a iota ypogegrammeni (written below the capital letter instead of after it, I wrote an extensive list of arguments about that to the mailing list when I did that; see http://www.tlg.uci.edu/~opoudjis/unicode/unicode_adscript.html for best info available on the net about the iota ypo-/prosgegrammeni. Let's just say Unicode messed it up a little :-). But really, no-one should type U+1FBE, but use the proper capital vowel with it, so I wouldn't mind changing that to a normal lowercase iota.
Comment 11 Roozbeh Pournader 2006-11-19 17:44:53 UTC
Created attachment 7834 [details] [review] patch to make U+0343 and U+0313 equal This is accoring to discussions with Ben on IRC. The glyph for U+0343 is added to Serif and Mono, while for Sans they are made references. The previous status of Sans with these were weird: Regular had it refer to another glyph, Bold and Oblique had outlines, but BoldOblique was fine!
Comment 12 Roozbeh Pournader 2006-11-19 18:20:36 UTC
Created attachment 7835 [details] [review] patch to remove U+2329/232A and use or add U+27E8/U+27E9 This patch removes U+2329 and U+232A because of their CJK properties and their being deprecated for math use, and moves the outlines to U+27E8 and U+27E9 instead, which are the recommend characters to be used for math. In Sans fonts, the characters U+27EA and U+27EB had references to the CJK chars, which were changed to the math chars.
Comment 13 Roozbeh Pournader 2006-11-21 06:25:22 UTC
All the four patches committed. The only remaining case is U+1FBE GREEK PROSGEGRAMMENI and U+03B9 GREEK SMALL LETTER IOTA, for which I am waiting for answers from UTC.
Comment 14 Roozbeh Pournader 2006-12-31 03:37:40 UTC
(In reply to comment #13) > All the four patches committed. The only remaining case is U+1FBE GREEK > PROSGEGRAMMENI and U+03B9 GREEK SMALL LETTER IOTA, for which I am waiting for > answers from UTC. I have not received any answer from UTC and it seems that I won't unless we do a formal proposal and explaining the whole situation, which I am unwilling to do. Still, from a conformance clause from Unicode 5.0: "C6 A process shall not assume that the interpretations of two canonical-equivalent character sequences are distinct." (page 71) I am taking that (and the comments that come after it) to mean that we MUST treat these two the same. As Ben is fine with changing U+1FBE glyph, I'll go and do that anyway.
Comment 15 Benjamin Close 2008-01-11 02:36:35 UTC
Bugzilla Upgrade Mass Bug Change NEEDSINFO state was removed in Bugzilla 3.x, reopening any bugs previously listed as NEEDSINFO. - benjsc fd.o Wrangler