Bug 9038

Summary:	Equivalent Unicode sequences rendered differently with DejaVu
Product:	DejaVu	Reporter:	Roozbeh Pournader <roozbeh>
Component:	General	Assignee:	Deja Vu bugs <dejavu-bugs>
Status:	REOPENED ---	QA Contact:
Severity:	normal
Priority:	high	CC:	samjnaa
Version:	unspecified
Hardware:	x86 (IA32)
OS:	All
Whiteboard:
i915 platform:		i915 features:
Bug Depends on:
Bug Blocks:	21142
Attachments:	patch to fix the U+0387 = U+00B7 part Patch to handle the prime and double prime case patch to make U+0343 and U+0313 equal patch to remove U+2329/232A and use or add U+27E8/U+27E9

Description Roozbeh Pournader 2006-11-15 12:08:23 UTC

Some cannonically equivalent Unicode sequences are rendered differently when
using some (all?) Deja Vu fonts. This makes the fonts not conforming to the
Unicode standard, as the standard requires all canonically equivalent sequences
to be rendered the same way.

A short list of some of the cases I discovered follows. I didn't do a thorough
analysis:

U+0374 = U+02B9, but the glyphs are different: ʹʹ
U+0387 = U+00B7, but the glyphs are different: ··

I think that for each of such cases, DejaVu should select the most appropriate
of the two shapes (or one in the middle) and then create two CMAP entries from
the two characters to the same glyph (or make one use glyph references to the
other). In the case the characters really need to appear differently in Greek
contexts, DejaVu should add the greek alternates under names such as
"periodcentered.greek" and use the appropriate OpenType features to handle this.

Of course, it may be said that the rendering engines should take care of Unicode
equivalence, instead of passing the character directly to the font. Even if that
is the desired behavior, a rendering engine that normalizes strings to, say, NFC
before passing it to the font layout subsystem, will never display the Greek
forms of the glyphs (U+0374 and U+0387) and will always use U+00B7 and U+02B9 forms.

Comment 1 Denis Jacquerye 2006-11-15 17:21:35 UTC

You mean U+0384, instead of U+0387, being decomposable to U+00B7.

Comment 2 Denis Jacquerye 2006-11-15 17:28:27 UTC

(In reply to comment #1)
> You mean U+0384, instead of U+0387, being decomposable to U+00B7.
> 
Sorry, got mixed up, your correct.

Comment 3 Ben Laenen 2006-11-15 17:47:01 UTC

about dotcentered: in sans and mono the anoteleia are references to 
dotcentered and should therefor look the same. In Serif that's apparently not 
the case (should be corrected), but the glyph shapes are the same. So I don't 
know why you say they look different.

About the Greek number sign and prime: I don't want to have a number sign that 
looks like the current prime, so I won't touch that. I don't know if prime 
could be changed. The option with local glyph variants you're suggesting seems 
to me a bit "ahead of technology", as I don't see locl support in Pango and Qt 
very soon. Therefor I'd prefer it to leave it that way until we're sure that a 
user that types the Greek number sign gets the glyph he is expecting.

Comment 4 Roozbeh Pournader 2006-11-15 18:37:44 UTC

(In reply to comment #3)
> but the glyph shapes are the same. So I don't 
> know why you say they look different.

I just checked, and it seems that they are really the same in the font. My
confusing should have come from a hinting/anti-aliasing issue, it seems.

> About the Greek number sign and prime: I don't want to have a number sign that 
> looks like the current prime, so I won't touch that. I don't know if prime 
> could be changed.

Looking at the Unicode charts, it seems that prime should be changed. It is not
upright at all in the charts.

Should I attach a patch?

Comment 5 Roozbeh Pournader 2006-11-15 18:44:32 UTC

(In reply to comment #3)
> a user that types the Greek number sign gets the glyph he is expecting.

BTW, it seems that the Unicode Technical Committee is planning to deprecate
U+0374, U+0387, and a bunch of other characters (U+0344, U+2126, ...) in 5.1. So
the only way to make a user get the proper rendering in the future is by making
sure U+02B9, U+00B7 etc. work properly for Greek.

Comment 6 Ben Laenen 2006-11-16 03:51:22 UTC

(In reply to comment #4)
> I just checked, and it seems that they are really the same in the font. My
> confusing should have come from a hinting/anti-aliasing issue, it seems.

The Serif anoteleia isn't hinted, while Serif dotcentered is, so that could 
cause the different look. It could also be that you have the autohinter 
enabled and the autohinter is making different choices for both glyphs 
(wouldn't be the first time that references look differently than original 
glyph with autohinter).

> Looking at the Unicode charts, it seems that prime should be changed. It is
> not upright at all in the charts.
> 
> Should I attach a patch?

Be my guest :-) Easiest way would be to make prime a reference to the Greek 
number sign. Don't forget to adjust double prime as well if you make a patch.

Comment 7 Roozbeh Pournader 2006-11-19 09:05:18 UTC

Created attachment 7826 [details] [review]
patch to fix the U+0387 = U+00B7 part

Comment 8 Roozbeh Pournader 2006-11-19 11:56:58 UTC

Created attachment 7832 [details] [review]
Patch to handle the prime and double prime case

Comment 9 Roozbeh Pournader 2006-11-19 13:42:55 UTC

More equivalent things that must look but don't look like each other:

U+0343 COMBINING GREEK KORONIS and U+0313 COMBINING COMMA ABOVE
* In Sans. U+0343 doesn't exist in Mono and Serif.

U+1FBE GREEK PROSGEGRAMMENI and U+03B9 GREEK SMALL LETTER IOTA
* This pair are pretty weird as they are also different in Unicode charts.
Investigating.

U+3008 LEFT ANGLE BRACKET and U+2329 LEFT-POINTING ANGLE BRACKET
U+3009 RIGHT ANGLE BRACKET and U+232A RIGHT-POINTING ANGLE BRACKET
* U+3008 and U+3009 don't exist in the fonts.

Comment 10 Ben Laenen 2006-11-19 14:05:54 UTC

(In reply to comment #9)
> U+1FBE GREEK PROSGEGRAMMENI and U+03B9 GREEK SMALL LETTER IOTA
> * This pair are pretty weird as they are also different in Unicode charts.
> Investigating.

that, and the fact that I chose to have a iota ypogegrammeni (written below 
the capital letter instead of after it, I wrote an extensive list of arguments 
about that to the mailing list when I did that; see 
http://www.tlg.uci.edu/~opoudjis/unicode/unicode_adscript.html for best info 
available on the net about the iota ypo-/prosgegrammeni. Let's just say 
Unicode messed it up a little :-). But really, no-one should type U+1FBE, but 
use the proper capital vowel with it, so I wouldn't mind changing that to a 
normal lowercase iota.

Comment 11 Roozbeh Pournader 2006-11-19 17:44:53 UTC

Created attachment 7834 [details] [review]
patch to make U+0343 and U+0313 equal

This is accoring to discussions with Ben on IRC. The glyph for U+0343 is added
to Serif and Mono, while for Sans they are made references. The previous status
of Sans with these were weird: Regular had it refer to another glyph, Bold and
Oblique had outlines, but BoldOblique was fine!

Comment 12 Roozbeh Pournader 2006-11-19 18:20:36 UTC

Created attachment 7835 [details] [review]
patch to remove U+2329/232A and use or add U+27E8/U+27E9

This patch removes U+2329 and U+232A because of their CJK properties and their
being deprecated for math use, and moves the outlines to U+27E8 and U+27E9
instead, which are the recommend characters to be used for math. In Sans fonts,
the characters U+27EA and U+27EB had references to the CJK chars, which were
changed to the math chars.

Comment 13 Roozbeh Pournader 2006-11-21 06:25:22 UTC

All the four patches committed. The only remaining case is U+1FBE GREEK
PROSGEGRAMMENI and U+03B9 GREEK SMALL LETTER IOTA, for which I am waiting for
answers from UTC.

Comment 14 Roozbeh Pournader 2006-12-31 03:37:40 UTC

(In reply to comment #13)
> All the four patches committed. The only remaining case is U+1FBE GREEK
> PROSGEGRAMMENI and U+03B9 GREEK SMALL LETTER IOTA, for which I am waiting for
> answers from UTC.

I have not received any answer from UTC and it seems that I won't unless we do a
formal proposal and explaining the whole situation, which I am unwilling to do.

Still, from a conformance clause from Unicode 5.0:
"C6  A process shall not assume that the interpretations of two
canonical-equivalent character sequences are distinct." (page 71)

I am taking that (and the comments that come after it) to mean that we MUST
treat these two the same. As Ben is fine with changing U+1FBE glyph, I'll go and
do that anyway.

Comment 15 Benjamin Close 2008-01-11 02:36:35 UTC

Bugzilla Upgrade Mass Bug Change

NEEDSINFO state was removed in Bugzilla 3.x, reopening any bugs previously listed as NEEDSINFO.

  - benjsc
    fd.o Wrangler

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.