Some cannonically equivalent Unicode sequences are rendered differently when
using some (all?) Deja Vu fonts. This makes the fonts not conforming to the
Unicode standard, as the standard requires all canonically equivalent sequences
to be rendered the same way.
A short list of some of the cases I discovered follows. I didn't do a thorough
U+0374 = U+02B9, but the glyphs are different: ʹʹ
U+0387 = U+00B7, but the glyphs are different: ··
I think that for each of such cases, DejaVu should select the most appropriate
of the two shapes (or one in the middle) and then create two CMAP entries from
the two characters to the same glyph (or make one use glyph references to the
other). In the case the characters really need to appear differently in Greek
contexts, DejaVu should add the greek alternates under names such as
"periodcentered.greek" and use the appropriate OpenType features to handle this.
Of course, it may be said that the rendering engines should take care of Unicode
equivalence, instead of passing the character directly to the font. Even if that
is the desired behavior, a rendering engine that normalizes strings to, say, NFC
before passing it to the font layout subsystem, will never display the Greek
forms of the glyphs (U+0374 and U+0387) and will always use U+00B7 and U+02B9 forms.
You mean U+0384, instead of U+0387, being decomposable to U+00B7.
(In reply to comment #1)
> You mean U+0384, instead of U+0387, being decomposable to U+00B7.
Sorry, got mixed up, your correct.
about dotcentered: in sans and mono the anoteleia are references to
dotcentered and should therefor look the same. In Serif that's apparently not
the case (should be corrected), but the glyph shapes are the same. So I don't
know why you say they look different.
About the Greek number sign and prime: I don't want to have a number sign that
looks like the current prime, so I won't touch that. I don't know if prime
could be changed. The option with local glyph variants you're suggesting seems
to me a bit "ahead of technology", as I don't see locl support in Pango and Qt
very soon. Therefor I'd prefer it to leave it that way until we're sure that a
user that types the Greek number sign gets the glyph he is expecting.
(In reply to comment #3)
> but the glyph shapes are the same. So I don't
> know why you say they look different.
I just checked, and it seems that they are really the same in the font. My
confusing should have come from a hinting/anti-aliasing issue, it seems.
> About the Greek number sign and prime: I don't want to have a number sign that
> looks like the current prime, so I won't touch that. I don't know if prime
> could be changed.
Looking at the Unicode charts, it seems that prime should be changed. It is not
upright at all in the charts.
Should I attach a patch?
(In reply to comment #3)
> a user that types the Greek number sign gets the glyph he is expecting.
BTW, it seems that the Unicode Technical Committee is planning to deprecate
U+0374, U+0387, and a bunch of other characters (U+0344, U+2126, ...) in 5.1. So
the only way to make a user get the proper rendering in the future is by making
sure U+02B9, U+00B7 etc. work properly for Greek.
(In reply to comment #4)
> I just checked, and it seems that they are really the same in the font. My
> confusing should have come from a hinting/anti-aliasing issue, it seems.
The Serif anoteleia isn't hinted, while Serif dotcentered is, so that could
cause the different look. It could also be that you have the autohinter
enabled and the autohinter is making different choices for both glyphs
(wouldn't be the first time that references look differently than original
glyph with autohinter).
> Looking at the Unicode charts, it seems that prime should be changed. It is
> not upright at all in the charts.
> Should I attach a patch?
Be my guest :-) Easiest way would be to make prime a reference to the Greek
number sign. Don't forget to adjust double prime as well if you make a patch.
Created attachment 7826 [details] [review]
patch to fix the U+0387 = U+00B7 part
Created attachment 7832 [details] [review]
Patch to handle the prime and double prime case
More equivalent things that must look but don't look like each other:
U+0343 COMBINING GREEK KORONIS and U+0313 COMBINING COMMA ABOVE
* In Sans. U+0343 doesn't exist in Mono and Serif.
U+1FBE GREEK PROSGEGRAMMENI and U+03B9 GREEK SMALL LETTER IOTA
* This pair are pretty weird as they are also different in Unicode charts.
U+3008 LEFT ANGLE BRACKET and U+2329 LEFT-POINTING ANGLE BRACKET
U+3009 RIGHT ANGLE BRACKET and U+232A RIGHT-POINTING ANGLE BRACKET
* U+3008 and U+3009 don't exist in the fonts.
(In reply to comment #9)
> U+1FBE GREEK PROSGEGRAMMENI and U+03B9 GREEK SMALL LETTER IOTA
> * This pair are pretty weird as they are also different in Unicode charts.
that, and the fact that I chose to have a iota ypogegrammeni (written below
the capital letter instead of after it, I wrote an extensive list of arguments
about that to the mailing list when I did that; see
http://www.tlg.uci.edu/~opoudjis/unicode/unicode_adscript.html for best info
available on the net about the iota ypo-/prosgegrammeni. Let's just say
Unicode messed it up a little :-). But really, no-one should type U+1FBE, but
use the proper capital vowel with it, so I wouldn't mind changing that to a
normal lowercase iota.
Created attachment 7834 [details] [review]
patch to make U+0343 and U+0313 equal
This is accoring to discussions with Ben on IRC. The glyph for U+0343 is added
to Serif and Mono, while for Sans they are made references. The previous status
of Sans with these were weird: Regular had it refer to another glyph, Bold and
Oblique had outlines, but BoldOblique was fine!
Created attachment 7835 [details] [review]
patch to remove U+2329/232A and use or add U+27E8/U+27E9
This patch removes U+2329 and U+232A because of their CJK properties and their
being deprecated for math use, and moves the outlines to U+27E8 and U+27E9
instead, which are the recommend characters to be used for math. In Sans fonts,
the characters U+27EA and U+27EB had references to the CJK chars, which were
changed to the math chars.
All the four patches committed. The only remaining case is U+1FBE GREEK
PROSGEGRAMMENI and U+03B9 GREEK SMALL LETTER IOTA, for which I am waiting for
answers from UTC.
(In reply to comment #13)
> All the four patches committed. The only remaining case is U+1FBE GREEK
> PROSGEGRAMMENI and U+03B9 GREEK SMALL LETTER IOTA, for which I am waiting for
> answers from UTC.
I have not received any answer from UTC and it seems that I won't unless we do a
formal proposal and explaining the whole situation, which I am unwilling to do.
Still, from a conformance clause from Unicode 5.0:
"C6 A process shall not assume that the interpretations of two
canonical-equivalent character sequences are distinct." (page 71)
I am taking that (and the comments that come after it) to mean that we MUST
treat these two the same. As Ben is fine with changing U+1FBE glyph, I'll go and
do that anyway.
Bugzilla Upgrade Mass Bug Change
NEEDSINFO state was removed in Bugzilla 3.x, reopening any bugs previously listed as NEEDSINFO.