Currently the compose file section for Greek Polytonic (re)-uses "dead_tilde" ( ̃, 0x303) for the Perispomeni mark ( ͂, 0x342). The two are not equivalent a la Unicode.
This causes problems in an input method update currently taking place in GTK+ (update of compose sequences found in GTK+) and potentially elsewhere.
I'll update the Compose files for en_US.UTF-8 and el_GR.UTF-8 once the patch gets included.
Created attachment 13650 [details] [review]
Add dead_perispomeni to keysymdef.h
Could you give some more details about the problems with GTK+ that would be solved by this addition?
I think adding a new keysym for dead perispomeni can cause lots of headache before this change and changes depending on it propagate to all users. The addition of dead psili and dasia keysyms was needed because the previously used keysyms were obviously incorrect, leading people to try and fix things, in turn making many users unable to enter aspirations when typing Polytonic Greek because of incompatibilities. This is evidenced by various bug reports and discussion in mailing lists.
Perispomeni, in contrast, never caused any problems, and the use of dead_tilde to enter it doesn't seem to me incorrect, even though combining perispomeni, tilde, inverted breve and circumflex are all different characters in Unicode. Dead keys don't have to map one-to-one to combining characters.
Unless a compelling technical reason can be found (and I don't think of any, since perispomeni works fine with the current implementation), I'd say that a new keysym is unneeded.
There is some work going on to update the GTK+ compose sequences table (which is replicated from Xorg).
There is an optimisation in this table that can reduce the size, by omitting compose sequences that correspond to the Unicode decompositions. For example, we do not need to put at all in GTK+ the compose sequences
GDK_dead_acute, GDK_dead_diaeresis, GDK_Greek_iota, 0, 0, 0x0390, /* GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS */
GDK_dead_diaeresis, GDK_dead_acute, GDK_Greek_iota, 0, 0, 0x0390, /* GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS */
because we can deduce them. We build a (decomposed) character based on the input and try if normalisation produces a single Unicode character.
In the compose table in GTK+ we then put what compose sequences are left over.
Tilde and Perispomeni are both used in a large number of compose sequences, and it defeats the optimisation if we have to choose (in GTK+ and elsewhere) to favour one group of sequences over the other.
keysymdef.h got recently several new definitions for dead keys,
#define XK_dead_belowring 0xfe67
#define XK_dead_belowmacron 0xfe68
#define XK_dead_belowcircumflex 0xfe69
#define XK_dead_belowtilde 0xfe6a
#define XK_dead_belowbreve 0xfe6b
#define XK_dead_belowdiaeresis 0xfe6c
This makes things much more clearer for the latin unicode block.
Do you still have objections against dead_perispomeni;
(I don’t remember seeing this before today....)
The dead keys were never meant to be one to one and unto with UCS characters.
There is no overlap in the UCS between TILDE and PERISPOMENI.
Their only difference in Unicode’s UCD is (from the xml version):
- cp="0303" na="COMBINING TILDE" na1="NON-SPACING TILDE"
+ cp="0342" na="COMBINING GREEK PERISPOMENI" na1=""
ie, codepoints and names.
Had Unicode’s and WG2’s current policies been in effect back then (they were both added in Unicode 1.1), I suspect they would have been unified.
De-unifying the dead key does not seem right.
That said, when I added XK_dead_psili and XK_dead_dasia, I made them aliases of XK_dead_abovecomma and XK_dead_abovereversedcomma; I would have no problem adding the symbol XK_dead_perispomeni as an alias for XK_dead_tilde (0xfe53), if doing do would be useful.
I did push the addition of dead_perispomeni as an alias for dead_tilde
(akin to how dead_psili is an alias for dead_abovecomma and dead_dasia
for dead_abovereversedcomma) with commit 0846d7adfe790897e879c5ed53d4f81db459a20d.
This change will be part of xproto-7.0.14. I’ll wait a couple of days
before making the release to see whether anything else comes up.
Having individual keysyms for each diacritic makes it really simple to eliminate many of the dead key sequences, thus saving space.
For the case of GNOME, the glib library knows how pre-composed characters can be decomposed and then composed again. In this way, we can actually omit around 2000 compose sequences on the compose table (2000 * (5+1) * 4 bytes = 48KB).
As a sidenote, if Xorg where to be shipped with glib (or a similar library that has Unicode functions such as decomposition/composition), I would be happy to code this part for Xorg.
I should be able to work around the issue of not having unique keysyms for dead keys, though the code would not look as nice as it is at this stage.
> and check_algorithmically().
That looks like it conflates dead keys with combining characters. One
of the points of the earlier discussions (cf the dead_psili/dead_dasia
bug¹) was that they are completely different beasts. That the dead keys
are meant to represent glyph components rather than characters.
I tried/try to follow that lead.
I’m all for saving space when storing the Compose table — my compose
cache file for en_US.UTF-8 is 304 Ko, the text file is 620 Ko and the
Compose.pre in the repo is 616 Ko. I would not at all mind using less
VM for compose sequences.
Perhaps there is some way to get the space savings while keeping the
distinction between the two concepts. Especially since there is no
overlap between the scripts. Scripts like Greek which use perispomeni
do not use tilde. And visa-versa. Just like commas above and reversed
commas above are not used by scripts which use dasia and psili.²
(At least for those encoded as single characters in the UCS.)
It seems, then, that it should be easy to unify them in the tables, if
not in the character names.
[Starting to fall asleep; if I missed anything I’ll have to add it later.]
1] bug #11930 if I’m not mistaken.
2] that sentence would look better if I knew ελληνικόσ pluralization rules….
A patch has been submitted to GTK+ (part of GNOME 2.24) that has a workaround so that perispomeni works.
When the user types a compose sequence that involves 'perispomeni', the system emits the dead_tilde keysym. Once the sequence is completed, GTK+ checks if the base character is Greek, and if so, it switches 'tilde' to the real 'perispomeni'.
Thus it works.