Bug 8195

Summary: Ligatures problem in Arabic keyboard layout
Product: xorg Reporter: Youssef Chahibi <chahibi>
Component: Server/GeneralAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED MOVED QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: high CC: bugs+behnam, dr.khaled.hosny, freedesktop, hedayaty, jg, mahmoud.kassem, moceap, msameer, simos.bugzilla, wearabnet
Version: git   
Hardware: All   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on: 4575    
Bug Blocks: 4101, 13894    

Description Youssef Chahibi 2006-09-08 11:17:34 UTC
In the Arabic keyboard layout, it is possible to type لا (la), in two ways. 
Either by typing ل+ا (G+H) [ara,xkb: Arabic_lam + Arabic_alef   ] or with one 
button (B) [ara,xkb: 0x100fefb  ]. The problem is that (G+H) and (B) are not 
encoded the same way. The first is from the normal Arabic table, and the second 
from the Arabic representation forms (Used by font designers to design complex 
ligatures). This creates many problems, since the two (la)s don't have the same 
value. For example, Google or grep won't give the same results.

https://bugs.freedesktop.org/enter_bug.cgi?product=xorg0x100fefb (ﻻ) = 
Arabic_lam (ل) + Arabic_alef (ا)

Three other keys have the same problem


0x100fef9 (ﻹ) = Arabic_lam (ل) + Arabic_hamzaunderalef (إ)
0x100fef7 (ﻷ) = Arabic_lam (ل) + Arabic_hamzaonalef (أ)
0x100fef5 (ﻵ) = Arabic_lam (ل) + Arabic_maddaonalef (آ)

The character codes and names are taken from /usr/share/X11/xkb/symbols/ara

So, the desired behavior is that pressing 0x100fefb, 0x100fef9, 0x100fef7 and 
0x100fef5 gives the same result as Arabic_lam (ل) + Arabic_alef (ا), Arabic_lam 
(ل) + Arabic_hamzaunderalef (إ),Arabic_lam (ل) + Arabic_hamzaonalef (أ), 
Arabic_lam (ل) + Arabic_maddaonalef (آ) respectively. For instance, pressing B 
should result in the production of 2 characters and not just one presentation 
which is what is happening today.

Windows XP does not have this problem.
Comment 1 Behdad Esfahbod 2006-09-08 12:23:26 UTC
The analysis is completely correct.  The presentation forms should die.  A
single keystroke may generate the entire sequence for the ligature.  I don't
remember how this is done with Xkb though.
Comment 2 Daniel Stone 2006-09-09 00:48:11 UTC
it's not currently possible to generate multiple keysyms with one key.
Comment 3 Mohammed Sameer 2006-09-09 03:11:24 UTC
What does it take to enable multiple keysyms with 1 key ?
Comment 4 Daniel Stone 2006-09-09 03:23:06 UTC
see bug #4575.  it would require a change to xkbcomp, the keymap compiler, to
deal with multiple keysyms; a protocol change for when you send the map over the
wire; removing xkm; dealing with multiple keysyms in the server.

it's non-trivial.
Comment 5 Jim Gettys 2006-09-12 12:31:28 UTC
(In reply to comment #4)
> see bug #4575.  it would require a change to xkbcomp, the keymap compiler, to
> deal with multiple keysyms; a protocol change for when you send the map over the
> wire; removing xkm; dealing with multiple keysyms in the server.
> 
> it's non-trivial.

Do you have a guess as to the size of the project?  man-days, weeks, months?
                                  Thanks,
                                           - Jim
Comment 6 Daniel Stone 2006-09-12 13:25:44 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > see bug #4575.  it would require a change to xkbcomp, the keymap compiler, to
> > deal with multiple keysyms; a protocol change for when you send the map over the
> > wire; removing xkm; dealing with multiple keysyms in the server.
> > 
> > it's non-trivial.
> 
> Do you have a guess as to the size of the project?  man-days, weeks, months?

working full time, I'd guess a month or two? who knows.
Comment 7 Behdad Esfahbod 2006-09-12 14:23:06 UTC
What if we just remove those four from the keyboard?  They are not really needed
afterall and are just leftovers from the typewriter era.  Just remove them and
use the precious keyboard real estate for more important keys.  What about
forming a working group and devising a better keyboard layout with several
per-territory layouts all merged in.?   We did that for Persian and we are
really happy about the outcome.
Comment 8 Youssef Chahibi 2006-09-12 14:39:45 UTC
(In reply to comment #7)
> What if we just remove those four from the keyboard?  They are not really 
needed
> afterall and are just leftovers from the typewriter era.  Just remove them 
and
> use the precious keyboard real estate for more important keys.  What about
> forming a working group and devising a better keyboard layout with several
> per-territory layouts all merged in.?   We did that for Persian and we are
> really happy about the outcome.

I agree. I really the Persian keyboard layout for the broad range of Unicode 
Bidi control characters and the various keys it contains. The Arabic keyboard 
layout should be improved again (New bug report?) based on the Persian layout. 
I don't think those keys are even used, may be in the past when ligatures were 
hard to input in typewriter. At least, we can make a new variant 
named "typewriter" for Arabic along the other variants (Azerty, Querty, Indic 
digits, Arabic digits).
Comment 9 Behnam Esfahbod [:zwnj] 2006-09-13 01:36:51 UTC
Youssef, I can help on the Arabic keyboard layout.  Just let's start it on the
wiki, our group mail, when it finished, you can open a bug and ask to commit it.
Comment 10 Jim Gettys 2006-09-13 19:31:52 UTC
Very interesting suggestion to solve this by keyboard layout: we have the luxury
of building our own keyboards; we are in the middle of layout right now.

This won't help other existing keyboards of course.

Behnam, IIRC we plan to see you here at OLPC when the Thai delegation visits us
very soon. In the meanwhile, I expect we can get interested people proposed
keyboard layouts for review.
Comment 11 Behdad Esfahbod 2006-09-14 09:14:06 UTC
(In reply to comment #10)
> Very interesting suggestion to solve this by keyboard layout: we have the luxury
> of building our own keyboards; we are in the middle of layout right now.
> 
> This won't help other existing keyboards of course.

Right.  I didn't have that in mind.  For Persian, most people don't use a
labeled keyboard anyway, so the layout is not hardcoded by the hardware.


> Behnam, IIRC we plan to see you here at OLPC when the Thai delegation visits us
> very soon. In the meanwhile, I expect we can get interested people proposed
> keyboard layouts for review.

Err...  I'm "Behdad", going to be in Boston Oct 7th..22th.  "Behnam" is my
brother in Iran :-).

Behnam, Youssef, can you two send out a message to the Arabeyes list to gather a
list of characters, then review current layouts and come up with a proposed one.
 CC me in all communications and I'll try to help time permitting.  Thanks.
Comment 12 Abdullah Ibn Hamad Al-Marri 2007-02-19 03:02:39 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > Very interesting suggestion to solve this by keyboard layout: we have the luxury
> > of building our own keyboards; we are in the middle of layout right now.
> > 
> > This won't help other existing keyboards of course.
> 
> Right.  I didn't have that in mind.  For Persian, most people don't use a
> labeled keyboard anyway, so the layout is not hardcoded by the hardware.
> 
> 
> > Behnam, IIRC we plan to see you here at OLPC when the Thai delegation visits us
> > very soon. In the meanwhile, I expect we can get interested people proposed
> > keyboard layouts for review.
> 
> Err...  I'm "Behdad", going to be in Boston Oct 7th..22th.  "Behnam" is my
> brother in Iran :-).
> 
> Behnam, Youssef, can you two send out a message to the Arabeyes list to gather a
> list of characters, then review current layouts and come up with a proposed one.
>  CC me in all communications and I'll try to help time permitting.  Thanks.

Hello,

Is this the samae bug? 
https://bugs.freedesktop.org/show_bug.cgi?id=9100
Comment 13 Youssef Chahibi 2007-02-19 03:16:56 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > Very interesting suggestion to solve this by keyboard layout: we have the luxury
> > of building our own keyboards; we are in the middle of layout right now.
> > 
> > This won't help other existing keyboards of course.
> 
> Right.  I didn't have that in mind.  For Persian, most people don't use a
> labeled keyboard anyway, so the layout is not hardcoded by the hardware.
> 
> 
> > Behnam, IIRC we plan to see you here at OLPC when the Thai delegation visits us
> > very soon. In the meanwhile, I expect we can get interested people proposed
> > keyboard layouts for review.
> 
> Err...  I'm "Behdad", going to be in Boston Oct 7th..22th.  "Behnam" is my
> brother in Iran :-).
> 
> Behnam, Youssef, can you two send out a message to the Arabeyes list to gather a
> list of characters, then review current layouts and come up with a proposed one.
>  CC me in all communications and I'll try to help time permitting.  Thanks.

Sorry, Behdad for the delay. I don't think suggesting a new layout is a solution. Many typers are used to the current one and those ligatures are printed in all available keyboards. Apparently, ligatures on the keyboard inherit from old typewriters that don't support shaping ل + ا to a ligature.
I think the abilility to type a combination of characters should be enabled since it is needed by many languages other than Arabic. The most serious case is the Kurdish Arabic script layout: To type the phonem /e/ they need to type Ha' (ه) + ZWNJ which is not a solution for a so frequently typed letter. See http://lists.arabeyes.org/archives/general/2006/August/msg00010.html .
Comment 14 Behdad Esfahbod 2007-02-19 11:16:26 UTC
(In reply to comment #13)

> Sorry, Behdad for the delay. I don't think suggesting a new layout is a
> solution. Many typers are used to the current one and those ligatures are
> printed in all available keyboards. Apparently, ligatures on the keyboard
> inherit from old typewriters that don't support shaping ل + ا to a ligature.
> I think the abilility to type a combination of characters should be enabled
> since it is needed by many languages other than Arabic. The most serious case
> is the Kurdish Arabic script layout: To type the phonem /e/ they need to type
> Ha' (ه) + ZWNJ which is not a solution for a so frequently typed letter. See
> http://lists.arabeyes.org/archives/general/2006/August/msg00010.html .

I agree that the feature is generally useful in some non-major languages (Kurdish and Uighar come to mind).  What I don't agree is that we should be limited by the limitations of old typewriters these days.  You've got to change *some* day.  We've done that in Persian and are really happy so far.  Apple has adopted our new layout.  Linux has had it for long.  And there are multiple drivers for Windows.  No hardware still has it printed, but that will take some time.  The positive side is that the new layout is a lot better and everyone who has tried it agrees.

And last but not least, the olpc machine doesn't have to use the legacy layout.
Comment 15 Mahmoud Kassem 2007-02-23 01:03:48 UTC
(In reply to comment #14)
This is about adding support for a layout and not limiting new layouts. Adding this feature does not affect the Persian layout. Its all about removing limitations here.
Comment 16 Behdad Esfahbod 2007-02-23 08:16:39 UTC
Fine.  Still, for the OLPC, I recommend a new, simplified, layout.
Comment 17 Mahmoud Kassem 2007-02-23 09:02:28 UTC
Sure, I agree about that. As much as I know it is already taken care of in the new OLPC Arabic keyboard: http://dev.laptop.org/ticket/420

(In reply to comment #16)
> Fine.  Still, for the OLPC, I recommend a new, simplified, layout.
> 
Comment 18 Daniel Stone 2007-02-27 01:33:29 UTC
Sorry about the phenomenal bug spam, guys.  Adding xorg-team@ to the QA contact so bugs don't get lost in future.
Comment 19 Amir Hedayaty 2007-04-09 06:19:27 UTC
There are many languages which may get use of having multiple charecters for a key. Most people change their keyboard layout to to current condition, if this feature is added many people may use it. 

By the way, having this feature is not too optimistic. Lack of this feature and complexity of X keyboard section is realy a bug. I think, there is a need to clean up this part of X server and get rid of all its bugs. 
Comment 20 Khaled Hosny 2008-06-27 17:56:29 UTC
We managed to get around this limitation by using Composite keys that map the presentation forms lam-alef ligatures to their "decomposed" codes. See #17228.
Comment 21 Simos Xenitellis 2008-09-08 12:40:36 UTC
(In reply to comment #20)
> We managed to get around this limitation by using Composite keys that map the
> presentation forms lam-alef ligatures to their "decomposed" codes. See #17228.
> 

I believe you refer to the compose sequences at the end of the file,
http://cgit.freedesktop.org/xorg/lib/libX11/tree/nls/en_US.UTF-8/Compose.pre
(warning, big HTML file), which are

6290	XCOMM
6291	XCOMM Arabic Lam-Alef ligatures
6292	XCOMM
6293	
6294	<UFEFB> : "لا" # ARABIC LIGATURE LAM WITH ALEF
6295	<UFEF7> : "لأ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE
6296	<UFEF9> : "لإ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA BELOW
6297	<UFEF5> : "لآ" # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE

I am interested in adding these sequences to GTK+ IM.
Could you please tell me how I can test that these work?

Which Arabic layout/variant shall I activate, 
what keys I should press and 
what I am expected to get?
Comment 22 James Cloos 2008-09-08 15:55:56 UTC
> 6294    <UFEFB> : "لا" # ARABIC LIGATURE LAM WITH ALEF
> 6295    <UFEF7> : "لأ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE
> 6296    <UFEF9> : "لإ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA BELOW
> 6297    <UFEF5> : "لآ" # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE

First, in x11proto/keysymdef.h (typically installed as
/usr/include/X11/keysymdef.h) you will find the formula for converting
between a UXXXX keysym and its integer value:  0x01000000 + the hex value
from the sym.  

As such, UFEFB is 0x0100FEFB, etc.

> Which Arabic layout/variant shall I activate, 

A grep through the xkeyboard-config repo shows that nothing there uses them.

However, a grep though the output of »git log -p -M -C symbols/ara«
shows that prior to commit b772edc289a844539ee096b2bb2a37bc74e1ef06
they were at:

 key <AD05> {  [      Arabic_feh,       0x100fef9      ]     };
 key <AC05> {  [      Arabic_lam,       0x100fef7      ]     };
 key <AB05> {  [       0x100fefb,      0x100fef5       ]     };

> what keys I should press and 

if your keyboard has US labels and you select the ara layout of an
old-enough version of xkeyboard-config, then:

 UFEF9 would be on SHIFT T
 UFEF7 would be on SHIFT G
 UFEFB would be on B
 UFEF5 would be on SHIFT B

If you do not have US labels (printed on the physical keyboard),
you can see from the key names above that each is the fifth alphabetic
key from the left in the fourth through second rows.  (The spacebar row
is of course the first row.)

> what I am expected to get?

When pressing a key which is mapped to eg UFEFB you should get a string of the
two characters »ل« U+0644 ARABIC LETTER LAM and »ا« U+0627 ARABIC LETTER ALEF.

The other three also give two character strings, but will U+0623, U+0625
or U+0622 instead of U+0627.
Comment 23 Simos Xenitellis 2008-09-08 19:53:21 UTC
Thanks James.

I put together an initial patch, available at
http://bugzilla.gnome.org/show_bug.cgi?id=537457
Comment 24 Khaled Hosny 2008-09-10 15:17:25 UTC
(In reply to comment #21)
> Which Arabic layout/variant shall I activate, 
> what keys I should press and 
> what I am expected to get?
> 

You'll need to apply the second patch in #13894 to restore the removed keys, pressing the keys corresponding to b, shift+b, shift+g and shift+t with the Arabic layout should give you لا and لآ and لأ and لإ respectively as two characters not a single character (presentation form). 
Comment 25 Simos Xenitellis 2008-09-10 15:35:59 UTC
(In reply to comment #24)
> (In reply to comment #21)
> > Which Arabic layout/variant shall I activate, 
> > what keys I should press and 
> > what I am expected to get?
> > 
> 
> You'll need to apply the second patch in #13894 to restore the removed keys,
> pressing the keys corresponding to b, shift+b, shift+g and shift+t with the
> Arabic layout should give you لا and لآ and لأ and لإ respectively as
> two characters not a single character (presentation form). 
> 

Thanks, I managed to figure out my way with James's explanation.

The status we are now at, is this

1. GTK+ applications (Openoffice, Firefox, all of GNOME, Gimp, inkscape, etc) require to have their own copy of the compose sequences so that it works. Otherwise, the default installation will not be able to work for these compose sequences. 

2. I produced a patch for this at
http://bugzilla.gnome.org/show_bug.cgi?id=537457
which covers Khmer and Arabic, the only two scripts with such compose sequences.
Of course, I trid it out and it works fine.

3. If you want to go for it and use these compose sequences, the situation is is like this: the next GTK+ stable release comes in a week or so, so it's tough to get this patch included now. It looks feasible to get the patch in six months time, when a new stable release comes out.

4. I am not sure what input method you use for the OLPC. If you use GTK+ IM, I am happy to work on a suitable patch so that it works for you, and you can use now.
Comment 26 Mosaab Alzoubi 2013-08-01 02:38:32 UTC
2006 ------------> 2013 

Any News ????
Comment 27 GitLab Migration User 2018-12-13 22:17:12 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/xserver/issues/346.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.