22145 – capslock+Greek_finalsmallsigma does not produce Greek_SIGMA

Bug 22145 - capslock+Greek_finalsmallsigma does not produce Greek_SIGMA

Summary: capslock+Greek_finalsmallsigma does not produce Greek_SIGMA

Status:	RESOLVED MOVED

Alias:	None

Product:	xorg
Classification:	Unclassified
Component:	Lib/Xlib (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Xorg Project Team
QA Contact:	Xorg Project Team

URL:
Whiteboard:
Keywords:	NEEDINFO

Depends on:
Blocks:

Reported:	2009-06-07 23:15 UTC by Jennie Petoumenou
Modified:	2018-08-10 20:09 UTC (History)
CC List:	4 users (show)

See Also:
i915 platform:
i915 features:

Attachments
Greek layout, fixes behaviour of capslock+finalsmallsigma (668 bytes, patch) 2009-06-07 23:19 UTC, Jennie Petoumenou	no flags	Details \| Splinter Review
modified xkbcomp output (54.78 KB, application/octet-stream) 2009-06-11 21:08 UTC, Jennie Petoumenou	no flags	Details
Show Obsolete (1) View All

Description Jennie Petoumenou 2009-06-07 23:15:49 UTC

When using the Greek keyboard layout, pressing capslock+Greek_finalsmallsigma (ς) does not produce the expected Greek_SIGMA (Σ). Instead, it produces Greek_finalsmallsigma (ς). However, according to Greek spelling rules, Greek_finalsmallsigma (ς) is capitalized as Greek_SIGMA (Σ). (Shift+Greek_finalsmallsigma correctly produces Greek_SIGMA, so it is only the capslock behaviour that is problematic.)

Comment 1 Jennie Petoumenou 2009-06-07 23:19:55 UTC

Created attachment 26525 [details] [review]
Greek layout, fixes behaviour of capslock+finalsmallsigma

Comment 2 Sergey V. Udaltsov 2009-06-09 03:54:58 UTC

Could you please try caps:shiftlock option?

Also, what about other keys (epsilon, rho etc) - should they be fixed as well?

Comment 3 Simos Xenitellis 2009-06-09 06:36:53 UTC

(In reply to comment #2)
> Could you please try caps:shiftlock option?

The caps:shiftlock option is not affected by this bug. When caps:shiftlock is enabled, Σ (capital sigma) appears as expected in both before and after cases.

> Also, what about other keys (epsilon, rho etc) - should they be fixed as well?
> 

The rest of the alphabetic keys for Greek work as expected.


There is a minor issue with ;: and dead_acute/dead_diaeresis that are affected by CapsLock though they should probably not. Is there a way to specify they should not be affected by CapsLock?

Jennie, should ;: and dead_acute/dead_diaeresis be affected by CapsLock?

Comment 4 Jennie Petoumenou 2009-06-09 12:09:02 UTC

(In reply to comment #3)
> (In reply to comment #2)
> > Could you please try caps:shiftlock option?
> 
> The caps:shiftlock option is not affected by this bug. When caps:shiftlock is
> enabled, Σ (capital sigma) appears as expected in both before and after cases.
> 
> > Also, what about other keys (epsilon, rho etc) - should they be fixed as well?
> > 
> 
> The rest of the alphabetic keys for Greek work as expected.
> 
> 
> There is a minor issue with ;: and dead_acute/dead_diaeresis that are affected
> by CapsLock though they should probably not. Is there a way to specify they
> should not be affected by CapsLock?
> 
> Jennie, should ;: and dead_acute/dead_diaeresis be affected by CapsLock?
> 

No they shouldn't be affected by CapsLock, but I don't think they are. In my system ;: work in exactly the same way as in the EN_US keyboard. As for dead_acute/dead_diaeresis, when capslock is pressed, dead_acute (tonos) + I produces Ί and shift + dead_acute + I produces Ϊ.

Comment 5 Simos Xenitellis 2009-06-09 13:11:23 UTC

(In reply to comment #4)
> (In reply to comment #3)
...
> > 
> > Jennie, should ;: and dead_acute/dead_diaeresis be affected by CapsLock?
> > 
> 
> No they shouldn't be affected by CapsLock, but I don't think they are. In my
> system ;: work in exactly the same way as in the EN_US keyboard. As for
> dead_acute/dead_diaeresis, when capslock is pressed, dead_acute (tonos) + I
> produces Ί and shift + dead_acute + I produces Ϊ. 
> 

I just checked again and indeed ;:, dead_acute/dead_diaeresis are not affected by CapsLock.

+1 for the patch being added to 'gr'.

Comment 6 Sergey V. Udaltsov 2009-06-11 16:08:23 UTC

Interesting update. If I do "setxkbmap -layout gr -print | xkbcomp - -xkb out.xkb", I see the following:

   key <AD01> {
        type= "FOUR_LEVEL",
        symbols[Group1]= [       semicolon,           colon,  periodcentered,        NoSymbol ]
    };
    key <AD02> {
        type= "FOUR_LEVEL",
        symbols[Group1]= [ Greek_finalsmallsigma,     Greek_SIGMA,           U03DB,           U03DA ]
    };
    key <AD03> {
        type= "FOUR_LEVEL_SEMIALPHABETIC",
        symbols[Group1]= [   Greek_epsilon,   Greek_EPSILON,        EuroSign,        NoSymbol ]
    };

Would it work correctly if you use FOUR_LEVEL_SEMIALPHABETIC for AD01 and AD02?

To me, it looks like a bug in xkbcomp, the default types for AD01 and AD02 are wrong

Comment 7 Jennie Petoumenou 2009-06-11 21:08:54 UTC

Created attachment 26699 [details]
modified xkbcomp output

Comment 8 Sergey V. Udaltsov 2009-06-12 01:47:10 UTC

So, that way - is the behaviour correct?

Comment 9 Jennie Petoumenou 2009-06-12 06:03:41 UTC

(In reply to comment #8)
> So, that way - is the behaviour correct?
> 
Sorry about this, I had written a comment to go along with the attachment, but I obviously managed to lose it somewhere along the way.
Basically, I used setxkbmap -layout gr -print | xkbcomp - -xkb
out.xkb, and then corrected the characters that do not function correctly with capslock (as Simos said, shift behaviour is correct).
So, AD01 should remain unchanged (FOUR_LEVEL).
AD02 should become FOUR_LEVEL_ALPHABETIC.
Also, AD08 and AB01 should become FOUR_LEVEL_ALPHABETIC.
I haven't figured out which file I would have to modify to test the changes on my system, but I am assuming that FOUR_LEVEL_ALPHABETIC covers cases where you have a small letter in 1st level, its capitalized version in 2nd, another small letter in 3rd level and its capital in fourth.

Comment 10 Sergey V. Udaltsov 2009-06-12 06:07:58 UTC

> So, AD01 should remain unchanged (FOUR_LEVEL).
> AD02 should become FOUR_LEVEL_ALPHABETIC.
> Also, AD08 and AB01 should become FOUR_LEVEL_ALPHABETIC.
> I haven't figured out which file I would have to modify to test the changes on
The thing is that I cannot figure it out myself! I searched xk-c high and low - and cannot understand, why AD02 is different from AD03. Will try to get to the bottom of that.

Comment 11 Sergey V. Udaltsov 2009-06-12 06:31:10 UTC

As I expected, it is in xkbcomp. Look, in out.xkb

    key <AD02> {
        type= "FOUR_LEVEL",
        symbols[Group1]= [ Greek_finalsmallsigma,     Greek_SIGMA,           U03DB,           U03DA ]
    };

In xkbcomp/symbols.c, function FindAutomaticType:

    } else if (width <= 4 ) {
        if ( syms && KSIsLower(syms[0]) && KSIsUpper(syms[1]) )
             if (    KSIsLower(syms[2]) && KSIsUpper(syms[3]) )
                *typeNameRtrn= XkbInternAtom(NULL,
                                            "FOUR_LEVEL_ALPHABETIC",False);
             else
                *typeNameRtrn= XkbInternAtom(NULL,
                                            "FOUR_LEVEL_SEMIALPHABETIC",False);

        else if ( syms && (XkbKSIsKeypad(syms[0]) || XkbKSIsKeypad(syms[1])) )
             *typeNameRtrn= XkbInternAtom(NULL,
                            "FOUR_LEVEL_KEYPAD",False);
        else *typeNameRtrn= XkbInternAtom(NULL,"FOUR_LEVEL",False);
    }

Clearly, functions XKIsLower/KSIsUpper are buggy - actually, they are using XConvertCase which should be fixed. So it is not xk-c bug, it is xkbcomp bug. Redirecting....

Comment 12 Simos Xenitellis 2009-06-12 08:15:00 UTC

(In reply to comment #11)
...
> 
> Clearly, functions XKIsLower/KSIsUpper are buggy - actually, they are using
> XConvertCase which should be fixed. So it is not xk-c bug, it is xkbcomp bug.
> Redirecting....
> 

X.Org does not know about Unicode (it is not linked with glib, for example),
so it will probably not be able to perform a proper XConvertCase in the general case.

The proper way to fix XConvertCase() would probably be to extract the case mapping information from sources described in
http://unicode.org/faq/casemap_charprop.html#6
Is that something feasible to add to X.Org? 
Wouldn't it better for X.Org to use an existing Unicode library for mapping conversions?

I think the part of this report for the Greek layout can be attended with the workaround that is being described, and that a new report should be opened for XConvertCase().

Comment 13 Sergey V. Udaltsov 2009-06-12 08:22:37 UTC

> X.Org does not know about Unicode (it is not linked with glib, for example),
> so it will probably not be able to perform a proper XConvertCase in the general
> case.
But how does it work for other Greek keysyms?

> Is that something feasible to add to X.Org? 
> Wouldn't it better for X.Org to use an existing Unicode library for mapping
> conversions?
That would be the right solution.

> I think the part of this report for the Greek layout can be attended with the
> workaround that is being described, and that a new report should be opened for
> XConvertCase().
I guess, it depends on the criticality of that bug. If Greek people absolutely cannot live without Caps properly handling sigma - I'd put that workaround in place. If not - I would prefer to wait till the proper fix. I usually do not like workarounds, unless they resolve something really essential (=people cannot wait).

Comment 14 Simos Xenitellis 2009-06-12 08:56:31 UTC

(In reply to comment #13)
> > X.Org does not know about Unicode (it is not linked with glib, for example),
> > so it will probably not be able to perform a proper XConvertCase in the general
> > case.
> But how does it work for other Greek keysyms?

Apparently X.Org does indeed know some of the 1-1 case mappings,
http://cgit.freedesktop.org/xorg/lib/libX11/tree/src/KeyBind.c#n294

The case for FINAL SIGMA is that it is not a pure 1-1 mapping; 
for example, in ΑΣΣΟΣ, when converted to lower case it becomes ασσος
which means that XConvertCase would need to know whether the Σ is the last character of the word in order to convert correctly to FINAL SIGMA.

The other characters such as LUNATE SIGMA are not supported because the current X.Org UCSConvertCase follows Unicode Data version 4.0.0; LUNATE SIGMA was added in Unicode 5.1.

> > Is that something feasible to add to X.Org? 
> > Wouldn't it better for X.Org to use an existing Unicode library for mapping
> > conversions?
> That would be the right solution.

Looking at
http://cgit.freedesktop.org/xorg/lib/libX11/tree/src/KeyBind.c#n294
there is a possibility of updating the tables for the immediate future.
If the Xlib maintainer is happy to add a Unicode library to XOrg, the better.
My concern is that I did not notice a reference for the source of the conversion scripts.

> > I think the part of this report for the Greek layout can be attended with the
> > workaround that is being described, and that a new report should be opened for
> > XConvertCase().
> I guess, it depends on the criticality of that bug. If Greek people absolutely
> cannot live without Caps properly handling sigma - I'd put that workaround in
> place. If not - I would prefer to wait till the proper fix. I usually do not
> like workarounds, unless they resolve something really essential (=people
> cannot wait).
> 

The situation with final sigma is that it requires full Unicode support in XOrg, and the rules depend on the context (is sigma the final letter of the word?). I do not see this resolved satisfactorily in XLib (we cannot add a hack to a generic UCSConvertCase), 
so I would opt to have indeed the workaround for the single case of FINAL SIGMA.

For the case of LUNATE SIGMA and the other archaic Greek characters, these should be able to be resolved by updating the UCSConvertCase() tables to Unicode 5.1. It requires the availability of the conversion script which I am not aware where it is.

Comment 15 Sergey V. Udaltsov 2009-06-12 09:05:56 UTC

> The situation with final sigma is that it requires full Unicode support in
> XOrg, and the rules depend on the context (is sigma the final letter of the
> word?). I do not see this resolved satisfactorily in XLib (we cannot add a hack
> to a generic UCSConvertCase), 
> so I would opt to have indeed the workaround for the single case of FINAL
> SIGMA.
May be, we should think at the higher level - to change the rule which finds "alphabetic" keysyms? May be checking for upper/lower case using XConvertCase is the wrong idea? What do you think? Strictly speaking, being upper or lower case is two one-bit properties of each unicode symbol, right? You do not have to "convert and compare" in order to find that, I guess.

Comment 16 Simos Xenitellis 2009-06-12 09:37:57 UTC

(In reply to comment #15)
> > The situation with final sigma is that it requires full Unicode support in
> > XOrg, and the rules depend on the context (is sigma the final letter of the
> > word?). I do not see this resolved satisfactorily in XLib (we cannot add a hack
> > to a generic UCSConvertCase), 
> > so I would opt to have indeed the workaround for the single case of FINAL
> > SIGMA.
> May be, we should think at the higher level - to change the rule which finds
> "alphabetic" keysyms? May be checking for upper/lower case using XConvertCase
> is the wrong idea? What do you think? Strictly speaking, being upper or lower
> case is two one-bit properties of each unicode symbol, right? You do not have
> to "convert and compare" in order to find that, I guess.
> 

The BMP (in Unicode) should take 8KB (65535 codepoints divided by 8 bits in a byte), if no run length encoding takes place, in order to mark which codepoints are alphabetic (are either small or capital letters).

This 8KB table can get smaller when we take into account the space in Unicode without alphabetic characters. That is, the table is made of pairs of integers; the starting codepoint and the number of subsequent codepoints that are also alphabetic.
The script should read ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt and produce a table suitable for C programming.

Comment 17 Sergey V. Udaltsov 2009-06-12 16:07:50 UTC

Parsing that file and preparing a table is a piece of cake for Daniel or Peter, I am sure (I can do it actually, if you tell me the preferred language)...

Run length encoding may be not needed - 8K is nothing for the utility. At least initial implementation could do without it. So, now it is just a matter of someone taking responsibility...;)

Comment 18 Simos Xenitellis 2009-06-13 10:55:58 UTC

I am trying to get it implemented (script that produces the table) by our local Ubuntu team members;
currently there is a discussion on which characters are 'alphabetic', based on the information provided by UnicodeData.txt ;-)

If someone wants to pick it and implement straight away, of course they can.

Regarding regression issues, a way to preempt regression reports would be to parse existing layouts and figure out where the result changes when we switch to the new 'is_alphabetic()'-style function.

Comment 19 GitLab Migration User 2018-08-10 20:09:18 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/lib/libx11/issues/5.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.