Bug 72

Summary: a font with wider coverage than ko.orth is not recognized as supporting Korean
Product: fontconfig Reporter: Jungshik Shin <jshin>
Component: libraryAssignee: Keith Packard <keithp>
Status: RESOLVED INVALID QA Contact:
Severity: normal    
Priority: high    
Version: 2.2   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: a simple freetype char check program

Description Jungshik Shin 2003-04-22 03:45:13 UTC
After adding about 1000 'characters' to PUA codepoints, UnBatang
font available at http://www.i18nl10n.com/fonts/UnBatang.ttf
does not get recognized as supporting Korean although it has 
the full set of characters in KS X 1001 as well as the full
set of precomposed Hangul syllables. In addition, it has ~1400
extra characters not in ko.orth. (I wrote a simple program
with Freetype2 APIs to get the list of Unicode characters
in the Unicode cmap of a font and compared the output with 
the list in ko.orth) 

I've just downloaded fontconfig 2.2 only to get the same result.
I tried to get the CVS snapshot, but the cvs access doesn't
seem to work. 

Setting FC_DEBUG=3,5,10 and running 'fc-cache -f -v' didn't let me
glimpse into what's going on. 

BTW, bugzilla doesn't let me choose version 2.2 for fontconfig
so I'm filing this under 2.1
Comment 1 Keith Packard 2003-04-22 09:36:22 UTC
I used:

     $ FC_DEBUG=384 fc-cache -f -v

to discover that UnBatang is missing a glyph for character U+4E00 which is
included in the KSC 5601-1992 encoding.  Is this glyph not a part of KS X 1001?
Comment 2 Jungshik Shin 2003-04-22 16:57:07 UTC
Created attachment 42 [details]
a simple freetype char check program
Comment 3 Jungshik Shin 2003-04-22 17:17:33 UTC
Thank you for testing.

U+4E00 is CJK Ideograph Number One and it's a part of KS X 1001:1998 repertoire
(KS C 5601-1992 repertoire +  EURO Sign + Registered Sign = KS X 1001:1998
repertoire).
The font has it with GID = 0x4356 according to my program I've just attached.
When I filed a bug, I used another program that lists all the characters in
a given font in all the CMaps present in the font. That's the way I compared
the list of characters in UnBatang with characters in ko.orth.

To my surprise, both fc-cache and pfaedit think UnBatang doesn't have 
U+4E00.  

The following is a part of the output I obtained from my first test program
(not attached)

        code=U+0033DA  gidx=0x0005a2
        code=U+0033DB  gidx=0x0005a3
        code=U+0033DC  gidx=0x0005a4
        code=U+0033DD  gidx=0x0005a5
        code=U+004E00  gidx=0x004356
        code=U+004E01  gidx=0x0005a7
        code=U+004E03  gidx=0x0005a8
        code=U+004E07  gidx=0x0005a9
        code=U+004E08  gidx=0x0005aa

As you can see, there's a jump in GID around U+4E00, but does it matter?
Anyway, it seems like there's something wrong with the font and I'll examine
Unicode CMap directly. 
Comment 4 Jungshik Shin 2003-04-29 10:44:10 UTC
I'm sorry that this bug is invalid.
It turned out that in the font U+4E00 had been accidentally 
assigned a blank glyph(gid=0x4356 that's supposed to be assigned to U+115f)  and
fontconfig rejcted it as invalid because U+4E00 is not supposed to be blank.
Fixing the cmap of the font (assigning gid=0x5a6 to U+4E00) solved the problem.
 
BTW, the tolerance limit of fontconfig appears to be zero when determining
whether a font is suitable for a language. For 'orthographies' with a large
repertoire,
this might be too severe. Well, if it had been more generous, I'd not have
discovered the glitch in the font. So, I must be grateful to it :-)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.