Summary: | Hardcode blanks in the library | ||
---|---|---|---|
Product: | fontconfig | Reporter: | Behdad Esfahbod <freedesktop> |
Component: | library | Assignee: | Akira TAGOH <akira> |
Status: | RESOLVED FIXED | QA Contact: | Behdad Esfahbod <freedesktop> |
Severity: | normal | ||
Priority: | medium | CC: | akira, fontconfig-bugs, freedesktop |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Behdad Esfahbod
2014-06-12 21:44:49 UTC
In syncing the existing blanks to GC=Zw+Default_Ignorable I found the following exception that needs to be added in: <int>0x2800</int> <!-- BRAILLE PATTERN BLANK --> just wrote the code to generate the table from PropList.txt though, we have some code points in fonts.conf right now, which doesn't match the condition you mentioned. i.e. not (Gen_Cat=Zs plus Default_Ignorable=True). Should we simply drop them? or was there any historical reason we added them? Here is the missing code points: <int>0x00AD</int> <!-- SOFT HYPHEN --> <int>0x034F</int> <!-- COMBINING GRAPHEME JOINER --> <int>0x061C</int> <!-- ARABIC LETTER MARK --> <int>0x115F</int> <!-- HANGUL CHOSEONG FILLER --> <int>0x1160</int> <!-- HANGUL JUNGSEONG FILLER --> <int>0x17B4</int> <!-- KHMER VOWEL INHERENT AQ --> <int>0x17B5</int> <!-- KHMER VOWEL INHERENT AA --> <int>0x180B</int> <!-- MONGOLIAN FREE VARIATION SELECTOR ONE --> <int>0x180C</int> <!-- MONGOLIAN FREE VARIATION SELECTOR TWO --> <int>0x180D</int> <!-- MONGOLIAN FREE VARIATION SELECTOR THREE --> <int>0x180E</int> <!-- MONGOLIAN VOWEL SEPARATOR --> <int>0x200B</int> <!-- ZERO WIDTH SPACE --> <int>0x200C</int> <!-- ZERO WIDTH NON-JOINER --> <int>0x200D</int> <!-- ZERO WIDTH JOINER --> <int>0x200E</int> <!-- LEFT-TO-RIGHT MARK --> <int>0x200F</int> <!-- RIGHT-TO-LEFT MARK --> <int>0x202A</int> <!-- LEFT-TO-RIGHT EMBEDDING --> <int>0x202B</int> <!-- RIGHT-TO-LEFT EMBEDDING --> <int>0x202C</int> <!-- POP DIRECTIONAL FORMATTING --> <int>0x202D</int> <!-- LEFT-TO-RIGHT OVERRIDE --> <int>0x202E</int> <!-- RIGHT-TO-LEFT OVERRIDE --> <int>0x2060</int> <!-- WORD JOINER --> <int>0x2061</int> <!-- FUNCTION APPLICATION --> <int>0x2062</int> <!-- INVISIBLE TIMES --> <int>0x2063</int> <!-- INVISIBLE SEPARATOR --> <int>0x2064</int> <!-- INVISIBLE PLUS --> <int>0x2066</int> <!-- LEFT-TO-RIGHT ISOLATE --> <int>0x2067</int> <!-- RIGHT-TO-LEFT ISOLATE --> <int>0x2068</int> <!-- FIRST STRONG ISOLATE --> <int>0x2069</int> <!-- POP DIRECTIONAL ISOLATE --> <int>0x206A</int> <!-- INHIBIT SYMMETRIC SWAPPING --> <int>0x206B</int> <!-- ACTIVATE SYMMETRIC SWAPPING --> <int>0x206C</int> <!-- INHIBIT ARABIC FORM SHAPING --> <int>0x206D</int> <!-- ACTIVATE ARABIC FORM SHAPING --> <int>0x206E</int> <!-- NATIONAL DIGIT SHAPES --> <int>0x206F</int> <!-- NOMINAL DIGIT SHAPES --> <int>0x2800</int> <!-- BRAILLE PATTERN BLANK --> <int>0x3164</int> <!-- HANGUL FILLER --> <int>0xFEFF</int> <!-- ZERO WIDTH NO-BREAK SPACE --> <int>0xFFA0</int> <!-- HALFWIDTH HANGUL FILLER --> <int>0x1BCA0</int> <!-- SHORTHAND FORMAT LETTER OVERLAP --> <int>0x1BCA1</int> <!-- SHORTHAND FORMAT CONTINUING OVERLAP --> <int>0x1BCA2</int> <!-- SHORTHAND FORMAT DOWN STEP --> <int>0x1BCA3</int> <!-- SHORTHAND FORMAT UP STEP --> Are you sure? Most of what you list are space or default_ignorable. (In reply to Behdad Esfahbod from comment #4) > Are you sure? Most of what you list are space or default_ignorable. Unless I misread the format or your intention.. or I should refer another data to determine. in fact ICU seems using http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3ADI%3A for the reference to determine the default ignorable though, I can't find the source of the data why they are. That is somewhat hard to parse html in C though, maybe consider using some script language to generate it at the bootstrap perhaps. http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3AGC%3DZs%3A][%3ADI%3A]&abb=on&ucd=on&esc=on&g= 0020 ;SPACE 00A0 ;NO-BREAK SPACE 00AD ;SOFT HYPHEN 034F ;COMBINING GRAPHEME JOINER 061C ;ARABIC LETTER MARK 115F ;HANGUL CHOSEONG FILLER 1160 ;HANGUL JUNGSEONG FILLER 1680 ;OGHAM SPACE MARK 17B4 ;KHMER VOWEL INHERENT AQ 17B5 ;KHMER VOWEL INHERENT AA 180B..180E ;MONGOLIAN VOWEL SEPARATOR 2000..200F ;RIGHT-TO-LEFT MARK 202A..202F ;NARROW NO-BREAK SPACE 205F..206F ;NOMINAL DIGIT SHAPES 3000 ;IDEOGRAPHIC SPACE 3164 ;HANGUL FILLER FE00..FE0F ;VARIATION SELECTOR-16 FEFF ;ZERO WIDTH NO-BREAK SPACE FFA0 ;HALFWIDTH HANGUL FILLER FFF0..FFF8 ;<unassigned-FFF8> 1BCA0..1BCA3 ;SHORTHAND FORMAT UP STEP 1D173..1D17A ;MUSICAL SYMBOL END PHRASE E0000..E0FFF ;<unassigned-E0FFF> As mentioned, the only exception we want to add to this is U+2800 BRAILLE PATTERN BLANK. Thanks. worked out in http://cgit.freedesktop.org/~tagoh/fontconfig/log/?h=bz79956 Interesting! I was thinking of manually coding the list... Why do you special-case U+0020? I see it here: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3AGC%3DZs%3A][%3ADI%3A]&abb=on&ucd=on&esc=on&g Oh, in the regexp, it has "\ " for it. Did you forget to add the BRAILE exception? (In reply to Behdad Esfahbod from comment #8) > Interesting! I was thinking of manually coding the list... just wanted to reduce the unnecessary effort to keep it updated and check the changes. I'm lazy person :) > Why do you special-case U+0020? I see it here: > http://unicode.org/cldr/utility/list-unicodeset. > jsp?a=[%3AGC%3DZs%3A][%3ADI%3A]&abb=on&ucd=on&esc=on&g > > Oh, in the regexp, it has "\ " for it. Ah! I missed that. have to modify the script to recognize it... > Did you forget to add the BRAILE exception? it's there.. I simply forgot to add a comment for that. I'll fix them. Updated. works now without adding 0x20. Thanks. lgtm. merged into git master. thanks! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.