No point in having them in the config. Just move it to the library. It should be the codepoints with Gen_Cat=Zs plus Default_Ignorable=True.
In syncing the existing blanks to GC=Zw+Default_Ignorable I found the following exception that needs to be added in: <int>0x2800</int> <!-- BRAILLE PATTERN BLANK -->
just wrote the code to generate the table from PropList.txt though, we have some code points in fonts.conf right now, which doesn't match the condition you mentioned. i.e. not (Gen_Cat=Zs plus Default_Ignorable=True). Should we simply drop them? or was there any historical reason we added them?
Here is the missing code points: <int>0x00AD</int> <!-- SOFT HYPHEN --> <int>0x034F</int> <!-- COMBINING GRAPHEME JOINER --> <int>0x061C</int> <!-- ARABIC LETTER MARK --> <int>0x115F</int> <!-- HANGUL CHOSEONG FILLER --> <int>0x1160</int> <!-- HANGUL JUNGSEONG FILLER --> <int>0x17B4</int> <!-- KHMER VOWEL INHERENT AQ --> <int>0x17B5</int> <!-- KHMER VOWEL INHERENT AA --> <int>0x180B</int> <!-- MONGOLIAN FREE VARIATION SELECTOR ONE --> <int>0x180C</int> <!-- MONGOLIAN FREE VARIATION SELECTOR TWO --> <int>0x180D</int> <!-- MONGOLIAN FREE VARIATION SELECTOR THREE --> <int>0x180E</int> <!-- MONGOLIAN VOWEL SEPARATOR --> <int>0x200B</int> <!-- ZERO WIDTH SPACE --> <int>0x200C</int> <!-- ZERO WIDTH NON-JOINER --> <int>0x200D</int> <!-- ZERO WIDTH JOINER --> <int>0x200E</int> <!-- LEFT-TO-RIGHT MARK --> <int>0x200F</int> <!-- RIGHT-TO-LEFT MARK --> <int>0x202A</int> <!-- LEFT-TO-RIGHT EMBEDDING --> <int>0x202B</int> <!-- RIGHT-TO-LEFT EMBEDDING --> <int>0x202C</int> <!-- POP DIRECTIONAL FORMATTING --> <int>0x202D</int> <!-- LEFT-TO-RIGHT OVERRIDE --> <int>0x202E</int> <!-- RIGHT-TO-LEFT OVERRIDE --> <int>0x2060</int> <!-- WORD JOINER --> <int>0x2061</int> <!-- FUNCTION APPLICATION --> <int>0x2062</int> <!-- INVISIBLE TIMES --> <int>0x2063</int> <!-- INVISIBLE SEPARATOR --> <int>0x2064</int> <!-- INVISIBLE PLUS --> <int>0x2066</int> <!-- LEFT-TO-RIGHT ISOLATE --> <int>0x2067</int> <!-- RIGHT-TO-LEFT ISOLATE --> <int>0x2068</int> <!-- FIRST STRONG ISOLATE --> <int>0x2069</int> <!-- POP DIRECTIONAL ISOLATE --> <int>0x206A</int> <!-- INHIBIT SYMMETRIC SWAPPING --> <int>0x206B</int> <!-- ACTIVATE SYMMETRIC SWAPPING --> <int>0x206C</int> <!-- INHIBIT ARABIC FORM SHAPING --> <int>0x206D</int> <!-- ACTIVATE ARABIC FORM SHAPING --> <int>0x206E</int> <!-- NATIONAL DIGIT SHAPES --> <int>0x206F</int> <!-- NOMINAL DIGIT SHAPES --> <int>0x2800</int> <!-- BRAILLE PATTERN BLANK --> <int>0x3164</int> <!-- HANGUL FILLER --> <int>0xFEFF</int> <!-- ZERO WIDTH NO-BREAK SPACE --> <int>0xFFA0</int> <!-- HALFWIDTH HANGUL FILLER --> <int>0x1BCA0</int> <!-- SHORTHAND FORMAT LETTER OVERLAP --> <int>0x1BCA1</int> <!-- SHORTHAND FORMAT CONTINUING OVERLAP --> <int>0x1BCA2</int> <!-- SHORTHAND FORMAT DOWN STEP --> <int>0x1BCA3</int> <!-- SHORTHAND FORMAT UP STEP -->
Are you sure? Most of what you list are space or default_ignorable.
(In reply to Behdad Esfahbod from comment #4) > Are you sure? Most of what you list are space or default_ignorable. Unless I misread the format or your intention.. or I should refer another data to determine. in fact ICU seems using http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3ADI%3A for the reference to determine the default ignorable though, I can't find the source of the data why they are. That is somewhat hard to parse html in C though, maybe consider using some script language to generate it at the bootstrap perhaps.
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3AGC%3DZs%3A][%3ADI%3A]&abb=on&ucd=on&esc=on&g= 0020 ;SPACE 00A0 ;NO-BREAK SPACE 00AD ;SOFT HYPHEN 034F ;COMBINING GRAPHEME JOINER 061C ;ARABIC LETTER MARK 115F ;HANGUL CHOSEONG FILLER 1160 ;HANGUL JUNGSEONG FILLER 1680 ;OGHAM SPACE MARK 17B4 ;KHMER VOWEL INHERENT AQ 17B5 ;KHMER VOWEL INHERENT AA 180B..180E ;MONGOLIAN VOWEL SEPARATOR 2000..200F ;RIGHT-TO-LEFT MARK 202A..202F ;NARROW NO-BREAK SPACE 205F..206F ;NOMINAL DIGIT SHAPES 3000 ;IDEOGRAPHIC SPACE 3164 ;HANGUL FILLER FE00..FE0F ;VARIATION SELECTOR-16 FEFF ;ZERO WIDTH NO-BREAK SPACE FFA0 ;HALFWIDTH HANGUL FILLER FFF0..FFF8 ;<unassigned-FFF8> 1BCA0..1BCA3 ;SHORTHAND FORMAT UP STEP 1D173..1D17A ;MUSICAL SYMBOL END PHRASE E0000..E0FFF ;<unassigned-E0FFF> As mentioned, the only exception we want to add to this is U+2800 BRAILLE PATTERN BLANK.
Thanks. worked out in http://cgit.freedesktop.org/~tagoh/fontconfig/log/?h=bz79956
Interesting! I was thinking of manually coding the list... Why do you special-case U+0020? I see it here: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3AGC%3DZs%3A][%3ADI%3A]&abb=on&ucd=on&esc=on&g Oh, in the regexp, it has "\ " for it. Did you forget to add the BRAILE exception?
(In reply to Behdad Esfahbod from comment #8) > Interesting! I was thinking of manually coding the list... just wanted to reduce the unnecessary effort to keep it updated and check the changes. I'm lazy person :) > Why do you special-case U+0020? I see it here: > http://unicode.org/cldr/utility/list-unicodeset. > jsp?a=[%3AGC%3DZs%3A][%3ADI%3A]&abb=on&ucd=on&esc=on&g > > Oh, in the regexp, it has "\ " for it. Ah! I missed that. have to modify the script to recognize it... > Did you forget to add the BRAILE exception? it's there.. I simply forgot to add a comment for that. I'll fix them.
Updated. works now without adding 0x20.
Thanks. lgtm.
merged into git master. thanks!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.