Bug 79956 - Hardcode blanks in the library
Summary: Hardcode blanks in the library
Status: RESOLVED FIXED
Alias: None
Product: fontconfig
Classification: Unclassified
Component: library (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Akira TAGOH
QA Contact: Behdad Esfahbod
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-12 21:44 UTC by Behdad Esfahbod
Modified: 2015-02-27 06:57 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Behdad Esfahbod 2014-06-12 21:44:49 UTC
No point in having them in the config.  Just move it to the library.  It should be the codepoints with Gen_Cat=Zs plus Default_Ignorable=True.
Comment 1 Behdad Esfahbod 2014-06-12 21:55:50 UTC
In syncing the existing blanks to GC=Zw+Default_Ignorable I found the following exception that needs to be added in:

  <int>0x2800</int>	<!-- BRAILLE PATTERN BLANK -->
Comment 2 Akira TAGOH 2015-02-09 12:17:26 UTC
just wrote the code to generate the table from PropList.txt though, we have some code points in fonts.conf right now, which doesn't match the condition you mentioned. i.e. not (Gen_Cat=Zs plus Default_Ignorable=True).
Should we simply drop them? or was there any historical reason we added them?
Comment 3 Akira TAGOH 2015-02-09 13:19:28 UTC
Here is the missing code points:

			<int>0x00AD</int>	<!-- SOFT HYPHEN -->
			<int>0x034F</int>	<!-- COMBINING GRAPHEME JOINER -->
			<int>0x061C</int>	<!-- ARABIC LETTER MARK -->
			<int>0x115F</int>	<!-- HANGUL CHOSEONG FILLER -->
			<int>0x1160</int>	<!-- HANGUL JUNGSEONG FILLER -->
			<int>0x17B4</int>	<!-- KHMER VOWEL INHERENT AQ -->
			<int>0x17B5</int>	<!-- KHMER VOWEL INHERENT AA -->
			<int>0x180B</int>	<!-- MONGOLIAN FREE VARIATION SELECTOR ONE -->
			<int>0x180C</int>	<!-- MONGOLIAN FREE VARIATION SELECTOR TWO -->
			<int>0x180D</int>	<!-- MONGOLIAN FREE VARIATION SELECTOR THREE -->
			<int>0x180E</int>	<!-- MONGOLIAN VOWEL SEPARATOR -->
			<int>0x200B</int>	<!-- ZERO WIDTH SPACE -->
			<int>0x200C</int>	<!-- ZERO WIDTH NON-JOINER -->
			<int>0x200D</int>	<!-- ZERO WIDTH JOINER -->
			<int>0x200E</int>	<!-- LEFT-TO-RIGHT MARK -->
			<int>0x200F</int>	<!-- RIGHT-TO-LEFT MARK -->
			<int>0x202A</int>	<!-- LEFT-TO-RIGHT EMBEDDING -->
			<int>0x202B</int>	<!-- RIGHT-TO-LEFT EMBEDDING -->
			<int>0x202C</int>	<!-- POP DIRECTIONAL FORMATTING -->
			<int>0x202D</int>	<!-- LEFT-TO-RIGHT OVERRIDE -->
			<int>0x202E</int>	<!-- RIGHT-TO-LEFT OVERRIDE -->
			<int>0x2060</int>	<!-- WORD JOINER -->
			<int>0x2061</int>	<!-- FUNCTION APPLICATION -->
			<int>0x2062</int>	<!-- INVISIBLE TIMES -->
			<int>0x2063</int>	<!-- INVISIBLE SEPARATOR -->
			<int>0x2064</int>	<!-- INVISIBLE PLUS -->
			<int>0x2066</int>	<!-- LEFT-TO-RIGHT ISOLATE -->
			<int>0x2067</int>	<!-- RIGHT-TO-LEFT ISOLATE -->
			<int>0x2068</int>	<!-- FIRST STRONG ISOLATE -->
			<int>0x2069</int>	<!-- POP DIRECTIONAL ISOLATE -->
			<int>0x206A</int>	<!-- INHIBIT SYMMETRIC SWAPPING -->
			<int>0x206B</int>	<!-- ACTIVATE SYMMETRIC SWAPPING -->
			<int>0x206C</int>	<!-- INHIBIT ARABIC FORM SHAPING -->
			<int>0x206D</int>	<!-- ACTIVATE ARABIC FORM SHAPING -->
			<int>0x206E</int>	<!-- NATIONAL DIGIT SHAPES -->
			<int>0x206F</int>	<!-- NOMINAL DIGIT SHAPES -->
			<int>0x2800</int>	<!-- BRAILLE PATTERN BLANK -->
			<int>0x3164</int>	<!-- HANGUL FILLER -->
			<int>0xFEFF</int>	<!-- ZERO WIDTH NO-BREAK SPACE -->
			<int>0xFFA0</int>	<!-- HALFWIDTH HANGUL FILLER -->
			<int>0x1BCA0</int>	<!-- SHORTHAND FORMAT LETTER OVERLAP -->
			<int>0x1BCA1</int>	<!-- SHORTHAND FORMAT CONTINUING OVERLAP -->
			<int>0x1BCA2</int>	<!-- SHORTHAND FORMAT DOWN STEP -->
			<int>0x1BCA3</int>	<!-- SHORTHAND FORMAT UP STEP -->
Comment 4 Behdad Esfahbod 2015-02-26 01:00:07 UTC
Are you sure?  Most of what you list are space or default_ignorable.
Comment 5 Akira TAGOH 2015-02-26 07:24:20 UTC
(In reply to Behdad Esfahbod from comment #4)
> Are you sure?  Most of what you list are space or default_ignorable.

Unless I misread the format or your intention.. or I should refer another data to determine. in fact ICU seems using http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3ADI%3A for the reference to determine the default ignorable though, I can't find the source of the data why they are.

That is somewhat hard to parse html in C though, maybe consider using some script language to generate it at the bootstrap perhaps.
Comment 6 Behdad Esfahbod 2015-02-26 21:47:19 UTC
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3AGC%3DZs%3A][%3ADI%3A]&abb=on&ucd=on&esc=on&g=

0020 ;SPACE
00A0 ;NO-BREAK SPACE
00AD ;SOFT HYPHEN
034F ;COMBINING GRAPHEME JOINER
061C ;ARABIC LETTER MARK
115F ;HANGUL CHOSEONG FILLER
1160 ;HANGUL JUNGSEONG FILLER
1680 ;OGHAM SPACE MARK
17B4 ;KHMER VOWEL INHERENT AQ
17B5 ;KHMER VOWEL INHERENT AA
180B..180E ;MONGOLIAN VOWEL SEPARATOR
2000..200F ;RIGHT-TO-LEFT MARK
202A..202F ;NARROW NO-BREAK SPACE
205F..206F ;NOMINAL DIGIT SHAPES
3000 ;IDEOGRAPHIC SPACE
3164 ;HANGUL FILLER
FE00..FE0F ;VARIATION SELECTOR-16
FEFF ;ZERO WIDTH NO-BREAK SPACE
FFA0 ;HALFWIDTH HANGUL FILLER
FFF0..FFF8 ;<unassigned-FFF8>
1BCA0..1BCA3 ;SHORTHAND FORMAT UP STEP
1D173..1D17A ;MUSICAL SYMBOL END PHRASE
E0000..E0FFF ;<unassigned-E0FFF>

As mentioned, the only exception we want to add to this is U+2800 BRAILLE PATTERN BLANK.
Comment 7 Akira TAGOH 2015-02-27 05:20:58 UTC
Thanks. worked out in http://cgit.freedesktop.org/~tagoh/fontconfig/log/?h=bz79956
Comment 8 Behdad Esfahbod 2015-02-27 05:29:02 UTC
Interesting!  I was thinking of manually coding the list...

Why do you special-case U+0020?  I see it here:
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3AGC%3DZs%3A][%3ADI%3A]&abb=on&ucd=on&esc=on&g

Oh, in the regexp, it has "\ " for it.

Did you forget to add the BRAILE exception?
Comment 9 Akira TAGOH 2015-02-27 06:13:48 UTC
(In reply to Behdad Esfahbod from comment #8)
> Interesting!  I was thinking of manually coding the list...

just wanted to reduce the unnecessary effort to keep it updated and check the changes. I'm lazy person :)

> Why do you special-case U+0020?  I see it here:
> http://unicode.org/cldr/utility/list-unicodeset.
> jsp?a=[%3AGC%3DZs%3A][%3ADI%3A]&abb=on&ucd=on&esc=on&g
> 
> Oh, in the regexp, it has "\ " for it.

Ah! I missed that. have to modify the script to recognize it...

> Did you forget to add the BRAILE exception?

it's there.. I simply forgot to add a comment for that.
I'll fix them.
Comment 10 Akira TAGOH 2015-02-27 06:50:24 UTC
Updated. works now without adding 0x20.
Comment 11 Behdad Esfahbod 2015-02-27 06:52:48 UTC
Thanks.  lgtm.
Comment 12 Akira TAGOH 2015-02-27 06:57:34 UTC
merged into git master. thanks!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.