Bug 31969 - Can't modify charset in target="scan"
Can't modify charset in target="scan"
Status: RESOLVED FIXED
Product: fontconfig
Classification: Unclassified
Component: library
2.8
Other All
: medium normal
Assigned To: Keith Packard
Behdad Esfahbod
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-11-29 02:03 UTC by Akira TAGOH
Modified: 2010-12-28 06:34 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Akira TAGOH 2010-11-29 02:03:03 UTC
As similar to Bug#23758, one can't modify the charset now.

This feature would be useful if the font has any buggy glyphs for applications say.

The proposed changes for this feature is available at:
http://cgit.freedesktop.org/~tagoh/fontconfig/commit/?h=charset-modification&id=28675a85dca498785aaa89286b5aecbdd1849195
Comment 2 Akira TAGOH 2010-11-30 02:05:32 UTC
Doh. another change for fixing bug:
http://cgit.freedesktop.org/~tagoh/fontconfig/commit/?h=charset-modification&id=40fbd16fe9c0bc5d8c75d60e50d38ffd9e971180

To drop the charset for the entire ASCII part from the font:

<match target="scan">
    <test name="family">
        <string>blahblahblah font</string>
    </test>
    <edit name="charset" mode="remove">
        <range object="charset">
            <int>20</int>
            <int>127</int>
        </range>
    </edit>
</match>
Comment 3 Akira TAGOH 2010-11-30 03:00:31 UTC
For more convenience:

http://cgit.freedesktop.org/~tagoh/fontconfig/commit/?h=charset-modification&id=e16e3a397c850a675f7964c46c5bd608a2c58e89

<match target="scan">
    <test name="family">
        <string>blahblahblah font</string>
    </test>
    <edit name="charset" mode="remove">
        <range object="charset">
            <string>U+0020</string>
            <string>U+007F</string>
        </range>
    </edit>
</match>
Comment 4 Behdad Esfahbod 2010-11-30 08:16:09 UTC
(In reply to comment #3)
> For more convenience:
> 
> http://cgit.freedesktop.org/~tagoh/fontconfig/commit/?h=charset-modification&id=e16e3a397c850a675f7964c46c5bd608a2c58e89
> 
> <match target="scan">
>     <test name="family">
>         <string>blahblahblah font</string>
>     </test>
>     <edit name="charset" mode="remove">
>         <range object="charset">
>             <string>U+0020</string>
>             <string>U+007F</string>
>         </range>
>     </edit>
> </match>

The syntax doesn't make sense.  Something like this is more like what I expect, given how the matrix operations currently work:

  <edit name="charset" mode="assign">
    <minus>
      <name>charset</name>
      <charset>
          <int>0x06CC</int>  <!-- ARABIC LETTER FARSI YEH -->
          <int>0x06D2</int>  <!-- ARABIC LETTER YEH BARREE -->
          <int>0x06D3</int>  <!-- ARABIC LETTER YEH BARREE WITH HAMZA ABOVE -->
      <charset>
    <minus>
  </edit>


I deliberately want to avoid ranges for now.  But if you have compelling usecases for ranges, then something like this would do:

  <edit name="charset" mode="assign">
    <minus>
      <name>charset</name>
      <charset>
          <int>0x06CC</int>  <!-- ARABIC LETTER FARSI YEH -->
          <range>
            <int>0x06D2</int>  <!-- ARABIC LETTER YEH BARREE -->
            <int>0x06D3</int>  <!-- ARABIC LETTER YEH BARREE WITH HAMZA ABOVE -->
          </range>
      <charset>
    <minus>
  </edit>
Comment 5 Akira TAGOH 2010-11-30 17:59:59 UTC
(In reply to comment #4)
> The syntax doesn't make sense.  Something like this is more like what I expect,
> given how the matrix operations currently work:
> 
>   <edit name="charset" mode="assign">
>     <minus>
>       <name>charset</name>
>       <charset>
>           <int>0x06CC</int>  <!-- ARABIC LETTER FARSI YEH -->
>           <int>0x06D2</int>  <!-- ARABIC LETTER YEH BARREE -->
>           <int>0x06D3</int>  <!-- ARABIC LETTER YEH BARREE WITH HAMZA ABOVE -->
>       <charset>
>     <minus>
>   </edit>

Got it. it seems that I too did shortcut to process the operations. let me re-work with the above. thank you for the comment.

> I deliberately want to avoid ranges for now.  But if you have compelling
> usecases for ranges, then something like this would do:
> 
>   <edit name="charset" mode="assign">
>     <minus>
>       <name>charset</name>
>       <charset>
>           <int>0x06CC</int>  <!-- ARABIC LETTER FARSI YEH -->
>           <range>
>             <int>0x06D2</int>  <!-- ARABIC LETTER YEH BARREE -->
>             <int>0x06D3</int>  <!-- ARABIC LETTER YEH BARREE WITH HAMZA ABOVE
> -->
>           </range>
>       <charset>
>     <minus>
>   </edit>

Hmm, well, the example in my previous comment to mark ASCII characters into the blacklist was one of what I expect in this feature. though it's still possible to have listing of them in the file but just thought it would be good for convenience, but anyway.
Comment 6 Behdad Esfahbod 2010-11-30 18:46:06 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > The syntax doesn't make sense.  Something like this is more like what I expect,
> > given how the matrix operations currently work:
> > 
> >   <edit name="charset" mode="assign">
> >     <minus>
> >       <name>charset</name>
> >       <charset>
> >           <int>0x06CC</int>  <!-- ARABIC LETTER FARSI YEH -->
> >           <int>0x06D2</int>  <!-- ARABIC LETTER YEH BARREE -->
> >           <int>0x06D3</int>  <!-- ARABIC LETTER YEH BARREE WITH HAMZA ABOVE -->
> >       <charset>
> >     <minus>
> >   </edit>
> 
> Got it. it seems that I too did shortcut to process the operations. let me
> re-work with the above. thank you for the comment.

Thanks.


> > I deliberately want to avoid ranges for now.  But if you have compelling
> > usecases for ranges, then something like this would do:
> > 
> >   <edit name="charset" mode="assign">
> >     <minus>
> >       <name>charset</name>
> >       <charset>
> >           <int>0x06CC</int>  <!-- ARABIC LETTER FARSI YEH -->
> >           <range>
> >             <int>0x06D2</int>  <!-- ARABIC LETTER YEH BARREE -->
> >             <int>0x06D3</int>  <!-- ARABIC LETTER YEH BARREE WITH HAMZA ABOVE
> > -->
> >           </range>
> >       <charset>
> >     <minus>
> >   </edit>
> 
> Hmm, well, the example in my previous comment to mark ASCII characters into the
> blacklist was one of what I expect in this feature. though it's still possible
> to have listing of them in the file but just thought it would be good for
> convenience, but anyway.

My vision for usecases of this feature is to excluded faulty characters from a font.  For anything beyond that, I expect editing based on lang to be adequate.  However, editing based on script is really what we need in reality, so maybe you're right that using charset instead of lang is preferred.

We can also add charsets representing scripts in Unicode....
Comment 7 Akira TAGOH 2010-11-30 21:18:43 UTC
(In reply to comment #6)
> My vision for usecases of this feature is to excluded faulty characters from a
> font.  For anything beyond that, I expect editing based on lang to be adequate.
>  However, editing based on script is really what we need in reality, so maybe
> you're right that using charset instead of lang is preferred.

That's also true. and it may be case-by-case. given that there are any fonts doesn't have some lang in the cache as a result due to missing a few glyphs, editing based on script doesn't help in this case. editing lang would makes more sense. so that would be nice to have both features of editing IMHO.
Comment 8 Behdad Esfahbod 2010-11-30 21:20:31 UTC
Yes, lets have both lang and charset editing.  Lang by itself is not enough.  If a font has, say, a faulty 'a' glyph, you don't want to be bothered with removing all the langs that use the character 'a'.
Comment 10 Akira TAGOH 2010-11-30 21:39:52 UTC
I modified the charset in FcConfigEvaluate() in the patch as matrix does. now I have the following elements in my .fonts.conf:

        <match target="scan">
                <test name="family">
                        <string>VL Gothic</string>
                </test>
                <edit name="charset" mode="assign">
                        <minus>
                                <name>charset</name>
                                <charset>
                                        <int>0x0040</int>
                                </charset>
                        </minus>
                </edit>
        </match>

and getting a result like:


FcConfigSubstitute test scan any family Equal "VL Gothic"
Substitute match
        scan any family Equal "VL Gothic"
edit
        Edit charset Assign charset Minus charset
;

Append list before 
Append list after  
        0000: 00000000 ffffffff fffffffe 7fffffff 00000000 ffffffff ffffffff fff
fffff

I see a bit of 0x0040 are dropped there. but I'm not sure if no output for "before" are expected behaviour. is it same for matrix?
Comment 11 Akira TAGOH 2010-11-30 21:54:54 UTC
Hm, nevermind. it looks like same with matrix:


FcConfigSubstitute test pattern any family Equal "AngsanaUPC" Comma "Angsana New
"
Substitute match
        pattern any family Equal "AngsanaUPC" Comma "Angsana New"
edit
        Edit family Assign "Kinnari";
        Edit matrix Assign matrix Times [0.67 0 0 0.67];
...
Append list before 
Append list after  (0.670000 0.000000; 0.000000 0.670000)(w)
Comment 12 Behdad Esfahbod 2010-12-01 13:07:13 UTC
(In reply to comment #11)
> Hm, nevermind. it looks like same with matrix:

Then that's a separate bug to be fixed :).
Comment 13 Akira TAGOH 2010-12-01 19:06:43 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > Hm, nevermind. it looks like same with matrix:
> 
> Then that's a separate bug to be fixed :).

I'm not sure what exactly the bug is, but when running under FcMatchScan, "assign" mode always works as "assign_replace" and no "before" output on "assign_replace" mode because the original pattern is removed prior to invoke FcConfigAdd(). "after" output looks good because it was evaluated prior to invoke FcConfigDel().
Comment 15 Behdad Esfahbod 2010-12-02 05:37:35 UTC
Thanks Akira.  Can you also update the DTD?

For the <range>, we don't need a new allocated type.  Just embed the range struct in FcVStack directly, like we do with 'double'.  And no iterator needed.  No public API change.

The rest looks good.  Thanks!
Comment 16 Akira TAGOH 2010-12-02 09:13:18 UTC
(In reply to comment #15)
> Thanks Akira.  Can you also update the DTD?
> 
> For the <range>, we don't need a new allocated type.  Just embed the range
> struct in FcVStack directly, like we do with 'double'.  And no iterator needed.
>  No public API change.
> 
> The rest looks good.  Thanks!

Sure. updated:
http://cgit.freedesktop.org/~tagoh/fontconfig/log/?h=charset-modification-take2
Comment 17 Behdad Esfahbod 2010-12-02 09:37:34 UTC
Thanks.  Looks great!

Do you think we should also support CDATA for charset, in the format printed out by FcPatternPrint()?  We can always add that later, so I'm happy to go ahead and commit what you've done so far.
Comment 18 Akira TAGOH 2010-12-02 10:07:25 UTC
(In reply to comment #17)
> Thanks.  Looks great!
> 
> Do you think we should also support CDATA for charset, in the format printed
> out by FcPatternPrint()?  We can always add that later, so I'm happy to go
> ahead and commit what you've done so far.

I'm not sure. but yeah, if there are any strong demand for that, we could go back to work on that later. I'm comfortable with current implementation so far.
Comment 19 Behdad Esfahbod 2010-12-02 13:51:18 UTC
Can you also add the support for <range> to the <blank> element please?
Comment 20 Akira TAGOH 2010-12-02 16:40:58 UTC
(In reply to comment #19)
> Can you also add the support for <range> to the <blank> element please?

Sure. I'll work on that this weekend or Monday because I'm away from now
Comment 21 Akira TAGOH 2010-12-05 19:25:07 UTC
hmm, I seemed mess up the tree with git-rebase. so merged changes into one on new branch:
http://cgit.freedesktop.org/~tagoh/fontconfig/commit/?h=charset-modification-take3&id=41d3a5dc0f0aa6da80504684dc9d505651620106

Also adding the range support in blank was done:
http://cgit.freedesktop.org/~tagoh/fontconfig/commit/?h=range-support-for-blank&id=0461ef0c867c43eaa6d6524707df00a2cb617e5d


Hope I'm not missing anything there.
Comment 24 Behdad Esfahbod 2010-12-28 01:02:07 UTC
Pushed to master.  Please test.  Thanks!
Comment 25 Akira TAGOH 2010-12-28 06:34:38 UTC
That works for me. thanks!