Summary: | Regular expression support in comparison for <test> | ||
---|---|---|---|
Product: | fontconfig | Reporter: | Akira TAGOH <akira> |
Component: | library | Assignee: | fontconfig-bugs |
Status: | RESOLVED MOVED | QA Contact: | Behdad Esfahbod <freedesktop> |
Severity: | enhancement | ||
Priority: | medium | CC: | freedesktop |
Version: | 2.8 | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Akira TAGOH
2011-09-06 00:41:39 UTC
Initial patch to support the regexp: http://cgit.freedesktop.org/~tagoh/fontconfig/commit/?h=bz40648&id=c7f6c83593bfde5d1d18478bc23b75a69a939c22 tested with: <?xml version="1.0"?> <!DOCTYPE fontconfig SYSTEM "fonts.dtd"> <fontconfig> <match> <test name="lang" compare="regex"> <string>ja.*</string> </test> <edit name="family" mode="prepend"> <string>DejaVu Sans</string> </edit> </match> </fontconfig> $ FC_DEBUG=4 fc-match :lang=ja ... Add Subst match pattern any lang Regex "ja.*" edit Edit family Prepend "DejaVu Sans"; ... FcConfigSubstitute test pattern any lang Regex "ja.*" Substitute match pattern any lang Regex "ja.*" edit Edit family Prepend "DejaVu Sans"; Prepend list before "sans-serif"(w) Prepend list after "DejaVu Sans"(w) "sans-serif"(w) FcConfigSubstitute editPattern has 2 elts (size 16) family: "DejaVu Sans"(w) "sans-serif"(w) lang: ja(s) ... $ FC_DEBUG=4 fc-match :lang=ja-jp ... FcConfigSubstitute test pattern any lang Regex "ja.*" Substitute match pattern any lang Regex "ja.*" edit Edit family Prepend "DejaVu Sans"; Prepend list before "sans-serif"(w) Prepend list after "DejaVu Sans"(w) "sans-serif"(w) FcConfigSubstitute editPattern has 2 elts (size 16) family: "DejaVu Sans"(w) "sans-serif"(w) lang: ja-jp(s) No, this is not the right approach. What's not working right now? Lang testing is not string testing. (In reply to comment #3) > No, this is not the right approach. What's not working right now? Lang > testing is not string testing. Right. and strictly speaking the above patch doesn't test the string but adding the lang name according to the result of the regexp and check if the pattern matches the string sets. the behavior is more intuitive IMHO. maybe not enough for explanation... This feature would gives us an easy way to test the multiple lang name. this feature doesn't provide the detailed comparison between FcLangSet that modified a bit for the special case though, that would provide similar functionality when creating FcLangSet against the string. I'm not sure if this is really useful example, but possibly functionality though, given that there are any requirements to apply something for CJK only, we could do: <match> <test name="lang" compare="regex"> <string>zh|ja|ko</string> </test> .... </match> say. according to Bug#33644, there are no smart way to do that in fontconfig so far, except having 3 different <match/> rules that isn't really smart. FWIW the previous comment somewhat contains a false alarm; FcLangSet contains the invalid lang name, it would be same to compare the string as you said. I'm not quite sure what "extra" field in FcLangSet is used for. that could be improved if one can creates the strict FcLangSet when building the pattern. another idea for regexp use case is: <match> <test name="psname" mode="regex"> <string>(.*)\-(UniJIS\-UTF8\-H)$</string> </test> <edit name="family" mode="regex_replace"> <string>\1</string> </edit> <edit name="pscmap" mode="regex_replace"> <string>\2</string> </edit> <edit name="lang" mode="assign"> <langset><string>ja</string></langset> </edit> </match> We could have the bunch of rules against CMaps to determine the family name and the lang according to the psname in the pattern. (In reply to comment #6) > another idea for regexp use case is: > > <match> > <test name="psname" mode="regex"> > <string>(.*)\-(UniJIS\-UTF8\-H)$</string> > </test> > <edit name="family" mode="regex_replace"> > <string>\1</string> > </edit> > <edit name="pscmap" mode="regex_replace"> > <string>\2</string> > </edit> > <edit name="lang" mode="assign"> > <langset><string>ja</string></langset> > </edit> > </match> > > We could have the bunch of rules against CMaps to determine the family name > and the lang according to the psname in the pattern. This is much harder to implement in the current codebase. One thing I don't like is matching for things like "pa.*" as that would also match things like "par". But I guess that can be fixed by a more involved regexp. I'm not opposing this per se, just pointing out details that need to be taken into consideration. (In reply to comment #7) > One thing I don't like is matching for things like "pa.*" as that would also > match things like "par". But I guess that can be fixed by a more involved > regexp. Indeed. it sounds to me like we were trying a bit harder on the lang comparison too. then I feel we shouldn't allowed <test name="lang"><string>xx</string></test>. instead, using <langset><string>xx</string></langset> may be somewhat easy to imagine it's not something can be done by the string operation. then we can make this feature for string specific operation. FWIW my main interests on this feature is comment#6 now, but not original one. this is more important to implement PostScript related features in bz. Doh, s/in bz/in fontconfig/ How much bad is it to make this feature which is limited to the string? we could have special behavior for lang or charset perhaps on this but it looks like inconsistent behavior and a bit concerned one is getting confused on it. Another idea is, as some comparison mode fall backs to other mode according to its value type, it can do fall back to eq or so but with warning for lang and charset and so on maybe, which isn't supposed to be the string. another use case that I'm keen to see is: <match> <edit name="family_copy" mode="assign"> <name>family</name> </edit> </match> <match> <test name="family_copy" mode="regex"> <string>[[:space:]]</string> </test> <edit name="family_copy" mode="regex_replace_all"> <string>-</string> </edit> </match> <match> <test name="psname" mode="eq" qual="all"> <string></string> </test> <edit name="psname" mode="assign"> <name>family_copy</name> </edit> </match> I'm expecting with the above rules to replace all of the white spaces to '-' and set to 'psname' in the pattern if not available. How about defining the regexps Perl-style, ie: <string>s/( *)/-/g</string> ? (In reply to comment #11) > How about defining the regexps Perl-style, ie: > > <string>s/( *)/-/g</string> > > ? Hmm, yeah, it would be easier way to implement this feature, but somewhat not making sense for syntax-wise because it can be done for editing in one line. or shall we add <regex> and allow that in this block only instead of <test compare="regex">? One way or other, it's much easier than trying to match "\1" to what a <match> block matched. Or maybe you have better ideas? Well, that said, the advantage of this way would be that it can be applied to the different objects at the same time in <edit> blocks as I wrote in comment#6. IIUC in your suggestion, it can be applied to the objects only that specified in <test> and need to have similar <string/> lines for other objects right? it may be somewhat hard to understand how it works without any examples, but IMHO regexp itself is sort of that so the situation can be improved if we have more examples according to the use case. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/fontconfig/fontconfig/issues/7. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.