Bug 97251 - Drawing strings with fontset in missing charsets
Summary: Drawing strings with fontset in missing charsets
Alias: None
Product: xorg
Classification: Unclassified
Component: Lib/Xlib (show other bugs)
Version: 7.7 (2012.06)
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
Depends on:
Reported: 2016-08-09 01:09 UTC by will
Modified: 2018-08-10 20:11 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:

libX11 1.6.3 patch showing proposed solution (4.10 KB, patch)
2016-08-19 18:52 UTC, will
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description will 2016-08-09 01:09:51 UTC
I have a program using Xlib where I draw text using XmbDrawString(). I've found that certain characters I try to draw do not show as I expect.

Some information about my environment and setup:

  * Locale: en_CA.UTF-8 and setlocale(LC_ALL, "") set in the program.
  * I load a fontset with XCreateFontSet() with a base font name of only "-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60"
  * XCreateFontSet() reports 4 missing charsets: JISX0208.1983-0, KSC5601.1987-0, GB2312.1980-0, JISX0201.1976-0
  * My X locale database XLC_FONTSET lists ISO8859-1:GL first and ISO10646-1 last, with KSC5601.1987-0 in between.
  * I pass in UTF-8 text to XmbDrawString().

What I see is the majority of ASCII/non-ASCII characters render correctly. There are some that do not. One that does not work is U+2122, the trademark symbol, ™. It shows as '"b'.

I've traced through what is happening and this is my best understanding: The conversion code translates it to KSC5601.1987-0 encoding, which my fontset lacks, and then tries to display it with the ISO8859-1 font.

Here is some information at the code level:

In modules/om/generic/omText.c we convert the input (UTF-8 text) to a charset listed in the X locale database. There are several we try to convert to, in order. In src/xlibi18n/lcUTF8.c we load an ordered list of
preferred encodings, matching that from the X locale database. The U+2122 characters gets converted to the KSC5601.1987-0 charset since it is apparently valid there and this charset comes before ISO10646-1 where it is also valid. KSC5601.1987-0 is a charset my fontset does not have. We end up trying to draw it using ISO8859-1 which appears to be the default due to being in position 0. This leads to the '"b'.

I've confirmed if I drop KSC5601.1987-0 from my X locale database, or skip over it during the conversion, that we convert the trademark symbol to ISO10646-1. Converting to ISO10646-1 is what I expected.

The problem is more extreme if we try to load a fontset with a font with a charset specified, such as with a base font name "-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1". If I do so then ASCII characters translate to ISO8859-1, but since there is no font in the fontset with that charset, they can't be drawn. But they could if we translated them to ISO10646-1.

For a solution I am thinking that during the conversions (in lcUTF8.c, such as in charset_wctocs()) we could favour trying those charsets that are available in the font set. That is, skip those that are missing, and at worst try them last. This would mean in both of the problem cases I describe, the characters would translate to ISO10646-1 and display.

From looking at the code I'm not sure the best way to make this happen though. It may be acceptable design wise as some of the lcUTF8.c code is already fontset aware.

I've already converted my program to use Xft for drawing text. I realize that is probably the recommended way to go these days. I wanted to try to figure out why the Xlib core font system was behaving like this though.

Please let me know if I can provide any more information or if you have any ideas about this.
Comment 1 will 2016-08-19 18:52:23 UTC
Created attachment 125912 [details] [review]
libX11 1.6.3 patch showing proposed solution

With this patch I update the list of preferred encodings to include only those present in a font set. This means that we first try encodings that our font set is aware of. Before the font set was not taken into account which could lead to trying to draw text encoded in a charset that the font set does not contain. We do still try the other encodings as before, but only after (as there is logic elsewhere to try all encodings if the preferred encodings do not match).

I don't expect the patch is acceptable as is but I wanted to try a proof of concept of a solution. Probably there is a better way to do this.

My patch is against libX11 1.6.3 as I was not able to get the git version working in my environment.
Comment 2 GitLab Migration User 2018-08-10 20:11:17 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/lib/libx11/issues/51.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.