At the above URL, I've put ​ (U+200B ZERO WIDTH SPACE)into the
displayed URL addresses so that they would wrap more reasonably--the
equivalent of inserting \- in TeX.
When I look at the page using Win32 Mozilla, it displays fine. (Not so
for Opera, but that's another matter.) When I look at it using Mozilla
1.4, ​ gets shown as a space. I'm not 100% sure this is a fontconfig
issue; when I go home, I'll try and write a test case to reproduce it.
I'm opening the bug now so I don't forget it.
Browser; Mozilla/5.0 (X11; U; NetBSD i386; en-US; rv:1.4) Gecko/20030713.
fontconfig-config --version; 1.0.1
XFree86 version: 18.104.22.168
OS; NetBSD 1.6T (a moderately recent CVS snapshot.)
On further experimentation; calling XftDrawString32 with U+200B value as one
the characters yields varying results. With Arial Unicode MS, it's displayed
zero-width. With Tahoma, it's displayed as a I-don't-have-this-character box.
With Lucidux Mono, it's displayed as a normal space. It's small, but not
zero-width, space with two Adobe fonts, Helvetica and Minion.
With U+FEFF, ZERO WIDTH NO BREAK SPACE, Arial Unicode MS gives a box, Adobe
Helvetica and Minion both give spaces, Luxi Sans gives a very thin box (as
does it with U+200B).
I submit that, except perhaps in the case of monospace or character cell fonts,
Xft should universally display both these characters as zero width. I quote
from the Unicode book on Zero Width No-Break Space, chapter 13, section 2;
"As ZERO WIDTH NO BREAK SPACE, U+FEFF behaves like U+00A0 NO BREAK SPACE in
that it indicates the absence of word boundaries; however, the former has no
width. For example, this character can be inserted after the fourth character
in the text "base+delta" to indicate that there should be no line-break
between the "e" and the "+". The ZERO WIDTH NO BREAK SPACE can be used to
prevent line breaking with other characters that do not have non-breaking
variants, such as U+2009 THIN SPACE or U+2015 HORIZONAL BAR, by bracketing the
Zero Width Space; The U+200B ZERO WIDTH SPACE indicates a word boundary,
except that it has no width. Zero-width space characters are intended to be
used in languages that have no visible word spacing to represent word breaks,
such as in Thai or Japanese. When text is justified, ZWSP has no effect on
letter spacing--for example, in English or Japanese usage.
There may be circumstances with other scripts, such as Thai, where extra space
is applied around ZWSP as a result of justification. [refers to a figure.]
This approach is unlike the use fixed-width space characters, such as U+2002
EN SPACE, that have specified width and should not be automatically expanded
(And, of course, \- is the TeX equivalent of Soft Hyphen, not ZWSP. Excuse
Xft promises only to display the glyphs in the Unicode encoding vector for the
selected font. It does no interpretation of the Unicode values at all. If you
want to ensure that zwsp is correctly displayed by your application, you will
need to parse the Unicode at a higher level or use non-broken fonts. Correct
typesetting of Unicode text is far outside the realm of Xft and belongs to
higher level libraries like Pango or STSF.