Bug 101

Summary: U+200B ZERO WIDTH SPACE isn't being shown as zero width.
Product: xorg Reporter: Aidan Kehoe <kehoea>
Component: Lib/XftAssignee: Keith Packard <keithp>
Status: CLOSED INVALID QA Contact:
Severity: normal    
Priority: high    
Version: unspecified   
Hardware: All   
OS: All   
URL: http://netsoc.tcd.ie/~hcksplat/plan/archive/?e=1054555171
i915 platform: i915 features:

Description Aidan Kehoe 2003-07-15 09:18:39 UTC
At the above URL, I've put &#8203; (U+200B ZERO WIDTH SPACE)into the 
displayed URL addresses so that they would wrap more reasonably--the 
equivalent of inserting \- in TeX. 

When I look at the page using Win32 Mozilla, it displays fine. (Not so
for Opera, but that's another matter.) When I look at it using Mozilla
1.4, &#8203; gets shown as a space. I'm not 100% sure this is a fontconfig
issue; when I go home, I'll try and write a test case to reproduce it. 
I'm opening the bug now so I don't forget it. 

Browser;  Mozilla/5.0 (X11; U; NetBSD i386; en-US; rv:1.4) Gecko/20030713.
fontconfig-config --version; 1.0.1
XFree86 version:
OS; NetBSD 1.6T (a moderately recent CVS snapshot.)
Comment 1 Aidan Kehoe 2003-07-15 23:53:30 UTC
On further experimentation; calling XftDrawString32 with U+200B value as one
the characters yields varying results. With Arial Unicode MS, it's displayed
zero-width. With Tahoma, it's displayed as a I-don't-have-this-character box. 
With Lucidux Mono, it's displayed as a normal space. It's small, but not
zero-width, space with two Adobe fonts, Helvetica and Minion.

With U+FEFF, ZERO WIDTH NO BREAK SPACE, Arial Unicode MS gives a box, Adobe
Helvetica and Minion both give spaces, Luxi Sans gives a very thin box (as
does it with U+200B). 

I submit that, except perhaps in the case of monospace or character cell fonts,
Xft should universally display both these characters as zero width. I quote 
from the Unicode book on Zero Width No-Break Space, chapter 13, section 2; 

that it indicates the absence of word boundaries; however, the former has no
width. For example, this character can be inserted after the fourth character
in the text "base+delta" to indicate that there should be no line-break
between the "e" and the "+". The ZERO WIDTH NO BREAK SPACE can be used to
prevent line breaking with other characters that do not have non-breaking
variants, such as U+2009 THIN SPACE or U+2015 HORIZONAL BAR, by bracketing the

Zero Width Space; The U+200B ZERO WIDTH SPACE indicates a word boundary,
except that it has no width. Zero-width space characters are intended to be
used in languages that have no visible word spacing to represent word breaks,
such as in Thai or Japanese. When text is justified, ZWSP has no effect on
letter spacing--for example, in English or Japanese usage. 

There may be circumstances with other scripts, such as Thai, where extra space
is applied around ZWSP as a result of justification. [refers to a figure.]
This approach is unlike the use fixed-width space characters, such as U+2002
EN SPACE, that have specified width and should not be automatically expanded
during justification."
Comment 2 Aidan Kehoe 2003-07-15 23:54:26 UTC
(And, of course, \- is the TeX equivalent of Soft Hyphen, not ZWSP. Excuse 
the confusion.)
Comment 3 Keith Packard 2003-07-16 10:24:37 UTC
Xft promises only to display the glyphs in the Unicode encoding vector for the
selected font.  It does no interpretation of the Unicode values at all. If you
want to ensure that zwsp is correctly displayed by your application, you will
need to parse the Unicode at a higher level or use non-broken fonts.  Correct
typesetting of Unicode text is far outside the realm of Xft and belongs to
higher level libraries like Pango or STSF.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.