|Summary:||U+200B ZERO WIDTH SPACE isn't being shown as zero width.|
|Product:||xorg||Reporter:||Aidan Kehoe <kehoea>|
|Component:||Lib/Xft||Assignee:||Keith Packard <keithp>|
|Status:||CLOSED INVALID||QA Contact:|
|i915 platform:||i915 features:|
Description Aidan Kehoe 2003-07-15 09:18:39 UTC
At the above URL, I've put ​ (U+200B ZERO WIDTH SPACE)into the displayed URL addresses so that they would wrap more reasonably--the equivalent of inserting \- in TeX. When I look at the page using Win32 Mozilla, it displays fine. (Not so for Opera, but that's another matter.) When I look at it using Mozilla 1.4, ​ gets shown as a space. I'm not 100% sure this is a fontconfig issue; when I go home, I'll try and write a test case to reproduce it. I'm opening the bug now so I don't forget it. Browser; Mozilla/5.0 (X11; U; NetBSD i386; en-US; rv:1.4) Gecko/20030713. fontconfig-config --version; 1.0.1 XFree86 version: 18.104.22.168 OS; NetBSD 1.6T (a moderately recent CVS snapshot.)
Comment 1 Aidan Kehoe 2003-07-15 23:53:30 UTC
On further experimentation; calling XftDrawString32 with U+200B value as one the characters yields varying results. With Arial Unicode MS, it's displayed zero-width. With Tahoma, it's displayed as a I-don't-have-this-character box. With Lucidux Mono, it's displayed as a normal space. It's small, but not zero-width, space with two Adobe fonts, Helvetica and Minion. With U+FEFF, ZERO WIDTH NO BREAK SPACE, Arial Unicode MS gives a box, Adobe Helvetica and Minion both give spaces, Luxi Sans gives a very thin box (as does it with U+200B). I submit that, except perhaps in the case of monospace or character cell fonts, Xft should universally display both these characters as zero width. I quote from the Unicode book on Zero Width No-Break Space, chapter 13, section 2; "As ZERO WIDTH NO BREAK SPACE, U+FEFF behaves like U+00A0 NO BREAK SPACE in that it indicates the absence of word boundaries; however, the former has no width. For example, this character can be inserted after the fourth character in the text "base+delta" to indicate that there should be no line-break between the "e" and the "+". The ZERO WIDTH NO BREAK SPACE can be used to prevent line breaking with other characters that do not have non-breaking variants, such as U+2009 THIN SPACE or U+2015 HORIZONAL BAR, by bracketing the character. Zero Width Space; The U+200B ZERO WIDTH SPACE indicates a word boundary, except that it has no width. Zero-width space characters are intended to be used in languages that have no visible word spacing to represent word breaks, such as in Thai or Japanese. When text is justified, ZWSP has no effect on letter spacing--for example, in English or Japanese usage. There may be circumstances with other scripts, such as Thai, where extra space is applied around ZWSP as a result of justification. [refers to a figure.] This approach is unlike the use fixed-width space characters, such as U+2002 EN SPACE, that have specified width and should not be automatically expanded during justification."
Comment 2 Aidan Kehoe 2003-07-15 23:54:26 UTC
(And, of course, \- is the TeX equivalent of Soft Hyphen, not ZWSP. Excuse the confusion.)
Comment 3 Keith Packard 2003-07-16 10:24:37 UTC
Xft promises only to display the glyphs in the Unicode encoding vector for the selected font. It does no interpretation of the Unicode values at all. If you want to ensure that zwsp is correctly displayed by your application, you will need to parse the Unicode at a higher level or use non-broken fonts. Correct typesetting of Unicode text is far outside the realm of Xft and belongs to higher level libraries like Pango or STSF.