Bug 82689 - VIEWING: U+3000 IDEOGRAPHIC SPACE (CJK full width space) and other spaces not rendered as non-printing characters in Writer
Summary: VIEWING: U+3000 IDEOGRAPHIC SPACE (CJK full width space) and other spaces not...
Status: NEEDINFO
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version: 4.3.0.4 release
Hardware: Other All
: medium normal
Assignee: Not Assigned
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: CJK-METABUG
  Show dependency treegraph
 
Reported: 2014-08-16 05:32 UTC by Matthew Francis
Modified: 2014-08-25 14:51 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Sample document with spaces (25.60 KB, application/vnd.oasis.opendocument.text)
2014-08-16 05:34 UTC, Matthew Francis
Details
Document rendered without non-printing characters enabled (94.71 KB, image/png)
2014-08-16 05:35 UTC, Matthew Francis
Details
Document rendered with non-printing characters enabled (106.12 KB, image/png)
2014-08-16 05:35 UTC, Matthew Francis
Details

Description Matthew Francis 2014-08-16 05:32:36 UTC
U+3000 IDEOGRAPHIC SPACE, which is a wide space used in CJK text, does not show visibly as a non-printing character when View -> Non-printing Characters is enabled in Writer.

Please see the attached document, which contains various sorts of space (ensure that View -> Non-printing Characters is enabled).

Currently, U+0020 SPACE and U+00A0 NO-BREAK SPACE are rendered correctly, but there are various other sorts of Unicode space which are not. While U+3000 IDEOGRAPHIC SPACE is almost certainly the most used of these, perhaps consideration should be given to making all on this list of space characters visible:

Non-zero-width spaces

U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2007 FIGURE SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+202F NARROW NO-BREAK SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE

Zero-width spaces

U+200B ZERO WIDTH SPACE
U+FEFF ZERO WIDTH NO-BREAK SPACE

(Interestingly, U+200B ZERO WIDTH SPACE shows as a sort of visible space whether or not View -> Non-printing Characters is enabled. Perhaps the handling of this should be unified with other non-printing characters?)
Comment 1 Matthew Francis 2014-08-16 05:34:30 UTC
Created attachment 104700 [details]
Sample document with spaces
Comment 2 Matthew Francis 2014-08-16 05:35:35 UTC
Created attachment 104701 [details]
Document rendered without non-printing characters enabled
Comment 3 Matthew Francis 2014-08-16 05:35:57 UTC
Created attachment 104702 [details]
Document rendered with non-printing characters enabled
Comment 4 Owen Genat 2014-08-23 15:32:38 UTC
(In reply to comment #0)
> U+3000 IDEOGRAPHIC SPACE, which is a wide space used in CJK text, does not
> show visibly as a non-printing character when View -> Non-printing
> Characters is enabled in Writer.

There is certainly no Interpunct character displayed over the Ideographic Space (U+3000) when Non-printing characters are displayed. There are possibly cultural reasons for this, given that the Middle Dot (U+00B7), which is used for Space (U+0020) and No-break Space (U+00A0), is in the Basic Latin block and some Asian scripts use a centralised dot for a full stop.

According to http://en.wikipedia.org/wiki/Interpunct these are the main Asian language preferences:

Chinese: "In Taiwan the Unicode code point U+2027, Hyphenation Point, is recommended by government as a fullwidth punctuation to separate the given name and the family name of non-Chinese." and "In Chinese, the middle dot is also fullwidth in printed matter, but the regular middle dot (·) is used in computer input, which is then rendered as fullwidth in Chinese-language fonts."

Japanese: "Interpuncts are often used to separate transcribed foreign words written in katakana. [...] the Japanese writing system usually does not use space or punctuation to separate words." and "U+30FB ・ katakana middle dot" and "U+FF65 ・ halfwidth katakana middle dot."

Korean: "Interpuncts are used in written Korean to denote a list of two or more words, more or less in the same way a slash (/) is used to juxtapose words in many other languages." and "The use of interpuncts has declined in years of digital typography and especially in place of slashes, but, in the strictest sense, a slash cannot replace a middle dot in Korean typography." and "U+318D ㆍ hangul letter araea (아래아) is used more than a middle dot when a interpunct is to be used in Korean typography."

In accordance with this I am setting the status to NEEDINFO as Asian language (l10n) experts are required to comment further on what would be considered acceptable practice.

> U+FEFF ZERO WIDTH NO-BREAK SPACE

Please note that use of U+FEFF as ZWNBSP is deprecated since 2002 (Unicode v3.2) and the Word Joiner (U+2060) is recommended to be used in its place.
Comment 5 Matthew Francis 2014-08-24 05:05:49 UTC
Thanks for the above comment.
Note that one mitigating factor to the other uses for • in CJK text is that, as of current master (4.4), the non-printing characters are displayed in blue text, rather than black, so there is some contrast there by default.

For comparison, Word for Mac 2011 appears to use a rectangle the width of the ideographic space for this case. This might be a reasonable model to follow.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.