These letters are necessary to render initial vocals in Kurdish (Sorani) written with the Arabic script. Possibly they are also necessary for other languages like Uzbek. Examples: http://www.decodeunicode.org/en/u+fbea http://www.decodeunicode.org/en/u+fbee http://www.decodeunicode.org/en/u+fbf2 http://www.decodeunicode.org/en/u+fbf9 http://www.decodeunicode.org/en/u+fc04 A discussion of these characters is at http://wiki.ferheng.org/doku.php/initial_hamza_in_sorani
Created attachment 19347 [details] Kurdish ligatures
Comment on attachment 19347 [details] Kurdish ligatures If I understood correctly there are some misconceptions about the proposed letters. The code points you cited are from Arabic Presentation Forms-A, these are not meant to be used directly. Instead they are provided for compatibility with legacy implementations. I've tried out the letter combinations in OpenOffice.org and they seem to work fine (see attachment).
Yes, the ligatures work. But if you write them as the combination of two characters, there are still two letters. You cannot address them directly with one keystroke. Or if you delete one, the other is still there. For Arabic that's fine, but in Kurdish they make sense only together. That's why they should be usable directly and not only as a ligature.
(In reply to comment #3) > Yes, the ligatures work. But if you write them as the combination of two > characters, there are still two letters. You cannot address them directly with > one keystroke. Or if you delete one, the other is still there. It should be possible to map one keystroke to multiple code points, this feature is needed for accented Latin letters without precomposed forms, for instance. > For Arabic that's fine, but in Kurdish they make sense only together. That's > why they should be usable directly and not only as a ligature. Still this workaround creates more problems than it solves. For one, as contextual forms they do not shape according to surrounding letters. So you would have to map multiple keys to a single character. I think this is analogous to Dutch "ij" or Spanisch "ll". They are or were seen as one letter rather than two, but still the preferred representations are i+j and l+l. Otherwise searching and collating could yield unexpected results. As the Unicode standards stands (and as implemented e.g. on the Kurdish Wikipedia), you are supposed to use a string of two code points (cf. chapter 2 of the standards). However implementations specially tailored for Kurdish could still treat these as a single letter. As far as I know, this how it's done for various Indic scripts and the accented letters I mentioned above.
Thanks for the clarification.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.