Summary: | Pasting test copied from Adobe Acrobat Reader messes up non-ascii characters | ||
---|---|---|---|
Product: | LibreOffice | Reporter: | matteo sisti sette <matteosistisette> |
Component: | Libreoffice | Assignee: | Not Assigned <libreoffice-bugs> |
Status: | NEW --- | QA Contact: | |
Severity: | major | ||
Priority: | medium | CC: | bugs, stfhell, thb |
Version: | 3.3.0 release | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
matteo sisti sette
2010-11-11 10:46:30 UTC
I have reproduced the problem. The text is pasted correctly into evolution or pidgin => OOo really could do a better job here. Cedric, any chance to look at it? I copy some information from Bug 52229 here, where this issue was also discussed. In short: The bug shows up on Linux and MacOS X. Non-7-bit-ASCII characters are badly converted due to the way the data are passed through the clipboard. LibreOffice reads the text/rtf format from the clipboard, many other applications UTF8_STRING format. I believe that Adobe Reader encodes bad RTF but would like this to be checked by someone familiar with the RTF specification. If the bug is in Adobe Reader (which I believe), LO cannot correct it, of course. It is very annoying, however, although you can paste text from Adobe Reader as UTF-8 (via Ctrl+V). How much sense would it make to correct Adobe Reader's bugs from inside LO? If there is an easy way to detect badly encoded RTF, LO could possibly prefer UTF-8 as default paste format in case of Adobe Reader. Here the quotes from Bug 52229: Comment_6 (Linux): (1) Special characters (éÉÄÖÜß) are misread and converted into 2-character sequences Comment_7 (MacOS): I can reproduce something similar to issue (1), but with a little difference: for me, all non-ASCII characters are replaced by simple dots (.). I can confirm that pasting the text as as unformatted text avoids this: all special characters are treated correctly. Comment_8 (Linux): Concerning issue (1), the bad conversion of non-ASCII-7-characters: I copied the string "Köpfen" from a PDF and this is what Adobe Reader put in the clipboard in format "text/rtf": {\rtf1\ansi\uc1 {\fonttbl\f0\froman TVTPJB+CaslonBookBE-Regular;}\pard\plain\ql\f0\fs20 {\fs22 K\'C3\'B6pfen}} Microsoft's RTF standard 1.9.1 says: "Text characters can be handled using the 16-bit Unicode character-encoding scheme defined in this section. Expressing this text in RTF required a new mechanism, because until Word 97, RTF handled only 7-bit characters directly and 8-bit characters encoded as hexadecimal using \'xx." So, Adobe Reader writes the "ö" UTF-8-encoded in the normal RTF notation for ANSI characters. UTF-8 is not a defined encoding scheme in RTF. As far as I can see, the "ö" should be encoded in a suitable ANSI character set (code page) or as a Unicode character using the "\u" command ("\uc1" configures the RTF reader to expect decimal representation of U+00F6 = UTF-8 C3-B6 followed by the ANSI representation of "ö": in RTF "\u246\'f6" ). I can confirm this behaviour (see bug 52229, all important information from there collected here in comment #2 by stfhell -- thank you for that!). Improving some fields: * I can reproduce this behaviour since LibO 3.3.0 → adapting Version field. * Lowering Importance a little bit -- this is a major problem, yes, but a) it is caused by bug in Adobe Reader (see comment #2), b) there is no crash, loss of (native) LibO files, etc., therefore not “critical”. * Platform changed to “All”, because reproducible in Linux, Mac and Linux. Please read this message in its entirety before responding. Your bug was confirmed at least 1 year ago and has not had any activity on it for over a year. Your bug is still set to NEW which means that it is open and confirmed. It would be nice to have the bug confirmed on a newer version than the version reported in the original report to know that the bug is still present -- sometimes a bug is inadvertently fixed over time and just never closed. If you have time please do the following: 1) Test to see if the bug is still present on a currently supported version of LibreOffice (preferably 4.2 or newer). 2) If it is present please leave a comment telling us what version of LibreOffice and your operating system. 3) If it is NOT present please set the bug to RESOLVED-WORKSFORME and leave a short comment telling us your version and Operating System Please DO NOT 1) Update the version field 2) Reply via email (please reply directly on the bug tracker) 3) Set the bug to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + LibreOffice is powered by a team of volunteers, every bug is confirmed (triaged) by human beings who mostly give their time for free. We invite you to join our triaging by checking out this link: https://wiki.documentfoundation.org/QA/BugTriage There are also other ways to get involved including with marketing, UX, documentation, and of course developing - http://www.libreoffice.org/get-help/mailing-lists/. Lastly, good bug reports help tremendously in making the process go smoother, please always provide reproducible steps (even if it seems easy) and attach any and all relevant material |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.