Bug 44211 - RTF FILEOPEN: LibreOffice doesn't select a correct default code page from regional settings (provide option to manually specify charset)
Summary: RTF FILEOPEN: LibreOffice doesn't select a correct default code page from reg...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.5.0 Beta2
Hardware: All All
: medium normal
Assignee: Miklos Vajna
URL:
Whiteboard: target:3.6.0
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-28 03:12 UTC by Aurimas Fišeras
Modified: 2012-04-26 11:17 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
RTF document without code page (207 bytes, application/rtf)
2011-12-28 03:12 UTC, Aurimas Fišeras
Details
How it looks in LibreOffice, WordPad when "Language for non-Unicode programs"=Lithuanian (13.95 KB, image/png)
2011-12-28 03:13 UTC, Aurimas Fišeras
Details
ASCII filter options (5.95 KB, image/png)
2011-12-28 03:18 UTC, Aurimas Fišeras
Details
Fix Lithuanian default text encoding (903 bytes, patch)
2012-04-24 21:44 UTC, Aurimas Fišeras
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aurimas Fišeras 2011-12-28 03:12:11 UTC
Created attachment 54885 [details]
RTF document without code page

Some RTF generators generate documents without defining ANSI code page.

WordPad, Word Viewer opens such documents in default code page that is configured in Windows' regional settings (Language for non-Unicode programs).

LibreOffice opens them in some other code page.
Comment 1 Aurimas Fišeras 2011-12-28 03:13:42 UTC
Created attachment 54886 [details]
How it looks in LibreOffice, WordPad when "Language for non-Unicode programs"=Lithuanian
Comment 2 Aurimas Fišeras 2011-12-28 03:18:21 UTC
Created attachment 54887 [details]
ASCII filter options

One possible "fix" (fixing it in Windows, Linux) in this case would be to offer "RTF Filter Options" dialog to select a correct character set like in "ASCII filter options" dialog.
Comment 3 Christian Lohmaier 2011-12-29 12:49:33 UTC
just using regional settings won't help for linux where nowadays the charset is utf-8 - so a filter-dialog probably is the better choice.
Comment 4 Jean-Baptiste Faure 2012-01-12 22:53:52 UTC
Hi Miklos,

Are you aware of this problem ?

Please feel free to reassign, if you cannot handle this bug.

Best regards. JBF
Comment 5 Florjan 2012-02-23 00:32:10 UTC
(In reply to comment #0)
> Created attachment 54885 [details]
> LibreOffice opens them in some other code page.


I can confirm this bug. I'm using slovenian regional settings in Windows. There's no problem with Open Office 3.3 and I think everything was OK in older versions of LibreOffice. In LibreOffice 3.5 some RTF files are opened in Asian or some other code page.
Comment 6 Jean-Baptiste Faure 2012-03-20 14:03:47 UTC
@Aurimas Fišeras: If I open the RTF file with WordPad under Ubuntu 11.10 with Wine, I get the same thing than you (and me) with LibreOffice 3.5.

Best regards. JBF
Comment 7 Urmas 2012-03-30 02:59:47 UTC
*** Bug 48023 has been marked as a duplicate of this bug. ***
Comment 8 omeringen 2012-04-05 13:38:41 UTC
Problem exists for Turkish RTF characters either. Openoffice & MSOffice is fine.
Comment 9 Mike Kaganski 2012-04-08 20:12:24 UTC
(In reply to comment #0)
> Some RTF generators generate documents without defining ANSI code page.

Looks like setting ANSI code page alone cannot fix the file (see Bug 48446). LO ignores this information anyway. Only /fcharsetN is honored.

Also, MS own products seem to fail to follow their own specifications:
if you define /ansicpgN, MS Word 2010, MS Office WordViewer v.11, as well as MS WordPad in W7 all ignore it, and use the Language for non-Unicode programs. If you set /cpgN, both MS Word 2010 and MS Office WordViewer v.11 ignore it, while WordPad takes it into account.
Comment 10 Mike Kaganski 2012-04-08 20:16:34 UTC
(In reply to comment #6)
> @Aurimas Fišeras: If I open the RTF file with WordPad under Ubuntu 11.10 with
> Wine, I get the same thing than you (and me) with LibreOffice 3.5.

What is the "Language for non-Unicode programs" setting under your Wine? ;)
You will have the same result under any Windows where this setting is not set to the language for which this file was created.
Comment 11 Vitaliy Lotorev 2012-04-09 21:41:45 UTC
this comment is just a copy from bug 48023:

In Writer-Options->Language Settings->Languages I have 'Locale
setting'=>Russian and Western=>Russian; once I replace Russian with any other
language and reopen rtf-file it will show improper characters (checked in
libreoffice 3.4.6).

so:
* LO 3.4.x RTF-import filter treated locale set in Writer options and used it in case encoding wasn't specified in rtf-file
* LO 3.5.x RTF-import filter doesn't treat locale settings at all
Comment 12 Aurimas Fišeras 2012-04-24 21:44:10 UTC
Created attachment 60552 [details]
Fix Lithuanian default text encoding

This patch fixes default text encoding problem only for Lithuanian language. See bug 48023 for details.

Full fix for this bug would be to add default text encodings for all other languages.
Comment 13 Miklos Vajna 2012-04-25 03:54:38 UTC
Hi Aurimas,

Thanks for your patch. I tested it and here is what I see:

Before your patch, when i start LO with LC_ALL=lt_LT, the first three characters of the document is "àèæ" (which looks correct).

After applying your patch, I get: "ąčę", which looks incorrect. Are you sure your patch improves the situation? :-)

Thanks,

Miklos
Comment 14 Aurimas Fišeras 2012-04-25 04:00:12 UTC
(In reply to comment #13)
> Hi Aurimas,
> 
> Thanks for your patch. I tested it and here is what I see:
> 
> Before your patch, when i start LO with LC_ALL=lt_LT, the first three
> characters of the document is "àèæ" (which looks correct).
> 
> After applying your patch, I get: "ąčę", which looks incorrect. Are you sure
> your patch improves the situation? :-)
> 
Yes, I'm sure. See https://en.wikipedia.org/wiki/Lithuanian_alphabet
Comment 15 Miklos Vajna 2012-04-25 04:03:24 UTC
Oh, okay, thanks for confirming. (Needless to say I know ~nothing about the Lithuanian language.) I'll push your patch to master in a bit, then.
Comment 16 Not Assigned 2012-04-25 06:07:44 UTC
Aurimas Fišeras committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=a8c05ae840f2673803d9784600be9a7b734076fc

fdo#44211 (RTF) return default text encoding for Lithuanian
Comment 17 Miklos Vajna 2012-04-25 06:09:24 UTC
Patch is in master, marking as resolved.
Comment 18 omeringen 2012-04-26 06:26:04 UTC
Why is this patch is just only for just one language ? Any general patch available ? We're having same issues with Turkish language either and there might be some of other problematic languages :

(In reply to comment #8)
> Problem exists for Turkish RTF characters either. Openoffice & MSOffice is
> fine.
Comment 19 Miklos Vajna 2012-04-26 11:17:21 UTC
Hi omeringen,

> Why is this patch is just only for just one language ? Any general patch
> available ?

To my best knowledge there is no general solution, since we try to guess an encoding based on locale info, which is never perfect. (LO 3.4 and earlier didn't had such a general solution, either.)

> We're having same issues with Turkish language either and there
> might be some of other problematic languages :

Did you actually test daily builds? Turkish was already fixed with bug 48023, but the original fix didn't contain anything for Lithuanian, what was the fix for this bug.

Please,

- reopen this bug if you have a Lithuanian sample that is imported incorrectly (and was imported correctly in LO 3.4)

- reopen bug 48023 if you have a ru/uk/tr sample that is imported incorrectly (and was imported correctly in LO 3.4)

Thanks,

Miklos