Bug 84266

Summary: FILEOPEN: xls from mac, bad encoding
Product: LibreOffice Reporter: Puggan SE <from_libreoffice>
Component: SpreadsheetAssignee: Not Assigned <libreoffice-bugs>
Status: NEEDINFO --- QA Contact:
Severity: minor    
Priority: medium CC: serval2412
Version: 4.2.6.2 release   
Hardware: Other   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: One column of the file

Description Puggan SE 2014-09-24 07:24:23 UTC
When opening .xls files, sent from a mac user, the non ascii-characters like åäö isn't displayed as åäö, and the character behind is eaten up.

Exemple cells that should say "Göteborg" turn up as "G쉞eborg".

Have tried to select both the Excel-95-xls and the Excel-97-xls option when opening file, with the same result.
Comment 1 Puggan SE 2014-09-24 09:21:03 UTC
when using the "file"-command on the files, i get:
Composite Document File V2 Document, Little Endian, Os: MacOS, Version 10.3, Code page: 10000, Last Saved By: _____, Name of Creating Application: Microsoft Macintosh Excel, Create Time/Date: Tue Aug 12 16:09:53 2014, Last Saved Time/Date: Thu Aug 14 15:14:20 2014, Security: 0
Comment 2 Puggan SE 2014-09-24 11:24:41 UTC
Asked my contact to send a xls file with just 6 words.
That file works fine, and "file"-command say its the same type.
So maybe small files (6 rows) works, but bigger files (5000+ rows) dosn't work.

Can't figurout what convertation is made in the text.

The text "Göteborg"
in UTF-8: 47 * c3 b6 * 74 65 62 6f 72  67
in Latin1: 47 * f6 * 74 65 62 6f 72 67
in mac/CP10000: 47 * 9a * 74 65 62 6f 72 67

The text shown in Libreoffice where i expect "Göteborg":
"G쉞eborg": 47 * ec 89 9e * 65 62 6f 72  67 0a

Where did the "t" (0x74) go, and how is "ö" ending up as "EC 89 9E"
Comment 3 Julien Nabet 2014-09-24 20:42:32 UTC
Would it be possible you attach a file which triggers the problem?
(have in mind that attachments are automatically made public so the file shouldn't contain any private/confidential part).
Comment 4 Puggan SE 2014-09-24 21:44:45 UTC
The files I have are customer contact informations, so I'm not allowed to make thous list public.

Tried to make my source generate a smaller list, that could be public, but that file didn't have that problem. Don't think i can get momre help from that source.

I'l try to dig up some other source, but most friends don't use mac and don't use Excel.
Comment 5 Puggan SE 2014-09-24 21:50:33 UTC
Created attachment 106817 [details]
One column of the file

After opening the file, i removed all but one column, and saved it as csv.
I then piped it by "sort -u" to remove duplicate rows.

Most of the postcites, but not all, looks ok after piping: "iconv -f utf8 -t CP949 | iconv -f macintosh -t utf8"
Comment 6 Julien Nabet 2014-09-29 20:10:23 UTC
On pc Debian x86-64 with master sources updated 3 days ago, I could reproduce this.
But when opening the file with Vi too (a Linux editor), I also have got some "asiatic characters.

Would it be possible you ask from your MacOs user an xls (not csv) file containing just "Göteborg"?
Also, what's the UI language of this user?
Comment 7 Puggan SE 2014-09-29 21:31:17 UTC
I asked for an xls file with 6 postcities, but opening that file worked great, its just the big files, containing 5000+ rows, that give this odd characters.

The csv above is what libreoffice saved after opening one of the "evil" files.

I guess his UI is swedish.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.