Summary: | FILEOPEN: xls from mac, bad encoding | ||
---|---|---|---|
Product: | LibreOffice | Reporter: | Puggan SE <from_libreoffice> |
Component: | Spreadsheet | Assignee: | Not Assigned <libreoffice-bugs> |
Status: | NEEDINFO --- | QA Contact: | |
Severity: | minor | ||
Priority: | medium | CC: | serval2412 |
Version: | 4.2.6.2 release | ||
Hardware: | Other | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | One column of the file |
Description
Puggan SE
2014-09-24 07:24:23 UTC
when using the "file"-command on the files, i get: Composite Document File V2 Document, Little Endian, Os: MacOS, Version 10.3, Code page: 10000, Last Saved By: _____, Name of Creating Application: Microsoft Macintosh Excel, Create Time/Date: Tue Aug 12 16:09:53 2014, Last Saved Time/Date: Thu Aug 14 15:14:20 2014, Security: 0 Asked my contact to send a xls file with just 6 words. That file works fine, and "file"-command say its the same type. So maybe small files (6 rows) works, but bigger files (5000+ rows) dosn't work. Can't figurout what convertation is made in the text. The text "Göteborg" in UTF-8: 47 * c3 b6 * 74 65 62 6f 72 67 in Latin1: 47 * f6 * 74 65 62 6f 72 67 in mac/CP10000: 47 * 9a * 74 65 62 6f 72 67 The text shown in Libreoffice where i expect "Göteborg": "G쉞eborg": 47 * ec 89 9e * 65 62 6f 72 67 0a Where did the "t" (0x74) go, and how is "ö" ending up as "EC 89 9E" Would it be possible you attach a file which triggers the problem? (have in mind that attachments are automatically made public so the file shouldn't contain any private/confidential part). The files I have are customer contact informations, so I'm not allowed to make thous list public. Tried to make my source generate a smaller list, that could be public, but that file didn't have that problem. Don't think i can get momre help from that source. I'l try to dig up some other source, but most friends don't use mac and don't use Excel. Created attachment 106817 [details]
One column of the file
After opening the file, i removed all but one column, and saved it as csv.
I then piped it by "sort -u" to remove duplicate rows.
Most of the postcites, but not all, looks ok after piping: "iconv -f utf8 -t CP949 | iconv -f macintosh -t utf8"
On pc Debian x86-64 with master sources updated 3 days ago, I could reproduce this. But when opening the file with Vi too (a Linux editor), I also have got some "asiatic characters. Would it be possible you ask from your MacOs user an xls (not csv) file containing just "Göteborg"? Also, what's the UI language of this user? I asked for an xls file with 6 postcities, but opening that file worked great, its just the big files, containing 5000+ rows, that give this odd characters. The csv above is what libreoffice saved after opening one of the "evil" files. I guess his UI is swedish. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.