Bug 51345

Summary: FILEOPEN: x:num attribute is not handled while importing HTML files created by Excel 2003
Product: LibreOffice Reporter: Marek Ozana <Marek.Ozana>
Component: SpreadsheetAssignee: Not Assigned <libreoffice-bugs>
Status: NEW --- QA Contact:
Severity: normal    
Priority: medium CC: erack, libreoffice, markus.mohrhard, miniopl, serval2412
Version: 3.4.4 release   
Hardware: Other   
OS: All   
Whiteboard: BSA
i915 platform: i915 features:
Attachments: Excel file with list of companies and their respective financial data
screenshot

Description Marek Ozana 2012-06-22 11:34:02 UTC
Created attachment 63358 [details]
Excel file with list of companies and their respective financial data

Problem description: 
When opening the excel file "REON-Table.xls" the sheet is empty. No error message. The same excel file contains numbers and text when opened in MS Excel.
Please find the file in attachment.

Steps to reproduce:
1. Start LibreOffice Calc
2. File->OPen
3. Select REON-Teble.xls

Current behavior:
progress bar shows when opening file. then no data are displayed

Expected behavior:
to show the data available in the file

Platform (if different from the browser): Ubuntu 10.10
              
Browser: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:13.0) Gecko/20100101 Firefox/13.0.1
Comment 1 Urmas 2012-06-23 08:18:22 UTC
In 3.5, cells with numbers are still not imported.
Comment 2 Julien Nabet 2012-06-29 23:55:43 UTC
Created attachment 63628 [details]
screenshot

On pc Debian x86-64, with master sources (future 3.7) updated today.

LO asked about Import Options (Select the language : automatic or Custom, I chose Automatic) + Detect Special Numbers (I tried with option unchecked then checked, same result)
Then I got the result of the screenshot.

Did you have the same result ?
Comment 3 Marek Ozana 2012-07-02 10:09:54 UTC
I get just empty sheet in Ubuntu 11.10, LibreOffice 3.4.4.
Even the screenshot (attachment 63358 [details]) shows incomplete import since the table is filled with numbers for all columns and rows in Excel
Comment 4 Mirosław Zalewski 2013-03-09 16:57:52 UTC
In fact, Calc behaves correctly.

This XLS file is really HTML. It contains one huge table and has empty <td> tags (table cells) where numbers ought to be. Some moron at Microsoft decided, that instead of exporting numbers to <td> content (so any HTML-compliant app could read them), they will write them in x:num attribute.

Of course x:num is NOT correct HTML attribute and Calc - as every good-behaving user agent should - ignores them.

You may try downloading file attached by Marek Ozana, renaming it to "REON-Tables.html" and opening in web browser. Table will be mostly empty, as in Calc.

So, while Calc behavior is correct and expected, it can lead to interoperability problem. This particular file declares to be created by MS Office Excel 2003 (which is rather old), but:
a) who knows how many "XLS" files like this are there on the wild
b) who knows whether newer Excel versions are saner

I am changing title of this bug, so it will show point of this bug more accurately.
Comment 5 Akhilesh 2013-04-27 08:22:05 UTC
We are having quite a few issues because our bank refuses to upgrade their Office or use any format other than xls (we suggested csv)...

Can you suggest some work around? Can I write a plugin that taps into the file when it is being open, to extract the xnum attribute as a value?
Comment 6 Julien Nabet 2013-05-01 17:41:16 UTC
Kohei/Markus/Eike: would 1 of you have some time to give his opinion about this?
Is it a bug, an enhancement?
Comment 7 Urmas 2013-05-02 19:14:59 UTC
There's also "x:fmla" attribute which isn't imported too.
Comment 8 Markus Mohrhard 2013-05-02 20:28:15 UTC
(In reply to comment #5)
> We are having quite a few issues because our bank refuses to upgrade their
> Office or use any format other than xls (we suggested csv)...
> 
> Can you suggest some work around? Can I write a plugin that taps into the
> file when it is being open, to extract the xnum attribute as a value?

You can fix it in the Libreoffice source code. If you are interested I'll add some code pointers. But be warned that our html parser is one of the worst in the world and it might be easier to write a clean new one based on orcus interfaces. Actually this is a plan for some time now but we just don't have enough time for all this work so it would be awesome if someone here would help.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.