Bug 41369 - FILEOPEN: Cell borders not recognized in XLS import
Summary: FILEOPEN: Cell borders not recognized in XLS import
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
3.4.3 release
Hardware: Other All
: medium normal
Assignee: Kohei Yoshida
URL:
Whiteboard: target:3.5
Keywords: regression
Depends on:
Blocks:
 
Reported: 2011-09-30 12:49 UTC by Valek Filippov
Modified: 2011-12-23 15:46 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Original XLS file (28.00 KB, application/msexcel)
2011-09-30 12:49 UTC, Valek Filippov
Details
screenshot of the file opened in xl2k7 (13.49 KB, image/png)
2011-09-30 12:52 UTC, Valek Filippov
Details
How it looks like in LibO 3.4.3 (32.45 KB, image/png)
2011-09-30 12:54 UTC, Valek Filippov
Details
How it looks like in LibO 3.3.3 (130.26 KB, image/png)
2011-09-30 13:01 UTC, Valek Filippov
Details
Reduced file with borders made in 1C way (25.00 KB, application/msexcel)
2011-09-30 18:43 UTC, Valek Filippov
Details
Same as in comment #5 but made by XL2k7 (25.00 KB, application/msexcel)
2011-09-30 18:55 UTC, Valek Filippov
Details
For "1 vs 1a", to check which border "wins", file 1 (25.00 KB, application/msexcel)
2011-10-01 05:38 UTC, Valek Filippov
Details
For "1 vs 1a", to check which border "wins", file 2 (25.00 KB, application/msexcel)
2011-10-01 05:39 UTC, Valek Filippov
Details
For "1 vs 1a", to check which border "wins", file 3 (25.00 KB, application/msexcel)
2011-10-01 05:52 UTC, Valek Filippov
Details
For "1 vs 1a", to check which border "wins", file 4 (25.00 KB, application/msexcel)
2011-10-01 07:10 UTC, Valek Filippov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Valek Filippov 2011-09-30 12:49:25 UTC
Created attachment 51807 [details]
Original XLS file
Comment 1 Valek Filippov 2011-09-30 12:52:12 UTC
Created attachment 51808 [details]
screenshot of the file opened in xl2k7

Screenshot made in xl2k7 (i.e. "how it should look like"). 
Rows 4 to 10 (out of the screenshot) contain some merged cells with borders, which are ok in 3.3.3.
Comment 2 Valek Filippov 2011-09-30 12:54:00 UTC
Created attachment 51809 [details]
How it looks like in LibO 3.4.3

Fonts attributes and size missed, no borders.
Comment 3 Valek Filippov 2011-09-30 13:01:22 UTC
Created attachment 51810 [details]
How it looks like in LibO 3.3.3

It was much better in LibO 3.3.3.
I'm listing 3.3.3 problems here:

1. These cell borders should be thick.
1a. gnumeric shows other but this cell border one properly, so it should be something additionally wrong with the file.
2. Row height is taken from the file, but it should be different (see https://bugzilla.gnome.org/show_bug.cgi?id=614399 comment #5 for analysis)
3. Font size is "8", LibO 3.3.3 imported it as "10" (plus #2 issue).
4. Most likely it's incorrect column width related issue.
(LibO 3.4.3 seems also lose 'multiline' attribute for that cell in row #20)

Z. (Enhancement) It would be nice to not overprint column IDs in case of narrow columns.
Comment 4 Valek Filippov 2011-09-30 13:11:02 UTC
Original file was made in "1C" program, which is well-known of doing not-100%-correct XLS files. (See fdo#33100 for example).
But still it's very important for users from Russian Federation ("1C" seems to be dominant accounting program there) and based on LibO 3.3.3 vs LibO 3.4.3 results it seems to be a regression in LibO 3.4.3 code.

I'm going to analyse "1 vs 1a" case from comment #3 for gnumeric, will update with my findings later.
Comment 5 Valek Filippov 2011-09-30 18:43:10 UTC
Created attachment 51830 [details]
Reduced file with borders made in 1C way
Comment 6 Valek Filippov 2011-09-30 18:55:24 UTC
Created attachment 51832 [details]
Same as in comment #5 but made by XL2k7

Cell D7 in this file and file from comment #5 should have thin border on top, left and right sides and thick border on bottom side.
XL shows both files properly.
LibO and gnumeric both fail to make bottom border thic in "1C" file, LibO also doesn't make it right for "XL" file.

In both files bottom border of cell D7 is actually made as top border of cell D8.
Suitable XF records are 0x3e and 0x3f.

XL saves 0x3e as "dgLeft = 1, dgRight = 1, dgTop = 1, dgBottom = 0" and 0x3f as "dgTop = 2".
1C saves 0x3e as "dgLeft = 1, dgRight = 1, dgTop = 1, dgBottom = 1" and 0x3f as "dgTop = 2".

It looks like XL overrides border of D7 by border of D8.
Probably LibO should do the same for imported XLS files for compatibility.
Comment 7 Rainer Bielefeld Retired 2011-10-01 00:19:42 UTC
A cell border problem is[Reproducible] with reporter's sample document and "LibreOffice 3.4.3 RC2 - WIN7 Home Premium (64bit) German UI [OOO340m1 (Build:302)]". Cell areas A20:BL26 and others are shown without black borders. 

Border information is completely lost, when the document has been saved with LibO 3.4.3 OOo and LibO 3.3.3 no onger will show black cell borders.

Opening document with "LibreOffice Portable 3.3.3  - WIN7  Home Premium (64bit) German UI [OOO330m19 (Build:301  Tag 3.3.3.1)]" shows borders (but other problems concerning text adjustment, row hight, ... for what separate reports should be submitted)

In MS EXCEL Viewer everything looks perfect.

A sample.xls saved from LibO 3.3.3 shows borders when opened with 3.4.3., strange!

@Reporter:
We can't handle such "Previously everything was better"-reports, what contain several issues that might have completely different roots and should be fixed by different developers.

@Kohei:
Please feel free to reassign (or reset Assignee to default) if it’s not your area or if provided information is not sufficient. Please set Status to ASSIGNED if you accept this Bug.
Comment 8 Valek Filippov 2011-10-01 05:09:50 UTC
@Rainer
<moggi> frob: then open a bug report please and add kohei and me in cc
<frob> moggi: ok, thank you
<frob> moggi: I believe it's more than one 'atomic' bug, would it be ok for you if multiple problems are reported in one bug?
<moggi> frob: yes

@Rainer
>A sample.xls saved from LibO 3.3.3 shows borders when opened with 3.4.3.,
>strange!
Most likely 3.3.3 fixes issues in the file, so 3.4.3 can open it properly, see comment #4.
You can check it with re-lab's "OLE toy" program (https://gitorious.org/re-lab/tools/).
With fdo41369_1cborders.xls as an example, you need to find 'Blank' records in the first 'BOF (Sheet)'. (Enter 0:1:29 in the bottom-left entry line and press 'enter', oletoy will scroll tree to the first 'Blank' leaf).
From 'Blank' record you need to read last 2 bytes, that would be ID of the XF record used to set cell attributes. In that case it's '0x3e 0x00' => "0x3e".
Scroll back to 'BOF (Book)' and find 'XF 3e' record. (As always in GtkTree you can put focus in the tree and start typing 'XF 3e', it will scroll to the right entry. Or use entry line again with 0:0:112 as a path.)
Select this 'XF 3e' entry. In the top right area oletoy will show parsed content. You need 'dgLeft', 'dgRight', 'dgTop' and 'dgBottom'.

@moggi,kohei:
Assumption that XL overrides D7 with D8 in comment #6 is wrong.
Based on the second test it's currently "thickest border wins".
I'm going to check single/double line borders etc to list how XL deals with invalid files.
Comment 9 Markus Mohrhard 2011-10-01 05:28:25 UTC
Hello Rainer,

this bug report is perfectly fine. This bug report is not directly about a bug in calc's xls handling but that we may now have problems with some not 100% correct documents we didn't have in 3.3.

And Valek is known to us for his great work analyzing binary formats, so that what he describes here in one bug report is more detailed and helpful than splitting these (related) problems into several bug reports that may miss the overall context.
Comment 10 Valek Filippov 2011-10-01 05:38:38 UTC
Created attachment 51839 [details]
For "1 vs 1a", to check which border "wins", file 1
Comment 11 Valek Filippov 2011-10-01 05:39:02 UTC
Created attachment 51840 [details]
For "1 vs 1a", to check which border "wins", file 2
Comment 12 Valek Filippov 2011-10-01 05:52:04 UTC
Created attachment 51841 [details]
For "1 vs 1a", to check which border "wins", file 3

Results of tests

d7		d8			result
--------------------------------------------------
1 (thin)	2 (medium)		2 (medium)
2 (medium)	1 (thin)		2 (medium)
2 (medium)	5 (thick)		5 (thick)
2 (medium)	3 (dashed)		2 (medium)
3 (dashed)	2 (medium)		2 (medium)
6 (double)	2 (medium)		6 (double)
2 (medium)	8 (med-dash)		2 (medium)
8 (med-dash)	2 (medium)		2 (medium)
6 (double)	5 (thick)		6 (double)
2 (medium)	7 (hair)		2 (medium)
7 (hair)	2 (medium)		2 (medium)
5 (thick)	6 (double)		6 (double)
--------------------------------------------------

double > thick > medium > thin > hair
solid > dashed
Comment 13 Valek Filippov 2011-10-01 07:10:45 UTC
Created attachment 51844 [details]
For "1 vs 1a", to check which border "wins", file 4

One important case was missed.
1 (thin) 8 (med-dash) -> 8 (med-dash).
Comment 14 Valek Filippov 2011-10-07 12:53:23 UTC
Full list of preferences is
double > thick > medium > medium-dash > medium-dash-dot > slanted dash-dot >
medium dash-dot-dot > thin > dashed > dotted > dash-dot > dash-dot-dot > hair

There is an additional information (eg. test file and screenshot from XL2k7) in the similar gnumeric bug: https://bugzilla.gnome.org/show_bug.cgi?id=660605
Comment 15 Kohei Yoshida 2011-10-25 08:05:47 UTC
Hmm it appears I forgot to set target for this bug...

Let's get this taken care of now.
Comment 16 Kohei Yoshida 2011-10-25 11:43:16 UTC
So, at least I know why we ignore cell formats from these documents generated by 1C.

Each cell format needs to be associated with a style, and most hard cell formats are associated with the "Normal" style (Excel terminology) which is equivalent to the "Default" style (Calc terminology).  But in these 1C-style xls documents, the formats are not associated with any style.  And we intentionally skip importing any cell formats that are not associated with styles (see XclImpXF::ApplyPatternToAttrList) for what happens there).

As to why the cell formats are not associated with any styles in these 1C xls docs, it's because these docs don't include the STYLES records (opcode = 0x0293) which specifies presence of cell styles.