Bug 63181 - FILESAVE: Calc Saves Bloated & Corrupt .xslx File
Summary: FILESAVE: Calc Saves Bloated & Corrupt .xslx File
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.0.2.2 release
Hardware: Other Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: BSA
Keywords: regression
Depends on:
Blocks:
 
Reported: 2013-04-05 16:23 UTC by C A J
Modified: 2013-05-30 08:36 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Zipped copy of bloated/corrupted .xlsx file (99.29 KB, application/zip)
2013-04-06 13:18 UTC, C A J
Details
Partially-corrected .xslx file (269.41 KB, application/zip)
2013-05-07 16:22 UTC, C A J
Details
Corrected .xlsx file (10.29 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2013-05-07 16:27 UTC, C A J
Details
Another corrupted .xlsx file (67.10 KB, application/zip)
2013-05-07 16:32 UTC, C A J
Details
Good copy of second example .xslx file (6.88 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2013-05-07 16:37 UTC, C A J
Details

Note You need to log in before you can comment on or make changes to this bug.
Description C A J 2013-04-05 16:23:33 UTC
Problem description: Recently, Calc began corrupting small .xlsx files. I currently have LO 4.0.2-1 installed on Arch Linux on a new i7 computer with 16 MB of memory and lots of free disk space. I ran pacman -Syu yesterday morning.

Steps to reproduce: I wish I could describe the steps.  It just happens.

Current behavior: The only symptom seems to be that Calc takes 10 - 15 seconds to save a file that it previously saved in <1 second. The most recent instance is a newly-created spreadsheet with two tabs.  Each tab has two columns (one date and one numeric) and less than 60 rows of data with no formulas. When correctly saved, the file is less than 20 KB. After The Problem occurs, the saved file is 3.2 MB. When I reopen this large file, Calc ceases to respond and I have to kill the process; any other spreadsheets which are currently open are also rendered non-functional.

This problem cropped up "fairly recently" (though I can't tell whether it reaches back to prior to the LO 4.0 installation) and has corrupted perhaps three or four files, some of which are brand new and some that I have used every day for many months.

I have been able to recover all but one of the files with Excel 2010 on my wife's Windows 7 computer. When opening the 3.2 MB file that hangs Calc, Excel opens a dialog that reads:

    Excel found unreadable content in 'Weight.xlsx'.
    Do you want to recover the content of this workbook?
    If you trust the source of this workbook, click 'Yes'.

Excel works several seconds after clicking "Yes"  and opens another dialog that reads:

    Excel was able to open the file by repairing or removing the unreadable content. 
	Removed records: Cell information from /xl/worksheets/sheet2.xml part

The log file that Excel offers contains:

      <?xml version="1.0" encoding="UTF-8" standalone="true"?>
    - <recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
         <logFileName>error050480_01.xml</logFileName>
         <summary>Errors were detected in file 'E:\Weight.xlsx'</summary>
       - <removedRecords summary="Following is a list of removed records:">
            <removedRecord>Removed Records: Cell information from /xl/worksheets/sheet2.xml part</removedRecord>
         </removedRecords>
      </recoveryLog>

The spreadsheet is now workable in Excel, but Excel still saves it very large (~2.7 MB). After I delete all the columns to the right of the last populated cell and all the rows below it, Excel will save the file at a proper size: 11 KB. Calc will now also open it successfully.

In one instance, I had to copy the data from the corrupt file and paste it into a new spreadsheet to resolve the problem.

Expected behavior: Well, Calc should save files without mangling them.  Sorry, but this seems obvious.

I will later upload an offending 3.2 MB file, but my bug report has been rejected twice whentrying to include it originally.

Happy to provide any more information I can to help resolve this problem. Thanks.

Operating System: Linux (Other)
Version: 4.0.2.2 release
Last worked in: 3.6.5.2 release
Comment 1 C A J 2013-04-05 16:31:43 UTC
Unable to upload example file. Bugzilla message:

    The file you are trying to attach is 3149 kilobytes (KB) in size.
    Attachments cannot be more than 3000 KB.

I have no means to "store your attachment elsewhere", but am happy to email it to someone for examination. Contact me at caj-bugz@cjconsult.com.

I may try to gzip the file and repost it later.

Thanks.
Comment 2 C A J 2013-04-06 13:18:22 UTC
Created attachment 77522 [details]
Zipped copy of bloated/corrupted .xlsx file

I compressed the bloated/corrupted .xlsx file with pkzip and am uploading it. Since the .zip file is 101 KB and the original .xlsx file is 3.2 MB, it's clear that there is a LOT of air space in the latter.
Comment 3 Michael Meeks 2013-05-07 10:28:00 UTC
Seems pretty unpleasant; 199Mb of sheet2.xls - amazing the .zip container has done a pretty bad job of compressing all that duplicate cruft (such that zipping it again makes some difference).

Skimming the top of the file which is rather hard I had to use:

$ sed "s/>/>\n/g" xl/worksheets/sheet2.xml | less

<row collapsed="false" customFormat="false" customHeight="true" hidden="false" ht="12.1" outlineLevel="0" r="2559">
<c r="A55" s="6" t="n">
<v>
40231</v>
</c>
<c r="B55" s="7" t="n">
<v>
188</v>
</c>
</row>

Many thousands of identical records like this - with the only difference being an incrementing 'r="XXXXX"' attribute.

Interesting; any ideas Muthu ? :-)
Comment 4 C A J 2013-05-07 15:18:13 UTC
Thanks for taking a look at the bloat file, Michael. I have to say that I have suffered no similar problems since the early-April flurry. I have used the several files that went crazy (after correcting them in Excel) and worked with a few newly-created ones. Current LO version is 4.0.2.2 and I last ran pacman -Syu for everything on 5/2/2013.

I suppose you can close this out as not reproducible unless you and the crowd find it intriguing.

Thanks for your time.
Comment 5 Michael Meeks 2013-05-07 15:35:16 UTC
> Thanks for taking a look at the bloat file, Michael. I have to say that I 
> have suffered no similar problems since the early-April flurry.

You filed vs. 4.0.2 though - which is reasonably recent; if this had got fixed I guess someone would know. Potentially it's just a bug in LibreOffice whereby we pick the wrong default for row-hights (or something) and so have to export a vast number of records for the rest of the whole column (or something).

> I have used the several files that went crazy (after correcting them
> in Excel) and worked with a few newly-created ones. 

Glad to know it's not a huge issue, -but- would really like to get the original files and a way of reproducing this ! :-)

Thanks !
Comment 6 C A J 2013-05-07 16:22:16 UTC
Created attachment 78986 [details]
Partially-corrected .xslx file

This .zip file contains a 2.8 MB version of Weight.xlsx after initial correction in Excel 2010. It's still huge, but at least it opens in Calc. (First time I can recall lauding a Microsoft product over a competitor since somewhere around Lotus 123 and SuperCalc.)
Comment 7 C A J 2013-05-07 16:27:26 UTC
Created attachment 78987 [details]
Corrected .xlsx file

Attached file is a "fully" corrected copy of Weight.xlsx. From the earlier 2.8 MB intermediate copy I deleted "all" the columns to the right of the real values and "all" the rows below them to get back to a reasonable working file size. I don't recall now whether I did that in Calc or Excel, though I guess it was the latter.

(BTW, we have both lost weight since these entries. <g>)
Comment 8 C A J 2013-05-07 16:32:22 UTC
Created attachment 78988 [details]
Another corrupted .xlsx file

Here's another corrupted file (ToolRestMatl.xlsx). Interesting that its original size is very similar to the corrupted copy of Weight.xlsx, but the .zip result is about 75% smaller.
Comment 9 C A J 2013-05-07 16:37:59 UTC
Created attachment 78989 [details]
Good copy of second example .xslx file

Here's a copy of ToolRestMatl.xlsx after correction. I didn't save anything contemporaneous to its corruption incident. This copy is from about two weeks later and probably has some (good/real) content modification; similar in structure, though.

Hope these help.

Off-topic, but while I have your ear: any action on Bug #59823? Thanks.
Comment 10 Markus Mohrhard 2013-05-29 22:36:35 UTC
Do we have a way to reproduce this behavior?

If you provide me with a detailed instruction that can 100% of the time reproduce it I might have a look at it.
Comment 11 C A J 2013-05-30 02:56:51 UTC
Well, there's good news and there's bad news.

The bad news is that my several experiences with the problem occurred during the first half of April and have not recurred.  The GOOD news is that, although my several experiences with the problem occurred during the first half of April, they have not recurred since and I have no means of reproducing the behavior.

As I suggested on 5/7/2013, it seems reasonable to close this ticket.  Thanks for pursuing it.
Comment 12 Michael Meeks 2013-05-30 08:36:48 UTC
C A J - thanks for your vigilance and your report :-) please do re-open if you reproduce this !