Bug 87618

Summary: Invalid sheet dimension written in XLSX format
Product: LibreOffice Reporter: jmorrison
Component: filters and storageAssignee: Not Assigned <libreoffice-bugs>
Status: NEW --- QA Contact:
Severity: normal    
Priority: medium    
Version: 4.3.5.2 release   
Hardware: Other   
OS: Windows (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Original Excel 2007 file
xlsx file saved by LibreOffice
Excel 2007 file input test
Libreoffice xlsx with problems

Description jmorrison 2014-12-22 22:58:51 UTC
1. The original xlsx file from Excel is ten times smaller than after saving in LibreOffice

-rwx------ 1 jmorrison None  150748 Dec 22 14:21 test-original.xlsx
-rwx------ 1 jmorrison None 1439545 Dec 22 14:36 test-saved.xlsx*



2. The original xlsx file is readable by the python script xlsx2csv
git clone https://github.com/dilshod/xlsx2csv
pip install xlsx2csv


xlsx2csv test-original.xlsx 

stuff from spreadsheet,,,,,,,,,,,,,,,,

xlsx2csv test-saved.xlsx

Traceback (most recent call last):
  File "/usr/bin/xlsx2csv", line 847, in <module>
    xlsx2csv.convert(outfile, sheetid)
  File "/usr/bin/xlsx2csv", line 178, in convert
    self._convert(sheetid, outfile)
  File "/usr/bin/xlsx2csv", line 247, in _convert
    sheet.to_csv(writer)
  File "/usr/bin/xlsx2csv", line 558, in to_csv
    self.parser.ParseFile(self.filehandle)
  File "/usr/bin/xlsx2csv", line 660, in handleStartElement
    startCol = start.group(1)
AttributeError: 'NoneType' object has no attribute 'group'


Researching the python error, it seems that Unicode is being returned where UTF-8 is expected

https://stackoverflow.com/questions/15232832/python-regex-attributeerror-nonetype-object-has-no-attribute-groups
Comment 1 Urmas 2014-12-23 06:50:47 UTC
Please attach both files.
Comment 2 jmorrison 2014-12-23 20:53:34 UTC
Created attachment 111238 [details]
Original Excel 2007 file

smaller test input file
Comment 3 jmorrison 2014-12-23 20:55:30 UTC
Created attachment 111239 [details]
xlsx file saved by LibreOffice

Had to redact the original file. Libreoffice version still has output problem with xlsx2csv.
Comment 4 jmorrison 2014-12-23 21:00:51 UTC
Created attachment 111240 [details]
Excel 2007 file input test

removed hidden sheets
Comment 5 jmorrison 2014-12-23 21:04:57 UTC
Created attachment 111241 [details]
Libreoffice xlsx with problems

This libreoffice xlsx file can not be parsed with xlsx2csv while original can be. 

I added more lines to the excel file and the saved libreoffice file is 5x larger.

In a large spreadsheet with hundreds of lines the file size difference is noticable. Excel xlxs file of 140k, LibreOffice was 1.4 MB.
Comment 6 Urmas 2014-12-24 05:41:13 UTC
The problem is caused by this element on the sheet 1.

<dimension ref="1:15"/>

As for the file size, there just has to be a duplicate bug somewhere.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.