Bug 84246 - Calc crashes on loading / saving large (~500MB) CSV files
Summary: Calc crashes on loading / saving large (~500MB) CSV files
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Spreadsheet (show other bugs)
Version: 4.3.1.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-23 14:15 UTC by Marcelo
Modified: 2014-10-11 05:13 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
simple bash script to generate a large CSV (212 bytes, application/x-shellscript)
2014-10-05 10:20 UTC, Owen Genat
Details

Description Marcelo 2014-09-23 14:15:28 UTC
When I open a big csv file about 500MB, and try to save as excel 2007 (xlsx), calc  grow fast on memory RAM, CPU almost 100%, status bar of saving task doesn't change, all the system start to slowdown until it freezes completely. It seems  its trying to convert and compress the whole file on memory, and then it writes the file. The process should be done on steps of smaller chunks of the data.
Comment 1 tommy27 2014-09-26 17:46:54 UTC
we need a test file to reproduce
Comment 2 Owen Genat 2014-10-05 10:20:36 UTC
Created attachment 107355 [details]
simple bash script to generate a large CSV

This script writes out records like:

> 1,2014-10-05 19:15:28.174189280+11:00,7jStH8bW5iMk...

i.e., a sequence number, date-time stamp, and 4096 bytes of random base64 data.

The script takes a single number (iterations/records) as a parameter: 10000 generates a file of ~52MB, while 100000 generates a file of ~525MB. It is no doubt horribly inefficient, but effective.

It is clear from testing with files created by this script that there are limits to the ability to handle large CSV files. v4.3.2.2 crashes trying to load a CSV with 100000 records. The same version eventually loads a CSV with 90000 records (~470MB) but then crashes trying to save the loaded data as XLSX. Around 2.5GB of RAM is used by LO during this process.
Comment 3 Owen Genat 2014-10-05 10:24:36 UTC
I am not sure whether there is anything the developers can do about handling CSV files of this magnitude, but for now confirmed that there is an issue. Status set to NEW. Component set to Spreadsheet. Summary amended for clarity.
Comment 4 Jean-Baptiste Faure 2014-10-05 10:52:27 UTC
@Marcello: 

How much RAM do you have on your machine?

Best regards. JBF
Comment 5 Marcelo 2014-10-06 11:56:27 UTC
I have 4GB RAM, processor Intel Core i5-3230M 2.60GHz, running elementary OS 0.2.1 64-bit
Comment 6 Owen Genat 2014-10-11 05:13:21 UTC
(In reply to Owen Genat from comment #2)
> Around 2.5GB of RAM is used by LO during this process.

This was a generalisation.

(In reply to Marcelo from comment #5)
> I have 4GB RAM, 

I think the machine is likely running out of RAM. Further test results using the provided script under GNU/Linux with v4.2.6.3:

On a system with 3708MB RAM, no swap.

CSV records/MB  XLSX MB  Peak RAM VIRT/RES[1]
--------------  -------  --------------------
30000/~157      ~119     2046/1.1
40000/~210      ~159     2378/1.6
50000/~262      ~200     2871/2.0
60000/~315      ~238     3268/2.3
70000/~367      ~278     3563/2.6
75000/~394      ~298     3750/2.8
80000/~420       N/A     3943/2.9[2]

On a system with 7941MB RAM, no swap.

CSV records/MB  XLSX MB  Peak RAM VIRT/RES[1]
--------------  -------  --------------------
100000/~525     ~397     4816/3.7

[1] Values of virtual (MB) and resident (GB) usage displayed by the top command.
[2] At this point Calc crashes.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.