Bug 82605 - CSV files take a long time to load when using text delimiter different from last-used delimiter
Summary: CSV files take a long time to load when using text delimiter different from l...
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.1.6.2 release
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: possibleRegression
Depends on:
Blocks: CSV-Import
  Show dependency treegraph
 
Reported: 2014-08-14 08:52 UTC by dotancohen
Modified: 2019-06-29 07:06 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Problematic file (2.55 MB, text/csv)
2014-08-14 20:18 UTC, Dotan Cohen
Details
Another problematic file (2.31 MB, text/csv)
2014-12-01 03:34 UTC, Ian Beardslee
Details

Note You need to log in before you can comment on or make changes to this bug.
Description dotancohen 2014-08-14 08:52:25 UTC
The Text Delimiter setting of a CSV file is remembered when opening different CSV files.

When opening a large CSV file using a different Text Delimiter than that which is currently stored, LibreOffice tries to load the entire CSV file into a single cell for display in the Text Import dialogue. This operation can take a very long time on large CSV files.

Steps to reproduce:
1) Open a CSV file that uses the following Text Delimiter: '
2) Close LibreOffice
3) Open a large CSV file that uses the following Text Delimiter: "

What happens:
LibreOffice takes a very long time to open, and when the Text Import dialogue is finally displayed it has the entire CSV file in cell A1.
Comment 1 ign_christian 2014-08-14 14:57:22 UTC
Hi.. Please attach sample files to test & specify in which OS - LO version problem occured.

Then change status back to UNCONFIRMED
Comment 2 Dotan Cohen 2014-08-14 20:18:04 UTC
Created attachment 104636 [details]
Problematic file

This file is contrived to demonstrate the issue as I cannot upload the real company file with a similar issue.

Open a previous CSV file with the Field Separator as a comma and the Text Delimiter as a single quote. Then try to open this file. LibreOffice get stuck and must be killed. Tested on LibreOffice 4.2 on Kubuntu Linux.
Comment 3 Owen Genat (retired) 2014-08-24 11:26:31 UTC
(In reply to comment #2)
> Open a previous CSV file with the Field Separator as a comma and the Text
> Delimiter as a single quote. Then try to open this file. LibreOffice get
> stuck and must be killed. Tested on LibreOffice 4.2 on Kubuntu Linux.

Thanks for providing the test file. I simply opened the provided attachment under GNU/Linux with Calc and on the Text Import dialog changed the Text delimiter from " to ' (default delimiter selections were Tab, Comma, and Semicolon). It then took several minutes to return with soffice.bin running at ~100% CPU and when it eventually did the preview showed only a single row of data. Same problem under:

- v4.1.6.2 Build ID: 40ff705089295be5be0aae9b15123f687c05b0a
- v4.2.6.2 Build ID: 185f2ce4dcc34af9bd97dec29e6d42c39557298f
- v4.3.0.4 Build ID: 62ad5818884a2fc2e5780dd45466868d41009ec0
- v4.4.0.0.alpha0+ Build ID: e379401618268ed7f7f5885a36b90e1f4f6cd4af TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-08-18_05:51:03

Under v3.5.7.2 Build ID: 3215f89-f603614-ab984f2-7348103-1225a5b the change from " to ' is instant and the preview appears as expected. 

Confirmed. Status set to NEW. Version set to 4.1.6.2 (but may date from earlier). "PossibleRegression" tag added to Whiteboard.
Comment 4 Dotan Cohen 2014-08-24 13:17:25 UTC
Thanks for testing Owen!
Comment 5 Ian Beardslee 2014-12-01 03:29:31 UTC
Although I'm not getting unexpected delays with the file supplied by Dotan Cohen. 
I've included a tweaked example of my problematic file that is doing the same sort of thing.

I open my file and I can see that htop is showing that 1 CPU core hits 102%.  When it loads and I tick the Comma Separated Option, it separates nicely, untick that and it locks up again until it displays everything in a single row (actually it's two rows, the header row and then a 'data row').

My files are being imported as UTF8, and I have tried with both CR/LF and LF line endings.

Version: 4.3.4.1
Build ID: 430m0(Build:1)

Installed from the LibreOffice PPA on Ubuntu 14.04.1 (also getting a similar effect with the default Trusty version (4.2.7)
Comment 6 Ian Beardslee 2014-12-01 03:34:29 UTC
Created attachment 110278 [details]
Another problematic file

Attached file mentioned in my previous post.  Originally a 25k line file, but it was too big to attach.  This 10k line file has the same issues.
Comment 7 Robinson Tryon (qubit) 2015-12-09 18:27:13 UTC Comment hidden (obsolete)
Comment 8 QA Administrators 2017-01-03 19:40:08 UTC Comment hidden (obsolete)
Comment 9 Buovjaga 2019-06-29 07:06:56 UTC
For both attached files, the import dialog opens instantly and the file is imported instantly. For the first file, I did the "different from last delimiter" steps. Closing.

Arch Linux 64-bit
Version: 6.4.0.0.alpha0+
Build ID: c2cb467a1e5194c56bb65706b7965fb2c9241b8f
CPU threads: 8; OS: Linux 5.1; UI render: default; VCL: gtk3; 
Locale: fi-FI (fi_FI.UTF-8); UI-Language: en-US
Calc: threaded
Built on 29 June 2019