Bug 67071 - VLOOKUP between two files is slow if both files are open
Summary: VLOOKUP between two files is slow if both files are open
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
3.4.0 release
Hardware: All All
: high critical
Assignee: Markus Mohrhard
URL:
Whiteboard: target:5.2.0
Keywords: perf, preBibisect, regression
Depends on:
Blocks:
 
Reported: 2013-07-19 09:15 UTC by Menno Scholten
Modified: 2021-03-03 19:36 UTC (History)
10 users (show)

See Also:
Crash report or crash signature:


Attachments
This is the file to enrich with vlookup (554.29 KB, application/vnd.oasis.opendocument.spreadsheet)
2013-07-19 09:15 UTC, Menno Scholten
Details
This is the file where the information is found to enrich the other document (1.76 MB, application/vnd.oasis.opendocument.spreadsheet)
2013-07-19 09:16 UTC, Menno Scholten
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Menno Scholten 2013-07-19 09:15:05 UTC
Created attachment 82665 [details]
This is the file to enrich with vlookup

Till a few months we worked with OpenOffice 3.01 in Ubuntu 9.04 and Windows.
Now migrated to linux Mint 13 (65 workplaces) with Libreoffice 3.5.7.2.
 I also tested this bug with te latest Libreoffice and OpenOffice on Mint and Windows.

Attached you will find two example ods files. We use vlookup often to enrich one file with data from the other file. These are very big files, but the same happens when using vlookup with less lines.

What we want to do in this example is looking for the articlenumber in column F in afnamebestand.ods and get the content of column H out of matchbestand-menno.ods into afnamebestand.ods.

In OpenOffice 3.01 is double clicking on the fill handle and filling 22.000 lines with the content of column H in matchbestand-menno.ods taking 8 seconds.

In Libreoffice 3.5.7.2 and higher is only looking for one cell taking taking several minutes. The CPU load at that moment is 100 %. When a double click the fill handle the PC is freezing an Libreoffice is crashing after several minutes.
I can attach only one file. I will try to add the second file after submitting the bug.

We are working with Linux workplaces since 2006. It's always a struggle to explain to the users and managers that opensource software is a fine alternative. Now i get a lot of complaints about loss of productivity because of this bug. So it's important to me, to get this problem fixed soon. Thanks in advance.
Comment 1 Menno Scholten 2013-07-19 09:16:36 UTC
Created attachment 82666 [details]
This is the file where the information is found to enrich the other document
Comment 2 bfoman (inactive) 2013-07-19 21:23:58 UTC
(In reply to comment #0)
>  I also tested this bug with te latest Libreoffice and OpenOffice on Mint
> and Windows.

Which versions you tested exactly? 3.5.x branch is EOL, 3.6.x will be EOL soon (https://wiki.documentfoundation.org/ReleasePlan), so those versions won't receive any bugfixes.
Comment 3 m_a_riosv 2013-07-19 21:37:49 UTC
Hi Menno, thanks for reporting,

I can reproduce with:
Win7x64Ultimate
LibreOffice Version 4.0.4.2 (Build ID: 9e9821abd0ffdbc09cd8c52eaa574fa09eb08f2)

But disable the option in both files:
Menu/Tools/Options/LibreOffice calc/Calculate - Automatically find rows and column labels
make possible to do it in couple of minutes.

Works fine with:
LibreOffice 3.3.4 OOO330m19 (Build:401) tag libreoffice-3.3.4.1
Comment 4 m_a_riosv 2013-07-19 21:56:07 UTC
Hi bfoman,
the issue is there with 
Version 3.6.7.2 (Build ID: e183d5b).

Maybe I am wrong, but as I can remember, there was a change in the way to link on how it was in older versions.
Comment 5 Menno Scholten 2013-07-20 10:06:57 UTC
I tested with the following versions:
LibreOffice Linux - deb (x86_64), version 4.0.4, English (US)
OpenOffice 3.4.1 for linux debian 64 bit english
Comment 6 Petr Mladek 2013-07-22 10:45:16 UTC
I add some Calc experts into CC.

Also I am going to lower the severity. I understand that it is very critical (blocker) for you. On the other hand, it seems to be quite old bug and nobody reported it yet. So, it affects only very limited group of users from the global point of view. We will do our best to fix it but there are many other important bugs that affect even more users and might have higher priority. In each case, it can't suddenly start blocking the release.

If you want to get this fixed quickly, you might consider to pay someone for this. There is a list of certified developers, https://www.documentfoundation.org/certification/developers/
Comment 7 Menno Scholten 2013-07-22 12:35:11 UTC
Hello Mario,

i tested the following on my Linux Mint 13 workspac 3.5.7.2.

"But disable the option in both files:
Menu/Tools/Options/LibreOffice calc/Calculate - Automatically find rows and column labels make possible to do it in couple of minutes."

For a 17 lines did it work indeed. 24 seconds. For alle the lines it didn't.
Calc didn't crash but froze. I had to kill it.
Comment 8 Menno Scholten 2013-07-22 14:18:10 UTC
Hello Petr,

thanks for your response.

It suprised me also that nobody did report this problem earlier. Maybe because in the most company's MS Office is used? 
If we want to promote libreOffice as a good alternative for business use, these kind of things wil have to work well.

We did choose the lts version of Linux Mint 13 because we don't want to install a new version on 65 workspaces every year. In a company you want to stay in the same release as long as possible,as lang as there are bugfixes and safety updates. Only if there are major improvements i would consider to upgrade specific software parts. 
So i need a fix for the "old version" in Linux Mint 13 too. Therefore I contacted Lanedo.
Comment 9 m_a_riosv 2013-07-22 22:18:46 UTC
Hi Menno, maybe it is not usual in a so large quantity, I think is more proper of a database. If data came form a database a view could be a good solution.
But anyway the issue is there and is a severe problem IMO.

Hi Petr, but maybe a lot of people use it in a more short quantity, I have seen more than once questions in Ask and forums about VLOOKUP() on external files. People having e.g. half a minute delay on recalculate, can not see there is a bug, but half after half can be a lot of their minutes.

Not a blocker, perhaps, but IMO is critical and a MAB, affects to all versions in production.
Comment 10 Menno Scholten 2013-07-23 06:45:59 UTC
Hello Mario,

i agree that the example files i posted are very large, most of the time the file to enrich is smaller (100 lines), but the file where the information is coming from is often big.
You are right that a database solution would be better, but vlookup is used by normal users, who have a basic knowledge of Excel/Calc. The information is often exported from our ERP DB en then manipulated in Calc, because we don't want to program 100 different lists for 100 different purposes every week. Sometimes we receive a spreadsheet from a customer to enrich.

Hi Petr,

i just tested with both sheets in one spreadsheet instead of two spreadsheets. Then it workes fine. It look like the connections between two spreadsheets is the problem.
Comment 11 Cor Nouws 2013-07-31 09:38:51 UTC
I took freedom to make some different test files.
  Matchlijst-menno_sml     with only 1082 rows 
  afnamebestand_sml.ods    with only 366 rows 
(sort of random choice).

Prepared the formula in H2 and then tested with copying that formula from H2 to all other cells in the range H3:H366


Looks as if we hit one of the artifacts of the famous 3.4 release ;)

In the oldest version of 3.4.0 that I have at hand 
  OOO340m1 (Build:12
it takes 4-5 seconds there (as in each other later release)

In the latest of the 3.3-series
  3.3.4.1
it takes appr. 1.5 seconds.


(Note that when opening the file with the formula (prepared in 4.1.0.4) in a 3.3 version, initially H2 shows error 511. Opening formula editor, clicking one of the argument fields and then OK, makes the formula just work fine.)
Comment 12 Pierre-Eric Pelloux-Prayer 2013-10-15 10:46:00 UTC
Modifications were made to LibreOffice to improve the situation.

See:
http://cgit.freedesktop.org/libreoffice/core/commit/?id=e5d9477e87837fb771cf6dcb3bde872873bc50a8
http://cgit.freedesktop.org/libreoffice/core/commit/?id=f243d07019ca205ef536b0d7b0e1fe4b84469f75
http://cgit.freedesktop.org/libreoffice/core/commit/?id=7334f8db6f6004d48e2dbf014f27878a7ae21eb1
http://cgit.freedesktop.org/libreoffice/core/commit/?id=3fed166279377f7ad702b8911899243b8adff3bf

(I forgot to reference the bug id in the commit message, sorry).

With these patches, vlookup is working better - but note that different usage will exercise different code path:
  1) if only the file containing the vlookup (afnamebestand.ods) is open: vlookup operations are fast
  2) if the file containing the datas (matchbestand-menno.ods) is open at the same time: vlookup are slow (but 4x faster than before the patches)

(the main culprit I believe is the caching logic used in externalrefmgr.cxx).
Comment 13 Alex Thurgood 2015-01-03 17:40:22 UTC
Adding self to CC if not already on
Comment 14 Matthew Francis 2015-01-26 04:18:45 UTC
Retested with 4.4.0.2

Filling column H of afnamebestand.ods with
=VLOOKUP(F1,'file:///......./Matchlijst-menno.ods'#$Blad1.$A$2:$B$35689, 2)

takes a few seconds if the other file isn't open at the same time, but an impossibly long time if it is

Adjusted metadata to reflect the remaining issue

-> Set Component: Spreadsheet
-> Added Whiteboard: preBibisect
-> Updated Summary
Comment 15 Emersson Augusto Suarez Ortiz 2015-10-21 01:34:30 UTC
Hello to everybody, I am agree that this is a huge bug for CALC, I'm using LibreOffice 5.0.1.2 and earlier form the 5 series, and Vlookup in a big file, for example 200 hundred thousand rows to enrich a one thousand rows file, in Ubuntu 14.04, it block the PC, get 100% CPU usage, 100% of my 4GB RAM Memory and 100% of my 4GB Swap Memory.

I also agree that this is a disadvantage of Calc vs MS Excel, cause now Excel can easily handling a 1 Million rows file and enrich the same 1 thousand rows file, and maybe the reason is that legal or not, the most used program is Excel, and I turned very upset for this bug. Just now I have to change to windows for handling my problem·
Comment 16 Markus Mohrhard 2015-11-14 06:33:02 UTC
The fix for this will depend on https://gitlab.com/mdds/mdds/issues/5
Comment 17 Robinson Tryon (qubit) 2015-12-14 05:40:00 UTC Comment hidden (obsolete)
Comment 18 Markus Mohrhard 2016-02-16 07:28:40 UTC
I have two patches that bring the performance back to somewhat acceptable levels. I'm down to about a second per cell in my own build.
Comment 19 Commit Notification 2016-02-16 09:11:19 UTC
Markus Mohrhard committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=f18cb2c196cc5b643b9832dfca1d940b82c832b0

single element access is really slow in mdds, tdf#67071

It will be available in 5.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 20 Markus Mohrhard 2016-02-17 19:05:47 UTC
Actually with a third patch I'm able to go to asymptotically O(1). So it is after the first cell which takes a second nearly instant.
Comment 21 Commit Notification 2016-02-18 07:51:44 UTC
Markus Mohrhard committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=c8ad72703b74b7338c5f8dd1fe0275822b1e45f0

don't fill the matrix cell by cell, tdf#67071

It will be available in 5.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 22 Markus Mohrhard 2016-02-20 07:17:47 UTC
(In reply to Markus Mohrhard from comment #20)
> Actually with a third patch I'm able to go to asymptotically O(1). So it is
> after the first cell which takes a second nearly instant.

That patch breaks the cache. This will take quite some time and maybe some redesign in the external reference cache.
Comment 23 Cor Nouws 2016-03-07 15:21:38 UTC
interesting prolly to read last paragraph of this commits message
https://cgit.freedesktop.org/libreoffice/core/commit/?id=9e60bbdb3aa5f80ca80c9c4fdf7accd12c4a5d1c