Bug 39179

Summary: FILEOPEN: Writer takes very long to open a docx file
Product: LibreOffice Reporter: Astrix <marais.hannes>
Component: WriterAssignee: Not Assigned <libreoffice-bugs>
Status: NEW --- QA Contact:
Severity: major    
Priority: medium CC: barta, bugs, jeffdchang, michael.meeks, sasha.libreoffice, serval2412, vmiklos, yfjiang
Version: Inherited From OOo   
Hardware: All   
OS: All   
Whiteboard: Confirmed:4.2.0.4:OSX
i915 platform: i915 features:
Attachments: shorter 50 page docx document
119 page docx file
1.5 mb docx file which opens quickly
long 119 page docx file
Large 1.5mb docx file
some bts at random

Description Astrix 2011-07-12 14:26:54 UTC
Created attachment 49025 [details]
shorter 50 page docx document

When opening a 50 page docx file it takes approximately 2 minutes to open and a 119 page docx file takes approximately 6 minutes to open. Also, the cpu is functioning at 100% for the full duration of the file opening process.

Steps to reproduce: Just open the attached 50 page file.

After the time lapse the file does open without any problems.

I first encountered the bug in LO 3.3.2 Build:202) on Pinguy OS. I have tested this bug in LO 3.3.3; 3.4.0 and 3.4.1 in Windows XP as well as in Ubuntu. Furthermore, I have also tested this bug on 2 other computers using LO 3.3.3 and 3.4.1.

Lastly, this bug does not seem to be dependent on the size of the document since a docx document with lots of pictures in it and with a size of 1.5 mb opened up very quickly. It therefore seems that this problem relates to the length of the document.
Comment 1 Astrix 2011-07-12 14:31:22 UTC
Created attachment 49026 [details]
119 page docx file

This file should take around 6 minutes to open
Comment 2 Astrix 2011-07-12 14:35:08 UTC
Created attachment 49027 [details]
1.5 mb docx file which opens quickly
Comment 3 Jeffrey 2011-07-12 19:17:23 UTC
Reproduced on LibreOffice 3.4 340m1(Build:103) for OpenSuse Linux.

Took me 2 minutes to open the first attached document. Other two documents are saved as text/plain files. I cannot seem to open them.
Comment 4 Astrix 2011-07-13 02:01:30 UTC
Created attachment 49032 [details]
long 119 page docx file
Comment 5 Astrix 2011-07-13 02:05:14 UTC
Created attachment 49033 [details]
Large 1.5mb docx file
Comment 6 Astrix 2011-07-13 02:08:02 UTC
(In reply to comment #3)
> Reproduced on LibreOffice 3.4 340m1(Build:103) for OpenSuse Linux.
> 
> Took me 2 minutes to open the first attached document. Other two documents are
> saved as text/plain files. I cannot seem to open them.

Thank you Jeffrey. I re-attached the long and large files. Sorry, first ever bug I reported.
Comment 7 Astrix 2011-09-21 01:40:17 UTC
Writer only takes so long to open a docx file, if the file has footnotes in it.
Comment 8 Roman Eisele 2011-09-21 05:33:11 UTC
Problem tested and confirmed with LibreOffice 3.4.3 (OOO340m1 (Build:302)) running on MacOS X 10.6.8, German, on a MacBook Pro with 2.53 GHz Intel Core i5, 4 GB RAM and Solid State Hard Drive. I mention the hardware data because the times LibreOffice needs to open your documents are a bit shorter for me, but they are still far too long. In detail:

1) The "large 1.5 mb docx file which opens quickly" (AJ Marais Research Proposal.docx) opens in less than 5 seconds for me -- fine! (But note that when closing the document again, just so, without any changes etc., LibreOffice crashes -- this is Bug 39510, I think.)

2) The "shorter 50 page docx document" (Chapter 3 draft 1.docx) needs about 60 seconds to open, while LibreOffice uses > 95% of the CPU all the time (measured with the top command in the shell). Far too long.

3) The "long 119 page docx file" (Volledige dokument.docx) needs about 3 minutes and 5 seconds to open, while LibreOffice uses > 97% of the CPU all the time (again measured with the top command in the shell). Far too long again.
Comment 9 Astrix 2011-09-21 13:46:55 UTC
(In reply to comment #8)
> Problem tested and confirmed with LibreOffice 3.4.3 (OOO340m1 (Build:302))
> running on MacOS X 10.6.8, German, on a MacBook Pro with 2.53 GHz Intel Core
> i5, 4 GB RAM and Solid State Hard Drive. I mention the hardware data because
> the times LibreOffice needs to open your documents are a bit shorter for me,
> but they are still far too long. In detail:
> 
> 1) The "large 1.5 mb docx file which opens quickly" (AJ Marais Research
> Proposal.docx) opens in less than 5 seconds for me -- fine! (But note that when
> closing the document again, just so, without any changes etc., LibreOffice
> crashes -- this is Bug 39510, I think.)
> 
> 2) The "shorter 50 page docx document" (Chapter 3 draft 1.docx) needs about 60
> seconds to open, while LibreOffice uses > 95% of the CPU all the time (measured
> with the top command in the shell). Far too long.
> 
> 3) The "long 119 page docx file" (Volledige dokument.docx) needs about 3
> minutes and 5 seconds to open, while LibreOffice uses > 97% of the CPU all the
> time (again measured with the top command in the shell). Far too long again.

Thank you, I have commented at bug 39510 that these two bugs could be related to that one another.
Comment 10 Björn Michaelsen 2011-12-23 12:29:24 UTC
[This is an automated message.]
This bug was filed before the changes to Bugzilla on 2011-10-16. Thus it
started right out as NEW without ever being explicitly confirmed. The bug is
changed to state NEEDINFO for this reason. To move this bug from NEEDINFO back
to NEW please check if the bug still persists with the 3.5.0 beta1 or beta2 prereleases.
Details on how to test the 3.5.0 beta1 can be found at:
http://wiki.documentfoundation.org/QA/BugHunting_Session_3.5.0.-1

more detail on this bulk operation: http://nabble.documentfoundation.org/RFC-Operation-Spamzilla-tp3607474p3607474.html
Comment 11 sasha.libreoffice 2012-01-06 04:24:02 UTC
reproduced on LibO 3.5.0 beta 1 on Fedora 64 bit and on LibO 3.4.3 on Windows XP 32 bit
Comment 12 Roman Eisele 2012-01-16 00:05:59 UTC
Acknowledgement: Problem is still present in LibreOffice 3.4.4 and 3.5beta2 running on MacOS X 10.6.8 German.
Comment 13 Julien Nabet 2012-10-30 20:37:53 UTC
On pc Debian x86-64 with master sources updated today, I reproduced the problem.

There are a lot of logs like these:
warn:writerfilter:22852:1:/home/julien/compile-libreoffice/libo/writerfilter/source/dmapper/DomainMapper.cxx:1451: DomainMapper::attribute() - Id: 92505 / 0x16959 value: 299 / 0x12b
warn:writerfilter:22852:1:/home/julien/compile-libreoffice/libo/writerfilter/source/dmapper/DomainMapper.cxx:1451: DomainMapper::attribute() - Id: 92505 / 0x16959 value: 300 / 0x12c
warn:writerfilter:22852:1:/home/julien/compile-libreoffice/libo/writerfilter/source/dmapper/DomainMapper.cxx:1451: DomainMapper::attribute() - Id: 92505 / 0x16959 value: 300 / 0x12c
etc.

Miklos: one for you?
Comment 14 Miklos Vajna 2012-10-31 06:34:34 UTC
Julien,

It makes little sense to reproduce a performance problem with a debug-enabled build, which has optimization disabled. :-)

Anyway, those warnings are just about some ignored properties, once those are fixed, the speed will be a bit more slow. But sure, I'll try to take a look as time permits.

Miklos
Comment 15 Julien Nabet 2012-10-31 06:53:44 UTC
(In reply to comment #14)
> It makes little sense to reproduce a performance problem with a
> debug-enabled build, which has optimization disabled. :-)

If the goal is to compare time, I agree with you of course :-)
Now it's quite a small doc and after a minute, it wasn't opened (I'm on i5 with 6Gb and had only Iceweasel and Icedove opened in addition with console and LO)
My goal was to retrieve maximum info.
I'll try to retrieve bts at random tonight after my day time job
Comment 16 Michael Meeks 2012-10-31 17:59:50 UTC
Nasty; is it a regression ? [ if so it'd be nice to have the keyword ;-] can we get a callgrind trace with debug symbols installed:

export OOO_EXIT_POST_STARTUP=1
export OOO_DISABLE_RECOVERY=1
valgrind --tool=callgrind --simulate-cache=yes --dump-instr=yes ./soffice.bin --splash-pipe=0 /path/to/file

and leave it overnight ;-)

gzip / up-load the callgrind.12345.txt file and (hopefully) the answer will just drop out of that.

Thanks !
Comment 17 Julien Nabet 2012-11-01 00:05:26 UTC
Michael: tried to follow your comment but after the 2 exports you quoted I runned this:
valgrind --tool=callgrind --simulate-cache=yes --dump-instr=yes ./soffice.bin --splash-pipe=0 ~/compile-libreoffice/bugs/39179/callgrind.txt

then got this:
10%
20%
25%
30%
35%
40%
50%
55%
60%
75%
80%
100%
end
+ a popup saying calgrind.txt doesn't exist
then:
==28753== Events    : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
==28753== Collected : 4357516741 1343547732 904497078 53990330 10823051 1763992 305982 1830857 823690
==28753== 
==28753== I   refs:      4,357,516,741
==28753== I1  misses:       53,990,330
==28753== LLi misses:          305,982
==28753== I1  miss rate:          1.23%
==28753== LLi miss rate:           0.0%

and other numbers.

I made the test again this time, after having creating an empty "callgrind.txt" file but I got the same.

What did I miss?

(Valgrind Version: 1:3.7.0-6 on pc Debian testing x86-64 updated today)
Comment 18 Julien Nabet 2012-11-01 05:12:56 UTC
Created attachment 69376 [details]
some bts at random

I attached some bts taken at random
Comment 19 Michael Meeks 2012-11-01 10:03:27 UTC
Hi Julien,

valgrind --tool=callgrind --simulate-cache=yes --dump-instr=yes ./soffice.bin --splash-pipe=0 ~/compile-libreoffice/bugs/39179/callgrind.txt

So - the command-line you're running after ./soffice.bin is just the same as if you're launching the app itself; so you need to pass the path to the test file you want to load to provoke the slowness :-)

so /path/to/docx/file is what I should have written.

Any chance of another try ? :-)
Comment 20 Julien Nabet 2012-11-01 10:24:54 UTC
(In reply to comment #19)
> Hi Julien,
> 
> valgrind --tool=callgrind --simulate-cache=yes --dump-instr=yes
> ./soffice.bin --splash-pipe=0 ~/compile-libreoffice/bugs/39179/callgrind.txt
> 
> So - the command-line you're running after ./soffice.bin is just the same as
> if you're launching the app itself; so you need to pass the path to the test
> file you want to load to provoke the slowness :-)
> 
> so /path/to/docx/file is what I should have written.
> 
> Any chance of another try ? :-)
Sorry for this, I thought it was like memcheck process and it was the output file.
Will give a try as soon as building of the day be finished
Comment 21 Julien Nabet 2012-11-01 16:45:09 UTC
I put the file available here:
http://serval2412.free.fr/callgrind.out.fdo39179.bz2

(I tried with http://wiki.documentfoundation.org/QA/Bugzilla-Attachments but it doesn't seem to accept more than 5MB, so gave up)
Comment 22 Michael Meeks 2012-11-05 11:37:12 UTC
Thanks - firing it up in kcachegrind doesn't show any staggeringly obvious single bottleneck. I suspect that the 2+ million calls to:

writerfilter::ooxml::OOXMLFactory::createFastChildContextFromFactory

and friends are near the core of the issue - we're spending far too much time per attribute by a -biiig- margin as we parse (I think) - and (I guess) there are a ton of attributes here.

OTOH - that will require a fair bit of work digging out ways to improve there I suspect, and there is the XSLT monster to kill there too :-)
Comment 23 tommy27 2013-08-31 19:27:45 UTC
still slow opening of 50 page test file.
almost 2 minute with LibO 4.1.1.2 under Win7 64bit
Comment 24 Julien Nabet 2013-09-01 20:47:57 UTC
I updated my local master repo and had the same.
I noticed lots of logs like this:
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 394 / 0x18a
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 395 / 0x18b
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 396 / 0x18c
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 397 / 0x18d
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 398 / 0x18e
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 399 / 0x18f
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 400 / 0x190
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 401 / 0x191
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 402 / 0x192
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 402 / 0x192
Comment 25 Miklos Vajna 2013-09-02 07:14:21 UTC
That is about unhandled tokens, not related to performance. If we want to improve performance, we should first profile.
Comment 26 the_letter_j 2013-10-07 01:24:28 UTC
Regression sometime before the most recent Ubuntu LTS in 2012 -- the slow documents are still slow as ever, but the 'fast' document is *also* now very slow to open.  Sigh.  LibreOffice 3.5.7 on Ubuntu 12.04.3 LTS

 25sec	Large 1.5mb docx     [was   5sec for previous commenter but crashed]  
 50sec	shorter 50 page docx [was  60sec for previous commenter] 
118sec	long 119 page docx   [was 185sec for previous commenter] 

Anyone like to test with ApacheOpenOffice 4.x please?  
And also on msftOffice, please specify Word version and Windows version? 

See also this bug, which is a much slower docx (~400 seconds) and has plenty of footnotes to go around.  https://bugs.freedesktop.org/show_bug.cgi?id=62656
Comment 27 tommy27 2013-10-07 05:40:30 UTC
just tested the shorter 50 page .docx on Win7 64bit.

same bad performance (almost 2 minutes to load) in LibO 3.3.3, LibO 4.1.2 and in OOo 3.3.0 and AOO 4.0.0.

so it seems the bug is inherited from OOo. changing
Comment 28 the_letter_j 2013-10-07 06:47:59 UTC
> just tested the shorter 50 page .docx on Win7 64bit ... 
> so it seems the bug is inherited from OOo. changing

@Tommy, can you please test the 1.5mb "fast" document?  Allegedly it worked fine in 2011, but on my Lio357 from 2012 is is now five times slower than before.  Might be a temporarily regression in the 3.5.x series, that is now fine in the 4.x series?
Comment 29 bugquestcontri 2014-02-06 04:28:53 UTC
I opened in LibO 4.1.4.2 on XP/SP3:
- 50 page file (123kB): in 46.9 sec; bug report: 2 min
- 119 page file (312kB): in 147 sec, bug report: 6 min
- 17 page file with 1.5 MB: in 4 sec; the bug reports gives no indication on opening time

See also the AskLibO Q&A: http://ask.libreoffice.org/en/question/29282/openbenchmarking-results-of-libreoffice/?answer=29290#post-id-29290
Comment 30 Michael Meeks 2014-02-06 07:50:13 UTC
Julien your callgrind trace is 404 now (urgh) =) in general running this stuff in callgrind (with symbols) and then looking at the results carefully, adding printf's to the slow pieces and then looking at what is churned on the console is the quickest and best way to identify what silly stuff is going on =)

That's really something that users can help with ... annoying that we can't attach those traces to bugzilla though ...
Comment 31 foss 2014-02-06 11:01:03 UTC
Confirmed:4.2.0.4:OSX

CPU spikes and 1.42min to open 50 page docx. Something is very off here.
Comment 32 Michael Stahl 2014-02-07 21:30:30 UTC
a quick check on loading the attachment implicated in comment #28 gives:

1167072 112520 pts/15 Sl+  22:20   0:01 OOo_3.2.1/opt/openoffice.org3/program/soffice.bin
1202460 120096 pts/6  Sl+  22:20   0:01 OOo_3.3/opt/openoffice.org3/program/soffice.bin
1209924 122832 pts/14 Sl+  22:20   0:01 OOo_3.4_beta/opt/ooo-dev3/program/soffice.bin
953528 131192 pts/9   Sl+  22:20   0:01 LO_3.4.6/opt/libreoffice3.4/program/soffice.bin
1075360 121616 pts/10 Sl+  22:21   0:01 LO_3.5.7/opt/libreoffice3.5/program/soffice.bin
996916 124556 pts/11  Sl+  22:21   0:01 LO_3.6.7/opt/libreoffice3.6/program/soffice.bin
1594144 143480 pts/12 Sl+  22:21   0:02 LO_4.0.6/opt/libreoffice4.0/program/soffice.bin
1581200 135336 pts/13 Sl+  22:22   0:02 LO_4.1.4/opt/libreoffice4.1/program/soffice.bin --splash-pipe=5

a look at the CPU time column shows only 0:01/0:02 values across all releases
=> removing "regression"

if you do find a document for which loading performance has regressed,
_please_ open a separate bug, do not add "regression" here!
Comment 33 Michael Stahl 2014-02-07 21:31:47 UTC
Comment on attachment 49033 [details]
Large 1.5mb docx file

since this document opens quickly it is sort of off-topic for this bug
Comment 34 tommy27 2014-12-13 09:34:03 UTC
(In reply to Astrix from comment #0)
> Created attachment 49025 [details]
> shorter 50 page docx document

still takes 1 minute and 40 seconds to load this file with LibO 4.3.4.1 and recent 4.5.0.0 alpha

tested under Win8.1 x64
AMD A8-6410 APU 2GHz 8GB RAM SSD
Comment 35 Michael Meeks 2014-12-13 14:27:21 UTC
Might be worth building a new profile of a build with debug symbols on vs. master - a ton of stuff changed in the XFastParser recently - and that has a very helpful effect wrt. splitting the core implementation slowness in writer from the XML parsing - and making it more visible. Any takers ?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.