Created attachment 83303 [details] C++ source reproducing the problem I am using Debian and upgraded to libreoffice from wheezy-backports (version 1:4.0.3-2~bpo70+1 in Debian notation). I can succefully connect to Libreoffice using API but when I try to get the type of any document from InputStream using XTypeDetection::queryTypeByDescriptor it looks like hang. For very short strings (about 1000 bytes) the function returns the type (generic_Text or encoded in my tests) correctly and quite fast but if I load even 50000 bytes file the wait time to get the type is more than a minute. I can see one thread from my app consuming 50% of cpu and one thread from Libreoffice process consuming 50% of cpu for all this time. I tried to attach gdb to the process but the only thing I understand that the execution is inside cppu_threadpool::JobQueue::enter and doesn't go out until the end. I used openoffice from Debian squeeze before and my code worked fine so I suppose that this situation is a bug. I attach simple test application to reproduce the problem
LibreOffice's document type detection code is notoriously slow, trying lots of filters one by one until it finds a good match, and seeking and reading the same data over and over again from the given input stream. So, if that stream is only made available across URP, this can easily cause lots of delay, esp. if the stream's data is in a "bogus" format for which no matching filter can be found (so the search needs to go through all of them). That said, if you say things were considerably faster with an old OpenOffice.org version (which exactly was that?), it might be a performance regression in the rewritten binary URP bridge hinted at <https://issues.apache.org/ooo/show_bug.cgi?id=116038#c14> "rewrite binary URP bridge."
I have some additional research: the main problem is reading data from the inputstream byte-by-byte. If I pass the file by URL the perfomance is great. The call to InputStream::readBytes is perforemed with size parameter num=1 and then this byte is transmitted across processes - this is very slow. I added debug print to my stream implemenation and the result is : Successfully connected to LibreOffice Changed location to: 0 Readed: 30 Changed location to: 0 Changed location to: 0 Changed location to: 0 Readed: 1024 Changed location to: 0 Changed location to: 0 Changed location to: 0 Changed location to: 0 Changed location to: 0 Readed: 4096 Changed location to: 0 Changed location to: 0 Changed location to: 0 Changed location to: 0 Changed location to: 0 Readed: 4096 Changed location to: 0 Changed location to: 0 Readed: 26 Changed location to: 0 Changed location to: 0 Readed: 7 Changed location to: 0 Changed location to: 0 Changed location to: 0 Readed: 512 Changed location to: 0 Changed location to: 0 Readed: 1 Changed location to: 0 Changed location to: 0 Readed: 4 Changed location to: 0 Changed location to: 0 Changed location to: 0 Changed location to: 0 Readed: 4096 Changed location to: 0 Changed location to: 1 Readed: 1 Readed: 1 Readed: 1 Changed location to: 0 Readed: 1 Changed location to: 0 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 ...... byte-by-byte read So it seems that there is no buffering in communication. This problem makes InputStreams useless - it is not just a two times slower as in <https://issues.apache.org/ooo/show_bug.cgi?id=116038#c14> "rewrite binary URP bridge.
(In reply to Grigory from comment #2) > I have some additional research: the main problem is reading data from the > inputstream byte-by-byte. If I pass the file by URL the perfomance is great. Hi Grigory, How's the performance of our the latest builds? Status -> NEEDINFO
(In reply to Robinson Tryon (qubit) from comment #3) > (In reply to Grigory from comment #2) > > I have some additional research: the main problem is reading data from the > > inputstream byte-by-byte. If I pass the file by URL the perfomance is great. > > Hi Grigory, > How's the performance of our the latest builds? > > Status -> NEEDINFO Sorry, I don't have a computer with necessary environment to check the error. But the code that I attached to the ticket should reproduce the problem if it still exists
(In reply to Grigory from comment #4) > Sorry, I don't have a computer with necessary environment to check the > error. But the code that I attached to the ticket should reproduce the > problem if it still exists Stephan: Is testing the code straightforward, or is this something a dev will need to handle?
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.