Bug 26984

Summary: Poor performance of Qt applications
Product: xorg Reporter: Martin Stolpe <martinstolpe>
Component: Driver/RadeonAssignee: xf86-video-ati maintainers <xorg-driver-ati>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: 7.5 (2009.10)   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
oprofile log while click on the KDE application launcher button
none
Change scratch bo from uncached to cached
none
dolphin using the opengl graphicssystem
none
oprifle when scrolling a page in Arora
none
another oprofile when klicking on the application launcher button
none
zooming in okular none

Description Martin Stolpe 2010-03-09 16:20:32 UTC
Hello,
I was running KDE 4.4.1 with Qt 4.6.1 on my computer and the applications are not really snappy. Sorry I can't describe it any better. I've tried the Gnome desktop and here everything seems to work much better (when I click on on the windows title bar the application immediatly comes to the front, overall responsiveness seems to be better). It doesn't seem to be a sole problem of kwin, because for example Amarok using the Gnome desktop with Metacity seems to be still sluggish.

This is using a R600 AGP card using the latest git snapshots from mesa and xf86-video-ati. On my notebook with a X1400 it seems to run well.

Please let me know if I can do any performance testing.
Comment 1 Alex Deucher 2010-03-09 17:13:19 UTC
If you are using kms, this is likely a dupe of bug 26641
Comment 2 Martin Stolpe 2010-03-10 00:29:00 UTC
I'm not quite sure if it's a duplicate.

I've patched the kernel as recommended in post #3: http://bugs.freedesktop.org/show_bug.cgi?id=26641#c3 and it didn't change anything.

Gnome seems to be running a lot smoother than KDE. To me it looks like a problem with the combination of driver stack + KWin + Qt application.
Comment 3 Michael de Lang 2010-03-10 01:06:46 UTC
Are you running xserver from git, per chance? I noticed the same problem using that, reverting to 1.7.5 seemed to help.
Comment 4 Michel Dänzer 2010-03-10 01:09:10 UTC
(In reply to comment #3)
> Are you running xserver from git, per chance? I noticed the same problem using
> that, reverting to 1.7.5 seemed to help.

Can you try and bisect which xserver change caused the regression for you?
Comment 5 Michael de Lang 2010-03-10 15:03:40 UTC
I tried bisecting it, but an error came up:
glxdri2.c:221: error: ‘__DRI2flushExtension’ has no member named ‘flushInvalidate’

Would using an older commit than 11252ed82e1f361b99e86521ac9314f868bd1a3a solve the problem, or do I need some weird patch which has been committed recently?
Comment 6 Martin Stolpe 2010-03-10 15:09:00 UTC
The only stuff compiled from git are mesa and the xf86-video-ati driver. Xorg server is 1.7.5.
Comment 7 Michel Dänzer 2010-03-11 03:34:42 UTC
(In reply to comment #6)
> The only stuff compiled from git are mesa and the xf86-video-ati driver. Xorg
> server is 1.7.5.

So please file your own bug Michael, as you're probably not suffering from the same issue as Martin.

Martin, if you have compositing enabled in kwin, which backend is it using?
Comment 8 Martin Stolpe 2010-03-11 14:17:32 UTC
I'm using kwin with the OpenGL backend. I've tried XRender and it was slower (as expected) than OpenGL. The different OpenGL options didn't change a lot performance wise (I was playing a flash video in the browser. I guess a benchmark with hard numbers would be a lot better).

Playing flash videos using kwin with compositing is smoother than without compositing (see my post at phoronix: http://www.phoronix.com/forums/showpost.php?p=115065&postcount=34)

The one thing where this sluggish behaviour is reproducible is when I klick on the application launcher. It takes ~2s to open the menu.
Comment 9 Michel Dänzer 2010-03-12 00:46:34 UTC
(In reply to comment #8)
> The one thing where this sluggish behaviour is reproducible is when I klick on
> the application launcher. It takes ~2s to open the menu.

If the CPU is (at least almost) pegged during those 2s, a profile from sysprof or oprofile might be interesting.
Comment 10 Martin Stolpe 2010-03-12 01:24:27 UTC
Created attachment 33984 [details]
oprofile log while click on the KDE application launcher button
Comment 11 Martin Stolpe 2010-03-12 02:05:40 UTC
I've created a oprofile log while clicking on the KDE application launcher button. Hope this helps understanding why this takes so long.

I have compiled glibc with debug symbols (nm shows me the following:
N .debug_abbrev 
N .debug_aranges 
N .debug_frame 
N .debug_info 
N .debug_line 
N .debug_loc 
N .debug_pubnames 
N .debug_ranges 
N .debug_str )
but those symbols won't show up in kcachgrind.

If I can do anything else please let me know
Comment 12 Jerome Glisse 2010-03-12 07:41:16 UTC
It's easier to use sysprof (gui interface) also when you try to profile such thing you need to an operation easy to reproduce and perform this operation repeatly while profiling is one so that your profile have enough data to show which path is actually taking time.
Comment 13 Michel Dänzer 2010-03-12 07:45:46 UTC
Doesn't look like my versions of oprofile/kcachegrind can get a lot of information out of that data. sysprof 1.1.x would really be preferable, but if you can't get that to work, please attach the output of oprofile -l and/or -c.
Comment 14 Michel Dänzer 2010-03-12 07:49:24 UTC
(In reply to comment #12)
> also when you try to profile such thing you need to an operation easy to
> reproduce and perform this operation repeatly while profiling is one so that
> your profile have enough data to show which path is actually taking time.

It might be possible to get useful data from a single event of 2s, provided the profiling is started / stopped as closely to the start / stop of the event as possible.
Comment 15 Martin Stolpe 2010-03-12 14:51:17 UTC
Here is another oprofile output. I've clicked on the application launcher button for about 2 minutes, so hopefully this generated enough useful data.

This is the output from oprofile: http://www.uni-ulm.de/~s_mstopl/oprofile.tar.gz
and this is the report using opreport, gprof2dot and dot: http://www.uni-ulm.de/~s_mstopl/application_launcher.tar.gz
Comment 16 Michel Dänzer 2010-03-13 02:13:13 UTC
Thanks, this looks useful. Basically, the bottleneck is XGetImage(). While we may be able to make that suck less, in general clients shouldn't rely on it in their fast paths. Are you using the Qt GTK+ engine by any chance?

BTW, does disabling AGP by passing radeon.agpmode=-1 help any?
Comment 17 Martin Stolpe 2010-03-13 17:08:46 UTC
I've added radeon.agpmode=-1 to my kernel command line. demsg shows me the following line: "[drm] Forcing AGP to PCIE mode" but it didn't change anything.

I had the following packages installed: gtk-kde4, gtk-kde4-oxygen-theme, gtk-qt-engine (I'm running Archlinux). I've uninstalled these packages and it also didn't change anything.

The output from oprofile looks the same as before:
http://www.uni-ulm.de/~s_mstopl/oprofile_2.tar.gz
http://www.uni-ulm.de/~s_mstopl/application_launcher_2.tar.gz
Comment 18 Martin Stolpe 2010-03-14 05:23:51 UTC
Removing the gtk stuff didn't speed up the application launcher but I'm under the impression that some applications don't lag that much anymore (Opera seems to work better). But I have no idea if this is just wishful thinking.

Are there any benchmarks which could be useful (for example running the kernel with the agpmode parameter and without)?
Comment 19 Michel Dänzer 2010-03-15 03:22:01 UTC
E.g.

x11perf -getimage{10,100,500}

Also, someone on IRC pointed out that due to limitations in the Qt X11 backend, starting plasma (and maybe Qt apps in general) with --graphicssystem=raster may work better. (You can also try --graphicssystem=opengl, but the Mesa driver most likely isn't quite up to that yet)
Comment 20 Pauli 2010-03-15 03:51:58 UTC
Created attachment 34057 [details] [review]
Change scratch bo from uncached to cached

This patch should improve memcpy performance in XGetImage.

Can you test and report if it helps?
Comment 21 Pauli 2010-03-15 08:46:02 UTC
> --- Comment #20 from Pauli <suokkos@gmail.com>  2010-03-15 03:51:58 PST ---
> Created an attachment (id=34057)
>  --> (http://bugs.freedesktop.org/attachment.cgi?id=34057)
> Change scratch bo from uncached to cached
>
> This patch should improve memcpy performance in XGetImage.
>
> Can you test and report if it helps?
>
>


Sorry. This doesn't work without kernel changes :/
Comment 22 Martin Stolpe 2010-03-15 08:59:38 UTC
Ok, I just wanted to upload a new profile w(In reply to comment #21)
> > --- Comment #20 from Pauli <suokkos@gmail.com>  2010-03-15 03:51:58 PST ---
> > Created an attachment (id=34057) [details]
> >  --> (http://bugs.freedesktop.org/attachment.cgi?id=34057)
> > Change scratch bo from uncached to cached
> >
> > This patch should improve memcpy performance in XGetImage.
> >
> > Can you test and report if it helps?
> >
> >
> 
> 
> Sorry. This doesn't work without kernel changes :/
> 

Too bad. I just wanted to upload a new profile which shows that this patch unfortunately didn't help.
Comment 23 Martin Stolpe 2010-03-15 09:06:46 UTC
Created attachment 34075 [details]
dolphin using the opengl graphicssystem

I've uploaded a screenshot with dolphin when using the opengl graphicssystem. What's interesting to me is that some letters are missing. When I hover with the mouse over a folder the name of the folder is showing (the mouse curser was on the Desktop folder when I took the screenshot). But I guess this would be another bug report...
Comment 24 Alex Deucher 2010-03-15 09:35:10 UTC
grab this commit:
http://cgit.freedesktop.org/xorg/driver/xf86-video-ati/commit/?id=488c9fd8300505cc6c0c2f8f0f00849f27cc5d63
or just pull the latest xf86-video-ati from git.
Comment 25 Nikos Chantziaras 2010-03-15 09:41:34 UTC
(In reply to comment #23)
> Created an attachment (id=34075) [details]
> dolphin using the opengl graphicssystem
> 
> I've uploaded a screenshot with dolphin when using the opengl graphicssystem.
> What's interesting to me is that some letters are missing. When I hover with
> the mouse over a folder the name of the folder is showing (the mouse curser was
> on the Desktop folder when I took the screenshot). But I guess this would be
> another bug report...

Qt's OpenGL renderer is generally not working.  Not in Linux, not in Windows, and not with any driver.  No need to actually even try it.
Comment 26 Martin Stolpe 2010-03-15 16:23:18 UTC
(In reply to comment #24)
> grab this commit:
> http://cgit.freedesktop.org/xorg/driver/xf86-video-ati/commit/?id=488c9fd8300505cc6c0c2f8f0f00849f27cc5d63
> or just pull the latest xf86-video-ati from git.
> 

Great work! Application launcher is indeed faster and the scaling in okular is also a lot faster. Thanks a lot.

I made to more profiles which I hope could be interesting.
Comment 27 Martin Stolpe 2010-03-15 16:24:56 UTC
Created attachment 34093 [details]
oprifle when scrolling a page in Arora

If I should bother the Qt guys with this profile please let me know.
Comment 28 Martin Stolpe 2010-03-15 16:31:48 UTC
Created attachment 34094 [details]
another oprofile when klicking on the application launcher button

The exaGetImage doesn't consume as much time a before.
Comment 29 Martin Stolpe 2010-03-15 17:19:23 UTC
Created attachment 34095 [details]
zooming in okular

And another attachment. No more attachments after this one (only if requested) promised! ;-)

All of the three profiles above have in common that a lot of time is spent in sysenter_do_call.

Now to the good stuff: The patch above solved quite a few performance problems on my system:
 -kmail is a lot faster
 -switching between tabs/windows is faster
 -zooming in okular is faster (but still slow)

Thanks again!
Comment 30 Alex Deucher 2010-03-15 17:40:58 UTC
Can this bug be considered closed?  I think optimally the apps would use less GetImage.
Comment 31 Martin Stolpe 2010-03-16 01:41:32 UTC
(In reply to comment #30)
> Can this bug be considered closed?  I think optimally the apps would use less
> GetImage.
> 

Should I open new bug reports for the other problems?

Looking at the output from oprofile the other bottlenecks seems to be:
 -sysenter_do_call (okular)
 -still exaGetImage (okular)
 -convert_ARGB_PM_to_ARGB (is this a Qt problem or a driver problem?) (arora)
Comment 32 Martin Stolpe 2010-03-16 13:35:49 UTC
Ok, I consider this bug as fixed. But it would be really nice if someone could answer my questions regarding the arora profile: sysenter_do_call and convert_ARGB_PM_to_ARGB functions seem to consume a lot of cpu time. Is this normal? Should I report the convert_ARGB_PM_to_ARGB function to the Qt developers?
Comment 33 Michel Dänzer 2010-03-17 03:51:02 UTC
(In reply to comment #32)
> Should I report the convert_ARGB_PM_to_ARGB function to the Qt developers?

Yes.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.