Bug 17129 - Slow to move translucent windows above a certain size with EXA (r300)
Summary: Slow to move translucent windows above a certain size with EXA (r300)
Status: RESOLVED WONTFIX
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/Radeon (show other bugs)
Version: git
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: xf86-video-ati maintainers
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords: NEEDINFO
Depends on:
Blocks:
 
Reported: 2008-08-13 20:49 UTC by Joel Feiner
Modified: 2009-07-08 23:42 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Current xorg.conf (4.38 KB, text/plain)
2008-08-13 20:49 UTC, Joel Feiner
no flags Details
Kernel configuration (50.16 KB, text/plain)
2008-08-14 05:54 UTC, Joel Feiner
no flags Details
Latest Xorg.0.log (30.24 KB, patch)
2008-08-14 05:56 UTC, Joel Feiner
no flags Details | Splinter Review
Log file from X server (241.88 KB, text/plain)
2008-08-29 05:46 UTC, Joel Feiner
no flags Details
Log from opreport (91.14 KB, text/plain)
2008-08-29 18:11 UTC, Joel Feiner
no flags Details
Latest opreport log showing different results (70.26 KB, text/plain)
2008-08-31 08:44 UTC, Joel Feiner
no flags Details
Basic oprofile report (61.97 KB, text/plain)
2008-09-01 10:40 UTC, Joel Feiner
no flags Details
oprofile report with callgraph information (may or may not be that useful) (897.90 KB, text/plain)
2008-09-01 10:41 UTC, Joel Feiner
no flags Details

Description Joel Feiner 2008-08-13 20:49:20 UTC
Created attachment 18274 [details]
Current xorg.conf

I have an ATI Mobility x300 using the entire git tree for X.org and kernel 2.6.26.  When running xcompmgr -c and setting windows to be translucent (with xtransset), windows above a certain size, say, around 600x600, will be laggy to move when dragged.  Smaller translucent windows, like the default 80x24 Konsole window, move just fine with translucency.  Windows that don't use transset, but have their own translucency, like urxvt or gnome-terminal (with translucent background) seem to fare a little bit better, but not by much.

Oprofile report while dragging window:
CPU: PIII, speed 2000 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mas
k of 0x00 (No unit mask) count 100000
samples  %        image name               symbol name
39258    36.5429  radeon                   /radeon
29882    27.8153  vmlinux                  dma_alloc_coherent
23813    22.1661  vmlinux                  vsscanf
2191      2.0395  Xorg                     dixLookupPrivate
1554      1.4465  libc-2.8.so              /lib/libc-2.8.so
802       0.7465  vmlinux                  cpuidle_register_governor
692       0.6441  libpixman-1.so.0.11.9    pixman_op
416       0.3872  Xorg                     miComputeClips
301       0.2802  Xorg                     __i686.get_pc_thunk.bx
282       0.2625  libpixman-1.so.0.11.9    pixman_region_intersect
216       0.2011  Xorg                     miHandleValidateExposures
202       0.1880  Xorg                     miMarkOverlappedWindows
175       0.1629  Xorg                     miRegionValidate
174       0.1620  Xorg                     SetWinSize
173       0.1610  libdbe.so                miDbePositionWindow
160       0.1489  Xorg                     damageDamageRegion
148       0.1378  libpixman-1.so.0.11.9    pixman_region_subtractO
144       0.1340  Xorg                     xf86XVWindowExposures
138       0.1285  Xorg                     compPositionWindow

Xorg.conf is attached.
Comment 1 Michel Dänzer 2008-08-14 00:25:25 UTC
First of all, please also attach the full Xorg.0.log.

I haven't played with xcompmgr recently, but I have noticed that with KDE4, compositing is much slower with the XRender backend than with GLX. That's not the case with the intel driver on an i945G though, so for now let's assume this is a driver issue.

As for the profile, it would be interesting to get more useful symbol resolution for the 'radeon' image - I presume that refers to the radeon DRM kernel module. Also, are you sure the vmlinux symbols are accurate? I'm not sure why vsscanf and dma_alloc_coherent would be such hot spots for moving around windows. You did reset the oprofile samples before profiling the operation, didn't you?
Comment 2 Joel Feiner 2008-08-14 05:54:42 UTC
Created attachment 18280 [details]
Kernel configuration

Kernel configuration
Comment 3 Joel Feiner 2008-08-14 05:56:46 UTC
Created attachment 18281 [details] [review]
Latest Xorg.0.log
Comment 4 Joel Feiner 2008-08-14 05:57:22 UTC
I also noticed the behavior with KDE4 and I was going to open a bug about it, but then I got sick of KDE4 and uninstalled it ;)

Xorg.0.log is attached.

Yes, I did do a opcontrol --reset before profiling.  As for the kernel symbols, I've been wondering myself why I only see /radeon for the DRM module and those other weird symbols.  I've attached my kernel .config just in case that may also be of use.

Comment 5 Michel Dänzer 2008-08-14 08:58:32 UTC
I tracked down the KDE4 slowdown to the shadows ending up using pictures with non-power-of-two dimensions, which at least pre-R500 hardware doesn't support as a repeated source. You can avoid this by making sure the kwin shadow plugin size and offset (or something like that) parameters add up to a power of two.

Now, I don't think xcompmgr uses the same technique for shadows, and I can't seem to reproduce the problem with it even with shadows enabled. But FWIW, does not enabling shadows avoid the problem for you?
Comment 6 Joel Feiner 2008-08-14 09:52:18 UTC
(In reply to comment #5)
> I tracked down the KDE4 slowdown to the shadows ending up using pictures with
> non-power-of-two dimensions, which at least pre-R500 hardware doesn't support
> as a repeated source. You can avoid this by making sure the kwin shadow plugin
> size and offset (or something like that) parameters add up to a power of two.
> 
> Now, I don't think xcompmgr uses the same technique for shadows, and I can't
> seem to reproduce the problem with it even with shadows enabled. But FWIW, does
> not enabling shadows avoid the problem for you?
> 

The only way I know to disable shadows is to set the radius to 0 (e.g., xcompmgr -c -R 0).  I get the same results there, although it's slightly less laggy.  Also setting FBTexPercent to 0 seems to improve things in general, but the problem remains.
Comment 7 Joel Feiner 2008-08-25 16:16:34 UTC
I found an interesting phenonemon today.  If I alt-drag the translucent window if it's underneath another window, say, Konsole, then the slowness disappears (and almost no CPU time is used).  Bring it back above Konsole and the movement is slow and laggy again.  But moving it around near the side of the screen, where maybe 2/3 to 3/4's of the translucent window is off the edge of the screen (or less) causes the lagginess to subside.

Opreport looks like this while laggy:
samples  %        image name               app name                 symbol name
27593    23.7846  radeon                   Xorg                     /radeon
17364    14.9674  vmlinux                  Xorg                     vt8237_force_enable_hpet
5660      4.8788  Xorg                     Xorg                     dixLookupPrivate
5567      4.7986  vmlinux                  Xorg                     vsscanf
4566      3.9358  libc-2.8.so              Xorg                     /lib/libc-2.8.so
1935      1.6679  vmlinux                  Xorg                     sys_vm86old
1550      1.3361  libpixman-1.so.0.11.9    Xorg                     pixman_op
1083      0.9335  libqt-mt.so.3.3.8        kicker                   /usr/qt/3/lib/libqt-mt.so.3.3.8
1053      0.9077  Xorg                     Xorg                     miComputeClips

When moving it under the other window:
samples  %        image name               app name                 symbol name
4213      8.1681  Xorg                     Xorg                     dixLookupPrivate
3099      6.0083  libc-2.8.so              Xorg                     /lib/libc-2.8.so
1361      2.6387  radeon                   Xorg                     /radeon
1269      2.4603  vmlinux                  Xorg                     sys_vm86old
934       1.8108  libpixman-1.so.0.11.9    Xorg                     pixman_op
876       1.6984  libqt-mt.so.3.3.8        kicker                   /usr/qt/3/lib/libqt-mt.so.3.3.8
841       1.6305  vmlinux                  vmlinux                  acpi_processor_get_throttling_info
750       1.4541  vmlinux                  Xorg                     vt8237_force_enable_hpet
721       1.3979  Xorg                     Xorg                     miComputeClips
683       1.3242  vmlinux                  vmlinux                  uvesafb_vbe_state_save
642       1.2447  Xorg                     Xorg                     miValidateTree


Comment 8 Michel Dänzer 2008-08-29 01:04:16 UTC
(In reply to comment #7)
> I found an interesting phenonemon today.  If I alt-drag the translucent window
> if it's underneath another window, say, Konsole, then the slowness disappears
> (and almost no CPU time is used).  Bring it back above Konsole and the movement
> is slow and laggy again.  But moving it around near the side of the screen,
> where maybe 2/3 to 3/4's of the translucent window is off the edge of the
> screen (or less) causes the lagginess to subside.

Sounds like the translucent window is composited by the CPU instead of the GPU. It would be interesting to track down why that is happening, e.g. by rebuilding the driver with RADEON_TRACE_FALL defined to 1 in src/radeon_exa.c, or if that doesn't show anything interesting by rebuilding xserver with DEBUG_TRACE_FALL defined to 1 in exa/exa_priv.h.
Comment 9 Joel Feiner 2008-08-29 05:20:51 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > I found an interesting phenonemon today.  If I alt-drag the translucent window
> > if it's underneath another window, say, Konsole, then the slowness disappears
> > (and almost no CPU time is used).  Bring it back above Konsole and the movement
> > is slow and laggy again.  But moving it around near the side of the screen,
> > where maybe 2/3 to 3/4's of the translucent window is off the edge of the
> > screen (or less) causes the lagginess to subside.
> 
> Sounds like the translucent window is composited by the CPU instead of the GPU.
> It would be interesting to track down why that is happening, e.g. by rebuilding
> the driver with RADEON_TRACE_FALL defined to 1 in src/radeon_exa.c, or if that
> doesn't show anything interesting by rebuilding xserver with DEBUG_TRACE_FALL
> defined to 1 in exa/exa_priv.h.
> 

Yep, that's where the issue is.  The log file shows a bunch of these:

R300CheckComposite: Component alpha not supported with source alpha and source value blending.

And a bunch of these:

R300CheckCompositeTexture: Unsupported picture format 0x1011000

They come in chunks of 10-20 lines.  It seems when I tried to move the window faster, I would get more of the former message.  Hopefully, this will all be of use to you.
Comment 10 Joel Feiner 2008-08-29 05:24:38 UTC
I want to go ahead and add that I keep getting those messages without moving the window (or even having it open, or even doing anything except watching tail -f on konsole).  I neglected to do a proper control test before posting the last message.  Hopefully, the messages are still useful.

I also want to say that when moving the translucent window, CPU usage does go to 100% but it's almost entirely in the kernel.
Comment 11 Joel Feiner 2008-08-29 05:46:07 UTC
Created attachment 18568 [details]
Log file from X server

Fallback trace turned on in both the server and the driver.
Comment 12 Joel Feiner 2008-08-29 05:46:56 UTC
I tried with fallback trace in the server.  The log is a big large and I
couldn't think of a good way to indicate where in the log corresponds to when I
was moving the window around.  So I just made sure to immediately kill the X
server as soon as I was done moving the window so that all of the stuff in the
log up to the end should be related to moving the window.
Comment 13 Roland Scheidegger 2008-08-29 06:18:03 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> > > I found an interesting phenonemon today.  If I alt-drag the translucent window
> > > if it's underneath another window, say, Konsole, then the slowness disappears
> > > (and almost no CPU time is used).  Bring it back above Konsole and the movement
> > > is slow and laggy again.  But moving it around near the side of the screen,
> > > where maybe 2/3 to 3/4's of the translucent window is off the edge of the
> > > screen (or less) causes the lagginess to subside.
> > 
> > Sounds like the translucent window is composited by the CPU instead of the GPU.
> > It would be interesting to track down why that is happening, e.g. by rebuilding
> > the driver with RADEON_TRACE_FALL defined to 1 in src/radeon_exa.c, or if that
> > doesn't show anything interesting by rebuilding xserver with DEBUG_TRACE_FALL
> > defined to 1 in exa/exa_priv.h.
> > 
> 
> Yep, that's where the issue is.  The log file shows a bunch of these:
> 
> R300CheckComposite: Component alpha not supported with source alpha and source
> value blending.
> 
> And a bunch of these:
> 
> R300CheckCompositeTexture: Unsupported picture format 0x1011000
> 
> They come in chunks of 10-20 lines.  It seems when I tried to move the window
> faster, I would get more of the former message.  Hopefully, this will all be of
> use to you.

The unsupported picture format is 1 bit alpha (PICT_a1) here, though I've no idea if this fallback is what causes the slowdowns. As a side note, I think this fallback could be avoided on r5xx cards since those should support such a format - not that it would help in your case...
Comment 14 Michel Dänzer 2008-08-29 07:20:41 UTC
(In reply to comment #13)
> 
> The unsupported picture format is 1 bit alpha (PICT_a1) here, though I've no
> idea if this fallback is what causes the slowdowns. 

I don't think so, looks like uploads of A1 glyphs to the glyph cache pixmap. The component alpha output is probably about sub-pixel anti-aliased text rendering and doesn't even result in a software fallback (no following ExaCheckComposite output), so these are both red herrings for the problem.

> As a side note, I think this fallback could be avoided on r5xx cards since
> those should support such a format - not that it would help in your case...

The problem is that the EXA core currently doesn't bother migrating < 8 bpp pixmaps offscreen, so they never actually get accelerated.

(In reply to comment #10)
> 
> I also want to say that when moving the translucent window, CPU usage does go
> to 100% but it's almost entirely in the kernel.

So we're back to square one and could probably use better profiling data...
Comment 15 Joel Feiner 2008-08-29 10:44:23 UTC
I don't know how to get the debugging symbols for the radeon and drm modules
to show up in oprofile.  I've built everything with -ggdb.  I selected the
options in the kernel to have full debugging symbols.  I'm not sure what I'm
doing wrong.

For what it's worth, opreport thinks the kernel module's path is /radeon,
which is obviously incorrect.  I don't know, however, how to fix that.  Any
ideas?

On Fri, Aug 29, 2008 at 10:20 AM, <bugzilla-daemon@freedesktop.org> wrote:

>
> (In reply to comment #10)
> >
> > I also want to say that when moving the translucent window, CPU usage
> does go
> > to 100% but it's almost entirely in the kernel.
>
> So we're back to square one and could probably use better profiling data...
>
>
Comment 16 Michel Dänzer 2008-08-29 15:17:36 UTC
(In reply to comment #15)
> For what it's worth, opreport thinks the kernel module's path is /radeon,
> which is obviously incorrect.  I don't know, however, how to fix that.  Any
> ideas?

Unfortunately not, try some oprofile documentation / forum / people / ... maybe? Or might sysprof work better?
Comment 17 Joel Feiner 2008-08-29 18:10:07 UTC
It turns out that if you read the man page, you can find the answers to things ;).

So, here's the oprofile with the symbols from the kernel modules:

samples  %        linenr info                 image name               symbol name
87794    38.9522  radeon_cp.c:1506            radeon.ko                radeon_freelist_get
49926    22.1510  quirks.c:290                vmlinux                  vt8237_force_enable_hpet
16635     7.3806  vsprintf.c:950              vmlinux                  vsscanf
6703      2.9740  privates.c:130              Xorg                     dixLookupPrivate
4635      2.0564  pixman-region.c:1552        libpixman-1.so.0.11.9    pixman_region_subtractO
4376      1.9415  (no location information)   libc-2.8.so              /lib/libc-2.8.so
3918      1.7383  pixman-region.c:633         libpixman-1.so.0.11.9    pixman_op
3060      1.3577  radeon_cp.c:238             radeon.ko                radeon_do_wait_for_idle
2373      1.0528  pixman-edge.c:324           libpixman-1.so.0.11.9    pixman_rasterize_edges
2326      1.0320  exa_offscreen.c:172         libexa.so                exaOffscreenAlloc
2170      0.9628  vm86_32.c:200               vmlinux                  sys_vm86old
1364      0.6052  exa_offscreen.c:417         libexa.so                exaOffscreenFree
893       0.3962  damage.c:174                Xorg                     damageDamageRegion
889       0.3944  radeon_exa_render.c:1908    radeon_drv.so            RadeonCompositeTileCP
713       0.3163  posix-timers.c:784          vmlinux                  sys_timer_settime
707       0.3137  xkbKillSrv.c:0              Xorg                     __i686.get_pc_thunk.bx
681       0.3021  resource.c:851              Xorg                     dixLookupResource
575       0.2551  radeon_state.c:2443         radeon.ko                radeon_cp_indirect
574       0.2547  exa_glyphs.c:417            libexa.so                exaGlyphCacheBufferGlyph
546       0.2422  signal_32.c:267             vmlinux                  setup_sigcontext
Comment 18 Joel Feiner 2008-08-29 18:11:22 UTC
Created attachment 18586 [details]
Log from opreport

Adding this so that you can actually read the content (since my cut and paste job epic-failed).
Comment 19 Michel Dänzer 2008-08-31 03:00:33 UTC
radeon_freelist_get at the top of the profile indicates the GPU is the bottleneck, which is odd... Does moving the window involve any kind of not directly related animation or other constant screen updates?

I'm also not sure why vt8237_force_enable_hpet is up there; do you have a VIA 8235/7 chipset?
Comment 20 Joel Feiner 2008-08-31 08:43:15 UTC
(In reply to comment #19)
> radeon_freelist_get at the top of the profile indicates the GPU is the
> bottleneck, which is odd... Does moving the window involve any kind of not
> directly related animation or other constant screen updates?
> 
Using plain xcompmgr so there are no animations or anything happening.

> I'm also not sure why vt8237_force_enable_hpet is up there; do you have a VIA
> 8235/7 chipset?
> 
I do not have a VIA chipset.  I did disable HPET to see if that would go away, but then something else just popped up high in the kernel list.  Once again, I suspect my opreport stuff is incorrect.

What's more disconcerting to me is that I ran this test again last night and the performance was different.  It was jerky instead of laggy and opreport showed completely different results.  Now this morning it is back to laggy.  I haven't upgraded anything related to X in over a week, certainly not between last night and, say, the night before (or this morning) when it was still showing the behavior triggering this bug in the first place.

For what it's worth, I've attached this morning opreport report.

Do you think it would be useful to try to do this over a less asynchronous communications channel like IRC or something like that?  Either that or just drop it since I can't get consistent results and don't feel entirely sure that I'm even measuring things correctly.  I don't want to waste people's time.
Comment 21 Joel Feiner 2008-08-31 08:44:14 UTC
Created attachment 18602 [details]
Latest opreport log showing different results
Comment 22 Michel Dänzer 2008-09-01 00:50:00 UTC
(In reply to comment #20)
> Do you think it would be useful to try to do this over a less asynchronous
> communications channel like IRC or something like that?

I'm not sure if/how IRC would help at this point; it seems most important to get accurate profiling data.
Comment 23 Joel Feiner 2008-09-01 10:40:01 UTC
I disabled HPET in the kernel.  I've turned on all the proper debugging stuff.  The results seem okay (when doing an operating like, say, scrolling firefox, the functions in opreport look appropriate -- bunch of stuff in pixman about modifying damage regions, compositing, etc.).  If this isn't useful, then I guess we'll need to think of something different.
Comment 24 Joel Feiner 2008-09-01 10:40:28 UTC
Created attachment 18615 [details]
Basic oprofile report
Comment 25 Joel Feiner 2008-09-01 10:41:12 UTC
Created attachment 18616 [details]
oprofile report with callgraph information (may or may not be that useful)
Comment 26 Michel Dänzer 2008-09-02 00:28:25 UTC
(In reply to comment #23)
> I've turned on all the proper debugging stuff. The results seem okay (when
> doing an operating like, say, scrolling firefox, the functions in opreport look
> appropriate -- bunch of stuff in pixman about modifying damage regions,
> compositing, etc.).

Userspace symbols tend be unproblematic, it's the kernel space symbols that are dubious here. Also, I have a hard time making sense of oprofile callgraphs, so even if vsscanf / dma_alloc_coherent are indeed the hotspots, I'm not sure where they're called from. It might make sense to clarify this with kernel / oprofile people and come back here if the results still point to the graphics drivers.
Comment 27 Joel Feiner 2009-03-09 16:54:22 UTC
I still haven't been able to get oprofile to behave, but I do have some new, albeit information to report.  The problem is definitely with shadows.  Disabling shadows in KDE 3's kompmgr results in the problem completely going away.  The less said about performance in KDE 4, the better, and I haven't done proper testing there either, although turning off shadows in KDE 4 helps considerably.  I don't know if this is of use to you guys.
Comment 28 Joel Feiner 2009-07-08 20:44:31 UTC
This may be the last update needed.  I tried moving large translucent windows around on Windows XP and I get exactly the same behavior, including large amounts of time spent in kernel mode.  So either both the ATI drivers for Windows and the the OSS drivers on Linux have the same bug, or it's an issue in the graphics hardware itself.  If it's the latter, it seems unlikely to be fixable.
Comment 29 Michel Dänzer 2009-07-08 23:42:11 UTC
It seems clear that the problem is due to software rendering fallbacks triggered by the compositing managers, so it would ultimately need to be fixed / worked around there.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.