Created attachment 18274 [details] Current xorg.conf I have an ATI Mobility x300 using the entire git tree for X.org and kernel 2.6.26. When running xcompmgr -c and setting windows to be translucent (with xtransset), windows above a certain size, say, around 600x600, will be laggy to move when dragged. Smaller translucent windows, like the default 80x24 Konsole window, move just fine with translucency. Windows that don't use transset, but have their own translucency, like urxvt or gnome-terminal (with translucent background) seem to fare a little bit better, but not by much. Oprofile report while dragging window: CPU: PIII, speed 2000 MHz (estimated) Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mas k of 0x00 (No unit mask) count 100000 samples % image name symbol name 39258 36.5429 radeon /radeon 29882 27.8153 vmlinux dma_alloc_coherent 23813 22.1661 vmlinux vsscanf 2191 2.0395 Xorg dixLookupPrivate 1554 1.4465 libc-2.8.so /lib/libc-2.8.so 802 0.7465 vmlinux cpuidle_register_governor 692 0.6441 libpixman-1.so.0.11.9 pixman_op 416 0.3872 Xorg miComputeClips 301 0.2802 Xorg __i686.get_pc_thunk.bx 282 0.2625 libpixman-1.so.0.11.9 pixman_region_intersect 216 0.2011 Xorg miHandleValidateExposures 202 0.1880 Xorg miMarkOverlappedWindows 175 0.1629 Xorg miRegionValidate 174 0.1620 Xorg SetWinSize 173 0.1610 libdbe.so miDbePositionWindow 160 0.1489 Xorg damageDamageRegion 148 0.1378 libpixman-1.so.0.11.9 pixman_region_subtractO 144 0.1340 Xorg xf86XVWindowExposures 138 0.1285 Xorg compPositionWindow Xorg.conf is attached.
First of all, please also attach the full Xorg.0.log. I haven't played with xcompmgr recently, but I have noticed that with KDE4, compositing is much slower with the XRender backend than with GLX. That's not the case with the intel driver on an i945G though, so for now let's assume this is a driver issue. As for the profile, it would be interesting to get more useful symbol resolution for the 'radeon' image - I presume that refers to the radeon DRM kernel module. Also, are you sure the vmlinux symbols are accurate? I'm not sure why vsscanf and dma_alloc_coherent would be such hot spots for moving around windows. You did reset the oprofile samples before profiling the operation, didn't you?
Created attachment 18280 [details] Kernel configuration Kernel configuration
Created attachment 18281 [details] [review] Latest Xorg.0.log
I also noticed the behavior with KDE4 and I was going to open a bug about it, but then I got sick of KDE4 and uninstalled it ;) Xorg.0.log is attached. Yes, I did do a opcontrol --reset before profiling. As for the kernel symbols, I've been wondering myself why I only see /radeon for the DRM module and those other weird symbols. I've attached my kernel .config just in case that may also be of use.
I tracked down the KDE4 slowdown to the shadows ending up using pictures with non-power-of-two dimensions, which at least pre-R500 hardware doesn't support as a repeated source. You can avoid this by making sure the kwin shadow plugin size and offset (or something like that) parameters add up to a power of two. Now, I don't think xcompmgr uses the same technique for shadows, and I can't seem to reproduce the problem with it even with shadows enabled. But FWIW, does not enabling shadows avoid the problem for you?
(In reply to comment #5) > I tracked down the KDE4 slowdown to the shadows ending up using pictures with > non-power-of-two dimensions, which at least pre-R500 hardware doesn't support > as a repeated source. You can avoid this by making sure the kwin shadow plugin > size and offset (or something like that) parameters add up to a power of two. > > Now, I don't think xcompmgr uses the same technique for shadows, and I can't > seem to reproduce the problem with it even with shadows enabled. But FWIW, does > not enabling shadows avoid the problem for you? > The only way I know to disable shadows is to set the radius to 0 (e.g., xcompmgr -c -R 0). I get the same results there, although it's slightly less laggy. Also setting FBTexPercent to 0 seems to improve things in general, but the problem remains.
I found an interesting phenonemon today. If I alt-drag the translucent window if it's underneath another window, say, Konsole, then the slowness disappears (and almost no CPU time is used). Bring it back above Konsole and the movement is slow and laggy again. But moving it around near the side of the screen, where maybe 2/3 to 3/4's of the translucent window is off the edge of the screen (or less) causes the lagginess to subside. Opreport looks like this while laggy: samples % image name app name symbol name 27593 23.7846 radeon Xorg /radeon 17364 14.9674 vmlinux Xorg vt8237_force_enable_hpet 5660 4.8788 Xorg Xorg dixLookupPrivate 5567 4.7986 vmlinux Xorg vsscanf 4566 3.9358 libc-2.8.so Xorg /lib/libc-2.8.so 1935 1.6679 vmlinux Xorg sys_vm86old 1550 1.3361 libpixman-1.so.0.11.9 Xorg pixman_op 1083 0.9335 libqt-mt.so.3.3.8 kicker /usr/qt/3/lib/libqt-mt.so.3.3.8 1053 0.9077 Xorg Xorg miComputeClips When moving it under the other window: samples % image name app name symbol name 4213 8.1681 Xorg Xorg dixLookupPrivate 3099 6.0083 libc-2.8.so Xorg /lib/libc-2.8.so 1361 2.6387 radeon Xorg /radeon 1269 2.4603 vmlinux Xorg sys_vm86old 934 1.8108 libpixman-1.so.0.11.9 Xorg pixman_op 876 1.6984 libqt-mt.so.3.3.8 kicker /usr/qt/3/lib/libqt-mt.so.3.3.8 841 1.6305 vmlinux vmlinux acpi_processor_get_throttling_info 750 1.4541 vmlinux Xorg vt8237_force_enable_hpet 721 1.3979 Xorg Xorg miComputeClips 683 1.3242 vmlinux vmlinux uvesafb_vbe_state_save 642 1.2447 Xorg Xorg miValidateTree
(In reply to comment #7) > I found an interesting phenonemon today. If I alt-drag the translucent window > if it's underneath another window, say, Konsole, then the slowness disappears > (and almost no CPU time is used). Bring it back above Konsole and the movement > is slow and laggy again. But moving it around near the side of the screen, > where maybe 2/3 to 3/4's of the translucent window is off the edge of the > screen (or less) causes the lagginess to subside. Sounds like the translucent window is composited by the CPU instead of the GPU. It would be interesting to track down why that is happening, e.g. by rebuilding the driver with RADEON_TRACE_FALL defined to 1 in src/radeon_exa.c, or if that doesn't show anything interesting by rebuilding xserver with DEBUG_TRACE_FALL defined to 1 in exa/exa_priv.h.
(In reply to comment #8) > (In reply to comment #7) > > I found an interesting phenonemon today. If I alt-drag the translucent window > > if it's underneath another window, say, Konsole, then the slowness disappears > > (and almost no CPU time is used). Bring it back above Konsole and the movement > > is slow and laggy again. But moving it around near the side of the screen, > > where maybe 2/3 to 3/4's of the translucent window is off the edge of the > > screen (or less) causes the lagginess to subside. > > Sounds like the translucent window is composited by the CPU instead of the GPU. > It would be interesting to track down why that is happening, e.g. by rebuilding > the driver with RADEON_TRACE_FALL defined to 1 in src/radeon_exa.c, or if that > doesn't show anything interesting by rebuilding xserver with DEBUG_TRACE_FALL > defined to 1 in exa/exa_priv.h. > Yep, that's where the issue is. The log file shows a bunch of these: R300CheckComposite: Component alpha not supported with source alpha and source value blending. And a bunch of these: R300CheckCompositeTexture: Unsupported picture format 0x1011000 They come in chunks of 10-20 lines. It seems when I tried to move the window faster, I would get more of the former message. Hopefully, this will all be of use to you.
I want to go ahead and add that I keep getting those messages without moving the window (or even having it open, or even doing anything except watching tail -f on konsole). I neglected to do a proper control test before posting the last message. Hopefully, the messages are still useful. I also want to say that when moving the translucent window, CPU usage does go to 100% but it's almost entirely in the kernel.
Created attachment 18568 [details] Log file from X server Fallback trace turned on in both the server and the driver.
I tried with fallback trace in the server. The log is a big large and I couldn't think of a good way to indicate where in the log corresponds to when I was moving the window around. So I just made sure to immediately kill the X server as soon as I was done moving the window so that all of the stuff in the log up to the end should be related to moving the window.
(In reply to comment #9) > (In reply to comment #8) > > (In reply to comment #7) > > > I found an interesting phenonemon today. If I alt-drag the translucent window > > > if it's underneath another window, say, Konsole, then the slowness disappears > > > (and almost no CPU time is used). Bring it back above Konsole and the movement > > > is slow and laggy again. But moving it around near the side of the screen, > > > where maybe 2/3 to 3/4's of the translucent window is off the edge of the > > > screen (or less) causes the lagginess to subside. > > > > Sounds like the translucent window is composited by the CPU instead of the GPU. > > It would be interesting to track down why that is happening, e.g. by rebuilding > > the driver with RADEON_TRACE_FALL defined to 1 in src/radeon_exa.c, or if that > > doesn't show anything interesting by rebuilding xserver with DEBUG_TRACE_FALL > > defined to 1 in exa/exa_priv.h. > > > > Yep, that's where the issue is. The log file shows a bunch of these: > > R300CheckComposite: Component alpha not supported with source alpha and source > value blending. > > And a bunch of these: > > R300CheckCompositeTexture: Unsupported picture format 0x1011000 > > They come in chunks of 10-20 lines. It seems when I tried to move the window > faster, I would get more of the former message. Hopefully, this will all be of > use to you. The unsupported picture format is 1 bit alpha (PICT_a1) here, though I've no idea if this fallback is what causes the slowdowns. As a side note, I think this fallback could be avoided on r5xx cards since those should support such a format - not that it would help in your case...
(In reply to comment #13) > > The unsupported picture format is 1 bit alpha (PICT_a1) here, though I've no > idea if this fallback is what causes the slowdowns. I don't think so, looks like uploads of A1 glyphs to the glyph cache pixmap. The component alpha output is probably about sub-pixel anti-aliased text rendering and doesn't even result in a software fallback (no following ExaCheckComposite output), so these are both red herrings for the problem. > As a side note, I think this fallback could be avoided on r5xx cards since > those should support such a format - not that it would help in your case... The problem is that the EXA core currently doesn't bother migrating < 8 bpp pixmaps offscreen, so they never actually get accelerated. (In reply to comment #10) > > I also want to say that when moving the translucent window, CPU usage does go > to 100% but it's almost entirely in the kernel. So we're back to square one and could probably use better profiling data...
I don't know how to get the debugging symbols for the radeon and drm modules to show up in oprofile. I've built everything with -ggdb. I selected the options in the kernel to have full debugging symbols. I'm not sure what I'm doing wrong. For what it's worth, opreport thinks the kernel module's path is /radeon, which is obviously incorrect. I don't know, however, how to fix that. Any ideas? On Fri, Aug 29, 2008 at 10:20 AM, <bugzilla-daemon@freedesktop.org> wrote: > > (In reply to comment #10) > > > > I also want to say that when moving the translucent window, CPU usage > does go > > to 100% but it's almost entirely in the kernel. > > So we're back to square one and could probably use better profiling data... > >
(In reply to comment #15) > For what it's worth, opreport thinks the kernel module's path is /radeon, > which is obviously incorrect. I don't know, however, how to fix that. Any > ideas? Unfortunately not, try some oprofile documentation / forum / people / ... maybe? Or might sysprof work better?
It turns out that if you read the man page, you can find the answers to things ;). So, here's the oprofile with the symbols from the kernel modules: samples % linenr info image name symbol name 87794 38.9522 radeon_cp.c:1506 radeon.ko radeon_freelist_get 49926 22.1510 quirks.c:290 vmlinux vt8237_force_enable_hpet 16635 7.3806 vsprintf.c:950 vmlinux vsscanf 6703 2.9740 privates.c:130 Xorg dixLookupPrivate 4635 2.0564 pixman-region.c:1552 libpixman-1.so.0.11.9 pixman_region_subtractO 4376 1.9415 (no location information) libc-2.8.so /lib/libc-2.8.so 3918 1.7383 pixman-region.c:633 libpixman-1.so.0.11.9 pixman_op 3060 1.3577 radeon_cp.c:238 radeon.ko radeon_do_wait_for_idle 2373 1.0528 pixman-edge.c:324 libpixman-1.so.0.11.9 pixman_rasterize_edges 2326 1.0320 exa_offscreen.c:172 libexa.so exaOffscreenAlloc 2170 0.9628 vm86_32.c:200 vmlinux sys_vm86old 1364 0.6052 exa_offscreen.c:417 libexa.so exaOffscreenFree 893 0.3962 damage.c:174 Xorg damageDamageRegion 889 0.3944 radeon_exa_render.c:1908 radeon_drv.so RadeonCompositeTileCP 713 0.3163 posix-timers.c:784 vmlinux sys_timer_settime 707 0.3137 xkbKillSrv.c:0 Xorg __i686.get_pc_thunk.bx 681 0.3021 resource.c:851 Xorg dixLookupResource 575 0.2551 radeon_state.c:2443 radeon.ko radeon_cp_indirect 574 0.2547 exa_glyphs.c:417 libexa.so exaGlyphCacheBufferGlyph 546 0.2422 signal_32.c:267 vmlinux setup_sigcontext
Created attachment 18586 [details] Log from opreport Adding this so that you can actually read the content (since my cut and paste job epic-failed).
radeon_freelist_get at the top of the profile indicates the GPU is the bottleneck, which is odd... Does moving the window involve any kind of not directly related animation or other constant screen updates? I'm also not sure why vt8237_force_enable_hpet is up there; do you have a VIA 8235/7 chipset?
(In reply to comment #19) > radeon_freelist_get at the top of the profile indicates the GPU is the > bottleneck, which is odd... Does moving the window involve any kind of not > directly related animation or other constant screen updates? > Using plain xcompmgr so there are no animations or anything happening. > I'm also not sure why vt8237_force_enable_hpet is up there; do you have a VIA > 8235/7 chipset? > I do not have a VIA chipset. I did disable HPET to see if that would go away, but then something else just popped up high in the kernel list. Once again, I suspect my opreport stuff is incorrect. What's more disconcerting to me is that I ran this test again last night and the performance was different. It was jerky instead of laggy and opreport showed completely different results. Now this morning it is back to laggy. I haven't upgraded anything related to X in over a week, certainly not between last night and, say, the night before (or this morning) when it was still showing the behavior triggering this bug in the first place. For what it's worth, I've attached this morning opreport report. Do you think it would be useful to try to do this over a less asynchronous communications channel like IRC or something like that? Either that or just drop it since I can't get consistent results and don't feel entirely sure that I'm even measuring things correctly. I don't want to waste people's time.
Created attachment 18602 [details] Latest opreport log showing different results
(In reply to comment #20) > Do you think it would be useful to try to do this over a less asynchronous > communications channel like IRC or something like that? I'm not sure if/how IRC would help at this point; it seems most important to get accurate profiling data.
I disabled HPET in the kernel. I've turned on all the proper debugging stuff. The results seem okay (when doing an operating like, say, scrolling firefox, the functions in opreport look appropriate -- bunch of stuff in pixman about modifying damage regions, compositing, etc.). If this isn't useful, then I guess we'll need to think of something different.
Created attachment 18615 [details] Basic oprofile report
Created attachment 18616 [details] oprofile report with callgraph information (may or may not be that useful)
(In reply to comment #23) > I've turned on all the proper debugging stuff. The results seem okay (when > doing an operating like, say, scrolling firefox, the functions in opreport look > appropriate -- bunch of stuff in pixman about modifying damage regions, > compositing, etc.). Userspace symbols tend be unproblematic, it's the kernel space symbols that are dubious here. Also, I have a hard time making sense of oprofile callgraphs, so even if vsscanf / dma_alloc_coherent are indeed the hotspots, I'm not sure where they're called from. It might make sense to clarify this with kernel / oprofile people and come back here if the results still point to the graphics drivers.
I still haven't been able to get oprofile to behave, but I do have some new, albeit information to report. The problem is definitely with shadows. Disabling shadows in KDE 3's kompmgr results in the problem completely going away. The less said about performance in KDE 4, the better, and I haven't done proper testing there either, although turning off shadows in KDE 4 helps considerably. I don't know if this is of use to you guys.
This may be the last update needed. I tried moving large translucent windows around on Windows XP and I get exactly the same behavior, including large amounts of time spent in kernel mode. So either both the ATI drivers for Windows and the the OSS drivers on Linux have the same bug, or it's an issue in the graphics hardware itself. If it's the latter, it seems unlikely to be fixable.
It seems clear that the problem is due to software rendering fallbacks triggered by the compositing managers, so it would ultimately need to be fixed / worked around there.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.