Summary: | Very poor performance compared to 6.8.2 | ||
---|---|---|---|
Product: | xorg | Reporter: | David Andruczyk <djandruczyk> |
Component: | Driver/Radeon | Assignee: | Xorg Project Team <xorg-team> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | critical | ||
Priority: | high | CC: | erik.andren, felipe.contreras, magnade |
Version: | 7.0.0 | ||
Hardware: | Other | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
David Andruczyk
2006-03-08 14:21:21 UTC
Could you perform a remote backtrace of the problem and post it here? Is this still an issue using a current version of xorg and the ati driver? (In reply to comment #2) > Is this still an issue using a current version of xorg and the ati driver? I'm not exactly sure how to test this more precisely, but Xorg is always the second most CPU intensive application in my system. This is both with the oficial FC5 ati radeon driver and the latest one in git (yesterday), and also both with XAA and EXA, and with or without drm and dri. For example, in 5 hours of use I get from top that firefox has a cpu time of 35:48 and Xorg has 29:08. I don't know if this is meaningful at all, but I have felt my system way slower since a while, and I strongly think it's because of the ati driver. Nothing can be done about this without at least a profile showing where the CPU time is spent. Running OProfile with MPlayer reproducing a DVD I got the following: /opt/ati/lib/xorg/modules/drivers/radeon_drv.so: Profiling through timer interrupt samples % symbol name 4929 89.6997 RADEONPutImage 467 8.4986 RADEONWaitForFifoFunction 84 1.5287 RADEONWaitForIdleMMIO 3 0.0546 RADEONBlockHandler 2 0.0364 RADEONEngineFlush 2 0.0364 RADEONINPLL 1 0.0182 .plt 1 0.0182 RADEONAllocateMemory 1 0.0182 RADEONDisplayVideo 1 0.0182 RADEONLoadCursorARGB 1 0.0182 RADEONOUTPLL 1 0.0182 RADEONPllErrataAfterIndex 1 0.0182 RADEONSetupForScreenToScreenCopyMMIO 1 0.0182 RenderCallback (In reply to comment #5) > Running OProfile with MPlayer reproducing a DVD I got the following: Note that this bug report is about 2D rendering performance, not XVideo. > samples % symbol name > 4929 89.6997 RADEONPutImage That said, try enabling the DRI, or make sure write combining is enabled for the framebuffer. Thanks, that helped a lot. Now I have been gathering profile data with normal X usage and this is what I get: These are the most used Xorg binaries: 137794 5.4858 /opt/xorg/lib/xorg/modules/libfb.so 11953 0.4759 /opt/xorg/bin/Xorg 11774 0.4687 /opt/xorg/lib/xorg/modules/libexa.so 8098 0.3224 /opt/ati/lib/xorg/modules/drivers/radeon_drv.so And these are the most used functions of libfb.so: 93800 67.4675 fbRasterizeEdges 23351 16.7957 fbFetch_x8r8g8b8 11220 8.0702 fbFetch_a8 2447 1.7601 fbCompositeSolidMask_nx8x8888mmx 2200 1.5824 fbCompositeSrc_8888RevNPx8888mmx 1663 1.1961 mmxCombineOverU 1178 0.8473 fbBlt 630 0.4531 mmxCombineMaskU 460 0.3309 fbCopyAreammx 431 0.3100 fbCompositeSolidMask_nx8888x8888Cmmx 277 0.1992 fbSolidFillmmx 201 0.1446 fbFetch 189 0.1359 fbCompositeGeneral 114 0.0820 fbStore_x8r8g8b8 I have EXA and composite enabled and I compiled my Xserver, I can try with the old ones, but I remember libfb.so:fbRasterizeEdges was still by far the most used function. Also I'm wondering that maybe one library is doing a lot of memcpy's and so the results appear in libc-2.4.so and not in an Xorg binary. I don't know how to check that. If I can provide you with more valuable information don't hesitate to ask for it, I'll be glad to help. (In reply to comment #7) > I have EXA and composite enabled and I compiled my Xserver, I can try with the > old ones, but I remember libfb.so:fbRasterizeEdges was still by far the most > used function. It accounts for less than 4% overall though, so it's unlikely the problem. > Also I'm wondering that maybe one library is doing a lot of memcpy's and so the > results appear in libc-2.4.so and not in an Xorg binary. That's indeed quite likely. > I don't know how to check that. You can try opreport -c, but it requires the oprofile kernel module to have the capability of recording call graphs, and I don't know if the kernel's copy of it has that. If you can reproduce a situation where the X server uses up (almost) all CPU cycles, the profile should be clear even for libc. Other than that, make sure oprofile can find the libc symbols. Unfortunately, I don't know how to achieve that with Gentoo. i saw this bug while just skiming the reports and played with extace on my athlon64 now i cant say if i had any regressions from 6.8 but i do see with 7.1 that perhaps libfb also needs a sse copy function after running for a while i saw the below at the top of usage the video card i have in use is a r300 9600 samples % image name app name symbol name 1390789 40.0518 libc-2.3.6.so Xorg memcpy 257246 7.4081 libfb.so Xorg fbCopyAreammx and here is off an athlon-xp 2400+ laptop with a igp320m doing similar work with extace samples % image name app name symbol name 1048989 50.1806 libc-2.3.6.so Xorg (no symbols) 616208 29.4776 libfb.so Xorg fbCopyAreammx 35236 1.6856 libfb.so Xorg fbSolidFillmmx Is that with EXA or XAA? With EXA, you may want to try current xf86-video-ati git and Option "AccelDFS". yes with exa and now with AccelDFS on the amd64 still looks like some sse would help I wont bother with the laptop since its on xorg 7 I'll test it when it gets upgraded also the wireframe flickers for me on both comps would this be the apps fault or the driver? its very annoying makes it hard to watch(ok enjoy it ;) samples % image name app name symbol name 2071728 75.3174 libc-2.3.6.so Xorg memcpy 165089 6.0018 libfb.so Xorg fbCopyAreammx 51964 1.8891 oprofiled oprofiled for_one_sfile 37472 1.3623 libfb.so Xorg fbSolidFillmmx been messing with this more and i found it seems to hate being resized (at least under fluxbox) after resizing to the size you want hide most of it off screen for a few seconds and cpu usage drops and bring it back onscreen cpu will spike a bit then levels off at a sane range well saner than what it was giving :) forgot to say that when the app is first loaded it needs to be hidden for a while also then cpu usage drops then of course have to hide it every time its resized also larger the window the longer it needs off screen Not sure SSE vs. MMX would make a big difference. If a significant part of the rendering has to be done in software, you've pretty much lost... or if you mean memcpy could use SSE, you may be right, but that would be a libc issue. memcpy is using sse from what little googling I did on getting a basic idea of how hard trying out a sse version of fbSolidFillmmx would help things or not I'm not sure how much it would help ether but I dont think it would hurt and if nothing else that little extra cpu time saved on a laptop could be the diff between changing freqs and consuming more power or not but that aside I dont think its the issue here as per my comment #13 and #14 Marking broken (status null/blank) bugs in xorg with no activity in a long time as fixed. Please reopen if you think it's necessary, but first do a search if a similar bug report is already filed and in a NEW/ASSIGNED state. These bugs do not currently show in most search results as they do not have any status. Sorry for this janitorial spam, you know where to send hate mails to when your inbox gets full of bugs you're subscribed to. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.