There still exist many situations in Firefox where scrolling in positively painful under EXA on the 965GMl. This is the case in both composited (compiz) and non-composited (metacity) environments. The first case that comes to mind is the comments section of a slashdot story. I'll mention other cases as I come across them (I know they exist, I just can't remember any more at the moment)
Created attachment 20380 [details] Profile from scrolling on a slashdot story Here's what I think is a pretty unhelpful profile from scrolling up and down a slashdot story, particularly in the comments section. Strangely, scrolling is quite smooth above the comments section. Presumably we're hitting a pretty bad fallback while rendering the comment blocks.
Hi Ben, Thanks for the report. I'd love to help get this resolved for you. So far, I'm entirely unable to replicate the problem with my 965GM here. Some details of my system: Linux: 2.6.28-rc3 (from anholt/drm-intel-next for GEM patches) xf86-video-intel: 2.4.97 Firefox: 3.0.1 And here, slashdot comments seem to scroll just fine. Maybe that's good news as perhaps you'll find good performance if you upgrade one or more components. Do let me know more details about your system, or if performance changes as you upgrade anything. Thanks, -Carl
I'm on Fedora Rawhide with the latest kernel bits from anholt's for-airlied branch (rawhide's kernel-2.6.27.5-113 package, just rebased last night). Moreover, I'm running xf86-video-intel from git (pulled today) and fedora's xorg-x11-server-Xorg-1.5.2-12.fc10 (airlied says that this includes the glyph cache but I'll try upgrading to the latest package next) Anyways, I'm getting a pretty respectable 200k glyphs/second in x11perf -aa10text (even with compiz running) so text doesn't look like it's the issue. When scrolling down the comments section of a slashdot story, xserver cpu usage jumps to >60%. Any ideas?
By the way, this is using firefox-3.0.2-1.fc10 as available in rawhide.
Note that this is in Firefox with smooth scrolling enabled using the wheel to scroll. It seems like the poor performance begins exactly when the floating comments bar (with the number of full/abbreviated/hidden comments) on the left margin starts floating above the page. If I continue to scroll quickly up the page past the beginning of the comments section (where when scrolling down the bar usually starts floating), the bar will continue to float until scrolling stops. The entire time the bar floats, performance is degraded significantly (it takes several seconds to stop scrolling after I stop turning the wheel). When the scrolling stops and the bar returns to it's usual fixed position on the right margin at top of the comments section, scrolling performance improves remarkably. Is there any easy way to determine if any fallbacks are being significantly hit?
Here is another site with extremely poor scrolling performance: http://plato.stanford.edu/entries/ecology/
(In reply to comment #3) > I'm on Fedora Rawhide with the latest kernel bits from anholt's for-airlied > branch (rawhide's kernel-2.6.27.5-113 package, just rebased last night). > Moreover, I'm running xf86-video-intel from git (pulled today) and fedora's > xorg-x11-server-Xorg-1.5.2-12.fc10 (airlied says that this includes the glyph > cache but I'll try upgrading to the latest package next) Ah, so it's possible that I'm actually out of date and that you hit a new bug. ;-) (I was travelling all last week so I haven't updated in a little while). (In reply to comment #5) > Note that this is in Firefox with smooth scrolling enabled using the wheel to > scroll. Thanks for mentioning that. I was dragging the scroll bar, and I don't think I've ever enabled smooth scrolling before. So those are a couple of things I can try as well. (And I appreciate your attempt to describe in detail exactly what you're doing.) > Is there any easy way to determine if any fallbacks are being significantly > hit? And this is the part I forgot to mention in my earlier comment. Yes! There is an easy way to examine fallbacks. What you do (with recent xf86-video-intel from git) is to add an option to the device section of your xorg.conf file as so: Option "FallbackDebug" "true" then look into your Xorg.#.log file to look for fallback messages. I'll look forward to what you can learn. -Carl
(In reply to comment #7) > Yes! There is an easy way to examine fallbacks. What you do (with recent > xf86-video-intel from git) is to add an option to the device section of your > xorg.conf file as so: > > Option "FallbackDebug" "true" > > then look into your Xorg.#.log file to look for fallback messages. > > I'll look forward to what you can learn. > > -Carl > After looking at the log, it became immediately evident that the fallback being hit is, (II) intel(0): EXA fallback: Component alpha not supported with source alpha and source value blending. Moreover, it seems that this fallback is being hit extremely frequently. In fact, even running 'tail -f /var/log/Xorg.0.log' in gnome-terminal maintained steady stream of tens of these messages per second. Running this same command in an xterm stopped these messages. In fact, it looks like all text rendering it causing this fallback.
One final note: in my short (~5 minute) X session running with FallbackDebug, the server produced over 70,000 lines of "Component alpha not supported with source alpha and source value blending" fallback warnings. Meanwhile, $ grep fallback /var/log/Xorg.0.log.old | uniq (II) intel(0): EXA fallback: Component alpha not supported with source alpha and source value blending. $ So apparently this is the only fallback ever being hit.
I've the same issue on G945 hw. When I enable Fallback I've seen thousands of (II) intel(0): EXA fallback: Component alpha not supported with source alpha and source value blending. When scrolling in any page, switching terminal window etc. My versions: xorg git master, intel git master
I have this problem too with an Intel 3100 and fedora 10 preview with the latest updates. The server is version 1.5.3-5.fc10 and the intel driver is version 2.5.0-3.fc10. Scrolling in firefox without compositing is quite slow on most larger pages. Scrolling with compiz or xcompmgr is very slow. Smooth scrolling is enabled in firefox.
BTW, I also tried to use XAA instead of EXA but that made X freeze on start and I had to kill it trough SSH.
I've replicated the "component alpha" fallback. I'm surprised to find it since I thought we had all common fallbacks eliminated from the current i965 driver. I'll talk the details over with Eric Anholt as soon as he gets back from Taiwan next week. The easy way to trigger the fallback is with "x11perf -rgb10text" (which is what I should have been using all along instead of "x11perf -aa10text"---but the naming scheme of those tests led me astray). Anyway, as a quick, (but maybe not so useful test), I tried removing the fallback for this case. When I did this I only got a 14% improvement to the score of "x11perf -rgb10text". So maybe there's more to the performance problem at the root of this bug than just this fallback. (And note that my quick hack to remove the fallback is inherently not interesting---it results in the text not appearing at all). -Carl
The component alpha fallback is a red herring. As Carl found out, it's related to sub-pixel AA text rendering, and the EXA core is still able to accelerate that in two passes. You may be able to get more ideas by enabling fallback debugging in the EXA core.
I just tried Carl's patch that he posted on the mailing list and while it is a bit of an improvement, I still can't say the problem is fixed. The patch brings aa10text performance up to 230-240k glyphs/second although scrolling (again, using http://plato.stanford.edu/entries/ecology/ as the standard) is pretty poor. Now that I have built an xserver, I'll also be able to look at the exa core fallbacks.
I just enabled fallback debugging in the exa core and the problem fallbacks were immediately apparent, EXA fallback at ExaCheckPolyFillRect: to 0x7f71ed873100 (m) EXA fallback at ExaCheckPutImage: to 0x1ae7c30 (s) EXA fallback at ExaCheckPolyFillRect: to 0x7f71ed873100 (m) EXA fallback at ExaCheckPolyFillRect: to 0x1b79410 (m) EXA fallback at ExaCheckPolyFillRect: to 0x7f71ecd56390 (m) EXA fallback at ExaCheckPutImage: to 0x1ae7c30 (s) EXA fallback at ExaCheckPolyFillRect: to 0x1b77f00 (m) EXA fallback at ExaCheckPolyFillRect: to 0x7f71ee0ebc60 (m) EXA fallback at ExaCheckPolyFillRect: to 0x1ae7c30 (m) EXA fallback at ExaCheckPolyFillRect: to 0x7f71ecd44720 (m) EXA fallback at ExaCheckPolyFillRect: to 0x1ae7c30 (m) EXA fallback at ExaCheckPutImage: to 0x7f71ecd44720 (s) EXA fallback at ExaCheckPolyFillRect: to 0x7f71ed873100 (m) EXA fallback at ExaCheckPolyFillRect: to 0x7f71ed873100 (m) EXA fallback at ExaCheckPolyFillRect: to 0x7f71ecd56390 (m) EXA fallback at ExaCheckPolyFillRect: to 0x7f71ed873100 (m) EXA fallback at ExaCheckPolyFillRect: to 0x7f71ed873100 (m) EXA fallback at ExaCheckPolyFillRect: to 0x1ccc1c0 (m) EXA fallback at ExaCheckPolyFillRect: to 0x1ccc1c0 (m) EXA fallback at ExaCheckPutImage: to 0x7f71ecd44720 (s) EXA fallback at ExaCheckPolyFillRect: to 0x7f71ede6dd50 (m) EXA fallback at ExaCheckPutImage: to 0x7f71ecd44720 (s) EXA fallback at ExaCheckPolyFillRect: to 0x1b7a040 (m)
Created attachment 20775 [details] Full xorg.log with EXA core fallback debugging enabled. Here is the full xorg.log. It seems there are also a few other types of fallbacks. These include, EXA fallback at exaDoMigration: Pixmap 0x1752df0 (1536x64) pinned in sys EXA fallback at exaCopyNtoN: from 0x1752df0 to 0x7f71ee045d30 (m,m) etc.
I've GM945 GPU 1) Hardy heron kernel 2.6.28rc4, xorg/mesa/intel current master branches xorg.conf: exa, tiling=on glxgears ~1100FPS tiling=off glxgears ~800 changing exa to uxa glxgears = 400 in all cases scrolling when compositing is damn slow (unusable) 2) Fedora 10 live glxgears = 800fps scrolling when compositing is very smooth (not as smooth as pure 2d, but near comparable) the question is, why scrolling when compositing is damn slow on ubuntu hardy with the latest xorg/mesa/intel/kernel and fedora is damn fast :) i'll try to compile fedora xorg/mesa/intel/kernel on hardy
Perhaps unsurprisingly, this is no better with UXA.
I've compiled fedoras kernel and intel driver but scrolling in composite is still slow.
(In reply to comment #16) > EXA fallback at ExaCheckPolyFillRect: to 0x7f71ed873100 (m) This is most likely some kind of stippled fill. As there don't seem to be extended regions of stippled fills on Slashdot, the impact of these should be limited. > EXA fallback at ExaCheckPutImage: to 0x1ae7c30 (s) PutImage can't be accelerated if the driver doesn't provide an UploadToScreen hook. Shouldn't really be a problem though. (In reply to comment #17) > EXA fallback at exaDoMigration: Pixmap 0x1752df0 (1536x64) pinned in sys > EXA fallback at exaCopyNtoN: from 0x1752df0 to 0x7f71ee045d30 (m,m) Hmm, these could be due to a ShmPutImage call which can't be directly handled via PutImage. There used to be a special exaShmPutImage which might have handled this a little better, though as with PutImage itself, this may not be a problem. Note that the impact assessments above are assuming that Option "EXAOptimizeMigration" is enabled; it's enabled by default as of xserver 1.6, but I'm not sure about the X server you're using. May want to try enabling it explicitly just in case. Also, in another bug report about EXA performance issues, it was observed that while EXA is as fast or faster than XAA in most x11perf tests, it's slower in some relatively small operations such as 10x10 or less. Can you confirm that? If so, it might be interesting to try and track down the bottleneck for the small operations.
Note that of late I have been using UXA primarily with xserver, xf86-video-intel, mesa, and libdrm from git and kernel from drm-intel-next. (In reply to comment #21) > (In reply to comment #16) > > EXA fallback at ExaCheckPolyFillRect: to 0x7f71ed873100 (m) > > This is most likely some kind of stippled fill. As there don't seem to be > extended regions of stippled fills on Slashdot, the impact of these should be > limited. I don't believe this is the case. With my primary test case (http://plato.stanford.edu/entries/ecology/) there are no stippled fills as far as I can see. > > EXA fallback at ExaCheckPutImage: to 0x1ae7c30 (s) > > PutImage can't be accelerated if the driver doesn't provide an UploadToScreen > hook. Shouldn't really be a problem though. Why is this operation not accelerated? Is there a technical reason or is it just a lack of developer time? > (In reply to comment #17) > > EXA fallback at exaDoMigration: Pixmap 0x1752df0 (1536x64) pinned in sys > > EXA fallback at exaCopyNtoN: from 0x1752df0 to 0x7f71ee045d30 (m,m) > > Hmm, these could be due to a ShmPutImage call which can't be directly handled > via PutImage. There used to be a special exaShmPutImage which might have > handled this a little better, though as with PutImage itself, this may not be a > problem. Again, is there a reason this can't be accelerated? > Note that the impact assessments above are assuming that Option > "EXAOptimizeMigration" is enabled; it's enabled by default as of xserver 1.6, > but I'm not sure about the X server you're using. May want to try enabling it > explicitly just in case. I don't believe this has an effect under UXA. If I understand, the migration logic has been removed, right?
I've recompiled today master branches and scrolling on my GM945 still slow. I've upgraded to ubuntu 8.10 (from 8.04) and recompiled all X stack and scrolling is fast as it should be. Interesting... (kernel is the same as 8.04 and 8.10)
Created attachment 22243 [details] Full Xorg.log of 855GM/2.6.22 case Just as a datapoint, I have this too when scrolling wikipedia texts, and get also lots of 'Component alpha' fallbacks -- using the system 855GM card kernel 2.6.22 Mesa-7.3 xf86-video-intel-2.6.1 xorg-server-1.5.3 with "EXAOPtimizeMigration" "true" I also get the occasional EXA fallback: Unsupported dest format 0x8018000 the log is attached
Nobody mentioned Gmail, quite popular web app. Yet its interface performance is the most annoying X-related issue for me at the moment. Scrolling in Gmail becomes really slow and eats all CPU resources when I read a thread consisting of several messages that don't fit browser window area. In this case a small box -- [v Next Author] -- appears above the scrolled page contents in the right lower corner of the window. As long as the box is there the scrolling is unacceptably slow. When the box disappears (at the top and bottom of the page), the scrolling immediately returns to normal speed. I'm using up-to-date Debian Lenny with the following packages: xserver-xorg Version: 1:7.3+18 xserver-xorg-video-intel Version: 2:2.3.2-2+lenny6 iceweasel Version: 3.0.5-1 No fancy desktop environments; just icewm Version: 1.2.35-1 rox-filer Version: 2.7.1-1 lspci reports 82865G Integrated Graphics Controller (rev 02)
(In reply to comment #25) > Nobody mentioned Gmail, quite popular web app. Yet its interface performance is > the most annoying X-related issue for me at the moment. BTW, adding Option "AccelMethod" "XAA" line to "Device" section of xorg.conf almost solves the issue on my desktop PC. The scrolling is still not perfect, but at least it's usable now.
Created attachment 23473 [details] Cairo trace of fast case Well, I finally got around to taking a cairo-trace log of the fast and slow behavior. What I am about to attach is a set of two logs. The test was conducted on a firefox story, with the fast case being of scrolling within the story body and the slow case being of scrolling within the comments section. As can be seen, the behavior changes dramatically between the two cases: the slow log is 13 MB uncompressed while the fast log is merely 3 MB despite similar run times. Judging by this, its conceivable that its not the driver's fault at all but rather the browser. What do you all think?
Created attachment 23474 [details] Cairo trace of slow case
Created attachment 24688 [details] sysprof output from scrolling slowness Sysprof output from slow scrolling in the comment section of a slasdot.org article. This is with git master of xf86-video-intel from today and server 1.6.0.
(In reply to comment #29) > Created an attachment (id=24688) [details] > sysprof output from scrolling slowness Looks like most cycles are spent rasterizing trapezoids...
(In reply to comment #30) > (In reply to comment #29) > > Created an attachment (id=24688) [details] [details] > > sysprof output from scrolling slowness > > Looks like most cycles are spent rasterizing trapezoids... Thanks for the analysis here. Since the bug has been isolated to trapezoid rasterization, this should be fixed as of the below commit in the driver. I tried to verify this as best I could by scrolling slashdot comments, and nothing seemed slow to me. (Though even in the past when I tried to look for this I didn't see any slowness. Jesse tells me the effect is subtle so I may have been overlooking it.) I also tried running the traces through cairo-perf-trace, but it doesn't seem to like them, (just reports times of 0). So maybe that's due to some change in the cairo-trace output between the time these were captured and now. Anyway, I'm going to mark this as resolved, and if someone could confirm or deny that (reopening if necessary), that would be great. -Carl commit accdbd23676d812d2345f86d8e3ee62f108841ff Author: Carl Worth <cworth@cworth.org> Date: Fri May 29 15:34:20 2009 -0700 UXA: Rasterize trapezoids to system memory, not a pixmap Since we're only doing software rasterization right now, anyway, it makes more sense to just rasterize to system memory and then upload to a pixmap once complete. This avoids expensive read-modify-write cycles. This results in a 2.4x speedup for a real-world test case that's heavy on trapezoids, which is swfdec running on the following file: http://michalevy.com/wp-content/uploads/Giant%20Steps%202007.swf Many thanks to Chris Wilson for his cairo-traces repository and cairo-perf-trace tool which makes it so easy to measure things like this.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.