Bug 18572 - Scrolling in Firefox still slow
Summary: Scrolling in Firefox still slow
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Carl Worth
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-11-17 10:06 UTC by Ben Gamari
Modified: 2009-07-02 11:05 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Profile from scrolling on a slashdot story (141.93 KB, text/plain)
2008-11-17 10:11 UTC, Ben Gamari
no flags Details
Full xorg.log with EXA core fallback debugging enabled. (58.46 KB, application/x-Bzip)
2008-12-03 17:18 UTC, Ben Gamari
no flags Details
Full Xorg.log of 855GM/2.6.22 case (9.43 KB, application/x-bzip2)
2009-01-26 02:59 UTC, ralf
no flags Details
Cairo trace of fast case (397.47 KB, application/x-Bzip)
2009-03-03 07:42 UTC, Ben Gamari
no flags Details
Cairo trace of slow case (586.92 KB, application/x-Bzip)
2009-03-03 07:43 UTC, Ben Gamari
no flags Details
sysprof output from scrolling slowness (756.36 KB, text/plain)
2009-04-09 16:48 UTC, Jesse Barnes
no flags Details

Description Ben Gamari 2008-11-17 10:06:28 UTC
There still exist many situations in Firefox where scrolling in positively painful under EXA on the 965GMl. This is the case in both composited (compiz) and non-composited (metacity) environments. The first case that comes to mind is the comments section of a slashdot story. I'll mention other cases as I come across them (I know they exist, I just can't remember any more at the moment)
Comment 1 Ben Gamari 2008-11-17 10:11:26 UTC
Created attachment 20380 [details]
Profile from scrolling on a slashdot story

Here's what I think is a pretty unhelpful profile from scrolling up and down a slashdot story, particularly in the comments section. Strangely, scrolling is quite smooth above the comments section. Presumably we're hitting a pretty bad fallback while rendering the comment blocks.
Comment 2 Carl Worth 2008-11-17 14:06:22 UTC
Hi Ben,

Thanks for the report. I'd love to help get this resolved for you. So far, I'm entirely unable to replicate the problem with my 965GM here. Some details of my system:

Linux: 2.6.28-rc3 (from anholt/drm-intel-next for GEM patches)
xf86-video-intel: 2.4.97
Firefox: 3.0.1

And here, slashdot comments seem to scroll just fine. Maybe that's good news as perhaps you'll find good performance if you upgrade one or more components.

Do let me know more details about your system, or if performance changes as you upgrade anything.

Thanks,

-Carl
Comment 3 Ben Gamari 2008-11-17 14:29:59 UTC
I'm on Fedora Rawhide with the latest kernel bits from anholt's for-airlied branch (rawhide's kernel-2.6.27.5-113 package, just rebased last night). Moreover, I'm running xf86-video-intel from git (pulled today) and fedora's xorg-x11-server-Xorg-1.5.2-12.fc10 (airlied says that this includes the glyph cache but I'll try upgrading to the latest package next)

Anyways, I'm getting a pretty respectable 200k glyphs/second in x11perf -aa10text (even with compiz running) so text doesn't look like it's the issue. When scrolling down the comments section of a slashdot story, xserver cpu usage jumps to >60%. Any ideas?
Comment 4 Ben Gamari 2008-11-17 14:53:20 UTC
By the way, this is using firefox-3.0.2-1.fc10 as available in rawhide.
Comment 5 Ben Gamari 2008-11-17 15:07:01 UTC
Note that this is in Firefox with smooth scrolling enabled using the wheel to scroll.

It seems like the poor performance begins exactly when the floating comments bar (with the number of full/abbreviated/hidden comments) on the left margin starts floating above the page. If I continue to scroll quickly up the page past the beginning of the comments section (where when scrolling down the bar usually starts floating), the bar will continue to float until scrolling stops. The entire time the bar floats, performance is degraded significantly (it takes several seconds to stop scrolling after I stop turning the wheel). When the scrolling stops and the bar returns to it's usual fixed position on the right margin at top of the comments section, scrolling performance improves remarkably.

Is there any easy way to determine if any fallbacks are being significantly hit?
Comment 6 Ben Gamari 2008-11-17 15:08:29 UTC
Here is another site with extremely poor scrolling performance: http://plato.stanford.edu/entries/ecology/
Comment 7 Carl Worth 2008-11-17 16:46:13 UTC
(In reply to comment #3)
> I'm on Fedora Rawhide with the latest kernel bits from anholt's for-airlied
> branch (rawhide's kernel-2.6.27.5-113 package, just rebased last night).
> Moreover, I'm running xf86-video-intel from git (pulled today) and fedora's
> xorg-x11-server-Xorg-1.5.2-12.fc10 (airlied says that this includes the glyph
> cache but I'll try upgrading to the latest package next)

Ah, so it's possible that I'm actually out of date and that you hit a new bug. ;-) (I was travelling all last week so I haven't updated in a little while).

(In reply to comment #5)
> Note that this is in Firefox with smooth scrolling enabled using the wheel to
> scroll.

Thanks for mentioning that. I was dragging the scroll bar, and I don't think I've ever enabled smooth scrolling before. So those are a couple of things I can try as well. (And I appreciate your attempt to describe in detail exactly what you're doing.)

> Is there any easy way to determine if any fallbacks are being significantly
> hit?

And this is the part I forgot to mention in my earlier comment.

Yes! There is an easy way to examine fallbacks. What you do (with recent xf86-video-intel from git) is to add an option to the device section of your xorg.conf file as so:

    Option "FallbackDebug" "true"

then look into your Xorg.#.log file to look for fallback messages.

I'll look forward to what you can learn.

-Carl
Comment 8 Ben Gamari 2008-11-17 17:36:37 UTC
(In reply to comment #7)
> Yes! There is an easy way to examine fallbacks. What you do (with recent
> xf86-video-intel from git) is to add an option to the device section of your
> xorg.conf file as so:
> 
>     Option "FallbackDebug" "true"
> 
> then look into your Xorg.#.log file to look for fallback messages.
> 
> I'll look forward to what you can learn.
> 
> -Carl
> 

After looking at the log, it became immediately evident that the fallback being hit is,

(II) intel(0): EXA fallback: Component alpha not supported with source alpha and source value blending.

Moreover, it seems that this fallback is being hit extremely frequently. In fact, even running 'tail -f /var/log/Xorg.0.log' in gnome-terminal maintained steady stream of tens of these messages per second. Running this same command in an xterm stopped these messages. In fact, it looks like all text rendering it causing this fallback.
Comment 9 Ben Gamari 2008-11-17 17:43:25 UTC
One final note: in my short (~5 minute) X session running with FallbackDebug, the server produced over 70,000 lines of "Component alpha not supported with source alpha and source value blending" fallback warnings. Meanwhile, 

$ grep fallback /var/log/Xorg.0.log.old | uniq
(II) intel(0): EXA fallback: Component alpha not supported with source alpha and source value blending.
$ 

So apparently this is the only fallback ever being hit.
Comment 10 Martin Vit @ festr 2008-11-18 13:43:10 UTC
I've the same issue on G945 hw. When I enable Fallback I've seen thousands of 

(II) intel(0): EXA fallback: Component alpha not supported with source alpha and source value blending.

When scrolling in any page, switching terminal window etc. 

My versions: xorg git master, intel git master
Comment 11 Simon Strandman 2008-11-19 02:00:14 UTC
I have this problem too with an Intel 3100 and fedora 10 preview with the latest updates. The server is version 1.5.3-5.fc10 and the intel driver is version 2.5.0-3.fc10.

Scrolling in firefox without compositing is quite slow on most larger pages. Scrolling with compiz or xcompmgr is very slow. Smooth scrolling is enabled in firefox.
Comment 12 Simon Strandman 2008-11-19 04:12:18 UTC
BTW, I also tried to use XAA instead of EXA but that made X freeze on start and I had to kill it trough SSH.
Comment 13 Carl Worth 2008-11-21 13:48:53 UTC
I've replicated the "component alpha" fallback. I'm surprised to find it since
I thought we had all common fallbacks eliminated from the current i965 driver.
I'll talk the details over with Eric Anholt as soon as he gets back from
Taiwan next week.

The easy way to trigger the fallback is with "x11perf -rgb10text" (which is
what I should have been using all along instead of "x11perf -aa10text"---but the
naming scheme of those tests led me astray).

Anyway, as a quick, (but maybe not so useful test), I tried removing the
fallback for this case. When I did this I only got a 14% improvement to
the score of "x11perf -rgb10text". So maybe there's more to the performance
problem at the root of this bug than just this fallback. (And note that my
quick hack to remove the fallback is inherently not interesting---it results
in the text not appearing at all).

-Carl
Comment 14 Michel Dänzer 2008-11-22 08:56:30 UTC
The component alpha fallback is a red herring. As Carl found out, it's related to sub-pixel AA text rendering, and the EXA core is still able to accelerate that in two passes.

You may be able to get more ideas by enabling fallback debugging in the EXA core.
Comment 15 Ben Gamari 2008-12-03 16:28:54 UTC
I just tried Carl's patch that he posted on the mailing list and while it is a bit of an improvement, I still can't say the problem is fixed. The patch brings aa10text performance up to 230-240k glyphs/second although scrolling (again, using http://plato.stanford.edu/entries/ecology/ as the standard) is pretty poor. Now that I have built an xserver, I'll also be able to look at the exa core fallbacks. 
Comment 16 Ben Gamari 2008-12-03 17:14:12 UTC
I just enabled fallback debugging in the exa core and the problem fallbacks were immediately apparent,

EXA fallback at ExaCheckPolyFillRect: to 0x7f71ed873100 (m)
EXA fallback at ExaCheckPutImage: to 0x1ae7c30 (s)
EXA fallback at ExaCheckPolyFillRect: to 0x7f71ed873100 (m)
EXA fallback at ExaCheckPolyFillRect: to 0x1b79410 (m)
EXA fallback at ExaCheckPolyFillRect: to 0x7f71ecd56390 (m)
EXA fallback at ExaCheckPutImage: to 0x1ae7c30 (s)
EXA fallback at ExaCheckPolyFillRect: to 0x1b77f00 (m)
EXA fallback at ExaCheckPolyFillRect: to 0x7f71ee0ebc60 (m)
EXA fallback at ExaCheckPolyFillRect: to 0x1ae7c30 (m)
EXA fallback at ExaCheckPolyFillRect: to 0x7f71ecd44720 (m)
EXA fallback at ExaCheckPolyFillRect: to 0x1ae7c30 (m)
EXA fallback at ExaCheckPutImage: to 0x7f71ecd44720 (s)
EXA fallback at ExaCheckPolyFillRect: to 0x7f71ed873100 (m)
EXA fallback at ExaCheckPolyFillRect: to 0x7f71ed873100 (m)
EXA fallback at ExaCheckPolyFillRect: to 0x7f71ecd56390 (m)
EXA fallback at ExaCheckPolyFillRect: to 0x7f71ed873100 (m)
EXA fallback at ExaCheckPolyFillRect: to 0x7f71ed873100 (m)
EXA fallback at ExaCheckPolyFillRect: to 0x1ccc1c0 (m)
EXA fallback at ExaCheckPolyFillRect: to 0x1ccc1c0 (m)
EXA fallback at ExaCheckPutImage: to 0x7f71ecd44720 (s)
EXA fallback at ExaCheckPolyFillRect: to 0x7f71ede6dd50 (m)
EXA fallback at ExaCheckPutImage: to 0x7f71ecd44720 (s)
EXA fallback at ExaCheckPolyFillRect: to 0x1b7a040 (m)
Comment 17 Ben Gamari 2008-12-03 17:18:17 UTC
Created attachment 20775 [details]
Full xorg.log with EXA core fallback debugging enabled.

Here is the full xorg.log. It seems there are also a few other types of fallbacks. These include,

EXA fallback at exaDoMigration: Pixmap 0x1752df0 (1536x64) pinned in sys
EXA fallback at exaCopyNtoN: from 0x1752df0 to 0x7f71ee045d30 (m,m)
etc.
Comment 18 Martin Vit @ festr 2008-12-06 07:26:15 UTC

I've GM945 GPU

1)

Hardy heron
kernel 2.6.28rc4,
xorg/mesa/intel current master branches
xorg.conf: exa, tiling=on
glxgears ~1100FPS

tiling=off
glxgears ~800

changing exa to uxa
glxgears = 400

in all cases scrolling when compositing is damn slow (unusable)

2)

Fedora 10 live

glxgears = 800fps
scrolling when compositing is very smooth (not as smooth as pure 2d, but near comparable)

the question is, why scrolling when compositing is damn slow on ubuntu hardy with the latest xorg/mesa/intel/kernel and fedora is damn fast :) i'll try to compile fedora xorg/mesa/intel/kernel on hardy
Comment 19 Ben Gamari 2008-12-10 18:31:50 UTC
Perhaps unsurprisingly, this is no better with UXA.
Comment 20 Martin Vit @ festr 2008-12-11 00:44:57 UTC
I've compiled fedoras kernel and intel driver but scrolling in composite is still slow. 
Comment 21 Michel Dänzer 2008-12-11 06:26:07 UTC
(In reply to comment #16)
> EXA fallback at ExaCheckPolyFillRect: to 0x7f71ed873100 (m)

This is most likely some kind of stippled fill. As there don't seem to be extended regions of stippled fills on Slashdot, the impact of these should be limited.

> EXA fallback at ExaCheckPutImage: to 0x1ae7c30 (s)

PutImage can't be accelerated if the driver doesn't provide an UploadToScreen hook. Shouldn't really be a problem though.

(In reply to comment #17)
> EXA fallback at exaDoMigration: Pixmap 0x1752df0 (1536x64) pinned in sys
> EXA fallback at exaCopyNtoN: from 0x1752df0 to 0x7f71ee045d30 (m,m)

Hmm, these could be due to a ShmPutImage call which can't be directly handled via PutImage. There used to be a special exaShmPutImage which might have handled this a little better, though as with PutImage itself, this may not be a problem.


Note that the impact assessments above are assuming that Option "EXAOptimizeMigration" is enabled; it's enabled by default as of xserver 1.6, but I'm not sure about the X server you're using. May want to try enabling it explicitly just in case.


Also, in another bug report about EXA performance issues, it was observed that while EXA is as fast or faster than XAA in most x11perf tests, it's slower in some relatively small operations such as 10x10 or less. Can you confirm that? If so, it might be interesting to try and track down the bottleneck for the small operations.
Comment 22 Ben Gamari 2008-12-25 10:50:45 UTC
Note that of late I have been using UXA primarily with xserver, xf86-video-intel, mesa, and libdrm from git and kernel from drm-intel-next.

(In reply to comment #21)
> (In reply to comment #16)
> > EXA fallback at ExaCheckPolyFillRect: to 0x7f71ed873100 (m)
> 
> This is most likely some kind of stippled fill. As there don't seem to be
> extended regions of stippled fills on Slashdot, the impact of these should be
> limited.

I don't believe this is the case. With my primary test case (http://plato.stanford.edu/entries/ecology/) there are no stippled fills as far as I can see.

> > EXA fallback at ExaCheckPutImage: to 0x1ae7c30 (s)
> 
> PutImage can't be accelerated if the driver doesn't provide an UploadToScreen
> hook. Shouldn't really be a problem though.

Why is this operation not accelerated? Is there a technical reason or is it just a lack of developer time?

> (In reply to comment #17)
> > EXA fallback at exaDoMigration: Pixmap 0x1752df0 (1536x64) pinned in sys
> > EXA fallback at exaCopyNtoN: from 0x1752df0 to 0x7f71ee045d30 (m,m)
> 
> Hmm, these could be due to a ShmPutImage call which can't be directly handled
> via PutImage. There used to be a special exaShmPutImage which might have
> handled this a little better, though as with PutImage itself, this may not be a
> problem.

Again, is there a reason this can't be accelerated?

> Note that the impact assessments above are assuming that Option
> "EXAOptimizeMigration" is enabled; it's enabled by default as of xserver 1.6,
> but I'm not sure about the X server you're using. May want to try enabling it
> explicitly just in case.

I don't believe this has an effect under UXA. If I understand, the migration logic has been removed, right?
Comment 23 Martin Vit @ festr 2009-01-02 15:55:27 UTC
I've recompiled today master branches and scrolling on my GM945 still slow. I've upgraded to ubuntu 8.10 (from 8.04) and recompiled all X stack and scrolling is fast as it should be. Interesting... (kernel is the same as 8.04 and 8.10) 
Comment 24 ralf 2009-01-26 02:59:11 UTC
Created attachment 22243 [details]
Full Xorg.log of 855GM/2.6.22 case

Just as a datapoint, I have this too when scrolling wikipedia texts, and get also lots of 'Component alpha' fallbacks -- using the system

855GM card
kernel 2.6.22
Mesa-7.3
xf86-video-intel-2.6.1
xorg-server-1.5.3 with "EXAOPtimizeMigration" "true"

I also get the occasional
EXA fallback: Unsupported dest format 0x8018000

the log is attached
Comment 25 Dmitry Semyonov 2009-01-29 15:42:02 UTC
Nobody mentioned Gmail, quite popular web app. Yet its interface performance is the most annoying X-related issue for me at the moment.

Scrolling in Gmail becomes really slow and eats all CPU resources when I read a thread consisting of several messages that don't fit browser window area. In this case a small box -- [v Next Author] -- appears above the scrolled page contents in the right lower corner of the window. As long as the box is there the scrolling is unacceptably slow. When the box disappears (at the top and bottom of the page), the scrolling immediately returns to normal speed.

I'm using up-to-date Debian Lenny with the following packages:

xserver-xorg              Version: 1:7.3+18
xserver-xorg-video-intel  Version: 2:2.3.2-2+lenny6
iceweasel                 Version: 3.0.5-1

No fancy desktop environments; just
icewm                     Version: 1.2.35-1
rox-filer                 Version: 2.7.1-1

lspci reports
82865G Integrated Graphics Controller (rev 02)
Comment 26 Dmitry Semyonov 2009-01-29 16:01:47 UTC
(In reply to comment #25)
> Nobody mentioned Gmail, quite popular web app. Yet its interface performance is
> the most annoying X-related issue for me at the moment.

BTW, adding
    Option          "AccelMethod" "XAA"
line to "Device" section of xorg.conf almost solves the issue on my desktop PC. The scrolling is still not perfect, but at least it's usable now.
Comment 27 Ben Gamari 2009-03-03 07:42:38 UTC
Created attachment 23473 [details]
Cairo trace of fast case

Well, I finally got around to taking a cairo-trace log of the fast and slow behavior. What I am about to attach is a set of two logs. The test was conducted on a firefox story, with the fast case being of scrolling within the story body and the slow case being of scrolling within the comments section.

As can be seen, the behavior changes dramatically between the two cases: the slow log is 13 MB uncompressed while the fast log is merely 3 MB despite similar run times. Judging by this, its conceivable that its not the driver's fault at all but rather the browser. What do you all think?
Comment 28 Ben Gamari 2009-03-03 07:43:14 UTC
Created attachment 23474 [details]
Cairo trace of slow case
Comment 29 Jesse Barnes 2009-04-09 16:48:39 UTC
Created attachment 24688 [details]
sysprof output from scrolling slowness

Sysprof output from slow scrolling in the comment section of a slasdot.org article.  This is with git master of xf86-video-intel from today and server 1.6.0.
Comment 30 Michel Dänzer 2009-04-14 08:23:43 UTC
(In reply to comment #29)
> Created an attachment (id=24688) [details]
> sysprof output from scrolling slowness

Looks like most cycles are spent rasterizing trapezoids...
Comment 31 Carl Worth 2009-07-02 11:05:50 UTC
(In reply to comment #30)
> (In reply to comment #29)
> > Created an attachment (id=24688) [details] [details]
> > sysprof output from scrolling slowness
> 
> Looks like most cycles are spent rasterizing trapezoids...

Thanks for the analysis here. Since the bug has been isolated to trapezoid rasterization, this should be fixed as of the below commit in the driver.

I tried to verify this as best I could by scrolling slashdot comments, and nothing seemed slow to me. (Though even in the past when I tried to look for this I didn't see any slowness. Jesse tells me the effect is subtle so I may have been overlooking it.)

I also tried running the traces through cairo-perf-trace, but it doesn't seem to like them, (just reports times of 0). So maybe that's due to some change in the cairo-trace output between the time these were captured and now.

Anyway, I'm going to mark this as resolved, and if someone could confirm or deny that (reopening if necessary), that would be great.

-Carl

commit accdbd23676d812d2345f86d8e3ee62f108841ff
Author: Carl Worth <cworth@cworth.org>
Date:   Fri May 29 15:34:20 2009 -0700

    UXA: Rasterize trapezoids to system memory, not a pixmap
    
    Since we're only doing software rasterization right now, anyway, it
    makes more sense to just rasterize to system memory and then upload
    to a pixmap once complete. This avoids expensive read-modify-write
    cycles.
    
    This results in a 2.4x speedup for a real-world test case that's
    heavy on trapezoids, which is swfdec running on the following file:
    
    http://michalevy.com/wp-content/uploads/Giant%20Steps%202007.swf
    
    Many thanks to Chris Wilson for his cairo-traces repository and
    cairo-perf-trace tool which makes it so easy to measure things
    like this.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.