Bugzilla – Bug 13389
EXA is much slower than XAA
Last modified: 2008-09-25 08:35:21 UTC
With latest intel git driver where EXA is default rendering is tooooooooooo slow.
With Option "AccelMethod" "XAA"
Everythink works like a charm.
00:02.0 VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
00:02.1 Display controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
Marking HIGH priority, because this is now default option.
I'm not seeing significant slowness with EXA on my normal working environment. Could you elaborate what operation gets slow. Are you using composite window manager?
I use external display so virtual desktop is bigger thant 2048 and so acceleration is disabled.
Direct rendering: No
What about with Option "ExaNoComposite" "true"?
Just tested with this parameter it is faster, but old rendering architecture is at least 5x faster...I still can't scroll the web page without seeing waves on the sea - don't know how to express it better..
I have a 945 intel video card and I use 2.2.0 driver.
EXA is default and it is indeed slow...
I'm using compiz as my window manager. I can't use firefox comfortably, as scrolling is terribly slow (I turn my mouse wheel and then wait several seconds for it to stop scrolling). Also cube rendering seems slower than with XAA. I'm testing it with two monitors if it does matter. I've tested previous drivers with EXA enabled with the same result. Probably this bug only exists if you use compiz like window manager.
I have this problem using classical KDE 3.5.8...
I can confirm anything that's been said in comment 5 - except that I don't have a dual-monitor setup.
yeah, this is a known issue for a while, and the fix is under going. You can check "intel-batchbuffer" branch of Dave's work, and see Carl's recent post to xorg list on this i965 perf tuning work. I can't say when would the fix be ready, but it's very close to be done I think.
I've tried to build that branch, and it failed (ubuntu hardy). The last commit was over a month ago, so sadly there is not much happening..
It should build ok. I've run intel-batchbuffer branch on my 945GM lap without problem so far, i965 has some render issues with composite manager, but i haven't tried Carl's latest work yet.
The build problem was probably due to a merge with 2.2 branch that I did locally. Here's how it fails:
gcc -DHAVE_CONFIG_H -I. -I.. -I../../src -Wall -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -fno-strict-aliasing -I/usr/include/xorg -I/usr/include/pixman-1 -I/usr/include/drm -I/usr/include/X11/dri -DI830_XV -DI830_USE_XAA -DI830_USE_EXA -Wall -g -O2 -MT i810_driver.lo -MD -MP -MF .deps/i810_driver.Tpo -c ../../src/i810_driver.c -fPIC -DPIC -o .libs/i810_driver.o
In file included from ../../src/i810_driver.c:88:
../../src/i830.h:631: error: expected specifier-qualifier-list before 'ddx_bufmgr'
../../src/i830.h:648: error: expected specifier-qualifier-list before 'ddx_bo'
The reason was probably that libdrm on my system is too old. Does the batchbuffer branch require TTM?
are you able to build the batchbuffer branch now?
No, it fails with:
gcc -DHAVE_CONFIG_H -I. -I.. -I../../src -Wall -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -fno-strict-aliasing -I/usr/include/xorg -I/usr/include/pixman-1 -I/usr/include/drm -I/usr/include/X11/dri -DI830_XV -DI830_USE_XAA -DI830_USE_EXA -Wall -g -O2 -MT i830_exa.lo -MD -MP -MF .deps/i830_exa.Tpo -c ../../src/i830_exa.c -fPIC -DPIC -o .libs/i830_exa.o
../../src/i830_exa.c: In function 'i830_pixmap_tiled':
../../src/i830_exa.c:118: warning: implicit declaration of function 'exaGetPixmapDriverPrivate'
../../src/i830_exa.c:118: warning: nested extern declaration of 'exaGetPixmapDriverPrivate'
../../src/i830_exa.c:118: warning: assignment makes pointer from integer without a cast
../../src/i830_exa.c: In function 'I830EXADestroyPixmap':
../../src/i830_exa.c:410: warning: unused variable 'pI830'
../../src/i830_exa.c: In function 'I830EXAPixmapIsOffscreen':
../../src/i830_exa.c:424: warning: assignment makes pointer from integer without a cast
../../src/i830_exa.c: In function 'I830EXAPrepareAccess':
../../src/i830_exa.c:441: warning: assignment makes pointer from integer without a cast
../../src/i830_exa.c: In function 'I830EXAModifyPixmapHeader':
../../src/i830_exa.c:476: warning: assignment makes pointer from integer without a cast
../../src/i830_exa.c: In function 'I830EXAInit':
../../src/i830_exa.c:537: error: 'EXA_HANDLES_PIXMAPS' undeclared (first use in this function)
../../src/i830_exa.c:537: error: (Each undeclared identifier is reported only once
../../src/i830_exa.c:537: error: for each function it appears in.)
(In reply to comment #15)
It requires xserver master.
When I compiled libdrm, mesa, xserver master and batchbuffer against them, loading intel_dri.so fails with unknown symbols (intelddx_bufmgr_ttm_init). So I'll just wait until things have settled down a bit.
I also experience that slowness on my Laptop which has a 945GM - EXA feels a lot slower than XAA even without an composition manager.
Re-Sizing windows, scrolling in FireFox and other 2D operations have become noticeable slower - althout even XAA does spend a lot of time in software-rendering-loops.
Can I expect the intel-batchbuffer branch to be faster on the 945GM, or is this only targeted for 965+?
(In reply to comment #18)
> EXA feels a lot slower than XAA even without an composition manager.
Note that EXA is designed for use with a compositing manager and generally works better than XAA with one.
> Note that EXA is designed for use with a compositing manager and generally
> works better than XAA with one.
Yes, sure - but this should not impose that very common applications should run slower without a composition manager (furthermore a composition manager adds overhead also with EXA - so if its slow without one its even a bit slower with a composition manager running, right?).
Will the 945GM also profit from the batchbuffer enhancements, or are these targeted at 965 only? Are there any plans to speed up the GMA950 driver when used with EXA?
Thanks, lg Clemens
(In reply to comment #20)
> > Note that EXA is designed for use with a compositing manager and generally
> > works better than XAA with one.
> Yes, sure - but this should not impose that very common applications should run
> slower without a composition manager
There are tradeoffs, and XAA was extremely optimized for this.
> (furthermore a composition manager adds overhead also with EXA - so if its slow
> without one its even a bit slower with a composition manager running, right?).
Not necessarily, depends on the specific operations.
> Will the 945GM also profit from the batchbuffer enhancements, or are these
> targeted at 965 only?
I think it will be for i830 and newer.
> There are tradeoffs, and XAA was extremely optimized for this.
Well, but even when running XAA I see tons of fallbacks to libfb for everything more complex than solid fills and lines. I hope this will get better with EXA.
> I think it will be for i830 and newer.
Thats really great news :)
Thanks a lot, lg Clemens
(In reply to comment #22)
> > There are tradeoffs, and XAA was extremely optimized for this.
> Well, but even when running XAA I see tons of fallbacks to libfb for everything
> more complex than solid fills and lines.
XAA tends to take less of a hit for software fallbacks because it doesn't migrate pixmaps to video memory as aggressively as EXA does for acceleration.
(In reply to comment #17)
> When I compiled libdrm, mesa, xserver master and batchbuffer against them,
> loading intel_dri.so fails with unknown symbols (intelddx_bufmgr_ttm_init). So
> I'll just wait until things have settled down a bit.
zhenyu, any hint for timo?
Clemens, are you able to test -intel-batchbuffer branch?
Created attachment 13874 [details]
Screenshot of image distortion using batchbuffer branch
Using the batchbuffer-branch I do not see this "EXA-slowness" anymore. But the problem is: I see nearly nothing anymore. ;) See attached screenshot. This holds only using a composite window manager (compiz, kwin4). Using kwin3 even with xcompmgr everything is ok.
*** Bug 13312 has been marked as a duplicate of this bug. ***
Created attachment 15032 [details]
I ran a few benchmarks using expedite (following Carl Worth). I got the following results on my intel 945GM.
All filenames follow the pattern
The suffix XAANOP stands for use of the option XAANoOffscreenPixmaps, bb stands for intel-batchbuffer branch from git (with or without use of DRI2).
Until I fell back to XAA, rdesktop was painfully slow for me on my 865G. Here is a simple test demonstrating the problem with EXA:
start rdesktop to connect to a Windows server on a LAN
open a Command Prompt
do a long directory listing
The scroll speed is very slow and could not be interrupted with ^C. When I switch to XAA, the scroll speed is lightning fast.
I wonder which driver is used by default in Fedora9?
Although EXA is painfully slow with Fedora8 on my 945GM, it actually quite acceptable with Fedora9, still slower than XAA but at least I can now scroll through GMAIL-converstaions with FireFox3 ;)
Text-performance went up from 50.000 chars/s to 150.000/s for -aatext10, which is almost the same I get from XAA :)
Does this driver (intel-master) already contain the batchbuffer-work?
Could this improvements be because of the TTM, or because of the enhanced fallback-implementation (no readback, just gart change)?
Using the drm-gem branch of current git with TTM-enabled Mesa I get values around 280.000 char/s using x11perf -aa10text. So it seems you do not even have the best one in Fedora 9. :)
Which hardware do you have, 915-class or 965?
280.000 looks quite well, well enough to be not a bottleneck anymore ... but still far away from the 2.000.000 chars I get from my Geforce6600 ;)
I wonder what has been included in Fedora9, and when all these improvements will be integrated in the official intel driver (which is almost useless for now when run with EXA enabled).
Thanks for sharing your numbers, Clemens
I have got a 945GM. The value mentioned above was achieved using KDE4 without compositing. If I use compositing it reduces to about 104.000.
For the sake of completeness I also tested actual master branch with TTM. Using KDE4 with compositing turned ON I get
195.000 chars/sec, nearly doubling the GEM value.
*** Bug 15289 has been marked as a duplicate of this bug. ***
reassign to carl..
Now that I "own" this bug, I want to share a few thoughts on where things stand:
1. The intel-batchbuffer branches are no longer interesting. Many things that were there have been merged into master, and the things that were not merged yet are now taking place in the drm-gem branches.
2. If people want to follow along at home, here are the necessary pieces:
kernel: git://people.freedesktop.org/~anholt/linux-2.6 (drm-gem branch)
mesa: git://git.freedesktop.org/git/mesa/mesa (drm-gem branch)
drm: git://git.freedesktop.org/git/mesa/drm (drm-gem branch)
Do "make; make install" at both the top-level and in the OS-specific
sub-directory, (such as linux-core, for example).
xserver: git://git.freedesktop.org/git/xorg/xserver (master branch)
3. I'm working hard on 2D performance on Intel as my full-time job, working
for Intel now, (previously I had been doing this as a part of my job working
for Red Hat).
4. I'm beginning by focusing on 965 because that's where things are worst,
but work will also happen for 915/930 and perhaps earlier devices as well.
5. I'll be making regular, (weekly), blog posts describing improvements
made. My most recent post is here:
And here's a feed that will always show my latest EXA-related posts:
6. From my experience, text rendering is one of the hardest things for the
driver to do well on, (and also very common). So if we get this working
well, then everything else is also working well. So currently, my benchmark
is "x11perf -aa10text" and my goal is to get that faster with EXA than with
XAA. As mentioned in the blog post above, we recently achieved this for
965 on the master branch of the driver, (but not yet on the drm-gem branch).
If others have other benchmarks that accurately capture performance problems
they are experiencing, please let me know.
the most easiest and most important test for me is scrolling in firefox with long text web page. This makes me crazy with EXA.
(In reply to comment #37)
> xf86-video-intel: git://git.freedesktop.org/git/xorg/driver/xf86-video-intel
You might have forgotten to add the drm-gem branch here. :)
Here's a bug entry in the Novell bug tracker that may have some useful data:
And here's another item, (xfce4 window-manger doing translucent window movement):
As is hopefully obvious to anyone interested in this bug, "EXA is slower than XAA" is necessarily an overly broad description of an entire family of bugs. I'm happy to keep it here and open, but it will only be useful insofar as it's a pointer to more specific issues that can be dealt with individually.
So please feel free to open bug reports for specific issues and link to them by making this bug entry depend on those.
As an update for EXA performance issues for xf86-video-intel:
We expect that many of the performance problems are caused by migration. Keith recently simplified EXA by removing the migration code, (the result is called UXA for now[*]). We're hoping that UXA will solve several performance problems. But if it doesn't it will at least *change* performance profiles.
So we're waiting on performance bugs until UXA lands and we can reexamine things.
But stay tuned! We definitely want to get xf86-video-intel performing well as soon as possible.
[*] Let's not get hung up on the name---the actual acceleration functions are all the same as in EXA. This isn't a new acceleration architecture from scratch or anything.
Now that Fedora Rawhide has support for GEM in, I figured this might be a good time to offer some of my performance experiences on 965. Unfortunately, things don't look good:
- Scrolling in Firefox is still a painful experience
- Certain animations in compiz still stutter, especially with a Firefox window mapped (e.g. switcher plugin)
- Although on the whole, compiz does "feel" a bit more responsive than it did a month ago
- x11perf -aa10text has apparently taken a turn for the worse:
- With Compiz: ~20,000 glyphs/sec
- With Metacity: ~26,000 glyphs/sec
- Haven't had a chance to reliably test
These numbers were on
- xf86-video-intel-2.4.2 (with the pixmap/text fix patch from bug #)
- kernel 2.6.27-rc5 (rawhide package 2.6.27-0.305.rc5.git6.fc10.x86_64)
So, as can be see things might have backslid a bit. Given the -aa10text numbers, I would say I must be hitting a bug or configuration oversight of some sort. Let me know if there are any obvious things that come to mind or if I should open a bug for this issue.
(In reply to comment #42)
> These numbers were on
> - xf86-video-intel-2.4.2 (with the pixmap/text fix patch from bug #)
Could you try xf86-video-intel-2.5-branch to see if things get better?
I just tested again on master and it seems to be a little better, producing 34,000 glyphs/sec on my i965 using EXA. Still seems low though it is probably important to note I'm using xserver-1.5 (i.e. no glyph cache).
(In reply to comment #43)
> (In reply to comment #42)
> > These numbers were on
> > - xf86-video-intel-2.4.2 (with the pixmap/text fix patch from bug #)
> Could you try xf86-video-intel-2.5-branch to see if things get better?
After finally getting working 2.4.2 packages, xserver 1.5, and a new kernel it seems that text performance is now magnitudes better. I'm now getting anywhere from 150k-175k glyphs/sec and the things seem much more responsive.
Out of curiosity, are we approaching being hardware-bound at this point or is there still substantially more potential for improvement?
I'm going to mark this bug fixed, as I think it has served its usefulness.
Here are some datapoints to where things stand:
As seen above in comment #28, with latest stuff from git, EXA can be much faster than XAA. Also, I recently reported the following numbers at Linux Plumbers Conference:
From render_bench (time in seconds)
Operation XAA EXA Speedup
━━━━━━━━━ ━━━━━━━ ━━━━━━ ━━━━━━━
Blend 2.61 s 0.28 s 9x
.5 scale 1.36 s 0.20 s 7x
2x scale 39.00 s 0.63 s 62x
General scale 78.51 s 1.06 s 74x
From x11perf -aa10text (glyphs/second)
Server option Glyphs/second
NoAccel true 138k
AccelMethod XAA 113k
ExaNoComposite true 25k
AccelMethod EXA 174k
So we cam see that EXA is as fast or faster than XAA in several situations.
However, this isn't to say that everything is done. For example, my numbers here were all generated with a GEM kernel, (2.6.27-rc4 + GEM patches from ~anholt/linux-2.6), a git-master X server, and a git-master xf86-video-intel driver (basically the same as xf6-video-intel-2.5-branch right now), on a GM965.
It's certainly possible, even likely, that people are still having performance problems, and I still want to hear about them.
But let's please do that in separate bug reports so that we don't mix up various different issues in a single report. If you open a new bug, please specify the following:
Hardware (such as 945GM or GM965, etc.)
Kernel (such as 2.6.25)
X server version (such as 1.5)
xf86-video-intel version (such as 2.4)
And what exposes the bug.
For this last point, please provide details on whatever is necessary. For example if the bug only shows up with compiz, say so, and give the version. If it's a mozilla issue, tell us that along with the mozilla version, (and who built it[*]). The best reports will describe the bug with a program, (such as x11perf, render_bench, or expedite), that put a number on the problem. This makes it very clear for us to see if we have reproduced the problem and to debug it.
[*] I mention "who built it" for Mozilla, (meaning is it a mozilla.com package, a Fedora package, a custom build, etc.), because I recently learned about a potentially severe issue with mozilla.com builds of firefox that have a built-in cairo that Mozilla has configured with a workaround for ancient X servers that may be killing performance on new X servers. But let's talk more about that in the new, specific bug reports.
I do appreciate the help of all testers. And I'm excited for when everything with Intel graphics is blisteringly fast. Let's all get there together!