Bug 13389 (intel-exa-speed) - EXA is much slower than XAA
Summary: EXA is much slower than XAA
Status: RESOLVED FIXED
Alias: intel-exa-speed
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: Other All
: high normal
Assignee: Carl Worth
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 13312 15289 (view as bug list)
Depends on:
Blocks: 13330 intel-2.5
  Show dependency treegraph
 
Reported: 2007-11-25 12:21 UTC by CIJOML CIJOMLovic CIJOMLov
Modified: 2008-09-25 08:35 UTC (History)
24 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Screenshot of image distortion using batchbuffer branch (149.41 KB, image/jpeg)
2008-01-23 00:00 UTC, Johannes Engel
no flags Details
Expedite logs (2.40 KB, application/octet-stream)
2008-03-11 07:26 UTC, Johannes Engel
no flags Details

Description CIJOML CIJOMLovic CIJOMLov 2007-11-25 12:21:20 UTC
With latest intel git driver where EXA is default rendering is tooooooooooo slow.
With Option          "AccelMethod" "XAA"
Everythink works like a charm.

Card:

00:02.0 VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
00:02.1 Display controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)

Marking HIGH priority, because this is now default option.
Comment 1 Gordon Jin 2007-11-26 00:18:47 UTC
I'm not seeing significant slowness with EXA on my normal working environment. Could you elaborate what operation gets slow. Are you using composite window manager?
Comment 2 CIJOML CIJOMLovic CIJOMLov 2007-11-26 00:31:54 UTC
Hi, 

I use external display so virtual desktop is bigger thant 2048 and so acceleration is disabled.

Direct rendering: No
Comment 3 Jesse Barnes 2007-11-27 17:41:29 UTC
What about with Option "ExaNoComposite" "true"?
Comment 4 CIJOML CIJOMLovic CIJOMLov 2007-11-30 11:28:28 UTC
Just tested with this parameter it is faster, but old rendering architecture is at least 5x faster...I still can't scroll the web page without seeing waves on the sea - don't know how to express it better..
Comment 5 Aaron 2007-12-02 05:53:51 UTC
I have a 945 intel video card and I use 2.2.0 driver.

EXA is default and it is indeed slow...

I'm using compiz as my window manager. I can't use firefox comfortably, as scrolling is terribly slow (I turn my mouse wheel and then wait several seconds for it to stop scrolling). Also cube rendering seems slower than with XAA. I'm testing it with two monitors if it does matter. I've tested previous drivers with EXA enabled with the same result. Probably this bug only exists if you use compiz like window manager.
Comment 6 CIJOML CIJOMLovic CIJOMLov 2007-12-02 06:22:33 UTC
I have this problem using classical KDE 3.5.8...
Comment 7 Lukas Petrovicky 2007-12-20 04:12:54 UTC
I can confirm anything that's been said in comment 5 - except that I don't have a dual-monitor setup.
Comment 8 Wang Zhenyu 2007-12-20 16:57:12 UTC
yeah, this is a known issue for a while, and the fix is under going. You can check "intel-batchbuffer" branch of Dave's work, and see Carl's recent post to xorg list on this i965 perf tuning work. I can't say when would the fix be ready, but it's very close to be done I think.
Comment 9 Timo Aaltonen 2008-01-08 15:11:38 UTC
I've tried to build that branch, and it failed (ubuntu hardy). The last commit was over a month ago, so sadly there is not much happening..
Comment 10 Wang Zhenyu 2008-01-08 19:51:19 UTC
It should build ok. I've run intel-batchbuffer branch on my 945GM lap without problem so far, i965 has some render issues with composite manager, but i haven't tried Carl's latest work yet.
Comment 11 Timo Aaltonen 2008-01-09 15:40:31 UTC
The build problem was probably due to a merge with 2.2 branch that I did locally. Here's how it fails:

 gcc -DHAVE_CONFIG_H -I. -I.. -I../../src -Wall -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -fno-strict-aliasing -I/usr/include/xorg -I/usr/include/pixman-1 -I/usr/include/drm -I/usr/include/X11/dri -DI830_XV -DI830_USE_XAA -DI830_USE_EXA -Wall -g -O2 -MT i810_driver.lo -MD -MP -MF .deps/i810_driver.Tpo -c ../../src/i810_driver.c  -fPIC -DPIC -o .libs/i810_driver.o
In file included from ../../src/i810_driver.c:88:
../../src/i830.h:631: error: expected specifier-qualifier-list before 'ddx_bufmgr'
../../src/i830.h:648: error: expected specifier-qualifier-list before 'ddx_bo'
Comment 12 Timo Aaltonen 2008-01-09 15:46:38 UTC
The reason was probably that libdrm on my system is too old. Does the batchbuffer branch require TTM?
Comment 13 Wang Zhenyu 2008-01-09 17:14:56 UTC
yes.
Comment 14 Michael Fu 2008-01-15 18:25:24 UTC
Timo, 

are you able to build the batchbuffer branch now?
Comment 15 Timo Aaltonen 2008-01-17 03:05:51 UTC
No, it fails with:

 gcc -DHAVE_CONFIG_H -I. -I.. -I../../src -Wall -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -fno-strict-aliasing -I/usr/include/xorg -I/usr/include/pixman-1 -I/usr/include/drm -I/usr/include/X11/dri -DI830_XV -DI830_USE_XAA -DI830_USE_EXA -Wall -g -O2 -MT i830_exa.lo -MD -MP -MF .deps/i830_exa.Tpo -c ../../src/i830_exa.c  -fPIC -DPIC -o .libs/i830_exa.o
../../src/i830_exa.c: In function 'i830_pixmap_tiled':
../../src/i830_exa.c:118: warning: implicit declaration of function 'exaGetPixmapDriverPrivate'
../../src/i830_exa.c:118: warning: nested extern declaration of 'exaGetPixmapDriverPrivate'
../../src/i830_exa.c:118: warning: assignment makes pointer from integer without a cast
../../src/i830_exa.c: In function 'I830EXADestroyPixmap':
../../src/i830_exa.c:410: warning: unused variable 'pI830'
../../src/i830_exa.c: In function 'I830EXAPixmapIsOffscreen':
../../src/i830_exa.c:424: warning: assignment makes pointer from integer without a cast
../../src/i830_exa.c: In function 'I830EXAPrepareAccess':
../../src/i830_exa.c:441: warning: assignment makes pointer from integer without a cast
../../src/i830_exa.c: In function 'I830EXAModifyPixmapHeader':
../../src/i830_exa.c:476: warning: assignment makes pointer from integer without a cast
../../src/i830_exa.c: In function 'I830EXAInit':
../../src/i830_exa.c:537: error: 'EXA_HANDLES_PIXMAPS' undeclared (first use in this function)
../../src/i830_exa.c:537: error: (Each undeclared identifier is reported only once
../../src/i830_exa.c:537: error: for each function it appears in.)

Comment 16 Michel Dänzer 2008-01-17 03:15:33 UTC
(In reply to comment #15)

It requires xserver master.
Comment 17 Timo Aaltonen 2008-01-18 00:52:23 UTC
When I compiled libdrm, mesa, xserver master and batchbuffer against them, loading intel_dri.so fails with unknown symbols (intelddx_bufmgr_ttm_init). So I'll just wait until things have settled down a bit.
Comment 18 Clemens Eisserer 2008-01-18 07:14:01 UTC
I also experience that slowness on my Laptop which has a 945GM - EXA feels a lot slower than XAA even without an composition manager.
Re-Sizing windows, scrolling in FireFox and other 2D operations have become noticeable slower - althout even XAA does spend a lot of time in software-rendering-loops.

Can I expect the intel-batchbuffer branch to be faster on the 945GM, or is this only targeted for 965+? 
Comment 19 Michel Dänzer 2008-01-18 07:26:45 UTC
(In reply to comment #18)
> EXA feels a lot slower than XAA even without an composition manager.

Note that EXA is designed for use with a compositing manager and generally works better than XAA with one.
Comment 20 Clemens Eisserer 2008-01-18 13:45:34 UTC
> Note that EXA is designed for use with a compositing manager and generally
> works better than XAA with one.

Yes, sure - but this should not impose that very common applications should run slower without a composition manager (furthermore a composition manager adds overhead also with EXA - so if its slow without one its even a bit slower with a composition manager running, right?).

Will the 945GM also profit from the batchbuffer enhancements, or are these targeted at 965 only? Are there any plans to speed up the GMA950 driver when used with EXA?

Thanks, lg Clemens
Comment 21 Michel Dänzer 2008-01-19 01:27:51 UTC
(In reply to comment #20)
> > Note that EXA is designed for use with a compositing manager and generally
> > works better than XAA with one.
> 
> Yes, sure - but this should not impose that very common applications should run
> slower without a composition manager 

There are tradeoffs, and XAA was extremely optimized for this.

> (furthermore a composition manager adds overhead also with EXA - so if its slow
> without one its even a bit slower with a composition manager running, right?).

Not necessarily, depends on the specific operations.

> Will the 945GM also profit from the batchbuffer enhancements, or are these
> targeted at 965 only?

I think it will be for i830 and newer.
Comment 22 Clemens Eisserer 2008-01-19 02:31:57 UTC
> There are tradeoffs, and XAA was extremely optimized for this.
Well, but even when running XAA I see tons of fallbacks to libfb for everything more complex than solid fills and lines. I hope this will get better with EXA.

> I think it will be for i830 and newer.
Thats really great news :)

Thanks a lot, lg Clemens
Comment 23 Michel Dänzer 2008-01-19 04:09:09 UTC
(In reply to comment #22)
> > There are tradeoffs, and XAA was extremely optimized for this.
> Well, but even when running XAA I see tons of fallbacks to libfb for everything
> more complex than solid fills and lines.

XAA tends to take less of a hit for software fallbacks because it doesn't migrate pixmaps to video memory as aggressively as EXA does for acceleration.
Comment 24 Michael Fu 2008-01-19 17:49:52 UTC
(In reply to comment #17)
> When I compiled libdrm, mesa, xserver master and batchbuffer against them,
> loading intel_dri.so fails with unknown symbols (intelddx_bufmgr_ttm_init). So
> I'll just wait until things have settled down a bit.
> 

zhenyu, any hint for timo?
Comment 25 Michael Fu 2008-01-19 17:51:38 UTC
Clemens, are you able to test -intel-batchbuffer branch?

Comment 26 Johannes Engel 2008-01-23 00:00:17 UTC
Created attachment 13874 [details]
Screenshot of image distortion using batchbuffer branch

Using the batchbuffer-branch I do not see this "EXA-slowness" anymore. But the problem is: I see nearly nothing anymore. ;) See attached screenshot. This holds only using a composite window manager (compiz, kwin4). Using kwin3 even with xcompmgr everything is ok.
Comment 27 Oswald Buddenhagen 2008-01-26 02:53:32 UTC
*** Bug 13312 has been marked as a duplicate of this bug. ***
Comment 28 Johannes Engel 2008-03-11 07:26:09 UTC
Created attachment 15032 [details]
Expedite logs

I ran a few benchmarks using expedite (following Carl Worth). I got the following results on my intel 945GM.
All filenames follow the pattern
drm_Mesa_composite_inteldriver_AccelMethod.log
The suffix XAANOP stands for use of the option XAANoOffscreenPixmaps, bb stands for intel-batchbuffer branch from git (with or without use of DRI2).

2.3_7.0.3_compiz_2.2.1_exa.log
2.3_7.0.3_compiz_2.2.1_xaa.log
2.3_7.0.3_compizop_2.2.1_xaa.log
2.3_7.0.3_no_2.2.1_exa.log
2.3_7.0.3__no_2.2.1_xaa.log
2.3_7.0.3_noXAANOP_2.2.1_xaa.log
git_7.0.3_compiz_2.2.1_exa.log
git_7.0.3_no_2.2.1_exa.log
git_git_no_bbdri2_exa.log
git_git_no_bb_exa.log
git_git_no_git_exa.log
git_git_xcomp_git_exa.log
Comment 29 Ben Armstrong 2008-03-11 08:30:06 UTC
Until I fell back to XAA, rdesktop was painfully slow for me on my 865G.  Here is a simple test demonstrating the problem with EXA:

start rdesktop to connect to a Windows server on a LAN
open a Command Prompt
do a long directory listing

The scroll speed is very slow and could not be interrupted with ^C.  When I switch to XAA, the scroll speed is lightning fast.
Comment 30 Clemens Eisserer 2008-06-05 02:14:36 UTC
I wonder which driver is used by default in Fedora9?

Although EXA is painfully slow with Fedora8 on my 945GM, it actually quite acceptable with Fedora9, still slower than XAA but at least I can now scroll through GMAIL-converstaions with FireFox3 ;)
Text-performance went up from 50.000 chars/s to 150.000/s for -aatext10, which is almost the same I get from XAA :)

Does this driver (intel-master) already contain the batchbuffer-work?
Could this improvements be because of the TTM, or because of the enhanced fallback-implementation (no readback, just gart change)?
Comment 31 Johannes Engel 2008-06-05 02:32:52 UTC
Using the drm-gem branch of current git with TTM-enabled Mesa I get values around 280.000 char/s using x11perf -aa10text. So it seems you do not even have the best one in Fedora 9. :)
Comment 32 Clemens Eisserer 2008-06-05 03:18:22 UTC
Which hardware do you have, 915-class or 965?
280.000 looks quite well, well enough to be not a bottleneck anymore ... but still far away from the 2.000.000 chars I get from my Geforce6600 ;)

I wonder what has been included in Fedora9, and when all these improvements will be integrated in the official intel driver (which is almost useless for now when run with EXA enabled).


Thanks for sharing your numbers, Clemens
Comment 33 Johannes Engel 2008-06-05 05:52:08 UTC
I have got a 945GM. The value mentioned above was achieved using KDE4 without compositing. If I use compositing it reduces to about 104.000.
Comment 34 Johannes Engel 2008-06-05 06:51:52 UTC
For the sake of completeness I also tested actual master branch with TTM. Using KDE4 with compositing turned ON I get
195.000 chars/sec, nearly doubling the GEM value.
Comment 35 Wang Zhenyu 2008-06-18 00:46:55 UTC
*** Bug 15289 has been marked as a duplicate of this bug. ***
Comment 36 Michael Fu 2008-07-03 20:20:09 UTC
reassign to carl..
Comment 37 Carl Worth 2008-07-22 10:45:29 UTC
Now that I "own" this bug, I want to share a few thoughts on where things stand:

1. The intel-batchbuffer branches are no longer interesting. Many things that were there have been merged into master, and the things that were not merged yet are now taking place in the drm-gem branches.

2. If people want to follow along at home, here are the necessary pieces:

kernel: git://people.freedesktop.org/~anholt/linux-2.6 (drm-gem branch)

mesa: git://git.freedesktop.org/git/mesa/mesa (drm-gem branch)

drm: git://git.freedesktop.org/git/mesa/drm (drm-gem branch)
Do "make; make install" at both the top-level and in the OS-specific
sub-directory, (such as linux-core, for example).

xserver: git://git.freedesktop.org/git/xorg/xserver (master branch)

xf86-video-intel: git://git.freedesktop.org/git/xorg/driver/xf86-video-intel

3. I'm working hard on 2D performance on Intel as my full-time job, working
for Intel now, (previously I had been doing this as a part of my job working
for Red Hat).

4. I'm beginning by focusing on 965 because that's where things are worst,
but work will also happen for 915/930 and perhaps earlier devices as well.

5. I'll be making regular, (weekly), blog posts describing improvements
made. My most recent post is here:

http://cworth.org/exa/i965/new_job_old_tricks/

And here's a feed that will always show my latest EXA-related posts:

http://cworth.org/tag/exa/

6. From my experience, text rendering is one of the hardest things for the
driver to do well on, (and also very common). So if we get this working
well, then everything else is also working well. So currently, my benchmark
is "x11perf -aa10text" and my goal is to get that faster with EXA than with
XAA. As mentioned in the blog post above, we recently achieved this for
965 on the master branch of the driver, (but not yet on the drm-gem branch).

If others have other benchmarks that accurately capture performance problems
they are experiencing, please let me know.

Thanks,

-Carl
carl.d.worth@intel.com
Comment 38 CIJOML CIJOMLovic CIJOMLov 2008-07-22 10:59:56 UTC
Hi there,

the most easiest and most important test for me is scrolling in firefox with long text web page. This makes me crazy with EXA.
Comment 39 Johannes Engel 2008-07-23 03:59:26 UTC
(In reply to comment #37)
> xf86-video-intel: git://git.freedesktop.org/git/xorg/driver/xf86-video-intel

You might have forgotten to add the drm-gem branch here. :)
Comment 40 Carl Worth 2008-08-04 14:23:04 UTC
Here's a bug entry in the Novell bug tracker that may have some useful data:

https://bugzilla.novell.com/show_bug.cgi?id=411183

And here's another item, (xfce4 window-manger doing translucent window movement):

http://lists.freedesktop.org/archives/xorg/2008-August/037746.html

As is hopefully obvious to anyone interested in this bug, "EXA is slower than XAA" is necessarily an overly broad description of an entire family of bugs. I'm happy to keep it here and open, but it will only be useful insofar as it's a pointer to more specific issues that can be dealt with individually.

So please feel free to open bug reports for specific issues and link to them by making this bug entry depend on those.

Thanks,

-Carl
Comment 41 Carl Worth 2008-08-07 17:25:31 UTC
As an update for EXA performance issues for xf86-video-intel:

We expect that many of the performance problems are caused by migration. Keith recently simplified EXA by removing the migration code, (the result is called UXA for now[*]). We're hoping that UXA will solve several performance problems. But if it doesn't it will at least *change* performance profiles.

So we're waiting on performance bugs until UXA lands and we can reexamine things.

But stay tuned! We definitely want to get xf86-video-intel performing well as soon as possible.

Thanks,

-Carl

[*] Let's not get hung up on the name---the actual acceleration functions are all the same as in EXA. This isn't a new acceleration architecture from scratch or anything.
Comment 42 Ben Gamari 2008-09-08 08:44:19 UTC
Now that Fedora Rawhide has support for GEM in, I figured this might be a good time to offer some of my performance experiences on 965. Unfortunately, things don't look good:

With EXA:
- Scrolling in Firefox is still a painful experience
- Certain animations in compiz still stutter, especially with a Firefox window mapped (e.g. switcher plugin)
- Although on the whole, compiz does "feel" a bit more responsive than it did a month ago
- x11perf -aa10text has apparently taken a turn for the worse:
  - With Compiz: ~20,000 glyphs/sec
  - With Metacity: ~26,000 glyphs/sec

With UXA:
- Haven't had a chance to reliably test

These numbers were on
- xf86-video-intel-2.4.2 (with the pixmap/text fix patch from bug #)
- mesa-7.1
- xserver-1.5.0
- drm-2.4.0
- kernel 2.6.27-rc5 (rawhide package 2.6.27-0.305.rc5.git6.fc10.x86_64)

So, as can be see things might have backslid a bit. Given the -aa10text numbers, I would say I must be hitting a bug or configuration oversight of some sort. Let me know if there are any obvious things that come to mind or if I should open a bug for this issue.
Comment 43 Gordon Jin 2008-09-08 19:24:54 UTC
(In reply to comment #42)
> These numbers were on
> - xf86-video-intel-2.4.2 (with the pixmap/text fix patch from bug #)

Could you try xf86-video-intel-2.5-branch to see if things get better?
Comment 44 Ben Gamari 2008-09-09 05:11:05 UTC
I just tested again on master and it seems to be a little better, producing 34,000 glyphs/sec on my i965 using EXA. Still seems low though it is probably important to note I'm using xserver-1.5 (i.e. no glyph cache).

(In reply to comment #43)
> (In reply to comment #42)
> > These numbers were on
> > - xf86-video-intel-2.4.2 (with the pixmap/text fix patch from bug #)
> 
> Could you try xf86-video-intel-2.5-branch to see if things get better?
> 

Comment 45 Ben Gamari 2008-09-16 18:12:19 UTC
After finally getting working 2.4.2 packages, xserver 1.5, and a new kernel it seems that text performance is now magnitudes better. I'm now getting anywhere from 150k-175k glyphs/sec and the things seem much more responsive.

Out of curiosity, are we approaching being hardware-bound at this point or is there still substantially more potential for improvement?
Comment 46 Carl Worth 2008-09-25 08:35:21 UTC
I'm going to mark this bug fixed, as I think it has served its usefulness.

Here are some datapoints to where things stand:

As seen above in comment #28, with latest stuff from git, EXA can be much faster than XAA. Also, I recently reported the following numbers at Linux Plumbers Conference:

From render_bench (time in seconds)
===================================

Operation       XAA       EXA    Speedup
━━━━━━━━━     ━━━━━━━   ━━━━━━   ━━━━━━━
Blend          2.61 s   0.28 s        9x
.5 scale       1.36 s   0.20 s        7x
2x scale      39.00 s   0.63 s       62x
General scale 78.51 s   1.06 s       74x

From x11perf -aa10text (glyphs/second)
======================================

Server option       Glyphs/second
━━━━━━━━━━━━━       ━━━━━━━━━━━━━
NoAccel true                 138k
AccelMethod XAA              113k
ExaNoComposite true           25k
AccelMethod EXA              174k

So we cam see that EXA is as fast or faster than XAA in several situations.

However, this isn't to say that everything is done. For example, my numbers here were all generated with a GEM kernel, (2.6.27-rc4 + GEM patches from ~anholt/linux-2.6), a git-master X server, and a git-master xf86-video-intel driver (basically the same as xf6-video-intel-2.5-branch right now), on a GM965.

It's certainly possible, even likely, that people are still having performance problems, and I still want to hear about them.

But let's please do that in separate bug reports so that we don't mix up various different issues in a single report. If you open a new bug, please specify the following:

Hardware (such as 945GM or GM965, etc.)
Kernel (such as 2.6.25)
X server version (such as 1.5)
xf86-video-intel version (such as 2.4)
And what exposes the bug.

For this last point, please provide details on whatever is necessary. For example if the bug only shows up with compiz, say so, and give the version. If it's a mozilla issue, tell us that along with the mozilla version, (and who built it[*]). The best reports will describe the bug with a program, (such as x11perf, render_bench, or expedite), that put a number on the problem. This makes it very clear for us to see if we have reproduced the problem and to debug it.

[*] I mention "who built it" for Mozilla, (meaning is it a mozilla.com package, a Fedora package, a custom build, etc.), because I recently learned about a potentially severe issue with mozilla.com builds of firefox that have a built-in cairo that Mozilla has configured with a workaround for ancient X servers that may be killing performance on new X servers. But let's talk more about that in the new, specific bug reports.

I do appreciate the help of all testers. And I'm excited for when everything with Intel graphics is blisteringly fast. Let's all get there together!

-Carl


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.