Bug 25068 - [i915 bisected] x11perf has a regression
Summary: [i915 bisected] x11perf has a regression
Status: VERIFIED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: low normal
Assignee: Chris Wilson
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-12 22:33 UTC by zhao jian
Modified: 2010-06-10 19:53 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
xorg.0.log (22.63 KB, text/plain)
2009-11-12 22:33 UTC, zhao jian
no flags Details

Description zhao jian 2009-11-12 22:33:40 UTC
Created attachment 31160 [details]
xorg.0.log

System Environment:
--------------------------
Arch    i386
Platform        945GM 945GME
Xf86_video_intel        (master)50e2a6734de43a135aa91cd6e6fb5147e15ce315

Bug detailed description:
-------------------------
x11perf/aa10text and x11perf/rgb10text has a regression caused by a commit in Xf86_video_intel. 
commit e581ceb7381e29ecc1a172597d258824f6a1d2d3
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Nov 10 11:14:23 2009 +0000

	i915: Use the color channels to pass along solid sources and masks.

	Instead of allocating and utilising the texture samplers for 1x1R
	solid sources and masks we can simply use the default diffuse and
	specular colour channels and adjust the fragment shader appropriately.
	The big advantage is the reduction in size of batches which should give
	a good boost to glyph performance, irrespective of the additional boost
	from using simpler shaders.

	However, the motivating factor behind the switch is that our use of 1x1
	textures turns out to be buggy...

	Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


Reproduce steps:
----------------
1.xinit&
2. x11perf -aa10text
Comment 1 Chris Wilson 2009-11-13 06:30:06 UTC
Interesting, I was expecting a boost since the shader is simpler and we're transferring fewer bytes.

Can you report the scale of the regression?
Comment 2 Chris Wilson 2009-11-14 01:50:05 UTC
Since centre-point sampling in combination with 1x1R textures is buggy, I can't simply revert this change (and the centre-point sampling is required to prevent off-by-one rendering errors, i.e the occasional black rectangle around images).

On my i945, prior to this commit I get 378k/s, and afterwards 370k/s.

Alternates that I have tried so far:
  using per-vertex colors: 361k/s
  using shader constants instead of defaults: 359k/s.

Comment 3 zhao jian 2009-11-15 19:07:52 UTC
On 945GM ia32, it drops 10% from 602000.0 to  536000.0 with rgb10text, and drops 11% from 812000.0 to  716000.0 with aa10text. 
On 945GME ia32, its rgb10text drops 19% from 330000.0 to  266000.0 but its aa10text only drops a little from 347000.0 to 338000.0. 
Comment 4 Chris Wilson 2009-12-09 02:13:09 UTC
Dropping priority as we seem to be hitting a gpu bottleneck on a path that I believe is required for correct rendering elsewhere (with similar setup).
Comment 5 Chris Wilson 2010-03-26 07:04:15 UTC
Ok, I think I've found the cause of the damage here.

i915:
Baseline:
3600000 trep @   0.0071 msec (142000.0/sec)
3200000 trep @   0.0082 msec (122000.0/sec)

Adjusting libXft to use SolidFills:
4000000 trep @   0.0066 msec (150000.0/sec)
3600000 trep @   0.0076 msec (132000.0/sec)

Improving the driver to avoid reading back (from system memory) a pixel to determine color for SolidFills:
4000000 trep @   0.0067 msec (149000.0/sec)
3600000 trep @   0.0068 msec (147000.0/sec)

And for good measure reverting the change to libXft i.e. back to using a solid pixmap (this should be close to the original code & performance):
8000000 trep @   0.0062 msec (162000.0/sec)
4000000 trep @   0.0069 msec (145000.0/sec)

PineView:
Baseline:
8000000 trep @   0.0040 msec (248000.0/sec)
8000000 trep @   0.0043 msec (232000.0/sec)
Updating libXft:
8000000 trep @   0.0040 msec (249000.0/sec)
8000000 trep @   0.0043 msec (231000.0/sec)
Improved driver:
8000000 trep @   0.0038 msec (262000.0/sec)
8000000 trep @   0.0040 msec (251000.0/sec)
And reverting the change to libXft...:
8000000 trep @   0.0039 msec (259000.0/sec)
8000000 trep @   0.0041 msec (244000.0/sec)


So, the cause would appear to be the readback of the single pixel. The remaining question is whether to use a pixmap or a diffuse color.
Comment 6 Chris Wilson 2010-05-10 02:51:10 UTC
(Still a mystery 3x performance hit, but the improvement is consistent and the code should be pretty close to the original, so...)

commit 21c1c3c7f6eb2b5070d2153b15a8fb1afe938bbb
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon May 10 10:19:28 2010 +0100

    i915: Use 1x1R pixmap for solid drawables
    
      x11perf has a regression
      https://bugs.freedesktop.org/show_bug.cgi?id=25068
    
    caused by
    
      commit e581ceb7381e29ecc1a172597d258824f6a1d2d3
      i915: Use the color channels to pass along solid sources and masks.
    
    Do not convert 1x1R pixmaps into a solid color as the readback from the
    bo negates all the performances advantages of using a smaller vertex
    buffer and fewer samplers.
    
    Before (PineView):
      aa=66800 glyph/s, rgb=28800 glyphs/s
    
    Now:
      aa=96800 glyphs/s, rgb=48500 glyphs/s
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 7 Chris Wilson 2010-05-10 03:46:13 UTC
Rechecked on my i945GME:

Before:
12000000 trep @   0.0025 msec (404000.0/sec): Char in 80-char aa line (Charter 10)
12000000 trep @   0.0026 msec (380000.0/sec): Char in 80-char rgb line (Charter 10)

After:
12000000 trep @   0.0024 msec (417000.0/sec): Char in 80-char aa line (Charter 10)
12000000 trep @   0.0025 msec (399000.0/sec): Char in 80-char rgb line (Charter 10)

which seems consistent with the original regression.
Comment 8 zhao jian 2010-06-10 19:53:32 UTC
x11perf improves a lot on pineview(i915) recently, so verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.