Created attachment 31160 [details] xorg.0.log System Environment: -------------------------- Arch i386 Platform 945GM 945GME Xf86_video_intel (master)50e2a6734de43a135aa91cd6e6fb5147e15ce315 Bug detailed description: ------------------------- x11perf/aa10text and x11perf/rgb10text has a regression caused by a commit in Xf86_video_intel. commit e581ceb7381e29ecc1a172597d258824f6a1d2d3 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Nov 10 11:14:23 2009 +0000 i915: Use the color channels to pass along solid sources and masks. Instead of allocating and utilising the texture samplers for 1x1R solid sources and masks we can simply use the default diffuse and specular colour channels and adjust the fragment shader appropriately. The big advantage is the reduction in size of batches which should give a good boost to glyph performance, irrespective of the additional boost from using simpler shaders. However, the motivating factor behind the switch is that our use of 1x1 textures turns out to be buggy... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reproduce steps: ---------------- 1.xinit& 2. x11perf -aa10text
Interesting, I was expecting a boost since the shader is simpler and we're transferring fewer bytes. Can you report the scale of the regression?
Since centre-point sampling in combination with 1x1R textures is buggy, I can't simply revert this change (and the centre-point sampling is required to prevent off-by-one rendering errors, i.e the occasional black rectangle around images). On my i945, prior to this commit I get 378k/s, and afterwards 370k/s. Alternates that I have tried so far: using per-vertex colors: 361k/s using shader constants instead of defaults: 359k/s.
On 945GM ia32, it drops 10% from 602000.0 to 536000.0 with rgb10text, and drops 11% from 812000.0 to 716000.0 with aa10text. On 945GME ia32, its rgb10text drops 19% from 330000.0 to 266000.0 but its aa10text only drops a little from 347000.0 to 338000.0.
Dropping priority as we seem to be hitting a gpu bottleneck on a path that I believe is required for correct rendering elsewhere (with similar setup).
Ok, I think I've found the cause of the damage here. i915: Baseline: 3600000 trep @ 0.0071 msec (142000.0/sec) 3200000 trep @ 0.0082 msec (122000.0/sec) Adjusting libXft to use SolidFills: 4000000 trep @ 0.0066 msec (150000.0/sec) 3600000 trep @ 0.0076 msec (132000.0/sec) Improving the driver to avoid reading back (from system memory) a pixel to determine color for SolidFills: 4000000 trep @ 0.0067 msec (149000.0/sec) 3600000 trep @ 0.0068 msec (147000.0/sec) And for good measure reverting the change to libXft i.e. back to using a solid pixmap (this should be close to the original code & performance): 8000000 trep @ 0.0062 msec (162000.0/sec) 4000000 trep @ 0.0069 msec (145000.0/sec) PineView: Baseline: 8000000 trep @ 0.0040 msec (248000.0/sec) 8000000 trep @ 0.0043 msec (232000.0/sec) Updating libXft: 8000000 trep @ 0.0040 msec (249000.0/sec) 8000000 trep @ 0.0043 msec (231000.0/sec) Improved driver: 8000000 trep @ 0.0038 msec (262000.0/sec) 8000000 trep @ 0.0040 msec (251000.0/sec) And reverting the change to libXft...: 8000000 trep @ 0.0039 msec (259000.0/sec) 8000000 trep @ 0.0041 msec (244000.0/sec) So, the cause would appear to be the readback of the single pixel. The remaining question is whether to use a pixmap or a diffuse color.
(Still a mystery 3x performance hit, but the improvement is consistent and the code should be pretty close to the original, so...) commit 21c1c3c7f6eb2b5070d2153b15a8fb1afe938bbb Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon May 10 10:19:28 2010 +0100 i915: Use 1x1R pixmap for solid drawables x11perf has a regression https://bugs.freedesktop.org/show_bug.cgi?id=25068 caused by commit e581ceb7381e29ecc1a172597d258824f6a1d2d3 i915: Use the color channels to pass along solid sources and masks. Do not convert 1x1R pixmaps into a solid color as the readback from the bo negates all the performances advantages of using a smaller vertex buffer and fewer samplers. Before (PineView): aa=66800 glyph/s, rgb=28800 glyphs/s Now: aa=96800 glyphs/s, rgb=48500 glyphs/s Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rechecked on my i945GME: Before: 12000000 trep @ 0.0025 msec (404000.0/sec): Char in 80-char aa line (Charter 10) 12000000 trep @ 0.0026 msec (380000.0/sec): Char in 80-char rgb line (Charter 10) After: 12000000 trep @ 0.0024 msec (417000.0/sec): Char in 80-char aa line (Charter 10) 12000000 trep @ 0.0025 msec (399000.0/sec): Char in 80-char rgb line (Charter 10) which seems consistent with the original regression.
x11perf improves a lot on pineview(i915) recently, so verified.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.