Bug 67157

Summary: [Piketon bisected] x11perf –aa10text performance reduced by ~37% enabling SNA
Product: xorg Reporter: meng <mengmeng.meng>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: VERIFIED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description meng 2013-07-22 08:46:28 UTC
System Environment:       
----------------------------------------------
Platform:  Ironlake
Libdrm:	   (master)libdrm-2.4.46-2-gfea5408098c3c3057958e85ea9d7146f0b08749e
Mesa:	   (master)cd90ebefd4f63d00eb65af6069fb078e454c43f1
Xf86_video_intel:(master)2.21.12-36-g2737aaab77cc6233826077b90d001fe45efc376c
Kernel:	(drm-intel-nightly)79e7b6f2cd7a75ff57207f763abf46cfe709e8de

Bug detailed description:
----------------------------------------------
x11perf –aa10text performance reduced by ~37% enabling SNA on Ironlake. 
It’s xf86_video_intel regression, bisecting show the first bad commit is that:

commit 16fb786bd0791986fcac8b11a7e182090c5b6249
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Jun 23 09:00:17 2013 +0100
    sna/gen5: Force a write flush when changing blend modes

Performance(x11perf –aa10text):
------------------------------------------
               SNA        disable SNA
git-9eabe28    9650000    1520000
git-16fb786    15500000   1520000

Reproduce steps:
---------------------------------------------
1. xinit&
2. vblank_mode=0 ./x11perf_x86_64 -aa10text
Comment 1 Chris Wilson 2013-07-22 09:51:44 UTC
There isn't much I can do since that flush is required to workaround a hardware issue.

On my mobile systems (i3-330), the degradation is closer to 10%,
7.9Mglyphs/s to 7.3Mglyphs/s, which is not that worrying.

Can you please describe how you avoid the IPS penalty on Ironlake for benchmarking?
Comment 2 Chris Wilson 2013-07-22 10:11:05 UTC
I reworked the flush and was happy that I couldn't see the corruption return and that x11perf improved here.

commit 7b1a5024df96070bab70744ad7e7d78a41fb0f72
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jul 22 10:51:05 2013 +0100

    sna/gen5: Rework the flush after blend state changes
    
    QA complained that the full flush to memory was too much of a
    performance hit, so let's try a lighter weight non-stalling pipe flush
    without.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=67157
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Can you please remeasure and see what the current perf hit is?
Comment 3 meng 2013-07-23 02:52:18 UTC
(In reply to comment #2)

The above performance data is on Piketon. I test on Calpella(mobile), the degradation is about ~16%.
I retest the 7b1a5024df96070b and make a summary:
                     Piketon         Calpella
git-9eabe28:        15500000          7760000   
git-16fb786:      9650000(37%↓)      6470000(16.6%↓)
git-7b1a502:      13700000(11.6%↓)   6820000(12%↓)

Calpella:  Core(TM) i5 CPU M 520 @ 2.40GHz
Piketon :  Core(TM) i5 CPU  650  @ 3.20GHz 

hi Chris, what does "IPS" mean?
Comment 4 Chris Wilson 2013-07-23 05:37:42 UTC
IPS is "Intelligent Power Sharing". It adjusts the performance of the GPU by ~2x in order to split power between the CPU/GPU based on demand, and takes a good couple of minutes to respond. That is it takes several runs of a benchmark on Ironlake in close succession for the results to stablise.

Hmm, it's strange that after 7b1a502 the degradation on my i3-330 was <1% (~7.9 to ~7.8 MGlyphs/s). But a 10% perf hit for a flushing correctness fix here is not too bad - though I don't see why we would need to inject a flush between glyph runs, so I will take another look.
Comment 5 Chris Wilson 2013-07-23 08:33:01 UTC
This should eliminate more of the flushes:

commit 74dd10d252df2d826b6db5da7e53bd7c5d76dec0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jul 23 09:37:15 2013 +0100

    sna/gen5: The cached value of the pipelined pointers key requires 32-bits
    
    Storing only the low 16-bits of the key for the pipelined state meant
    that we forced an update with every new drawing op - with the side
    effect of flushing the render cache.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67157
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Can you please remeasure?
Comment 6 meng 2013-07-23 08:50:08 UTC
(In reply to comment #5)
With the commit, x11perf –aa10text on Piketon is 15200000. The problem is fixed.
Comment 7 Chris Wilson 2013-07-23 09:04:20 UTC
Thanks for testing. Hopefully, the render error in facebook.com (and friends) doesn't reappear...
Comment 8 meng 2013-07-26 02:26:06 UTC
Verified it.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.