Bug 87821

Summary: [BYT Bisected] OglBatch1 performance regression ~20%
Product: xorg Reporter: zhipeng.Zheng <zhipengx.zheng>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED NOTABUG QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: christophe.prigent, eero.t.tamminen
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xor.0.log attached none

Description zhipeng.Zheng 2014-12-29 07:45:05 UTC
Created attachment 111457 [details]
Xor.0.log attached

Environment:
-----------------------------------
Platform:BYT
Libdrm:		(master)libdrm-2.4.58-19-gf99522e678dbbaffeca9462a8edcbe900574dc12
Mesa:		(master)aa6415b4852557ed91b4f31065a79b2a6c987c53
Xserver:		(master)xorg-server-1.16.99.901-102-g826e7c2b36f192fbbe7ddff37eb559f4d6301146
Xf86_video_intel:		(master)2.99.916-188-g01ce7efe73538047abd38bbbb95fc4012ebeb9b4
Cairo:		(master)8e11a42e3e9b679dce97ac45cd8b47322536a253
Libva:		(master)8986ec692b19d8dd6bd2aa118b5dffbd05a8f909
Libva_intel_driver:		(master)b5d6d9d425a6d539b27d22992bda05f79d1a0622
Kernel:   (drm-intel-nightly)4fa23142a15526f4a4b5df61f26eacdd558a849a

Bug detailed description:
---------------------------------------------
OglBatch1 performance regress by ~20%

It's Xf86_video_intel regression,bisect is in the progress:
Find the bad commit is: baec802b21387d04aebb10ac29e719a1800c5aa0

the good commit is :692c14d405bb352697b67f36a034d4963e272b66


Reproduce steps:
---------------------------------------------
1.   xinit&
2.   ./ synmark2 OglBatch1
Comment 1 zhipeng.Zheng 2014-12-29 08:37:18 UTC
the first bad commit is:

b6eeb7a1f7efa591504070b606be655e27e6e9c2

Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Nov 5 13:03:41 2014 +0000

    Disable DRI3 by default

    The external libraries, both in git, and especially shipping already
    enabled in distributions, are buggy and lead to server crashes and
    lockups. Caveat emptor.

    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 2 Eero Tamminen 2014-12-29 09:09:48 UTC
DRI2 has more overhead (copy per frame update) than DRI3, so the performance regression from disabling DRI3 is expected, especially on memory bandwidth bound high FPS cases.
Comment 3 Chris Wilson 2014-12-29 10:26:44 UTC
Only byt? On ivb I see:

DRI2 - FPS: 361.101 (MIN: 360.858, MAX: 394.804, STDEV [%]: 1.30636, S: 36)

DRI3 - FPS: 348.131 (MIN: 328.46, MAX: 370.083, STDEV [%]: 3.63791, S: 33)
Comment 4 Eero Tamminen 2014-12-29 12:15:56 UTC
Tests are run in FullHD.  BYT has (a lot) less memory bandwidth available than Core machines, compared to the memory bandwidth taken by FullHD frames.  On IVB/HSW, Batch0 test isn't memory bandwidth bound (at least as much as on BYT).
Comment 5 zhipeng.Zheng 2014-12-30 05:31:43 UTC
Test with below cases:


SynMark2_v6.0.0_OglBatch2
SynMark2_v6.0.0_OglBatch3
SynMark2_v6.0.0_OglBatch4
SynMark2_v6.0.0_OglGeomPoint
SynMark2_v6.0.0_OglGeomTriList
SynMark2_v6.0.0_OglGeomTriStrip
Comment 6 Chris Wilson 2014-12-30 18:45:43 UTC
On byt N2820, 1080p:

dri2: FPS: 99.9493 (MIN: 89.2223, MAX: 123.483, CV: 7.00455%, S: 10)
dri3: FPS: 58.1767 (MIN: 42.9242, MAX: 66.0809, CV: 8.41218%, S: 10)

I am not sure if dri3 is being locked to screen refresh or not (max of 66fps suggests otherwise, but it is suspiciously close to refresh).

Tweaking DRI2 to use async flipping [http://cgit.freedesktop.org/~ickle/xserver/commit/?h=async-20141230&id=13372bcf995599b4fe1f427923635d02d2f4c71f]:

FPS: 138.5 (MIN: 67.0551, MAX: 185.931, CV: 19.661%, S: 10)

which does support that it is the swap ellision that may be the difference, but fwiw I cannot reproduce the reported DRI3 improvements.
Comment 7 Eero Tamminen 2014-12-31 13:26:27 UTC
(In reply to Chris Wilson from comment #6)
> On byt N2820, 1080p:
> 
> dri2: FPS: 99.9493 (MIN: 89.2223, MAX: 123.483, CV: 7.00455%, S: 10)
> dri3: FPS: 58.1767 (MIN: 42.9242, MAX: 66.0809, CV: 8.41218%, S: 10)
> 
> I am not sure if dri3 is being locked to screen refresh or not (max of 66fps
> suggests otherwise, but it is suspiciously close to refresh).

Based on bug 79715 debugging, swaps may be locked to refresh only part of the time.  Maybe it can get locked also to multiple of 60 FPS.


> Tweaking DRI2 to use async flipping
> [http://cgit.freedesktop.org/~ickle/xserver/commit/?h=async-
> 20141230&id=13372bcf995599b4fe1f427923635d02d2f4c71f]:
> 
> FPS: 138.5 (MIN: 67.0551, MAX: 185.931, CV: 19.661%, S: 10)
> 
> which does support that it is the swap ellision that may be the difference,
> but fwiw I cannot reproduce the reported DRI3 improvements.

Improvements are visible only in memory bound tests and only when they don't get wrongly refresh locked. Based on bug 79715, on what tests / when DRI3 locking happens, is pretty random. :-)
Comment 8 Chris Wilson 2015-01-21 09:56:00 UTC
To recap on my byt and OglBatch1 and latest mesa + async patches:

dri3: 147.969 (MIN: 123.535, MAX: 164.291, CV: 5.35204%, S: 14)
dri2: 153.901 (MIN: 153.454, MAX: 165.578, CV: 3.32846%, S: 14)
Comment 9 Chris Wilson 2016-10-24 13:50:49 UTC
This is now invalid since the boost came from DRI3 swap elision which is no longer used.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.