Bug 33394

Summary: [SNB] performance regression: screen stuttered when running the demo of 3D games with compiz enabled without GPU semaphores
Product: DRI Reporter: meng <mengmeng.meng>
Component: DRM/IntelAssignee: Chris Wilson <chris>
Status: CLOSED FIXED QA Contact:
Severity: major    
Priority: high CC: jbarnes, xunx.fang
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
The resulting of i915_error_state none

Description meng 2011-01-23 22:43:59 UTC
System Environment:
--------------------------
Platform:          Huronriver
Libdrm:            (master)2.4.23-6-g550fe2ca3b29ad2191eab4fdfbed9ed21e25492d
Mesa:              (master)e8c7d7598fb48237508f566204c71ba8f74d544f
Xserver:           (master)xorg-server-1.9.99.901-118-gc6aa4755ec355101a62bef86dbb090262fe806f6
Xf86_video_intel:  (master)2.14.0-10-g4c4ad555564a80311df1a4b762eb1e119c6d95fb
Cairo:             (master)72a9d49a530456e7002675235333885c70580abb
Kernel:            (drm-intel-next)fe4402931e43e81a4129eba41d05cf8907603af5

Bug detailed description:
-------------------------
Screen stuttered when running the demo(urbanterror,openarena)on Huronriver.The screen will stop-go about every few seconds only on gnome-desktop with compiz enabled.It's kernel regression. Especially,it's a compiz problem. In gnome without compiz,it works fine. 


Reproduce steps:
----------------
1. gnome-session
2. enable compiz
3. run demo
Comment 1 meng 2011-01-23 23:37:58 UTC
There is some information in bug 32752.
Comment 2 Chris Wilson 2011-01-25 12:15:41 UTC
Is it an FBC issue (going via compiz might triggering a blit?)? Does i915.powersave=0 make any difference?

On the other hand it might be a pageflipping problem...

Or something entirely different.
Comment 3 meng 2011-01-26 01:02:40 UTC
Add "i915.powersave=0" into grub.conf, screen still stutter in (drm-intel-next)5d6135012e9a7aa8a9128145ed9315eb916feea2.
Comment 4 Chris Wilson 2011-01-26 10:29:10 UTC
I've definitely seen this on my desktop, but it doesn't seem to be 100% reproducible yet...
Comment 5 Chris Wilson 2011-01-26 13:01:41 UTC
And I've now spent a few hours trying to reproduce that earlier failure.
Comment 6 Chris Wilson 2011-01-27 02:01:12 UTC
*sigh* False alarm. I had a kernel without "reverse-engineer safe snb wm0 values" without which I get random hangs on that machine.

Has anybody followed any additional info on the value that needs to be programmed into wm0?
Comment 7 Chris Wilson 2011-01-27 11:27:17 UTC
When it stutters and you have some i915_hangcheck_elapsed errors:

$ echo 1 | sudo tee /sys/kernel/debug/dri/0/i915_wedged

and attach the resulting /sys/kernel/debug/dri/0/i915_error_state.
Comment 8 meng 2011-02-08 23:43:11 UTC
Created attachment 43150 [details]
The resulting  of i915_error_state
Comment 9 Gordon Jin 2011-02-28 23:19:55 UTC
Chris, any idea? We could try bisect if no better method.
Comment 10 Chris Wilson 2011-03-02 02:11:43 UTC
I've had a chance now to try and reproduce this on a HuronRiver rev09 to no avail. Is this still reproducible on your test machines?
Comment 11 zhao jian 2011-03-02 03:26:46 UTC
(In reply to comment #10)
> I've had a chance now to try and reproduce this on a HuronRiver rev09 to no
> avail. Is this still reproducible on your test machines?

Yes. It still exists on our Huronriver rev09 only with compiz enabled. If not enable compiz, there is no such problem.
Comment 12 Chris Wilson 2011-03-02 03:49:40 UTC
Ok, a couple of approaches: bisection and tracing.

Grab trace-cmd from git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git and see if you can capture the stutter whilst running trace-cmd record -e i915 (you will need to have enabled tracers/ftrace under kernel debugging) and upload the trace.dat. (It will be too big for bugzilla so just place it on the ftp and give me a link.)
Comment 13 meng 2011-03-04 01:39:17 UTC
1.Test in commit e8b2c3c47a53348aebbbeb5322e32937df958793 when enable compiz,the bug 'stutter' does not exist on Huronriver(0216 rev08,0116 rev09).
And the performance is improved.Compare commit 6927faf30920b with commit e8b2c3,rgb10text 923k to 1270k;demo of openarane 41.8 to 89.9 fps.
2.In commit e8b2c3c,system hangs on Huronriver(0216 rev08) when running demo of urbanterror only xinit& by 3-5 times(bug 32752).But it works fine on Huronriver(0116 rev09) with commit e8b2c3c(running the demo 15 times).
Comment 14 Gordon Jin 2011-03-08 00:45:35 UTC
So this original bug is caused by disable semaphore on SNB mobile (workaround bug#32572). And with that extends to desktop (a1656b9090f7008d2941c314f5a64724bea2ae37), this also impacts SNB desktop now.
Comment 15 Chris Wilson 2011-03-08 02:05:49 UTC
.38 behaviour should in theory be the same as .37, so no regression there. .38 includes the i915.semaphores=1 workaround, and drm-intel-next turns them on by default. I think we fully understand the issue and there is no near-term fix better than what we already have in place.

I have a patch "drm/i915: Enable use of GPU semaphores to sync page-flips on SNB" which improves matters further and plans to improve the DDX.

Downgrading priority, as I think all that really remains is thorough testing on .38 with i915.semaphores=1 to see if the root cause preventing enabling by default has been fixed (the FIFO overflow is my guess).
Comment 16 Gordon Jin 2011-04-10 02:30:44 UTC
I see semaphore has been enabled in -backport, so shall we close this one?
Comment 17 Chris Wilson 2011-04-17 00:37:11 UTC
Yes, I've come to the same conclusion that it is time to close this bug.
Comment 18 meng 2011-04-17 17:43:36 UTC
Verified with the commit c94249d2a6911daf74f329e05c42e076af2cd024,it works fine.
Comment 19 Gordon Jin 2011-05-20 00:21:47 UTC
Reopening. GPU semaphores were turned to disabled in 2.6.39.
Comment 20 Gordon Jin 2011-05-22 18:41:16 UTC
*** Bug 37090 has been marked as a duplicate of this bug. ***
Comment 21 Chris Wilson 2011-06-06 04:09:01 UTC
So one of the reasons for SNA is to avoid the need for semaphores for SwapBuffers...
Comment 22 Jesse Barnes 2011-06-16 11:32:43 UTC
Another one for http://lists.freedesktop.org/archives/intel-gfx/2011-June/010979.html
Comment 23 Chris Wilson 2011-06-16 12:08:18 UTC
Nah, we never mentioned the BLT hang-check issue on this one ;-)
Comment 24 Jesse Barnes 2011-06-16 12:24:38 UTC
If enabling semaphores fixes this issue then the real blocker is getting Andrew's machine fixed with semaphores enabled.  And Daniel's patch may do that; we need him to test.  Also missing blit ring interrupts will probably cause stutter even if it doesn't trigger the hangcheck timer?
Comment 25 Chris Wilson 2011-06-16 12:34:04 UTC
One can envision where there is a train of BLT interrupts and we happen to be waiting on the early one whose write goes astray. Not sure that matches the usage here though.

We need semaphores for performance as mesa also uses intelClearWithBlit on SNB...

And Andrew's system hang is more than likely one of the Gnome gpu hangs where the gpu reset killed the box. (That has happened often enough that it turns out I've i915.reset=0 on both my snb boxes.)
Comment 26 meng 2011-06-17 21:41:28 UTC
Test the commit 6a574b5b with above patch, the problems("hang check" and
"stuttered") have been fixed(disable semaphores). 
The 3D performance improved much except 2D.
        6a574b5b(enable semaphores)    6a574b5b(disable semaphores +above patch)
rgb10text(2D)       2380k                                   1100k
aa10text (2D)       2667k                                   1640k
openarena           86.2fps                                 98.8fps
urbanterrror        71.4fps                                 68.5fps
padman              100.7fps                                92fps
nexuiz              20fps                                   19.5fps
Comment 27 Chris Wilson 2011-06-18 01:44:28 UTC
(In reply to comment #26)
>         6a574b5b(enable semaphores)    6a574b5b(disable semaphores +above
> openarena           86.2fps                                 98.8fps

I didn't notice this before. Can we follow this up in a new bug, please?
Comment 28 Chris Wilson 2011-06-18 01:50:49 UTC
(In reply to comment #27)
> (In reply to comment #26)
> >         6a574b5b(enable semaphores)    6a574b5b(disable semaphores +above
> > openarena           86.2fps                                 98.8fps
> 
> I didn't notice this before. Can we follow this up in a new bug, please?

Unless of course, this is the case where openarena+enable_semaphores+patch is >= 98.8fps.
Comment 29 meng 2011-06-18 02:38:04 UTC
Performances on Sugarbay with commit 6a574b5b9.
There are 4 columns(enable/disable semaphores and +/- patch).
2D/3D	        disable	 enable	   disable+patch	 enable+patch
2D-aa10text	1790k	 2650k	      1640k	   2550k
2D-rgb10text	1380k	 2380k        1100k	   2320k
openarena	11	 86.2	      98.8	   103.9 fps
urbanterror	10.5	 71.4	      68.5	   70.9  fps
padman	        12.1	 100.7	      92	           100.3 fps
Comment 30 Florian Mickler 2011-06-30 03:31:16 UTC
A patch referencing this bug report has been merged in Linux v3.0-rc4:

commit 498e720b96379d8ee9c294950a01534a73defcf3
Author: Daniel J Blueman <daniel.blueman@gmail.com>
Date:   Fri Jun 17 11:32:19 2011 -0700

    drm/i915: Fix gen6 (SNB) missed BLT ring interrupts.
Comment 31 Florian Mickler 2011-06-30 03:45:44 UTC
A patch referencing a commit referencing this bug report has been merged in Linux v3.0-rc5:

commit ec6a890dfed7dd245beba5e5bcdfcffbd934c284
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jun 21 18:37:59 2011 +0100

    drm/i915: Apply HWSTAM workaround for BSD ring on SandyBridge
Comment 32 meng 2011-06-30 21:22:51 UTC
Test on(drm-intel-fixes)f01c22fd59aa10a3738ede20fd4b9b6fd1e2eac3,fixed the SNB
problem :"hang check", "screen stuttered" and performance regression when running the demo of 3D games on SNB.So, I close this bug.
But without semaphores,2D performance regression on SNB,pls see bug 38861.
Comment 33 meng 2011-06-30 21:50:04 UTC
Test (drm-intel-fixes)f01c22fd59aa10a3738ede20fd4b9b6fd1e2eac3 on IVB, the problem "hang check", "screen stuttered" still exist. Pls see the bug 38862.
Comment 34 Jari Tahvanainen 2016-12-07 16:19:57 UTC
Closing old verified+fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.