Bug 53322

Summary: [IVB Regression]GPU hung when run demos of smokin-guns and doom3
Product: DRI Reporter: ye.tian <yex.tian>
Component: DRM/IntelAssignee: Daniel Vetter <daniel>
Status: CLOSED FIXED QA Contact:
Severity: major    
Priority: high CC: ben, chris, daniel, florian, jbarnes, mengmeng.meng, yex.tian
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
i915_error_state
none
Xorg.0.log
none
Apply post-sync write for TLB invalidate none

Description ye.tian 2012-08-10 06:50:46 UTC
System Environment:       
--------------------------
Libdrm:   (master)libdrm-2.4.37-26-g93fef04b1e3a83e2f884880ed1c3395f67b038ab
Mesa:     (master)34665381713249c29b7da5028396222dfea477c2
Xserver:	  (master)xorg-server-1.12.99.904
Kernel:   (drm-intel-next-queued) 65bccb5c708bd9f00d24f041f4f7c45130359448

Bug detailed description:
-----------------------------
GPU hung when run demos of smokin-guns and doom3 by 3-5 times on IVB.
It's kernel regression.
The good kernel commit:(drm-intel-next-queued)ab3951eb74e7c33a.
Please see the attachment i915_error_state and Xorg.0.log.

dmesg:
[ 245.379586] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Reproduce steps:
----------------------------
1. xinit&
2. gnome-session
3. ./smokinguns.x86_64  +timedemo 1 +set demodone "quit" +set demoloop1 "demo pts; set nextdemo vstr demodone" +vstr demoloop1 +set r_customwidth 1920 +set r_customheight 1080 (run 3-5 times)
Comment 1 ye.tian 2012-08-10 06:52:00 UTC
Created attachment 65369 [details]
i915_error_state
Comment 2 ye.tian 2012-08-10 06:52:35 UTC
Created attachment 65370 [details]
Xorg.0.log
Comment 3 Chris Wilson 2012-08-10 07:16:50 UTC
Post-sync write for TLB invalidate, is this the reason you are required?
Comment 4 ye.tian 2012-08-10 07:36:04 UTC
(In reply to comment #3)
> Post-sync write for TLB invalidate, is this the reason you are required?

I don't understand your meaning,Can you explain it?
Comment 5 Chris Wilson 2012-08-10 07:55:59 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > Post-sync write for TLB invalidate, is this the reason you are required?
> 
> I don't understand your meaning,Can you explain it?

Sorry, it was a note to self, Daniel, Ben et al.

There's a requirement mention in the bspec and in the simulators that we should be doing a post-sync write when performing TLB invalidates. This dies during the invalidate pipe-control after having completed a flush pipe-control and dword writes, so the missing sync seems a very real possibility.
Comment 6 Chris Wilson 2012-08-10 08:34:16 UTC
Created attachment 65373 [details] [review]
Apply post-sync write for TLB invalidate
Comment 7 ye.tian 2012-08-13 01:27:37 UTC
(In reply to comment #6)
> Created attachment 65373 [details] [review] [review]
> Apply post-sync write for TLB invalidate

Test commit 65bccb5c70 with above patch, it works well.
Comment 8 Ben Widawsky 2012-08-13 01:41:15 UTC
+1 for the simulator then?
Comment 9 Daniel Vetter 2012-08-14 07:48:09 UTC
Patch merged to -fixes:

commit 7d54a904285b6e780291b91a518267bec5591913
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Aug 10 10:18:10 2012 +0100

    drm/i915: Apply post-sync write for pipe control invalidates
Comment 10 ye.tian 2012-08-14 08:06:34 UTC
(In reply to comment #9)
> Patch merged to -fixes:
> 
> commit 7d54a904285b6e780291b91a518267bec5591913
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri Aug 10 10:18:10 2012 +0100
> 
>     drm/i915: Apply post-sync write for pipe control invalidates



The issue also exists on drm-intel-next-queued, please merged to that branch.
Comment 11 ye.tian 2012-08-17 08:53:42 UTC
Verified with the commit 7d54a904285b6e780 on drm-intel-next-queued.
Comment 12 Florian Mickler 2012-10-15 20:48:59 UTC
A patch referencing a commit referencing this bug report has been merged in Linux v3.7-rc1:

commit ac82ea2e97a32f9c49d0746874b4cd1d8904d10f
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Oct 1 14:27:04 2012 +0100

    drm/i915: Actually invalidate the TLB for the SandyBridge HW contexts w/a

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.