Bug 51218

Summary: [IVB] " *ERROR* render ring initialization failed" while running I-G-T/ZZ_hangman
Product: DRI Reporter: Guang Yang <guang.a.yang>
Component: DRM/IntelAssignee: Daniel Vetter <daniel>
Status: CLOSED FIXED QA Contact:
Severity: normal    
Priority: medium CC: ben, chris, daniel, jbarnes
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
running hangman debug info
none
running hangman with reset-fail's debug info none

Description Guang Yang 2012-06-18 22:30:54 UTC
Created attachment 63205 [details]
running hangman debug info

System Environment:
--------------------------
Platform:        Ivybridge
Kernel: (drm-intel-next-queued)33ee6d190ce8e4c33a7caf7d75618feb97936517

Bug detailed description:
-------------------------
Running i-g-t tool ZZ_hangman, casuses " *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000"
Comment 1 Daniel Vetter 2012-06-19 06:41:29 UTC
I can reproduce this on my ivb, but it takes a few loops of ZZ_hangman. How reliably can you hit this?
Comment 2 Guang Yang 2012-06-19 20:37:02 UTC
(In reply to comment #1)
> I can reproduce this on my ivb, but it takes a few loops of ZZ_hangman. How
> reliably can you hit this?
 Every time I run hangman will cause this error.
Comment 3 Daniel Vetter 2012-06-20 00:29:49 UTC
I've noticed that disabling rc6 with i915.i915_enable_rc6=0 makes ZZ_hangman completely stable for me on both ivb&snb, even when I run it in a loop. Note that you need to have a sleep 10 in that loop, otherwise the kernel complains about the gpu hanging too fast and stops accepting batchbuffer commands. I usually do

while tests/ZZ_hangman; do sleep 10; done

that will stop as soon as the gpu died. Can you confirm that disabling rc6 makes gpu reset stable for you, too?
Comment 4 Guang Yang 2012-06-20 01:23:42 UTC
(In reply to comment #3)
> I've noticed that disabling rc6 with i915.i915_enable_rc6=0 makes ZZ_hangman
> completely stable for me on both ivb&snb, even when I run it in a loop. Note
> that you need to have a sleep 10 in that loop, otherwise the kernel complains
> about the gpu hanging too fast and stops accepting batchbuffer commands. I
> usually do
> 
> while tests/ZZ_hangman; do sleep 10; done
> 
> that will stop as soon as the gpu died. Can you confirm that disabling rc6
> makes gpu reset stable for you, too?
Yes, I disable rc6 , and the gpu reset turn to be stable.the dmesg shows:
[  302.188495] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[  302.188551] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[  302.191248] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off
Comment 5 Daniel Vetter 2012-06-22 04:06:20 UTC
I've created some patches to make the gpu reset more stable. Can you please test my reset-fail git branch from my personal git repo?

http://cgit.freedesktop.org/~danvet/drm/log/?h=reset-fail
Comment 6 Guang Yang 2012-06-25 02:01:37 UTC
Created attachment 63426 [details]
running hangman with reset-fail's debug info

(In reply to comment #5)
> I've created some patches to make the gpu reset more stable. Can you please
> test my reset-fail git branch from my personal git repo?
> 
> http://cgit.freedesktop.org/~danvet/drm/log/?h=reset-fail
I try reset-fail with its latest commit:
Kernel: (reset-fail)aefaf55d4cbb279d5029fdaf428edd22a83f575f
and attach the dmesg
Comment 7 Daniel Vetter 2012-06-25 02:04:34 UTC
Looks like it works now with -fixes merged in. Can you confirm that the gpu works after running the hangman test (i.e. running gl apps doesn't crash it)?
Comment 8 Guang Yang 2012-06-25 02:17:45 UTC
(In reply to comment #7)
> Looks like it works now with -fixes merged in. Can you confirm that the gpu
> works after running the hangman test (i.e. running gl apps doesn't crash it)?
I run glxgears after hangman test, it can work well.
Comment 9 Daniel Vetter 2012-07-05 01:07:38 UTC
Ok, patches are now all merged to -queued.
Comment 10 Elizabeth 2017-10-06 14:49:37 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.