Bug 79248

Summary: [BDW] kernel v3.15-rc7: GPU HANG: ecode 0:0x0fdf00ac, in Xorg [1060], reason: Ring hung, action: reset
Product: DRI Reporter: EvaWang <evawang>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: medium CC: alban.crequy, ben, intel-gfx-bugs
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
/sys/class/drm/card0/error
none
dmesg with kernel 3.15-rc8
none
dmesg none

Description EvaWang 2014-05-26 07:40:50 UTC
Created attachment 99837 [details]
dmesg

[BDW] kernel v3.15-rc7: GPU HANG: ecode 0:0x0fdf00ac, in Xorg [1060], reason: Ring hung, action: reset
This is only happened on broadwell platform. 
Attached the /sys/class/drm/card0/error and dmesg
Please help to check it. if you need any information, please let me know. Thanks!
Comment 1 EvaWang 2014-05-26 07:41:29 UTC
Created attachment 99838 [details]
/sys/class/drm/card0/error
Comment 2 Ben Widawsky 2014-05-28 18:49:52 UTC
Is this a regression? If so, please bisect.
Comment 3 Ben Widawsky 2014-06-03 23:06:59 UTC
Eva, I'm still waiting for some information from you.

1. Is this reproducible?
   1a. If yes, please submit dmesg with drm.debug=0x6
   1b. If no, please lower the priority
2. Is this a regression?
   2a. If yes, please bisect.
3. What were you doing at the time of the hang?
4. Did the system recover after the hang.

As for the bug itself, the PDPs seem to have lost their minds on the render ring. 

5. Please try to reproduce with rc6 disabled.
Comment 4 EvaWang 2014-06-04 03:29:07 UTC
> 1. Is this reproducible?
>    1a. If yes, please submit dmesg with drm.debug=0x6
        Eva:Yes, please refer attachment dmesg_315rc8.log
>    1b. If no, please lower the priority
> 2. Is this a regression?
>    2a. If yes, please bisect.
        Eva: It seems 3.15 has the issue, but 3.14 has no issue. we will spend hours to verify whether it is a regression.  which branch  do you suggest to bisect?
> 3. What were you doing at the time of the hang?
        Eva: I enter to tty4 to get the dmesg and error. if do not press tty4, system will hang and can't enter to tty even if press tty4.
> 4. Did the system recover after the hang.
       Eva: It can't recover.

> 5. Please try to reproduce with rc6 disabled.
       Eva: It still happen with rc6 disabled.
Comment 5 EvaWang 2014-06-04 03:29:49 UTC
Created attachment 100380 [details]
dmesg with kernel 3.15-rc8
Comment 6 EvaWang 2014-06-11 02:55:05 UTC
3.15 formal kernel also has the issue.
Comment 7 EvaWang 2014-07-17 02:54:34 UTC
Created attachment 102956 [details]
dmesg

We just checked kernel 3.16 rc5 on Broadwell platform, it still can't enter to desktop, but there is no GPU HANG messages. 
Please refer attachment dmesg with drm.debug=0xe.

3.16 rc5 is OK on Baytrail-M and HSWwell.
Comment 8 Jani Nikula 2014-09-08 10:14:35 UTC
Please try 3.17-rc4.
Comment 9 EvaWang 2014-09-09 03:28:20 UTC
(In reply to comment #8)
> Please try 3.17-rc4.

3.17-rc4 kernel can enter to desktop and it is OK.

BTW, 3.15 and 3.16 kernel both version has the issue, but 3.17 kernel fixed it.

Thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.