Bug 90183

Summary: [HSW] GPU HANG: ecode 7:0:0x86dfbff9, in compiz [1938], reason: Ring hung, action: reset
Product: DRI Reporter: Jens <jens-bugs.freedesktop.org>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs, k.vrban
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: HSW i915 features: GPU hang, power/suspend-resume
Attachments:
Description Flags
dmesg boot log including error
none
card error log
none
dmesg and card0 error of kernel-4.0.0-drm-nightly-20150411 none

Description Jens 2015-04-26 12:07:58 UTC
Created attachment 115341 [details]
dmesg boot log including error

This happened after resuming from S4 hibernate. Afterwards the system was unstable (worked for a couple minutes then panicked with blinking keyboard LEDs).

Attaching logs as per syslog instructions.

Kernel: 4.0.0-rc7+ (freedesktop git 631c2f8cb).

I also tried 4.0.0+ (freedesktop git 92bb36c80) but it wouldn't even boot on my machine (blank screen).

3.19.0+ (freedesktop git 89271faca1) works fine including S4.

Hardware: MSI-7817 Haswell chipset, i5-4570 CPU. Ubuntu 14.04 with custom kernel.

[    0.000000] Linux version 4.0.0-rc7+ (root@linuxkiste) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #7 SMP Sat Apr 11 19:43:31 CEST 2015
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.0.0-rc7+ root=/dev/mapper/linuxkiste--vg-root ro splash quiet vt.handoff=7
Comment 1 Jens 2015-04-26 12:08:27 UTC
Created attachment 115342 [details]
card error log
Comment 2 Chris Wilson 2015-04-26 12:19:14 UTC
The GPU hang looks suspiciously like it resulted from the resume. The final panic is just an explosion in drm core when releasing fb on a fd. Both would be good to track down.

Can you try bisecting, and just skip everytime you don't get a working boot. We can hope that we have enough non-skips to identify the bug.
Comment 3 Jens 2015-05-01 20:22:58 UTC
Will do.
Comment 4 Jens 2015-05-01 22:04:02 UTC
To be quick (and to avoid any errors during compilation that I might introduce) I tested the mainline kernels available from Ubuntu at kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly. Here are my results so far.

2015-04-10-vivid: Boots fine and does S4 suspend/resume fine three times in a row (so far) including restauration of X11 desktop. No load testing yet.

2015-05-11-vivid: Boots fine, S4 suspend works as well, after resume I get the "GPU HANG" message with a backtrace. and X11 seems to restart (after ~10seconds I get kicked back to the login screen). 

2015-04-14-vivid: Boots fine, S4 suspend works as well, after resume I get the "GPU HANG" message with a backtrace. and X11 seems to restart (after ~10seconds I get kicked back to the login screen).

2015-04-15-vivid: Boots fine, S4 suspend works as well, after resume I get the "GPU HANG" message with a backtrace. and X11 seems to restart (after ~10seconds I get kicked back to the login screen).

2015-04-21-vivid: Does not boot. Last message on console: "Ignoring BGRT: Invalid status 0 (expected 1). But I think all the other kernels show this too but it is replaced too quickly by the desktop.

2015-04-29-vivid: Does not boot cleanly - desktop does not appear, but I can get to the text console. In syslog I get the "GPU HANG: ecode 7:2:0x00d7ffe9, in Xorg" message, even without a S4 suspend. I saved the full dmesg output if required. After a S4 suspend, the symptom is the same as in the 2015-04-21 kernel (does not boot).

2015-05-01-vivid: Boots fine, but when I do S4 suspend via 'pm-hibernate', the machine seems to shut off but does not power down (all fans are quiet but the power LED stays on). After a forced power cycle (4 seconds on power button) there is a normal reboot. According to syslog the machine did not even try to resume.


Does this help already? Or do I need to start bisecting between 2015-04-10 and -11? Or test something else altogether?

Thank you for supporting!
Comment 5 Jens 2015-05-01 22:06:53 UTC
Created attachment 115507 [details]
dmesg and card0 error of kernel-4.0.0-drm-nightly-20150411

Here are the logs and card0 error output of the 20150411 kernel.
Comment 6 Ander Conselvan de Oliveira 2015-05-12 10:55:50 UTC
(In reply to Jens from comment #4)
> Does this help already? Or do I need to start bisecting between 2015-04-10
> and -11? Or test something else altogether?

Doing that bisect would be helpful.
Comment 7 Ricardo 2017-02-22 15:47:53 UTC
is this problem still occurs? if so please update the bug, if there is no response within 30 days the bug will be closed
Comment 8 Jens 2017-02-23 13:07:51 UTC
Cannot reproduce any more with 4.10.0-999-generic (2017.01.05-21:00) built by Ubuntu (mainline daily kernel).
Comment 9 yann 2017-02-23 13:57:16 UTC
(In reply to Jens from comment #8)
> Cannot reproduce any more with 4.10.0-999-generic (2017.01.05-21:00) built
> by Ubuntu (mainline daily kernel).

Thanks Jens

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.