Bug 90183 - [HSW] GPU HANG: ecode 7:0:0x86dfbff9, in compiz [1938], reason: Ring hung, action: reset
Summary: [HSW] GPU HANG: ecode 7:0:0x86dfbff9, in compiz [1938], reason: Ring hung, ac...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-26 12:07 UTC by Jens
Modified: 2017-02-23 13:58 UTC (History)
2 users (show)

See Also:
i915 platform: HSW
i915 features: GPU hang, power/suspend-resume


Attachments
dmesg boot log including error (107.96 KB, text/plain)
2015-04-26 12:07 UTC, Jens
no flags Details
card error log (296.11 KB, text/plain)
2015-04-26 12:08 UTC, Jens
no flags Details
dmesg and card0 error of kernel-4.0.0-drm-nightly-20150411 (65.14 KB, text/plain)
2015-05-01 22:06 UTC, Jens
no flags Details

Description Jens 2015-04-26 12:07:58 UTC
Created attachment 115341 [details]
dmesg boot log including error

This happened after resuming from S4 hibernate. Afterwards the system was unstable (worked for a couple minutes then panicked with blinking keyboard LEDs).

Attaching logs as per syslog instructions.

Kernel: 4.0.0-rc7+ (freedesktop git 631c2f8cb).

I also tried 4.0.0+ (freedesktop git 92bb36c80) but it wouldn't even boot on my machine (blank screen).

3.19.0+ (freedesktop git 89271faca1) works fine including S4.

Hardware: MSI-7817 Haswell chipset, i5-4570 CPU. Ubuntu 14.04 with custom kernel.

[    0.000000] Linux version 4.0.0-rc7+ (root@linuxkiste) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #7 SMP Sat Apr 11 19:43:31 CEST 2015
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.0.0-rc7+ root=/dev/mapper/linuxkiste--vg-root ro splash quiet vt.handoff=7
Comment 1 Jens 2015-04-26 12:08:27 UTC
Created attachment 115342 [details]
card error log
Comment 2 Chris Wilson 2015-04-26 12:19:14 UTC
The GPU hang looks suspiciously like it resulted from the resume. The final panic is just an explosion in drm core when releasing fb on a fd. Both would be good to track down.

Can you try bisecting, and just skip everytime you don't get a working boot. We can hope that we have enough non-skips to identify the bug.
Comment 3 Jens 2015-05-01 20:22:58 UTC
Will do.
Comment 4 Jens 2015-05-01 22:04:02 UTC
To be quick (and to avoid any errors during compilation that I might introduce) I tested the mainline kernels available from Ubuntu at kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly. Here are my results so far.

2015-04-10-vivid: Boots fine and does S4 suspend/resume fine three times in a row (so far) including restauration of X11 desktop. No load testing yet.

2015-05-11-vivid: Boots fine, S4 suspend works as well, after resume I get the "GPU HANG" message with a backtrace. and X11 seems to restart (after ~10seconds I get kicked back to the login screen). 

2015-04-14-vivid: Boots fine, S4 suspend works as well, after resume I get the "GPU HANG" message with a backtrace. and X11 seems to restart (after ~10seconds I get kicked back to the login screen).

2015-04-15-vivid: Boots fine, S4 suspend works as well, after resume I get the "GPU HANG" message with a backtrace. and X11 seems to restart (after ~10seconds I get kicked back to the login screen).

2015-04-21-vivid: Does not boot. Last message on console: "Ignoring BGRT: Invalid status 0 (expected 1). But I think all the other kernels show this too but it is replaced too quickly by the desktop.

2015-04-29-vivid: Does not boot cleanly - desktop does not appear, but I can get to the text console. In syslog I get the "GPU HANG: ecode 7:2:0x00d7ffe9, in Xorg" message, even without a S4 suspend. I saved the full dmesg output if required. After a S4 suspend, the symptom is the same as in the 2015-04-21 kernel (does not boot).

2015-05-01-vivid: Boots fine, but when I do S4 suspend via 'pm-hibernate', the machine seems to shut off but does not power down (all fans are quiet but the power LED stays on). After a forced power cycle (4 seconds on power button) there is a normal reboot. According to syslog the machine did not even try to resume.


Does this help already? Or do I need to start bisecting between 2015-04-10 and -11? Or test something else altogether?

Thank you for supporting!
Comment 5 Jens 2015-05-01 22:06:53 UTC
Created attachment 115507 [details]
dmesg and card0 error of kernel-4.0.0-drm-nightly-20150411

Here are the logs and card0 error output of the 20150411 kernel.
Comment 6 Ander Conselvan de Oliveira 2015-05-12 10:55:50 UTC
(In reply to Jens from comment #4)
> Does this help already? Or do I need to start bisecting between 2015-04-10
> and -11? Or test something else altogether?

Doing that bisect would be helpful.
Comment 7 Ricardo 2017-02-22 15:47:53 UTC
is this problem still occurs? if so please update the bug, if there is no response within 30 days the bug will be closed
Comment 8 Jens 2017-02-23 13:07:51 UTC
Cannot reproduce any more with 4.10.0-999-generic (2017.01.05-21:00) built by Ubuntu (mainline daily kernel).
Comment 9 yann 2017-02-23 13:57:16 UTC
(In reply to Jens from comment #8)
> Cannot reproduce any more with 4.10.0-999-generic (2017.01.05-21:00) built
> by Ubuntu (mainline daily kernel).

Thanks Jens


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.