Bug 91976 - GPU HANG: ecode 4:1:0x01000000, reason: Ring hung, action: reset
Summary: GPU HANG: ecode 4:1:0x01000000, reason: Ring hung, action: reset
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-09-11 11:50 UTC by Jiri Kosina
Modified: 2016-11-18 12:53 UTC (History)
2 users (show)

See Also:
i915 platform: G45
i915 features: GPU hang


Attachments
/sys/class/drm/card0/error contents after resume (87.12 KB, text/plain)
2015-09-11 11:50 UTC, Jiri Kosina
no flags Details
dmesg log of a similar GPU hang (99.96 KB, text/plain)
2015-11-27 18:35 UTC, Manuel Krause
no flags Details
_sys_class_drm_card0_error.20151127.MK.tar.bz2 (107.87 KB, application/x-bzip)
2015-11-27 18:45 UTC, Manuel Krause
no flags Details

Description Jiri Kosina 2015-09-11 11:50:15 UTC
Created attachment 118215 [details]
/sys/class/drm/card0/error contents after resume

Currentl mainline kernel (HEAD == b0a1ea51) gives me this on resume from hibernation:

[   78.816182] [drm] stuck on bsd ring
[   78.824503] [drm] GPU HANG: ecode 4:1:0x01000000, reason: Ring hung, action: reset
[   78.824779] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   78.824784] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   78.824789] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   78.824793] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   78.824798] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   78.826876] drm/i915: Resetting chip after gpu hang


Attaching /sys/class/drm/card0/error contents as well.
Comment 1 Manuel Krause 2015-11-27 18:35:21 UTC
Created attachment 120181 [details]
dmesg log of a similar GPU hang

It's taken long time for me to find this BUG. For months I've been searching to find reasons and solutions for my resumes from hibernation (suspend-to-disk) sometimes fail and sometimes not. 

Currently I'm using kernel 4.3.0 (vanilla from openSUSE) + the actual vrq patch from Alfred Chen, the TuxOnIce patch and the BFQ disk scheduler patches v7r8. As I already annoyed others with the annoying unreliability to always resume properly (sometimes 25+ attempts needed to get the written image back working, but sometimes the first one comes up well, in one row of hibernations without reboot), I decided to compile the i915 into the kernel, yesterday. One problem with the TuxOnIce resume attempts failing: I never got any failure messages. Either it worked or the machine locked up at "Doing atomic copy/restore." -- where there seems to occur a mode switch in the working case (module version).

With the compiled-in i915 I do at least get some messages about what's maybe going on. And... I'm a bit lucky to have done this, as the needed GPU resets seem to make resuming more relyable on here.

This is a HP Compaq 6730b laptop running openSUSE 13.1, but with actualized drivers. For Xorg, Mesa and GFX related .rpms I use pontostroy's repository.

Here I attach the dmesg log of a series of resumes from hibernation.

In the next attachment I'd add the /sys/class/drm/card0/error crash dump.
Comment 2 Manuel Krause 2015-11-27 18:45:53 UTC
Created attachment 120183 [details]
_sys_class_drm_card0_error.20151127.MK.tar.bz2

This is the latest gathered crash dump.

Somewhere I've read that if the IPEHR value is identical (here 0x01000000) it's likely to be the same BUG. So, I hope to attach to the right one.

O.k., please let me know, if there's needed any further information.

Thank you for your time and work in advance,
best regards,

Manuel Krause
Comment 3 yann 2016-09-27 14:16:28 UTC
We seem to have neglected the bug a bit, apologies.

It sounds that hung happen in bsd ring. There were improvements pushed in kernel and Mesa that will benefit to your system, so please re-test with latest kernel & Mesa to see if this issue is still occurring.
Comment 4 yann 2016-11-18 12:53:42 UTC
(In reply to yann from comment #3)
> We seem to have neglected the bug a bit, apologies.
> 
> It sounds that hung happen in bsd ring. There were improvements pushed in
> kernel and Mesa that will benefit to your system, so please re-test with
> latest kernel & Mesa to see if this issue is still occurring.

Timeout. Assuming that it is fixed by now. If this is not the case, please re-test with latest kernel & Mesa (12-13) to see if this issue is still occurring since there were improvements pushed in kernel and Mesa that will benefit to your system, and fill a new bug.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.