Bug 91976

Summary:

GPU HANG: ecode 4:1:0x01000000, reason: Ring hung, action: reset

Product:

DRI

Reporter:

Jiri Kosina <jikos>

Component:

DRM/Intel

Assignee:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Status:

CLOSED FIXED

QA Contact:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Severity:

normal

Priority:

medium

CC:

intel-gfx-bugs, manuelkrause

Version:

unspecified

Hardware:

Other

OS:

All

Whiteboard:

i915 platform:

G45

i915 features:

GPU hang

Attachments:

Description	Flags
/sys/class/drm/card0/error contents after resume	none
dmesg log of a similar GPU hang	none
_sys_class_drm_card0_error.20151127.MK.tar.bz2	none

Description Jiri Kosina 2015-09-11 11:50:15 UTC

Created attachment 118215 [details]
/sys/class/drm/card0/error contents after resume

Currentl mainline kernel (HEAD == b0a1ea51) gives me this on resume from hibernation:

[   78.816182] [drm] stuck on bsd ring
[   78.824503] [drm] GPU HANG: ecode 4:1:0x01000000, reason: Ring hung, action: reset
[   78.824779] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   78.824784] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   78.824789] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   78.824793] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   78.824798] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   78.826876] drm/i915: Resetting chip after gpu hang


Attaching /sys/class/drm/card0/error contents as well.

Comment 1 Manuel Krause 2015-11-27 18:35:21 UTC

Created attachment 120181 [details]
dmesg log of a similar GPU hang

It's taken long time for me to find this BUG. For months I've been searching to find reasons and solutions for my resumes from hibernation (suspend-to-disk) sometimes fail and sometimes not. 

Currently I'm using kernel 4.3.0 (vanilla from openSUSE) + the actual vrq patch from Alfred Chen, the TuxOnIce patch and the BFQ disk scheduler patches v7r8. As I already annoyed others with the annoying unreliability to always resume properly (sometimes 25+ attempts needed to get the written image back working, but sometimes the first one comes up well, in one row of hibernations without reboot), I decided to compile the i915 into the kernel, yesterday. One problem with the TuxOnIce resume attempts failing: I never got any failure messages. Either it worked or the machine locked up at "Doing atomic copy/restore." -- where there seems to occur a mode switch in the working case (module version).

With the compiled-in i915 I do at least get some messages about what's maybe going on. And... I'm a bit lucky to have done this, as the needed GPU resets seem to make resuming more relyable on here.

This is a HP Compaq 6730b laptop running openSUSE 13.1, but with actualized drivers. For Xorg, Mesa and GFX related .rpms I use pontostroy's repository.

Here I attach the dmesg log of a series of resumes from hibernation.

In the next attachment I'd add the /sys/class/drm/card0/error crash dump.

Comment 2 Manuel Krause 2015-11-27 18:45:53 UTC

Created attachment 120183 [details]
_sys_class_drm_card0_error.20151127.MK.tar.bz2

This is the latest gathered crash dump.

Somewhere I've read that if the IPEHR value is identical (here 0x01000000) it's likely to be the same BUG. So, I hope to attach to the right one.

O.k., please let me know, if there's needed any further information.

Thank you for your time and work in advance,
best regards,

Manuel Krause

Comment 3 yann 2016-09-27 14:16:28 UTC

We seem to have neglected the bug a bit, apologies.

It sounds that hung happen in bsd ring. There were improvements pushed in kernel and Mesa that will benefit to your system, so please re-test with latest kernel & Mesa to see if this issue is still occurring.

Comment 4 yann 2016-11-18 12:53:42 UTC

(In reply to yann from comment #3)
> We seem to have neglected the bug a bit, apologies.
> 
> It sounds that hung happen in bsd ring. There were improvements pushed in
> kernel and Mesa that will benefit to your system, so please re-test with
> latest kernel & Mesa to see if this issue is still occurring.

Timeout. Assuming that it is fixed by now. If this is not the case, please re-test with latest kernel & Mesa (12-13) to see if this issue is still occurring since there were improvements pushed in kernel and Mesa that will benefit to your system, and fill a new bug.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.