Bug 107808 - [skl] GPU HANG: ecode 9:0:0x85dffffb, in gnome-shell
Summary: [skl] GPU HANG: ecode 9:0:0x85dffffb, in gnome-shell
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-03 14:20 UTC by Mark Thomas
Modified: 2019-02-18 13:08 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
GPU hang from /sys/class/drm/card0/error (38.44 KB, text/plain)
2018-09-03 18:14 UTC, Mark Thomas
Details
GPU hang from /sys/class/drm/card0/error with GLX disabled (41.00 KB, text/plain)
2018-09-04 11:21 UTC, Mark Thomas
Details
Latest GPU hang on kernel 4.18.9 (44.60 KB, text/plain)
2018-10-07 09:46 UTC, Mark Thomas
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Thomas 2018-09-03 14:20:33 UTC
openSUSE Tumbleweed system running 4.18.5-1-default kernel routinely crashes video with screen and cursor freezing then resetting to login prompt.  Subsequent login will hang system after a few minutes with no recovery.  journalctl showed the following message:

Sep 03 08:58:40 snorelax kernel: [drm] GPU HANG: ecode 9:0:0x85dffffb, in gnome-shell [3314], reason: hang on rcs0, action: reset
Sep 03 08:58:40 snorelax kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Sep 03 08:58:40 snorelax kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Sep 03 08:58:40 snorelax kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Sep 03 08:58:40 snorelax kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Sep 03 08:58:40 snorelax kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Sep 03 08:58:40 snorelax kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0

Unfortunately, the crash log at the location specified in the error message is 0 bytes in length.

I'll be happy to help debug in any way.
Comment 1 Chris Wilson 2018-09-03 14:23:51 UTC
Don't trust ls, just cat /sys/class/drm/card0/error > error and attach it (after a hang).
Comment 2 Mark Thomas 2018-09-03 14:32:48 UTC
I did that and the file says "No error state collected".  Perhaps I need to turn some flag on to collect this information?
Comment 3 Mark Thomas 2018-09-03 14:43:24 UTC
Just tried booting with i915.modeset=0 and video never showed up.  Booted with video=vesafb:off and also got hang.  Trying now with pti=off and video=vesafb:off
Comment 4 Chris Wilson 2018-09-03 14:44:24 UTC
The error state file exists in memory and so only contains the GPU dump from after the hang until reboot (or until it is manually cleared).
Comment 5 Mark Thomas 2018-09-03 14:47:29 UTC
OK, when it hangs again I'll try and get the dump and attach it.  Thanks.
Comment 6 Mark Thomas 2018-09-03 18:14:30 UTC
Created attachment 141432 [details]
GPU hang from /sys/class/drm/card0/error

Here is the error file requested.
Comment 7 Mark Thomas 2018-09-03 18:35:57 UTC
Hi Chris, thanks for the fast response.  Is there a workaround for this bug; it is significant;y impacting my work.  Any flag or anything I can add to the kernel?
Comment 8 Mark Thomas 2018-09-04 10:55:47 UTC
Would disabling GLX be a suitable workaround for this issue?
Comment 9 Chris Wilson 2018-09-04 11:02:37 UTC
There's some hope that the bug is already fixed in recent mesa (though a few like this still remain). To avoid the issue, you have to not use mesa for your system compositor or ddx; switch gnome-shell for something like openbox and don't use -modesetting!
Comment 10 Mark Thomas 2018-09-04 11:21:01 UTC
Created attachment 141441 [details]
GPU hang from /sys/class/drm/card0/error with GLX disabled
Comment 11 Mark Thomas 2018-09-20 09:32:29 UTC
Tried again with kernel 4.18.7 and Mesa 18.1.7 and the bug is still present.
Comment 12 Denis 2018-09-20 09:39:18 UTC
Hi Mark. Could you please clarify the steps, lead to the crash? Any apps in use or it is random crashes during navigating in the system?
Comment 13 Mark Thomas 2018-09-20 09:45:40 UTC
It's random during navigation.  Apps in use are terminal, Chrome and IntelliJ.
Comment 14 Mark Thomas 2018-09-25 12:38:25 UTC
Not sure if this matters but thought I'd mention it.  My setup is a tri-head setup using hybrid graphics with both Intel and Nvidia chips.

00:02.0 VGA compatible controller: Intel Corporation HD Graphics P530 (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M2000M] (rev a2)
Comment 15 Mark Thomas 2018-10-07 09:41:19 UTC
Just tried again with kernel 4.18.9 and Mesa-18.1.7-208.1.x86_64.  Problem still exists.
Comment 16 Mark Thomas 2018-10-07 09:46:53 UTC
Created attachment 141928 [details]
Latest GPU hang on kernel 4.18.9
Comment 17 Mark Thomas 2019-02-18 13:08:21 UTC
Any update here?  No progress in last 4 months?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.