openSUSE Tumbleweed system running 4.18.5-1-default kernel routinely crashes video with screen and cursor freezing then resetting to login prompt. Subsequent login will hang system after a few minutes with no recovery. journalctl showed the following message:
Sep 03 08:58:40 snorelax kernel: [drm] GPU HANG: ecode 9:0:0x85dffffb, in gnome-shell , reason: hang on rcs0, action: reset
Sep 03 08:58:40 snorelax kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Sep 03 08:58:40 snorelax kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Sep 03 08:58:40 snorelax kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Sep 03 08:58:40 snorelax kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Sep 03 08:58:40 snorelax kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Sep 03 08:58:40 snorelax kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Unfortunately, the crash log at the location specified in the error message is 0 bytes in length.
I'll be happy to help debug in any way.
Don't trust ls, just cat /sys/class/drm/card0/error > error and attach it (after a hang).
I did that and the file says "No error state collected". Perhaps I need to turn some flag on to collect this information?
Just tried booting with i915.modeset=0 and video never showed up. Booted with video=vesafb:off and also got hang. Trying now with pti=off and video=vesafb:off
The error state file exists in memory and so only contains the GPU dump from after the hang until reboot (or until it is manually cleared).
OK, when it hangs again I'll try and get the dump and attach it. Thanks.
Created attachment 141432 [details]
GPU hang from /sys/class/drm/card0/error
Here is the error file requested.
Hi Chris, thanks for the fast response. Is there a workaround for this bug; it is significant;y impacting my work. Any flag or anything I can add to the kernel?
Would disabling GLX be a suitable workaround for this issue?
There's some hope that the bug is already fixed in recent mesa (though a few like this still remain). To avoid the issue, you have to not use mesa for your system compositor or ddx; switch gnome-shell for something like openbox and don't use -modesetting!
Created attachment 141441 [details]
GPU hang from /sys/class/drm/card0/error with GLX disabled
Tried again with kernel 4.18.7 and Mesa 18.1.7 and the bug is still present.
Hi Mark. Could you please clarify the steps, lead to the crash? Any apps in use or it is random crashes during navigating in the system?
It's random during navigation. Apps in use are terminal, Chrome and IntelliJ.
Not sure if this matters but thought I'd mention it. My setup is a tri-head setup using hybrid graphics with both Intel and Nvidia chips.
00:02.0 VGA compatible controller: Intel Corporation HD Graphics P530 (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M2000M] (rev a2)
Just tried again with kernel 4.18.9 and Mesa-18.1.7-208.1.x86_64. Problem still exists.
Created attachment 141928 [details]
Latest GPU hang on kernel 4.18.9
Any update here? No progress in last 4 months?