Bug 104520 - Intermittent X crashes: GPU HANG: ecode 9:0:0x85dffffb, in Xorg [443], reason: Hang on rcs0, action: reset
Summary: Intermittent X crashes: GPU HANG: ecode 9:0:0x85dffffb, in Xorg [443], reason...
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i915 (show other bugs)
Version: 17.3
Hardware: x86-64 (AMD64) Linux (All)
: highest major
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-07 08:16 UTC by Amy
Modified: 2018-12-06 10:00 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
GPU error dup. (53.53 KB, text/plain)
2018-01-07 08:16 UTC, Amy
Details
Dmesg log (58.80 KB, text/plain)
2018-01-07 08:17 UTC, Amy
Details
Glxinfo output. (31.33 KB, text/plain)
2018-01-07 08:24 UTC, Amy
Details
GPU crash dump from /sys/class/drm/card0/error (39.76 KB, text/plain)
2018-01-30 18:02 UTC, Michael Weitzel
Details
/sys/class/drm/card0/error file (129.84 KB, text/plain)
2018-01-31 14:06 UTC, Eric Blau
Details
dmesg output (62.15 KB, text/plain)
2018-02-27 08:41 UTC, Emilio J. Padrón
Details
xorg log (43.42 KB, text/plain)
2018-02-27 08:41 UTC, Emilio J. Padrón
Details
/sys/class/drm/card0/error (38.21 KB, text/plain)
2018-02-27 08:51 UTC, Emilio J. Padrón
Details
/sys/class/drm/card0/error (23.44 KB, text/plain)
2018-04-18 10:29 UTC, hfekih
Details
Log dump from /sys/class/drm/card0/error after GPU hang (as printed in dmesg output). (696.49 KB, text/plain)
2018-06-11 21:26 UTC, Alif Wahid
Details
/sys/class/drm/card0/error (48.07 KB, text/plain)
2018-12-06 10:00 UTC, John M.
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Amy 2018-01-07 08:16:36 UTC
Created attachment 136591 [details]
GPU error dup.

1) startx
2) loading i3 and i3 scripts (loading an xterm and palemoon) intermittently crashes
Result: GPU hangs, and eventually X crashes with this message in the dmesg.

[drm] GPU HANG: ecode 9:0:0x85dffffb, in Xorg [443], reason: Hang on rcs0, action: reset
[  561.340148] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  561.340148] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  561.340148] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  561.340149] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  561.340149] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Comment 1 Amy 2018-01-07 08:17:14 UTC
Created attachment 136592 [details]
Dmesg log
Comment 2 Amy 2018-01-07 08:24:34 UTC
Further info:

Kernel (Arch Linux): 

4.14.12-1-ARCH #1 SMP PREEMPT Fri Jan 5 18:19:34 UTC 2018 x86_64 GNU/Linux

LSPCI info:
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)
        Subsystem: Acer Incorporated [ALI] HD Graphics 620
        Kernel driver in use: i915
        Kernel modules: i915
--
01:00.0 3D controller: NVIDIA Corporation Device 179c (rev ff)
        Kernel modules: nouveau, nvidia_drm, nvidia
Comment 3 Amy 2018-01-07 08:24:58 UTC
Created attachment 136593 [details]
Glxinfo output.
Comment 4 Michael Weitzel 2018-01-30 18:01:54 UTC
I had the same crash (for the first time) - also on KabyLake, ArchLinux, Kernel 4.14.15-1-ARCH. I'll attach my crash dump.

[drm] GPU HANG: ecode 9:0:0x85dffffb, in Xorg [636], reason: Hang on rcs0, action: reset
[drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[drm] GPU crash dump saved to /sys/class/drm/card0/error
i915 0000:00:02.0: Resetting rcs0 after gpu hang
i915 0000:00:02.0: Resetting rcs0 after gpu hang
asynchronous wait on fence i915:kwin_x11[820]/1:23321 timed out
i915 0000:00:02.0: Resetting rcs0 after gpu hang
i915 0000:00:02.0: Resetting rcs0 after gpu hang
i915 0000:00:02.0: Resetting rcs0 after gpu hang
Comment 5 Michael Weitzel 2018-01-30 18:02:52 UTC
Created attachment 137059 [details]
GPU crash dump from /sys/class/drm/card0/error
Comment 6 Eric Blau 2018-01-31 14:05:42 UTC
I'm in the same boat. I get frequent hangs as reported:

kernel: [drm] GPU HANG: ecode 8:0:0x2e6b4c79, in Xorg [2680], reason: Hang on rcs0, action: reset
kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
kernel: i915 0000:00:02.0: Resetting rcs0 after gpu hang
kernel: i915 0000:00:02.0: Resetting rcs0 after gpu hang
kernel: i915 0000:00:02.0: Resetting chip after gpu hang
kernel: [drm:i915_reset [i915]] *ERROR* GPU recovery failed


Sometimes my laptop stays up and running, but other times it requires a power cycle.
Comment 7 Eric Blau 2018-01-31 14:06:30 UTC
Created attachment 137088 [details]
/sys/class/drm/card0/error file
Comment 8 xman 2018-02-01 03:49:44 UTC
I am also hitting the same issue.
[ 1516.880515] [drm] GPU HANG: ecode 9:0:0x85dffffb, reason: Hang on rcs0, action: reset
[ 1516.880517] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 1516.880517] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 1516.880518] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 1516.880518] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 1516.880518] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 1516.880526] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 1531.801142] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 1539.837207] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 1547.801180] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 1555.801205] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 1559.001148] asynchronous wait on fence i915:compiz[1607]/1:243a timed out
[ 1563.801234] i915 0000:00:02.0: Resetting rcs0 after gpu hang
Comment 9 Amy 2018-02-18 21:39:50 UTC
Still happening on:

Linux Playful-Plankton 4.15.4-1-ARCH #1 SMP PREEMPT Sat Feb 17 16:01:38 UTC 2018 x86_64 GNU/Linux

version number:    11.0
X.Org version: 1.19.6

xf86-video-intel 1:2.99.917+812+g75795523-1
Comment 10 Amy 2018-02-18 21:43:21 UTC
Additonally: Mesa version is

OpenGL version string: 3.0 Mesa 17.3.4
Comment 11 Amy 2018-02-25 23:07:53 UTC
Still happening on Mesa 17.3.5
Comment 12 Emilio J. Padrón 2018-02-27 08:34:49 UTC
Same (or similar) problem here!

Thinkpad T470, kaby lake (i5 7200U), running a Debian GNU/Linux Sid up-to-date with kernel 4.15.

I use Awesome 4.2 as window manager. The issue (GPU hang) seems to appear (above all) when using Emacs (I'm using the GTK-based emacs25).

I attach my dmesg output and the error dumped on /sys/class/drm/card0/error
Comment 13 Emilio J. Padrón 2018-02-27 08:41:10 UTC
Created attachment 137638 [details]
dmesg output
Comment 14 Emilio J. Padrón 2018-02-27 08:41:36 UTC
Created attachment 137639 [details]
xorg log
Comment 15 Emilio J. Padrón 2018-02-27 08:51:02 UTC
Created attachment 137640 [details]
/sys/class/drm/card0/error
Comment 16 Gennady 2018-03-28 22:47:13 UTC
I have same or very similar problem on Skylake.
Comment 17 Gennady 2018-03-28 22:57:13 UTC
I can not attach error log, please let me know if it is necessary.

Problem is reproducible, if I run certain qt5 app xorg hangs.

If I wait for a minute, it restarts.
Comment 18 Gennady 2018-03-28 23:00:55 UTC
GPU HANG: ecode 9:0:0x85dffffb, in Xorg [1035], reason: Hang on rcs0, action: reset
Kernel: 4.15.0-1-amd64
Time: 1522274173 s 645209 us
Boottime: 262 s 812221 us
Uptime: 260 s 247539 us
Active process (on ring render): Xorg [1035], score 0
Reset count: 0
Suspend count: 0
Platform: SKYLAKE
PCI ID: 0x191b
PCI Revision: 0x06
PCI Subsystem: 17aa:222e
Comment 19 Gennady 2018-03-28 23:04:24 UTC
Linux p50-debian 4.15.0-1-amd64 #1 SMP Debian 4.15.4-1 (2018-02-18) x86_64 GNU/Linux
ii  libgl1-mesa-dri:amd64                         17.3.7-1                             amd64        free implementation of the OpenGL API -- DRI modules
ii  xserver-xorg-video-intel                      2:2.99.917+git20171229-1             amd64        X.Org X server -- Intel i8xx, i9xx display driver
ii  xserver-xorg                                  1:7.7+19                             amd64        X.Org X server
ii  firmware-misc-nonfree                         20170823-1                           all          Binary firmware for various drivers in the Linux kernel
Comment 20 hfekih 2018-04-18 10:28:55 UTC
same problem here.
I am using Intel(R) Celeron(R) CPU  N3160
reproduced on Linux (kernel) 4.16.2 and 4.14.34, (Mesa 17.3.8)
problem can be reproduced by starting any application using OPEN GL ES

root@ca-linux:/home/cannon$ glmark2-es2-drm 
=======================================================
    glmark2 2014.03
=======================================================
    OpenGL Information
    GL_VENDOR:     Intel Open Source Technology Center
    GL_RENDERER:   Mesa DRI Intel(R) HD Graphics 400 (Braswell) 
    GL_VERSION:    OpenGL ES 3.1 Mesa 17.3.8
=======================================================
[build] use-vbo=false:i965: Failed to submit batchbuffer: Input/output error
----------------------------------------------------------------------------
dmesg output:
[   38.859784] [drm] GPU HANG: ecode 8:0:0xe757feff, in glmark2-es2-drm [346], reason: Hang on rcs0, action: reset
[   38.859788] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   38.859789] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   38.859791] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   38.859792] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   38.859794] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   38.859869] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[   41.905511] i915 0000:00:02.0: Resetting chip after gpu hang
[   41.952424] asynchronous wait on fence i915:glmark2-es2-drm[346]/2:2 timed out
[   47.072637] i915 0000:00:02.0: i915_reset_device timed out, cancelling all in-flight rendering.
[   51.372311] i915 0000:00:02.0: Failed to reset chip
Comment 21 hfekih 2018-04-18 10:29:53 UTC
Created attachment 138905 [details]
/sys/class/drm/card0/error
Comment 22 Alif Wahid 2018-06-11 21:26:39 UTC
Created attachment 140127 [details]
Log dump from /sys/class/drm/card0/error after GPU hang (as printed in dmesg output).

I see this error intermittently when running the Xilinx Vivado v2016.4 software on my Ubuntu 16.04 LTS desktop (kernel 4.4.0, Intel core i5-6400 cpu with Intel skylake-gt2 gpu). Attached the full dump from /sys/class/drm/card0/error as instructed by dmesg below.

[  255.622496] [drm] stuck on render ring
[  255.623115] [drm] GPU HANG: ecode 9:0:0x84dffff8, in Xorg [1006], reason: Engine(s) hung, action: reset
[  255.623120] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  255.623122] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  255.623125] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  255.623127] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  255.623130] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  255.624623] drm/i915: Resetting chip after gpu hang
[  257.622605] [drm] RC6 on
[  311.628103] [drm] stuck on render ring
[  311.628686] [drm] GPU HANG: ecode 9:0:0x84dffff8, in Xorg [1006], reason: Engine(s) hung, action: reset
[  311.630227] drm/i915: Resetting chip after gpu hang
[  313.628765] [drm] RC6 on
Comment 23 John M. 2018-12-06 10:00:32 UTC
Created attachment 142740 [details]
/sys/class/drm/card0/error

Hello,

Here's the error message I've got:

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[drm] GPU crash dump saved to /sys/class/drm/card0/error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

The crash appeared when I tried to resize a gnome terminal window.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.