Bug 94021

Summary: Skylake GPU hang ecode 9:0:0x86dffffd
Product: Mesa Reporter: Markus Schauler <myemailu>
Component: Drivers/DRI/i965Assignee: Intel 3D Bugs Mailing List <intel-3d-bugs>
Status: REOPENED --- QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: major    
Priority: medium CC: bugs.freedesktop.org, bugzilla, intel-gfx-bugs, marc, me, zhangchi866
Version: 11.0   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: SKL i915 features: GPU hang
Attachments: GPU crash dump
crash dump
GPU crash dump on kernel v4.10.11 Arch Linux x64
Crash dump, /sys/class/drm/card0/error

Description Markus Schauler 2016-02-06 09:58:56 UTC
Created attachment 121546 [details]
GPU crash dump

From time to time, X11 crashes because of a GPU hang.

x86_64 
kernel 4.4.0-8.g9f68b90-default

Distribution: openSuse Leap 42

hardware Intel NUC6i5SHY
CPU:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 78
model name      : Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz
stepping        : 3
microcode       : 0x6a


log message:
Feb 06 10:42:37 linux.suse kernel: [drm] stuck on render ring
Feb 06 10:42:37 linux.suse kernel: [drm] GPU HANG: ecode 9:0:0x86dffffd, in Xorg [1415], reason: Ring hung, action: reset
Feb 06 10:42:37 linux.suse kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspa
Feb 06 10:42:37 linux.suse kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Feb 06 10:42:37 linux.suse kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kerne
Feb 06 10:42:37 linux.suse kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Feb 06 10:42:37 linux.suse kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Feb 06 10:42:37 linux.suse kernel: drm/i915: Resetting chip after gpu hang
Feb 06 10:42:38 linux.suse kernel: [drm] RC6 on
Feb 06 10:44:12 linux.suse kernel: [drm] stuck on render ring
Feb 06 10:44:12 linux.suse kernel: [drm] GPU HANG: ecode 9:0:0x85dfbfff, in Xorg [1415], reason: Ring hung, action: reset
Feb 06 10:44:12 linux.suse kernel: drm/i915: Resetting chip after gpu hang
Feb 06 10:44:14 linux.suse kernel: [drm] RC6 on
Feb 06 10:44:26 linux.suse kernel: [drm] stuck on render ring
Feb 06 10:44:26 linux.suse kernel: [drm] GPU HANG: ecode 9:0:0x86dffffd, in Xorg [1415], reason: Ring hung, action: reset
Feb 06 10:44:26 linux.suse kernel: drm/i915: Resetting chip after gpu hang
Feb 06 10:44:28 linux.suse kernel: [drm] RC6 on
Feb 06 10:44:32 linux.suse kernel: [drm] stuck on render ring
Feb 06 10:44:32 linux.suse kernel: [drm] GPU HANG: ecode 9:0:0x85dffffb, in Xorg [1415], reason: Ring hung, action: reset
Feb 06 10:44:32 linux.suse kernel: drm/i915: Resetting chip after gpu hang
Comment 1 Markus Schauler 2016-02-06 12:53:20 UTC
I can often trigger the bug by opening maps.google.com in Firefox 44
and then rapidly zooming in/out and panning around the map.
Comment 2 John Boero 2016-02-29 08:43:56 UTC
Appears to be duplicate of https://bugs.freedesktop.org/show_bug.cgi?id=93049

Though I've been experiencing this too and I'm on kernel 4.4.2-301.  It seems to happen whether using Fedora repo's Intel driver or Intel's own repository.  Seems this isn't over yet.
Comment 3 John Boero 2016-02-29 09:19:58 UTC
Hmm after scouring the forums and trying various kernel options I seem to have stabilized mine a bit with "i915.enable_rc6=0" as recommended on Intel community site.  Annoying as this is far from my first Skylake issue.  May as well have named the series Skyflake.

https://communities.intel.com/thread/98226
Comment 4 yann 2016-06-02 07:45:26 UTC
*** Bug 96318 has been marked as a duplicate of this bug. ***
Comment 5 yann 2016-06-02 07:58:23 UTC
(In reply to yann from comment #4)
> *** Bug 96318 has been marked as a duplicate of this bug. ***
I was mistaken on the duplicated bug, 96318 is not duplicate of this one
Comment 6 yann 2016-09-13 11:58:05 UTC
There were workarounds on SKL pushed in kernel (more recent than yours 4.4.0-8.g9f68b90-default), so please re-test with latest kernel to see if it has some benefits on that work.

In parallel, assigning to Mesa product.

From this error dump, hung is happening in render ring batch with active head at 0xf031f498, with 0x79000002 (3DSTATE_DRAWING_RECTANGLE) as IPEHR.

Batch extract (around 0xf031f498):

0xf031f474:      0x784e0002: 3D UNKNOWN: 3d_965 opcode = 0x784e
0xf031f478:      0x00000000: MI_NOOP
0xf031f47c:      0x00000000: MI_NOOP
0xf031f480:      0x00000000: MI_NOOP
0xf031f484:      0x78140000: 3D UNKNOWN: 3d_965 opcode = 0x7814
0xf031f488:      0x80000044: UNKNOWN
0xf031f48c:      0x780f0000: 3DSTATE_SCISSOR_POINTERS
0xf031f490:      0x00007c80:    scissor rect offset
0xf031f494:      0x79000002: 3DSTATE_DRAWING_RECTANGLE
0xf031f498:      0x00000000:    top left: 0,0
0xf031f49c:      0x00000000:    bottom right: 0,0
0xf031f4a0:      0x00000000:    origin: 0,0
0xf031f4a4:      0x784b0000: 3D UNKNOWN: 3d_965 opcode = 0x784b
0xf031f4a8:      0x00000005: MI_NOOP
0xf031f4ac:      0x784a0000: 3D UNKNOWN: 3d_965 opcode = 0x784a
0xf031f4b0:      0x0000c001: MI_NOOP
0xf031f4b4:      0x78490001: 3D UNKNOWN: 3d_965 opcode = 0x7849
Comment 7 szbart.op 2016-10-05 10:19:00 UTC
Created attachment 127021 [details]
crash dump
Comment 8 szbart.op 2016-10-05 10:20:51 UTC
I'm experiencing simillar problem since I built my pc ~10 months ago. It happens when I'm using rdesktop connected to server running windows server 2008 R2. It can happen multiple times a day.

Single gpu hang is not a big problem, because it can recover, but if it occurs multiple times in a row, then it kills my session which is extremely annoying.


distro: archlinux

pc specs:
i5-6500
Asrock B150M-HDV
2 monitors connected to vga and dvi ports

log:
paź 05 11:39:21 overkill kernel: [drm] stuck on render ring
paź 05 11:39:22 overkill kernel: [drm] GPU HANG: ecode 9:0:0x86dffffd, in Xorg [555], reason: Engine(s) hung, action: reset
paź 05 11:39:22 overkill kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
paź 05 11:39:22 overkill kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
paź 05 11:39:22 overkill kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
paź 05 11:39:22 overkill kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
paź 05 11:39:22 overkill kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
paź 05 11:39:22 overkill kernel: drm/i915: Resetting chip after gpu hang
paź 05 11:39:23 overkill kernel: [drm] RC6 off
Comment 9 Matt Turner 2016-11-04 00:18:12 UTC
Please test a new version of Mesa (12 or 13) and mark as REOPENED
if you can reproduce and RESOLVED/* if you cannot reproduce.
Comment 10 szbart.op 2016-12-23 12:57:31 UTC
I upgraded my archlinux 3 weeks ago to:
mesa 13.0.2-2 
kernel 4.8.11-1-ARCH

Since then I was unable to reproduce the error.
Comment 11 Adhokshaj Mishra 2017-04-26 07:57:12 UTC
Created attachment 131045 [details]
GPU crash dump on kernel v4.10.11 Arch Linux x64
Comment 12 Adhokshaj Mishra 2017-04-26 07:58:57 UTC
GPU crashes on kernel v 4.10.11-1-Arch (vanilla kernel @ Arch Linux x86_64).
Comment 13 AWe 2017-04-27 15:07:05 UTC
[4921632.170633] [drm] stuck on render ring
[4921632.171991] [drm] GPU HANG: ecode 9:0:0x86dffffd, in Xorg [26710], reason: Engine(s) hung, action: reset
[4921632.174252] drm/i915: Resetting chip after gpu hang
[4921634.146966] [drm] RC6 on

00:00.0 Host bridge: Intel Corporation Skylake Host Bridge/DRAM Registers (rev 07)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)

vendor_id       : GenuineIntel
cpu family      : 6
model           : 94
model name      : Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
stepping        : 3
microcode       : 0x84
cpu MHz         : 3790.649
cache size      : 6144 KB

happens several times per hour, when I am writing a simple text in soffice (package libreoffice-common 1:5.2.4-2)

mesa-va-driver 13.0.5-1

xserver-xorg-core 2:1.19.0-3

Linux 4.7.0-1-amd64 #1 SMP Debian 4.7.8-1 (2016-10-19) x86_64 GNU/Linux

"KWin-Fensterverwaltung" pops up with the words
"Arbeitsflächeneffekte wurden neu gestartet wegen der zurückgesetzten Grafik"
Comment 14 pilot104 2018-01-31 18:43:56 UTC
Hello, the bug happend to me several times today, also on my Arch Linux laptop with Skylake.

dmesg:
[ 4671.266830] [drm] GPU HANG: ecode 9:0:0x85dffffb, in Xorg [522], reason: Hang on rcs0, action: reset
[ 4671.266832] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 4671.266832] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 4671.266833] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 4671.266833] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 4671.266834] [drm] GPU crash dump saved to /sys/class/drm/card1/error
[ 4671.266840] i915 0000:00:02.0: Resetting rcs0 after gpu hang

lspci | grep VGA:
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)

Anything else I can do or post?
Comment 15 Kenneth Graunke 2018-02-01 05:25:25 UTC
(In reply to pilot104 from comment #14)
> Hello, the bug happend to me several times today, also on my Arch Linux
> laptop with Skylake.

My guess is that you've gotten bit by a more recent issue.  Mesa master / 18.0 ought to work, and the fixes should hopefully hit stable releases before too long.
Comment 16 Zhang Chi 2018-02-26 01:51:39 UTC
Similar situation here:

[ 5087.853310] [drm] GPU HANG: ecode 9:0:0x86dffffd, in Xorg [862], reason: Hang on rcs0, action: reset
[ 5087.853316] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 5095.844690] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 5107.844711] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 5117.832724] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 5125.828738] i915 0000:00:02.0: Resetting rcs0 after gpu hang

Then X restarts (without log).

CPU: i3 6320

$ lspci
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)

Very reproduceable when editing texts in LibreOffice Impress 6.0.1.1 (other similar versions should also trigger). Other applications do not seem to trigger the problem.

I believe the problem started a few months ago and happens with kernel 4.9. But I cannot remember clearly.

Adding "i915.enable_rc6=0" alone to boot parameters does NOT fix the problem. Adding "intel_iommu=igfx_off" does seem to fix it.
Comment 17 Zhang Chi 2018-02-26 01:52:29 UTC
Created attachment 137598 [details]
Crash dump, /sys/class/drm/card0/error
Comment 18 Elizabeth 2018-02-26 16:26:05 UTC
(In reply to Zhang Chi from comment #16)
> Similar situation here:
> 
Which mesa version? Thanks.
Comment 19 Zhang Chi 2018-02-26 23:05:38 UTC
(In reply to Elizabeth from comment #18)
> (In reply to Zhang Chi from comment #16)
> > Similar situation here:
> > 
> Which mesa version? Thanks.

$ dpkg --list | fgrep -i mesa
ii  libegl-mesa0:amd64                                          17.3.3-1                             amd64        free implementation of the EGL API -- Mesa vendor library
ii  libegl1-mesa:amd64                                          17.3.3-1                             amd64        transitional dummy package
ii  libegl1-mesa-dev:amd64                                      17.3.3-1                             amd64        free implementation of the EGL API -- development files
ii  libgl1-mesa-dev:amd64                                       17.3.3-1                             amd64        free implementation of the OpenGL API -- GLX development files
ii  libgl1-mesa-dri:amd64                                       17.3.3-1                             amd64        free implementation of the OpenGL API -- DRI modules
ii  libgl1-mesa-dri:i386                                        17.3.3-1                             i386         free implementation of the OpenGL API -- DRI modules
ii  libglapi-mesa:amd64                                         17.3.3-1                             amd64        free implementation of the GL API -- shared library
ii  libglapi-mesa:i386                                          17.3.3-1                             i386         free implementation of the GL API -- shared library
ii  libglu1-mesa:amd64                                          9.0.0-2.1                            amd64        Mesa OpenGL utility library (GLU)
ii  libglu1-mesa-dev:amd64                                      9.0.0-2.1                            amd64        Mesa OpenGL utility library -- development files
ii  libglx-mesa0:amd64                                          17.3.3-1                             amd64        free implementation of the OpenGL API -- GLX vendor library
ii  libglx-mesa0:i386                                           17.3.3-1                             i386         free implementation of the OpenGL API -- GLX vendor library
ii  libosmesa6:amd64                                            17.3.3-1                             amd64        Mesa Off-screen rendering extension
ii  libwayland-egl1-mesa:amd64                                  17.3.3-1                             amd64        implementation of the Wayland EGL platform -- runtime
ii  mesa-common-dev:amd64                                       17.3.3-1                             amd64        Developer documentation for Mesa
ii  mesa-utils                                                  8.3.0-5                              amd64        Miscellaneous Mesa GL utilities
ii  mesa-utils-extra                                            8.3.0-5                              amd64        Miscellaneous Mesa utilies (opengles, egl)
ii  mesa-vdpau-drivers:amd64                                    17.3.3-1                             amd64        Mesa VDPAU video acceleration drivers
ii  mesa-vdpau-drivers:i386                                     17.3.3-1                             i386         Mesa VDPAU video acceleration drivers

However, I wonder whether the bug is specific to this mesa version. The bug first showed up a few month ago, and I am on Debian testing. Chances are the bug spans multiple versions, I think.
Comment 20 Elizabeth 2018-02-27 17:06:23 UTC
There is this report https://bugs.freedesktop.org/show_bug.cgi?id=105195#c5 where is mentioned that downgrading to mesa 17.2.4 fixes the issue, could you confirm if your case is the same and, if possible, try mesa 18rc4?
Thank you.
Comment 21 Mark Janes 2018-02-27 17:30:28 UTC
17.3.6 is out, and should fix this as well.
Comment 22 Zhang Chi 2018-03-01 05:53:27 UTC
(In reply to Mark Janes from comment #21)
> 17.3.6 is out, and should fix this as well.

Seems 17.3.6 from Debian unstable has fixed my case.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.