Bug 93852 - GPU HANG: ecode 9:0:0x84df7efc
Summary: GPU HANG: ecode 9:0:0x84df7efc
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i915 (show other bugs)
Version: 17.1
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Ian Romanick
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-25 13:55 UTC by Falk Alexander
Modified: 2019-09-18 19:39 UTC (History)
2 users (show)

See Also:
i915 platform: SKL
i915 features: GPU hang


Attachments
The error dump said directly after rebooting only: no error state collected (25 bytes, text/plain)
2016-01-25 13:57 UTC, Falk Alexander
Details
GPU crash dump (283.64 KB, text/plain)
2016-10-21 14:14 UTC, Falk Alexander
Details

Description Falk Alexander 2016-01-25 13:55:30 UTC
Description of problem:
The whole system is freezing, and sound is stopping or repeating.
This happens according to my experience with OpenGL applications like games. For me, the crash occurs when playing Minetest, Xonotic, Tesseract (all OpenGL games, only Tesseract not from the Fedora repositories).
Happened after playing about 10 minutes, sometimes immediately after the start of the game, or sometimes later.


[drm] stuck on render ring
[drm] GPU HANG: ecode 9:0:0x84df7efc, in minetest [3180], reason: Ring hung, action: reset
[drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[drm] GPU crash dump saved to /sys/class/drm/card0/error
drm/i915: Resetting chip after gpu hang
[drm] RC6 on


Intel® HD Graphics 520 (Skylake GT2)
Intel® Core™ i7-6500U

Additional info:
cmdline:        BOOT_IMAGE=/vmlinuz-4.3.3-301.fc23.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet LANG=de_DE.UTF-8
kernel:         4.3.3-301.fc23.x86_64
runlevel:       unknown
type:           Kerneloops
Comment 1 Falk Alexander 2016-01-25 13:57:35 UTC
Created attachment 121265 [details]
The error dump said directly after rebooting only: no error state collected
Comment 2 yann 2016-09-13 15:31:41 UTC
There were workarounds for SKL available on latest kernel as well fixed push in Mesa that may fix your issue. Please update your system (kernel & Mesa) and confirm if that issue is still occurring or not.
Comment 3 yann 2016-10-21 14:03:25 UTC
(In reply to yann from comment #2)
> There were workarounds for SKL available on latest kernel as well fixed push
> in Mesa that may fix your issue. Please update your system (kernel & Mesa)
> and confirm if that issue is still occurring or not.

Timeout. Assuming that it is fixed by now. If this is not the case, please re-test with latest kernel & Mesa to see if this issue is still occurring since there were improvements pushed in kernel and Mesa that will benefit to your system.
Comment 4 Falk Alexander 2016-10-21 14:14:34 UTC
Created attachment 127450 [details]
GPU crash dump

Arch Linux x64
Kernel: 4.8.2-1-ARCH
Mesa: Mesa 12.0.3

Okt 21 16:07:24 faultierfarm kernel: [drm] GPU HANG: ecode 9:0:0x849f7efc, in Doorways.x86 [5610], reason: Hang on render ring, action: reset
Okt 21 16:07:24 faultierfarm kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Okt 21 16:07:24 faultierfarm kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Okt 21 16:07:24 faultierfarm kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Okt 21 16:07:24 faultierfarm kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Okt 21 16:07:24 faultierfarm kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Okt 21 16:07:24 faultierfarm kernel: drm/i915: Resetting chip after gpu hang
Okt 21 16:07:24 faultierfarm kernel: [drm] GuC firmware load skipped

Intel® HD Graphics 520 (Skylake GT2)
Intel® Core™ i7-6500U

Description:
The same, the hang is appearing in some OpenGL applications (mostly games).
Comment 5 yann 2016-10-21 14:49:25 UTC
(In reply to Falk Alexander from comment #4)
> Created attachment 127450 [details]
> GPU crash dump
> 
> Arch Linux x64
> Kernel: 4.8.2-1-ARCH
> Mesa: Mesa 12.0.3
> 
> Okt 21 16:07:24 faultierfarm kernel: [drm] GPU HANG: ecode 9:0:0x849f7efc,
> in Doorways.x86 [5610], reason: Hang on render ring, action: reset
> Okt 21 16:07:24 faultierfarm kernel: [drm] GPU hangs can indicate a bug
> anywhere in the entire gfx stack, including userspace.
> Okt 21 16:07:24 faultierfarm kernel: [drm] Please file a _new_ bug report on
> bugs.freedesktop.org against DRI -> DRM/Intel
> Okt 21 16:07:24 faultierfarm kernel: [drm] drm/i915 developers can then
> reassign to the right component if it's not a kernel issue.
> Okt 21 16:07:24 faultierfarm kernel: [drm] The gpu crash dump is required to
> analyze gpu hangs, so please always attach it.
> Okt 21 16:07:24 faultierfarm kernel: [drm] GPU crash dump saved to
> /sys/class/drm/card0/error
> Okt 21 16:07:24 faultierfarm kernel: drm/i915: Resetting chip after gpu hang
> Okt 21 16:07:24 faultierfarm kernel: [drm] GuC firmware load skipped
> 
> Intel® HD Graphics 520 (Skylake GT2)
> Intel® Core™ i7-6500U
> 
> Description:
> The same, the hang is appearing in some OpenGL applications (mostly games).

Thanks for your feedback. Re-opening it then
Comment 6 yann 2016-10-21 14:54:44 UTC
You may also collect and attach logs collected thanks to apitrace: http://apitrace.github.io/

In parallel, assigning to Mesa product.

Kernel: 4.8.2-1-ARCH
Platform: Skylake (pci id: 0x1916 - PCI Revision: 0x07 - PCI Subsystem: 1558:2425)
Mesa: Mesa 12.0.3

From this error dump, hung is happening in render ring batch with active head at 0xf5f89330, with 0x7b000005 (3DPRIMITIVE) as IPEHR.

We can note also ERROR: 0x00000001 and in the ring "Invalid PTE Fault".

Batch extract (around 0xf5f89330):

0xf5f892f4:      0x78490001: 3D UNKNOWN: 3d_965 opcode = 0x7849
0xf5f892f8:      0x00000001: MI_NOOP
0xf5f892fc:      0x00000000: MI_NOOP
0xf5f89300:      0x78490001: 3D UNKNOWN: 3d_965 opcode = 0x7849
0xf5f89304:      0x00000002: MI_NOOP
0xf5f89308:      0x00000000: MI_NOOP
0xf5f8930c:      0x780c0000: 3D UNKNOWN: 3d_965 opcode = 0x780c
0xf5f89310:      0x00000000: MI_NOOP
Bad length 7 in (null), expected 6-6
0xf5f89314:      0x7b000005: 3DPRIMITIVE: fail sequential
0xf5f89318:      0x00000104:    vertex count
0xf5f8931c:      0x00002f64:    start vertex
0xf5f89320:      0x00000000:    instance count
0xf5f89324:      0x00000001:    start instance
0xf5f89328:      0x00000000:    index bias
0xf5f8932c:      0x00000000: MI_NOOP
0xf5f89330:      0x78230000: 3D UNKNOWN: 3d_965 opcode = 0x7823
0xf5f89334:      0x00007cc0: MI_NOOP
0xf5f89338:      0x78150009: 3D UNKNOWN: 3d_965 opcode = 0x7815
Comment 7 Falk Alexander 2017-06-07 10:44:44 UTC
The GPU Hang is still happening.

Linux tuxedo 4.11.3-1-ARCH #1 SMP PREEMPT Sun May 28 10:40:17 CEST 2017 x86_64 GNU/Linux

OpenGL version string: 3.0 Mesa 17.1.1

Intel HD 520 Graphics (Skylake GT2)

Jun 07 09:39:22 tuxedo kernel: [drm] GPU HANG: ecode 9:0:0x84df7cfc, in xonotic-sdl [1467], reason: Hang on render ring, action: reset
Jun 07 09:39:22 tuxedo kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Jun 07 09:39:22 tuxedo kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Jun 07 09:39:22 tuxedo kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Jun 07 09:39:22 tuxedo kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Jun 07 09:39:22 tuxedo kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Jun 07 09:39:22 tuxedo kernel: drm/i915: Resetting chip after gpu hang
Jun 07 09:39:22 tuxedo kernel: [drm] RC6 off
Jun 07 09:39:22 tuxedo kernel: [drm] GuC firmware load skipped
Jun 07 09:39:30 tuxedo kernel: drm/i915: Resetting chip after gpu hang
Jun 07 09:39:30 tuxedo kernel: [drm] RC6 off
Jun 07 09:39:30 tuxedo kernel: [drm] GuC firmware load skipped
Jun 07 09:39:38 tuxedo kernel: drm/i915: Resetting chip after gpu hang
Jun 07 09:39:38 tuxedo kernel: [drm] RC6 off
Jun 07 09:39:38 tuxedo kernel: [drm] GuC firmware load skipped
Jun 07 09:39:46 tuxedo kernel: drm/i915: Resetting chip after gpu hang
Jun 07 09:39:46 tuxedo kernel: [drm] RC6 off
Jun 07 09:39:46 tuxedo kernel: [drm] GuC firmware load skipped
Jun 07 09:39:54 tuxedo kernel: drm/i915: Resetting chip after gpu hang
Jun 07 09:39:54 tuxedo kernel: [drm] RC6 off
Jun 07 09:39:54 tuxedo kernel: [drm] GuC firmware load skipped

I've done some ApiTrace attemps and recorded while the GPU Hang occurs. After the hang the application closes immediately and ApiTrace stops recording.

The .trace files can be found here:

application: xonotic-glx
md5: 10f4481e7a7e49c2ed79aae97c518b4e
size: 1824170391 bytes / 1,8 GB
link: https://dl.terminal.run/apitrace/xonotic-glx.trace
info: gpu hang happened at the end, the freeze was where the trace ends

application: minetest
md5: af2a7d60f1955ff6d45cc5316c31ad9a
size: 656264517 bytes / 656,3 MB
link: https://dl.terminal.run/apitrace/minetest.trace
info: gpu hang happened at the end, the freeze was where the trace ends

Freeze happens only if the notebook is connected to the charger. No problems in battery mode but low FPS etc.
RC6 is off, but the problem is the same with RC6 on.
Comment 8 Falk Alexander 2017-07-03 09:26:43 UTC
This problem does not seem to be triggered by a OpenGL bug or something like this. Because the gpu hang / freeze also happens if an OpenGL application is just opened and it window is in the background, or also if the player is just AFK and the game / application keeps running. Furthermore it is not possible to reproduce the gpu hang with an OpenGL apitrace, even if the same frame gets replayed were the gpu hang happened while recording. Thats why I think there is something wrong with the energy management or it is also possible that something regarding thermal throttling problems is involved here. Thinking about the energy management then in my mind comes the RC6 power saving mode. Disabling this does not help, the gpu hangs doesn't disappears.
Comment 9 GitLab Migration User 2019-09-18 19:39:18 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/760.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.