Summary: | GPU HANG: ecode 9:0:0x84df7efc | ||
---|---|---|---|
Product: | Mesa | Reporter: | Falk Alexander <falkse> |
Component: | Drivers/DRI/i915 | Assignee: | Ian Romanick <idr> |
Status: | RESOLVED MOVED | QA Contact: | Default DRI bug account <dri-devel> |
Severity: | normal | ||
Priority: | medium | CC: | intel-gfx-bugs, nicolaspok |
Version: | 17.1 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | SKL | i915 features: | GPU hang |
Attachments: |
The error dump said directly after rebooting only: no error state collected
GPU crash dump |
Description
Falk Alexander
2016-01-25 13:55:30 UTC
Created attachment 121265 [details]
The error dump said directly after rebooting only: no error state collected
There were workarounds for SKL available on latest kernel as well fixed push in Mesa that may fix your issue. Please update your system (kernel & Mesa) and confirm if that issue is still occurring or not. (In reply to yann from comment #2) > There were workarounds for SKL available on latest kernel as well fixed push > in Mesa that may fix your issue. Please update your system (kernel & Mesa) > and confirm if that issue is still occurring or not. Timeout. Assuming that it is fixed by now. If this is not the case, please re-test with latest kernel & Mesa to see if this issue is still occurring since there were improvements pushed in kernel and Mesa that will benefit to your system. Created attachment 127450 [details]
GPU crash dump
Arch Linux x64
Kernel: 4.8.2-1-ARCH
Mesa: Mesa 12.0.3
Okt 21 16:07:24 faultierfarm kernel: [drm] GPU HANG: ecode 9:0:0x849f7efc, in Doorways.x86 [5610], reason: Hang on render ring, action: reset
Okt 21 16:07:24 faultierfarm kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Okt 21 16:07:24 faultierfarm kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Okt 21 16:07:24 faultierfarm kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Okt 21 16:07:24 faultierfarm kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Okt 21 16:07:24 faultierfarm kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Okt 21 16:07:24 faultierfarm kernel: drm/i915: Resetting chip after gpu hang
Okt 21 16:07:24 faultierfarm kernel: [drm] GuC firmware load skipped
Intel® HD Graphics 520 (Skylake GT2)
Intel® Core™ i7-6500U
Description:
The same, the hang is appearing in some OpenGL applications (mostly games).
(In reply to Falk Alexander from comment #4) > Created attachment 127450 [details] > GPU crash dump > > Arch Linux x64 > Kernel: 4.8.2-1-ARCH > Mesa: Mesa 12.0.3 > > Okt 21 16:07:24 faultierfarm kernel: [drm] GPU HANG: ecode 9:0:0x849f7efc, > in Doorways.x86 [5610], reason: Hang on render ring, action: reset > Okt 21 16:07:24 faultierfarm kernel: [drm] GPU hangs can indicate a bug > anywhere in the entire gfx stack, including userspace. > Okt 21 16:07:24 faultierfarm kernel: [drm] Please file a _new_ bug report on > bugs.freedesktop.org against DRI -> DRM/Intel > Okt 21 16:07:24 faultierfarm kernel: [drm] drm/i915 developers can then > reassign to the right component if it's not a kernel issue. > Okt 21 16:07:24 faultierfarm kernel: [drm] The gpu crash dump is required to > analyze gpu hangs, so please always attach it. > Okt 21 16:07:24 faultierfarm kernel: [drm] GPU crash dump saved to > /sys/class/drm/card0/error > Okt 21 16:07:24 faultierfarm kernel: drm/i915: Resetting chip after gpu hang > Okt 21 16:07:24 faultierfarm kernel: [drm] GuC firmware load skipped > > Intel® HD Graphics 520 (Skylake GT2) > Intel® Core™ i7-6500U > > Description: > The same, the hang is appearing in some OpenGL applications (mostly games). Thanks for your feedback. Re-opening it then You may also collect and attach logs collected thanks to apitrace: http://apitrace.github.io/ In parallel, assigning to Mesa product. Kernel: 4.8.2-1-ARCH Platform: Skylake (pci id: 0x1916 - PCI Revision: 0x07 - PCI Subsystem: 1558:2425) Mesa: Mesa 12.0.3 From this error dump, hung is happening in render ring batch with active head at 0xf5f89330, with 0x7b000005 (3DPRIMITIVE) as IPEHR. We can note also ERROR: 0x00000001 and in the ring "Invalid PTE Fault". Batch extract (around 0xf5f89330): 0xf5f892f4: 0x78490001: 3D UNKNOWN: 3d_965 opcode = 0x7849 0xf5f892f8: 0x00000001: MI_NOOP 0xf5f892fc: 0x00000000: MI_NOOP 0xf5f89300: 0x78490001: 3D UNKNOWN: 3d_965 opcode = 0x7849 0xf5f89304: 0x00000002: MI_NOOP 0xf5f89308: 0x00000000: MI_NOOP 0xf5f8930c: 0x780c0000: 3D UNKNOWN: 3d_965 opcode = 0x780c 0xf5f89310: 0x00000000: MI_NOOP Bad length 7 in (null), expected 6-6 0xf5f89314: 0x7b000005: 3DPRIMITIVE: fail sequential 0xf5f89318: 0x00000104: vertex count 0xf5f8931c: 0x00002f64: start vertex 0xf5f89320: 0x00000000: instance count 0xf5f89324: 0x00000001: start instance 0xf5f89328: 0x00000000: index bias 0xf5f8932c: 0x00000000: MI_NOOP 0xf5f89330: 0x78230000: 3D UNKNOWN: 3d_965 opcode = 0x7823 0xf5f89334: 0x00007cc0: MI_NOOP 0xf5f89338: 0x78150009: 3D UNKNOWN: 3d_965 opcode = 0x7815 The GPU Hang is still happening. Linux tuxedo 4.11.3-1-ARCH #1 SMP PREEMPT Sun May 28 10:40:17 CEST 2017 x86_64 GNU/Linux OpenGL version string: 3.0 Mesa 17.1.1 Intel HD 520 Graphics (Skylake GT2) Jun 07 09:39:22 tuxedo kernel: [drm] GPU HANG: ecode 9:0:0x84df7cfc, in xonotic-sdl [1467], reason: Hang on render ring, action: reset Jun 07 09:39:22 tuxedo kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Jun 07 09:39:22 tuxedo kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Jun 07 09:39:22 tuxedo kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Jun 07 09:39:22 tuxedo kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Jun 07 09:39:22 tuxedo kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error Jun 07 09:39:22 tuxedo kernel: drm/i915: Resetting chip after gpu hang Jun 07 09:39:22 tuxedo kernel: [drm] RC6 off Jun 07 09:39:22 tuxedo kernel: [drm] GuC firmware load skipped Jun 07 09:39:30 tuxedo kernel: drm/i915: Resetting chip after gpu hang Jun 07 09:39:30 tuxedo kernel: [drm] RC6 off Jun 07 09:39:30 tuxedo kernel: [drm] GuC firmware load skipped Jun 07 09:39:38 tuxedo kernel: drm/i915: Resetting chip after gpu hang Jun 07 09:39:38 tuxedo kernel: [drm] RC6 off Jun 07 09:39:38 tuxedo kernel: [drm] GuC firmware load skipped Jun 07 09:39:46 tuxedo kernel: drm/i915: Resetting chip after gpu hang Jun 07 09:39:46 tuxedo kernel: [drm] RC6 off Jun 07 09:39:46 tuxedo kernel: [drm] GuC firmware load skipped Jun 07 09:39:54 tuxedo kernel: drm/i915: Resetting chip after gpu hang Jun 07 09:39:54 tuxedo kernel: [drm] RC6 off Jun 07 09:39:54 tuxedo kernel: [drm] GuC firmware load skipped I've done some ApiTrace attemps and recorded while the GPU Hang occurs. After the hang the application closes immediately and ApiTrace stops recording. The .trace files can be found here: application: xonotic-glx md5: 10f4481e7a7e49c2ed79aae97c518b4e size: 1824170391 bytes / 1,8 GB link: https://dl.terminal.run/apitrace/xonotic-glx.trace info: gpu hang happened at the end, the freeze was where the trace ends application: minetest md5: af2a7d60f1955ff6d45cc5316c31ad9a size: 656264517 bytes / 656,3 MB link: https://dl.terminal.run/apitrace/minetest.trace info: gpu hang happened at the end, the freeze was where the trace ends Freeze happens only if the notebook is connected to the charger. No problems in battery mode but low FPS etc. RC6 is off, but the problem is the same with RC6 on. This problem does not seem to be triggered by a OpenGL bug or something like this. Because the gpu hang / freeze also happens if an OpenGL application is just opened and it window is in the background, or also if the player is just AFK and the game / application keeps running. Furthermore it is not possible to reproduce the gpu hang with an OpenGL apitrace, even if the same frame gets replayed were the gpu hang happened while recording. Thats why I think there is something wrong with the energy management or it is also possible that something regarding thermal throttling problems is involved here. Thinking about the energy management then in my mind comes the RC6 power saving mode. Disabling this does not help, the gpu hangs doesn't disappears. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/760. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.