Bug 107017 - GPU HANG: ecode 9:0:0xfffffffe, reason: hang on rcs0, bcs0, vcs0, vecs0, action: reset
Summary: GPU HANG: ecode 9:0:0xfffffffe, reason: hang on rcs0, bcs0, vcs0, vecs0, acti...
Status: CLOSED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
: 107147 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-06-24 20:03 UTC by Alexandre
Modified: 2018-07-07 07:02 UTC (History)
3 users (show)

See Also:
i915 platform: SKL
i915 features: GPU hang


Attachments
Crash file extrated from /sys/class/drm/card0/error (15.55 KB, text/plain)
2018-06-24 20:03 UTC, Alexandre
no flags Details
dmesg with the drm.debug=0x1e log_buf_len=4M (195.17 KB, text/plain)
2018-06-26 00:54 UTC, Alexandre
no flags Details
Patch for the commit that if reverted fix the problem (3.82 KB, patch)
2018-06-27 23:34 UTC, Alexandre
no flags Details | Splinter Review

Description Alexandre 2018-06-24 20:03:37 UTC
Created attachment 140304 [details]
Crash file extrated from /sys/class/drm/card0/error

On 4.18.0-rc2 and rc1 this error occurs during kernel boot:
[    0.342047] [drm] VT-d active for gfx access
[    0.342049] fb: switching to inteldrmfb from EFI VGA
[    0.342092] [drm] Replacing VGA console driver
[    0.342835] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    0.342835] [drm] Driver supports precise vblank timestamp query.
[    0.343538] [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
[    0.343787] [drm] Disabling framebuffer compression (FBC) to prevent screen flicker with VT-d enabled
[    4.801171] [drm] GPU HANG: ecode 9:0:0xfffffffe, reason: hang on rcs0, bcs0, vcs0, vecs0, action: reset
[    4.801172] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[    4.801172] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[    4.801173] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[    4.801173] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[    4.801173] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   10.817596] [drm] Initialized i915 1.6.0 20180514 for 0000:00:02.0 on minor 0
[   10.847297] fbcon: inteldrmfb (fb0) is primary device
[   10.927807] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[   11.207607] ata1.00: supports DRM functions and may not be fully accessible
[   11.210268] ata1.00: supports DRM functions and may not be fully accessible

Kernel 4.17.X all boots fine. The GPU crash file is attached.
Comment 1 Chris Wilson 2018-06-25 08:52:14 UTC
The GPU didn't even pretend to start. One quick test would be intel_iommu=igfx_off but it doesn't seem that related in this case. A bisection would be very, very helpful.
Comment 2 Jani Saarinen 2018-06-25 09:49:13 UTC
I assume you are using drm-tip as in rc2? 

Like Chris was saying would be nice to get this bisected.

So if you can confirm you are using https://cgit.freedesktop.org/drm-tip and also send dmesg with drm.debug=0x1e log_buf_len=4M?
Comment 3 Alexandre 2018-06-26 00:54:02 UTC
Created attachment 140338 [details]
dmesg with the drm.debug=0x1e log_buf_len=4M
Comment 4 Alexandre 2018-06-26 00:55:12 UTC
The kernel is 4.18.0-rc2 from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6. I will work on bisecting the kernel.
Comment 5 Jani Saarinen 2018-06-26 06:32:35 UTC
Thank you
Comment 6 Alexandre 2018-06-27 23:33:25 UTC
Sorry for the delay. Finally I narrowed to this commit: ab96746aaa344fb720a198245a837e266fad3b62. I made a patch with the commit and uploaded it here (0001-iommu-vt-d-Clean-up-pasid-quirk-for-pre-production-d.patch). I think it has nothing to the GPU and it is related to IOMMU. Should we close this bug and open one on the kernel?
Comment 7 Alexandre 2018-06-27 23:34:19 UTC
Created attachment 140373 [details] [review]
Patch for the commit that if reverted fix the problem
Comment 8 Alexandre 2018-06-27 23:37:20 UTC
I also tested the latest kernel 4.18.0-rc2+ (commit 813835028e9ae1f18cd11bb0ec591d0f0577d96a) and reverse appliying the patch and the kernel boots correctly without any failures of the GPU.
Comment 9 Jani Saarinen 2018-06-28 09:16:48 UTC
I think this is not our bug is good resolution for this and you should open new to kernel I think.
Comment 10 Jani Saarinen 2018-06-28 09:17:07 UTC
Alexandre, can you close if agree?
Comment 11 Jani Nikula 2018-07-03 07:41:41 UTC
Alexandre, did you report another bug about this? Please add a reference here. Thanks.
Comment 12 Adric Blake 2018-07-04 15:33:45 UTC
Not reporter, but I found the new bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=200327
No idea about whether he took it to email or not.
Comment 13 Alexandre 2018-07-04 17:22:28 UTC
I was traveling. I opened that bug but did not sent an email. Nobody has picked the bug report yet. Can you send an email?
Comment 14 Chris Wilson 2018-07-07 07:02:22 UTC
*** Bug 107147 has been marked as a duplicate of this bug. ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.