|Summary:||[cfl iommu] GPU Hang with Gallium3D iris driver on FC31|
|Component:||DRM/Intel||Assignee:||Intel GFX Bugs mailing list <intel-gfx-bugs>|
|Status:||RESOLVED MOVED||QA Contact:||Intel GFX Bugs mailing list <intel-gfx-bugs>|
|i915 platform:||CFL||i915 features:||GPU hang|
Description ryan 2019-11-19 17:49:51 UTC
Created attachment 146000 [details] GPU crash dump As per dmesg: [ 1156.640672] DMAR: DRHD: handling fault status reg 3 [ 1156.640677] DMAR: [DMA Write] Request device [00:02.0] PASID ffffffff fault addr fffffffeffef6000 [fault reason 07] Next page table ptr is invalid [ 1162.782665] i915 0000:00:02.0: GPU HANG: ecode 9:1:0x85dffffb, in alacritty , hang on rcs0 [ 1162.782667] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 1162.782667] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 1162.782667] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 1162.782667] The GPU crash dump is required to analyze GPU hangs, so please always attach it. [ 1162.782668] GPU crash dump saved to /sys/class/drm/card0/error [ 1162.783676] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 ThinkPad X1Y4 (i7-8665U/UHD 620 with kernel 5.4-rc8, MESA 19.3-rc3 and MESA_LOADER_DRIVER_OVERRIDE=iris Crash dump attached.
Comment 1 Chris Wilson 2019-11-19 18:47:57 UTC
The DMAR error right before is definitely suspect. The odd part with DMAR is that there are no unmapped GPU addresses...
Comment 2 Lakshmi 2019-11-20 10:28:14 UTC
rcs0 command stream: IDLE?: no START: 0x0615c000 HEAD: 0x00201e38 [0x00001de0] TAIL: 0x00001e90 [0x00001e40, 0x00001e90] CTL: 0x00003001 MODE: 0x00000000 HWS: 0xffffe000 ACTHD: 0x0000fffe fffd7b0c IPEIR: 0x00000000 IPEHR: 0x7a000004 INSTDONE: 0xffdfffff SC_INSTDONE: 0xffffff90 SAMPLER_INSTDONE: 0xffffffff SAMPLER_INSTDONE: 0xffffffff SAMPLER_INSTDONE: 0xffffffff ROW_INSTDONE: 0xfe10ffbc ROW_INSTDONE: 0xfe10ffbc ROW_INSTDONE: 0xfe10ffbc batch: [0x0000fffe_fffd7000, 0x0000fffe_fffeb000] BBADDR: 0x0000fffe_fffd7b0d BB_STATE: 0x00000020 INSTPS: 0x00009080 INSTPM: 0x00000000 FADDR: 0x0000fffe fffd7d00 RC PSMI: 0x00000010 FAULT_REG: 0x00000000 GFX_MODE: 0x00008000 PDP0: 0x000000006d572000 PDP1: 0x0000000000000000 PDP2: 0x0000000000000000 PDP3: 0x0000000000000000 ring->head: 0x00001d90 ring->tail: 0x00001e90 hangcheck timestamp: 0ms (4295824000; epoch) engine reset count: 0 ELSP: pid 4420, seqno 52:00000288+, prio 4096, emitted -120ms, start 0615c000, head 00001de0, tail 00001e90 ELSP: pid 2184, seqno 10:00004126+, prio 4096, emitted -112ms, start 00005000, head 00001e10, tail 00001eb8 Active context: alacritty hw_id 33, prio 0, guilty 0 active 0 rcs0 (submitted by alacritty ) --- gtt_offset = 0x0000fffe fffd7000 Head != Tail != ATCHD based on this, can we conclude it as NOTOURBUG?
Comment 3 Chris Wilson 2019-11-20 10:30:41 UTC
(In reply to Lakshmi from comment #2) > Head != Tail != ATCHD based on this, can we conclude it as NOTOURBUG? The DMAR [iommu] fault makes it very much our problem.
Comment 4 ryan 2019-11-21 11:43:10 UTC
Another very similar one. It seems this is happening semi-regularly and has been with the last few 5.4-rc series kernels. I've discovered it always happens during a 5-10 second hang where music will keep playing etc but input and the display becomes unresponsive (makes sense with the GPU hang I guess. Another log from the latest one for posterity - [ 6307.875466] DMAR: DRHD: handling fault status reg 2 [ 6307.875477] DMAR: [DMA Write] Request device [00:02.0] PASID ffffffff fault addr fffffffefffbf000 [fault reason 07] Next page table ptr is invalid [ 6315.062880] i915 0000:00:02.0: GPU HANG: ecode 9:1:0x85dffffb, in alacritty , hang on rcs0 [ 6315.062884] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 6315.062885] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 6315.062886] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 6315.062887] The GPU crash dump is required to analyze GPU hangs, so please always attach it. [ 6315.062889] GPU crash dump saved to /sys/class/drm/card0/error [ 6315.063909] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Comment 5 ryan 2019-11-21 11:44:08 UTC
Created attachment 146004 [details] Another GPU hang crash output
Comment 6 Martin Peres 2019-11-29 19:50:37 UTC
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/622.