Summary: | [CI][RESUME] igt@* - fail - Failed assertion: !"GPU hung" | ||
---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | major | ||
Priority: | high | CC: | intel-gfx-bugs, stanislav.lisovskiy |
Version: | XOrg git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | TGL | i915 features: | GEM/Other |
Description
Martin Peres
2019-09-09 08:14:24 UTC
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * TGL: all tests - fail - Failed assertion: !"GPU hung" - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-self-blt.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_ctx_switch@legacy-blt-queue.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_parallel@bcs0.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_await@wide-all.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-contexts-chain-render.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-other-chain-blt.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_ctx_shared@q-out-order-blt.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_ctx_switch@bcs0-heavy.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_balancer@full-pulse.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_store@pages-bcs0.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-chain-blt.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@smoketest-all.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@smoketest-blt.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-other-bsd1.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_ctx_switch@queue-light.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_nop@basic-parallel.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-contexts-blt.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_parallel@fds.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@semaphore-user.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_await@wide-contexts.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@promotion-blt.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-other-chain-bsd1.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-contexts-chain-vebox.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_basic@readonly-all.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-blt.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_ctx_shared@q-in-order-blt.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_basic@gtt-bcs0.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-chain-vebox.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_ctx_shared@q-smoketest-bsd1.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_busy@extended-semaphore-vcs1.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_parallel@vcs1-contexts.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_reuse@single.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_parallel@vcs1-fds.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-contexts-bsd2.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_ctx_switch@legacy-blt-heavy.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@prime_busy@wait-before-blt.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_ctx_switch@bcs0-heavy-queue.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-self-bsd2.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@independent-bsd2.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_balancer@indices.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_ctx_switch@all-light.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-contexts-bsd1.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_reuse@baggage.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@out-order-blt.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-self-render.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@pi-ringfull-blt.html <7> [294.989903] hangcheck bcs0 <7> [294.989915] hangcheck Awake? 2 <7> [294.989920] hangcheck Hangcheck: 6016 ms ago <7> [294.989924] hangcheck Reset count: 56 (global 39) <7> [294.989928] hangcheck Requests: <7> [294.989935] hangcheck MMIO base: 0x00022000 <7> [294.989947] hangcheck RING_START: 0x0160f000 <7> [294.989954] hangcheck RING_HEAD: 0x00000000 <7> [294.989962] hangcheck RING_TAIL: 0x00000068 <7> [294.989972] hangcheck RING_CTL: 0x00003001 <7> [294.989983] hangcheck RING_MODE: 0x00000000 <7> [294.989990] hangcheck RING_IMR: 00000000 <7> [294.990003] hangcheck ACTHD: 0x00000000_0a000140 <7> [294.990017] hangcheck BBADDR: 0x00000000_00000000 <7> [294.990031] hangcheck DMA_FADDR: 0x00000000_00000000 <7> [294.990038] hangcheck IPEIR: 0x00000000 <7> [294.990045] hangcheck IPEHR: 0x11081003 Another failed context restore. Getting closer with https://patchwork.freedesktop.org/patch/329718/?series=66415&rev=2 *** Bug 111604 has been marked as a duplicate of this bug. *** (In reply to Chris Wilson from comment #2) > Another failed context restore. Getting closer with > https://patchwork.freedesktop.org/patch/329718/?series=66415&rev=2 Unfortunately, that was a fluke. Normal service resumed on the next run. Doing now some testing with tgl, when I do submit multiple gpgpu_fill commands I constantly get this: (kms_plane_stress:3092) gpu_cmds-CRITICAL: Test assertion failure function gen7_render_flush, file ../lib/gpu_cmds.c:36: (kms_plane_stress:3092) gpu_cmds-CRITICAL: Failed assertion: ret == 0 (kms_plane_stress:3092) gpu_cmds-CRITICAL: Last errno: 5, Input/output error Pausing GPU thread 0 Stack trace: #0 ../lib/igt_core.c:1694 __igt_fail_assert() #1 ../lib/gpu_cmds.c:36 gen7_render_flush() #2 ../lib/gpgpu_fill.c:356 gen12p1_gpgpu_fillfunc() #3 ../tests/kms_plane_stress.c:318 gpu_load() #4 /build/glibc-OTsEL5/glibc-2.27/nptl/pthread_create.c:463 start_thread() #5 ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 __clone() Which works quite fine with ICL and other platforms. In dmesg I have this: [ 3108.643351] hangcheck rcs0 [ 3108.643420] hangcheck Awake? 2 [ 3108.643428] hangcheck Hangcheck: 6016 ms ago [ 3108.643434] hangcheck Reset count: 0 (global 0) [ 3108.643440] hangcheck Requests: [ 3108.643628] hangcheck active 1a:4* prio=2 @ 7900ms: kms_plane_stres[1347] [ 3108.643689] hangcheck ring->start: 0x00008000 [ 3108.643708] hangcheck ring->head: 0x00000048 [ 3108.643724] hangcheck ring->tail: 0x00003078 [ 3108.643733] hangcheck ring->emit: 0x00003080 [ 3108.643738] hangcheck ring->space: 0x00000f88 [ 3108.643745] hangcheck ring->hwsp: 0xffff81c0 [ 3108.643753] hangcheck [head 0080, postfix 00c8, tail 0100, batch 0x00000000_007ea000]: [ 3108.643820] hangcheck [0000] 7a000004 21144c1c fffff080 00000000 00000000 00000000 02800000 00000000 [ 3108.643832] hangcheck [0020] 10400002 ffff81c0 00000000 00000003 04000001 18800101 007ea000 00000000 [ 3108.643841] hangcheck [0040] 04000000 00000000 7a000004 111050a1 ffff81c0 00000000 00000004 00000000 [ 3108.643849] hangcheck [0060] 01000000 04000001 0e40c002 00000000 ffffe0c8 00000000 02800000 00000000 [ 3108.644037] hangcheck MMIO base: 0x00002000 [ 3108.644085] hangcheck RING_START: 0x00008000 [ 3108.644098] hangcheck RING_HEAD: 0x000000c0 [ 3108.644110] hangcheck RING_TAIL: 0x00003078 [ 3108.644139] hangcheck RING_CTL: 0x00003001 [ 3108.644158] hangcheck RING_MODE: 0x00000000 [ 3108.644173] hangcheck RING_IMR: 00000000 [ 3108.644198] hangcheck ACTHD: 0x00000000_007ea884 [ 3108.644223] hangcheck BBADDR: 0x00000000_007ea885 [ 3108.644246] hangcheck DMA_FADDR: 0x00000000_007eaa80 [ 3108.644257] hangcheck IPEIR: 0x00000000 [ 3108.644267] hangcheck IPEHR: 0x25014100 [ 3108.644286] hangcheck Execlist status: 0x00002098 00000040, entries 12 [ 3108.644295] hangcheck Execlist CSB read 8, write 8, tasklet queued? no (enabled) [ 3108.644318] hangcheck Active[0: ring:{start:00008000, hwsp:ffff81c0, seqno:00000003}, rq: 1a:c2 prio=2 @ 7748ms: kms_plane_stres[1347] [ 3108.644343] hangcheck E 1a:4* prio=2 @ 7901ms: kms_plane_stres[1347] [ 3108.644352] hangcheck E 1a:6 prio=2 @ 7900ms: kms_plane_stres[1347] [ 3108.644360] hangcheck E 1a:8 prio=2 @ 7899ms: kms_plane_stres[1347] [ 3108.644368] hangcheck E 1a:a prio=2 @ 7898ms: kms_plane_stres[1347] [ 3108.644377] hangcheck E 1a:c prio=2 @ 7898ms: kms_plane_stres[1347] [ 3108.644384] hangcheck E 1a:e prio=2 @ 7897ms: kms_plane_stres[1347] [ 3108.644392] hangcheck E 1a:10 prio=2 @ 7896ms: kms_plane_stres[1347] [ 3108.644442] hangcheck ...skipping 88 executing requests... [ 3108.644450] hangcheck E 1a:c2 prio=2 @ 7748ms: kms_plane_stres[1347] [ 3108.644457] hangcheck HWSP: [ 3108.644470] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 3108.644475] hangcheck * [ 3108.644486] hangcheck [0040] 00010001 00010005 00010001 00010005 00010001 00010005 00010001 00010005 [ 3108.644491] hangcheck * [ 3108.644499] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000008 [ 3108.644508] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 3108.644513] hangcheck * [ 3108.644563] hangcheck Idle? no [ 3108.644578] hangcheck Signals: [ 3108.644676] hangcheck [1a:44] @ 7846ms [ 3108.651414] i915 0000:00:02.0: GPU HANG: ecode 12:1:0xdadebeff, in kms_plane_stres [1347], hang on rcs0 [ 3108.651930] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 3108.651945] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 3108.651953] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 3108.651958] The GPU crash dump is required to analyze GPU hangs, so please always attach it. [ 3108.651965] GPU crash dump saved to /sys/class/drm/card0/error kms_plane_stress is not yet in IGT however, I think there is definitely a bug, however I don't have any clue what gpu hang might mean. Most likely a dup of '593. Once that critical and wide reaching bug is resolved, we will have a better indication of what else is broken. *** This bug has been marked as a duplicate of bug 111593 *** Re-opening since 111513 has been fixed but the problem still persists. A CI Bug Log filter associated to this bug has been updated: {- TGL: all tests - fail - Failed assertion: !"GPU hung" -} {+ TGL: all tests - fail / warn - Failed assertion: !"GPU hung" +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5188/fi-tgl-u/igt@gem_exec_fence@nb-await-default.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6908/fi-tgl-u/igt@gem_exec_fence@nb-await-default.html * https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5184/fi-tgl-u/igt@gem_exec_fence@nb-await-default.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6901/fi-tgl-u/igt@gem_exec_fence@nb-await-default.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6902/fi-tgl-u/igt@gem_exec_fence@nb-await-default.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6903/fi-tgl-u/igt@gem_exec_fence@basic-await-default.html (In reply to CI Bug Log from comment #8) > A CI Bug Log filter associated to this bug has been updated: > > {- TGL: all tests - fail - Failed assertion: !"GPU hung" -} > {+ TGL: all tests - fail / warn - Failed assertion: !"GPU hung" +} > > New failures caught by the filter: > > * > https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5188/fi-tgl-u/ > igt@gem_exec_fence@nb-await-default.html > * > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6908/fi-tgl-u/ > igt@gem_exec_fence@nb-await-default.html > * > https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5184/fi-tgl-u/ > igt@gem_exec_fence@nb-await-default.html > * > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6901/fi-tgl-u/ > igt@gem_exec_fence@nb-await-default.html > * > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6902/fi-tgl-u/ > igt@gem_exec_fence@nb-await-default.html > * > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6903/fi-tgl-u/ > igt@gem_exec_fence@basic-await-default.html That's a very particular hang. Not related to the earlier report. (In reply to Chris Wilson from comment #9) > (In reply to CI Bug Log from comment #8) > > https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5188/fi-tgl-u/ > > igt@gem_exec_fence@nb-await-default.html > > That's a very particular hang. Not related to the earlier report. https://patchwork.freedesktop.org/series/66703/ https://patchwork.freedesktop.org/series/66718/ Chris, can I mark this fixed given the above patches? I see they are reviewed/acked-by but I'm not sure if they went in. Not the original bug, but commit c45e788d95b470e9f68fabe1f3cb44beb5dd7840 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Sep 19 16:18:11 2019 +0100 drm/i915/tgl: Suspend pre-parser across GTT invalidations nevertheless. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.