Summary: | [CI] igt@gem_exec_* - fail - Failed assertion: !"GPU hung" | ||
---|---|---|---|
Product: | DRI | Reporter: | Marta Löfstedt <marta.lofstedt> |
Component: | DRM/Intel | Assignee: | Francesco Balestrieri <francesco.balestrieri> |
Status: | CLOSED INVALID | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | high | CC: | intel-gfx-bugs |
Version: | DRI git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | BYT, CNL, GLK, HSW, SKL | i915 features: | GPU hang |
Description
Marta Löfstedt
2018-04-05 05:55:28 UTC
Also, https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4432/shard-glkb6/igt@gem_exec_whisper@normal.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4064/shard-glk1/igt@gem_exec_flush@basic-wb-set-default.html (gem_exec_flush:3819) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:481: (gem_exec_flush:3819) igt_aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-wb-set-default failed. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4092/shard-glk3/igt@gem_exec_store@cachelines-bsd.html (gem_exec_store:1466) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:481: (gem_exec_store:1466) igt_aux-CRITICAL: Failed assertion: !"GPU hung" Subtest cachelines-bsd failed. https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_29/fi-cnl-y3/igt@gem_exec_await@wide-contexts.html (gem_exec_await:2291) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:481: (gem_exec_await:2291) igt_aux-CRITICAL: Failed assertion: !"GPU hung" Subtest wide-contexts failed. (In reply to Martin Peres from comment #4) > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_29/fi-cnl-y3/ > igt@gem_exec_await@wide-contexts.html > > (gem_exec_await:2291) igt_aux-CRITICAL: Test assertion failure function > sig_abort, file ../lib/igt_aux.c:481: > (gem_exec_await:2291) igt_aux-CRITICAL: Failed assertion: !"GPU hung" > Subtest wide-contexts failed. This is a different issue. The GPU hang here is a result of hitting a blocking ioctl in the test. The earlier hangs look more like those fixed by commit ea491b23b2ffba069537a8216060d4d3400931a7 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed May 2 23:03:12 2018 +0100 drm/i915: Reset the hangcheck timestamp before repeating a seqno (In reply to Chris Wilson from comment #5) > (In reply to Martin Peres from comment #4) > > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_29/fi-cnl-y3/ > > igt@gem_exec_await@wide-contexts.html > > > > (gem_exec_await:2291) igt_aux-CRITICAL: Test assertion failure function > > sig_abort, file ../lib/igt_aux.c:481: > > (gem_exec_await:2291) igt_aux-CRITICAL: Failed assertion: !"GPU hung" > > Subtest wide-contexts failed. > > This is a different issue. The GPU hang here is a result of hitting a > blocking ioctl in the test. Using Execlists submission Ring size: 143 batches If we can only fit 143 batches in a ring, why did we submit 144?... <7>[ 134.752155] hangcheck bcs0 <7>[ 134.752174] hangcheck current seqno c707, last c797, hangcheck c707 [4031 ms] <7>[ 134.752178] hangcheck Reset count: 0 (global 0) <7>[ 134.752181] hangcheck Requests: <7>[ 134.752192] hangcheck first c708 [4e0a:1] prio=0 @ 4047ms: gem_exec_await[2291]/3 <7>[ 134.752196] hangcheck last c797 [4e0a:90] prio=0 @ 4037ms: gem_exec_await[2291]/3 <7>[ 134.752207] hangcheck active c708 [4e0a:1] prio=0 @ 4047ms: gem_exec_await[2291]/3 <7>[ 134.752212] hangcheck [head 0000, postfix 0030, tail 0050, batch 0x00000000_00040000] <7>[ 134.752215] hangcheck ring->start: 0x035a8000 <7>[ 134.752218] hangcheck ring->head: 0x00000000 <7>[ 134.752221] hangcheck ring->tail: 0x00002cf8 <7>[ 134.752224] hangcheck ring->emit: 0x00002d00 <7>[ 134.752227] hangcheck ring->space: 0x000012c0 <7>[ 134.752232] hangcheck RING_START: 0x035a8000 <7>[ 134.752237] hangcheck RING_HEAD: 0x00000020 <7>[ 134.752241] hangcheck RING_TAIL: 0x00002cf8 <7>[ 134.752246] hangcheck RING_CTL: 0x00003001 <7>[ 134.752252] hangcheck RING_MODE: 0x00000000 <7>[ 134.752256] hangcheck RING_IMR: feffffff <7>[ 134.752263] hangcheck ACTHD: 0x00000000_00040000 <7>[ 134.752271] hangcheck BBADDR: 0x00000000_00040001 <7>[ 134.752277] hangcheck DMA_FADDR: 0x00000000_00040200 <7>[ 134.752282] hangcheck IPEIR: 0x00000000 <7>[ 134.752286] hangcheck IPEHR: 0x18800101 <7>[ 134.752292] hangcheck Execlist status: 0x00024049 0000057e <7>[ 134.752297] hangcheck Execlist CSB read 1 [1 cached], write 1 [1 from hws], interrupt posted? no, tasklet queued? no (enabled) <7>[ 134.752302] hangcheck ELSP[0] count=1, rq: c797 [4e0a:90] prio=0 @ 4037ms: gem_exec_await[2291]/3 <7>[ 134.752305] hangcheck ELSP[1] idle <7>[ 134.752308] hangcheck HW active? 0x5 <7>[ 134.752355] hangcheck E c708 [4e0a:1] prio=0 @ 4047ms: gem_exec_await[2291]/3 <7>[ 134.752392] hangcheck E c709 [4e0a:2] prio=0 @ 4047ms: gem_exec_await[2291]/3 <7>[ 134.752396] hangcheck E c70a [4e0a:3] prio=0 @ 4047ms: gem_exec_await[2291]/3 <7>[ 134.752399] hangcheck E c70b [4e0a:4] prio=0 @ 4047ms: gem_exec_await[2291]/3 <7>[ 134.752403] hangcheck E c70c [4e0a:5] prio=0 @ 4047ms: gem_exec_await[2291]/3 <7>[ 134.752407] hangcheck E c70d [4e0a:6] prio=0 @ 4047ms: gem_exec_await[2291]/3 <7>[ 134.752411] hangcheck E c70e [4e0a:7] prio=0 @ 4047ms: gem_exec_await[2291]/3 <7>[ 134.752427] hangcheck ...skipping 136 executing requests... <7>[ 134.752431] hangcheck E c797 [4e0a:90] prio=0 @ 4037ms: gem_exec_await[2291]/3 <7>[ 134.752434] hangcheck Queue priority: -2147483648 (In reply to Chris Wilson from comment #6) > (In reply to Chris Wilson from comment #5) > > (In reply to Martin Peres from comment #4) > > > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_29/fi-cnl-y3/ > > > igt@gem_exec_await@wide-contexts.html > > > > > > (gem_exec_await:2291) igt_aux-CRITICAL: Test assertion failure function > > > sig_abort, file ../lib/igt_aux.c:481: > > > (gem_exec_await:2291) igt_aux-CRITICAL: Failed assertion: !"GPU hung" > > > Subtest wide-contexts failed. > > > > This is a different issue. The GPU hang here is a result of hitting a > > blocking ioctl in the test. > > Using Execlists submission > Ring size: 143 batches > > If we can only fit 143 batches in a ring, why did we submit 144?... Nah, last batch has seqno 144. Just an off-by-one (or at least misleading) comment about the number of skipped batches. The ring that stuck was: <7>[ 134.752727] hangcheck vecs0 <7>[ 134.752730] hangcheck current seqno c707, last c740, hangcheck c707 [4031 ms] <7>[ 134.752733] hangcheck Reset count: 0 (global 0) <7>[ 134.752736] hangcheck Requests: <7>[ 134.752740] hangcheck first c708 [4e17:1] prio=0 @ 4036ms: gem_exec_await[2291]/4 <7>[ 134.752744] hangcheck last c740 [4e17:39] prio=0 @ 4033ms: gem_exec_await[2291]/4 <7>[ 134.752748] hangcheck active c708 [4e17:1] prio=0 @ 4036ms: gem_exec_await[2291]/4 <7>[ 134.752752] hangcheck [head 0000, postfix 0030, tail 0050, batch 0x00000000_00040000] <7>[ 134.752755] hangcheck ring->start: 0x035ac000 <7>[ 134.752758] hangcheck ring->head: 0x00000000 <7>[ 134.752761] hangcheck ring->tail: 0x000011c8 <7>[ 134.752764] hangcheck ring->emit: 0x000011d0 <7>[ 134.752767] hangcheck ring->space: 0x00002df0 <7>[ 134.752772] hangcheck RING_START: 0x035ac000 <7>[ 134.752776] hangcheck RING_HEAD: 0x00000020 <7>[ 134.752780] hangcheck RING_TAIL: 0x000011c8 <7>[ 134.752786] hangcheck RING_CTL: 0x00003001 <7>[ 134.752791] hangcheck RING_MODE: 0x00000000 <7>[ 134.752795] hangcheck RING_IMR: fffffeff <7>[ 134.752802] hangcheck ACTHD: 0x00000000_00040000 <7>[ 134.752809] hangcheck BBADDR: 0x00000000_00040001 <7>[ 134.752816] hangcheck DMA_FADDR: 0x00000000_00040200 <7>[ 134.752821] hangcheck IPEIR: 0x00000000 <7>[ 134.752825] hangcheck IPEHR: 0x18800101 <7>[ 134.752830] hangcheck Execlist status: 0x00044052 0000057f <7>[ 134.752835] hangcheck Execlist CSB read 1 [1 cached], write 1 [1 from hws], interrupt posted? no, tasklet queued? no (enabled) <7>[ 134.752840] hangcheck ELSP[0] count=1, rq: c740 [4e17:39] prio=0 @ 4033ms: gem_exec_await[2291]/4 <7>[ 134.752843] hangcheck ELSP[1] idle <7>[ 134.752846] hangcheck HW active? 0x5 <7>[ 134.752850] hangcheck E c708 [4e17:1] prio=0 @ 4036ms: gem_exec_await[2291]/4 <7>[ 134.752853] hangcheck E c709 [4e17:2] prio=0 @ 4036ms: gem_exec_await[2291]/4 <7>[ 134.752857] hangcheck E c70a [4e17:3] prio=0 @ 4036ms: gem_exec_await[2291]/4 <7>[ 134.752861] hangcheck E c70b [4e17:4] prio=0 @ 4036ms: gem_exec_await[2291]/4 <7>[ 134.752865] hangcheck E c70c [4e17:5] prio=0 @ 4036ms: gem_exec_await[2291]/4 <7>[ 134.752868] hangcheck E c70d [4e17:6] prio=0 @ 4036ms: gem_exec_await[2291]/4 <7>[ 134.752872] hangcheck E c70e [4e17:7] prio=0 @ 4036ms: gem_exec_await[2291]/4 <7>[ 134.752880] hangcheck ...skipping 49 executing requests... <7>[ 134.752884] hangcheck E c740 [4e17:39] prio=0 @ 4033ms: gem_exec_await[2291]/4 <7>[ 134.752887] hangcheck Queue priority: -2147483648 <7>[ 134.752890] hangcheck IRQ? 0x1 (breadcrumbs? yes) (execlists? no) <7>[ 134.752893] hangcheck HWSP: <7>[ 134.752897] hangcheck 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7>[ 134.752900] hangcheck * <7>[ 134.752904] hangcheck 00000040 00008002 0000057f 00008002 0000057f 00008002 0000057f 00008002 0000057f <7>[ 134.752909] hangcheck 00000060 00008002 0000057f 00008002 0000057f 00000000 00000000 00000000 00000000 <7>[ 134.752913] hangcheck 00000080 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7>[ 134.752917] hangcheck 000000a0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 <7>[ 134.752922] hangcheck 000000c0 0000c707 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7>[ 134.752926] hangcheck 000000e0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7>[ 134.752929] hangcheck * <7>[ 134.752932] hangcheck Idle? no which isn't out of ring space... Oh, maybe it is just the premature hangcheck, but how? Hmm. Reminder to self: dump the kernel stack from the GPU hang signal handler. https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_30/fi-glk-j4005/igt@gem_exec_await@wide-contexts.html (gem_exec_await:2505) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:481: (gem_exec_await:2505) igt_aux-CRITICAL: Failed assertion: !"GPU hung" Subtest wide-contexts failed. https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_30/fi-skl-guc/igt@gem_exec_await@wide-contexts.html (gem_exec_await:2704) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:481: (gem_exec_await:2704) igt_aux-CRITICAL: Failed assertion: !"GPU hung" Subtest wide-contexts failed. Waiting for a debug dump after commit 91b5a3ef5516b29584ea4567b0f5ffa18219b29f (upstream/master) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu May 3 21:53:41 2018 +0100 lib: Show the kernel stack when reporting a GPU hang In a few tests (like gem_exec_await, gem_exec_schedule) we use the GPU hang to break a deadlock hit during test setup. In these case we would like to see where in kernel the process is stuck (if at all). References: https://bugs.freedesktop.org/show_bug.cgi?id=105900 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Antonio Argenziano <antonio.argenziano@intel.com> https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4477/fi-byt-n2820/igt@gem_exec_fence@basic-await-default.html (gem_exec_fence:1381) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:500: (gem_exec_fence:1381) igt_aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-await-default failed. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4286/fi-hsw-peppy/igt@gem_exec_suspend@basic-s4-devices.html (gem_exec_suspend:2755) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:500: (gem_exec_suspend:2755) igt_aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-S4-devices failed. Too many conflating *valid* fails end up being claimed by this bug. Please refine. Closing, thanks. (In reply to Martin Peres from comment #13) > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4286/fi-hsw-peppy/ > igt@gem_exec_suspend@basic-s4-devices.html > > (gem_exec_suspend:2755) igt_aux-CRITICAL: Test assertion failure function > sig_abort, file ../lib/igt_aux.c:500: > (gem_exec_suspend:2755) igt_aux-CRITICAL: Failed assertion: !"GPU hung" > Subtest basic-S4-devices failed. Moved the igt@gem_exec_await@wide-contexts failures to https://bugs.freedesktop.org/show_bug.cgi?id=106680 Moved the BYT failures to https://bugs.freedesktop.org/show_bug.cgi?id=107861 |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.