Summary: | [CI] igt@gem_exec_schedule@* - fail - !"GPU hung" | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Marta Löfstedt <marta.lofstedt> | ||||||
Component: | DRM/Intel | Assignee: | Kimmo Nikkanen <knikkane> | ||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||
Severity: | normal | ||||||||
Priority: | high | CC: | intel-gfx-bugs | ||||||
Version: | DRI git | ||||||||
Hardware: | Other | ||||||||
OS: | All | ||||||||
Whiteboard: | ReadyForDev | ||||||||
i915 platform: | BXT, GLK, KBL | i915 features: | |||||||
Attachments: |
|
Description
Marta Löfstedt
2017-09-19 06:15:37 UTC
Also, on APL-shards CI_DRM_3200 igt@gem_exec_schedule@reorder-wide-bsd (gem_exec_schedule:1482) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:446: (gem_exec_schedule:1482) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest reorder-wide-bsd failed. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3200/shard-apl4/igt@gem_exec_schedule@reorder-wide-bsd.html Something weird happened here: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3210/shard-apl1/igt@gem_exec_schedule@deep-vebox.html the igt@gem_exec_schedule@deep-vebox has been skipped on all runs fir APL-shards except this one. (gem_exec_schedule:1476) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:446: (gem_exec_schedule:1476) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest deep-vebox failed. So, is there a context leak from previous tests? Here are the previous tests in this shard: pass: igt/gem_exec_reloc/basic-gtt-wc-active skip: igt/kms_concurrent/pipe-E skip: igt/gem_exec_parallel/bsd1-contexts skip: igt/kms_frontbuffer_tracking/fbcpsr-2p-primscrn-spr-indfb-fullscreen pass: igt/kms_chv_cursor_fail/pipe-A-128x128-right-edge skip: igt/gem_mmap_gtt/basic-write-cpu-read-gtt skip: igt/chamelium/vga-edid-read skip: igt/kms_flip/2x-absolute-wf_vblank pass: igt/perf/i915-ref-count pass: igt/kms_cursor_crc/cursor-256x85-offscreen skip: igt/kms_psr_sink_crc/cursor_mmap_gtt pass: igt/gem_persistent_relocs/forked-faulting-reloc-thrashing skip: igt/kms_cursor_legacy/2x-long-cursor-vs-flip-legacy pass: igt/gem_mmap_gtt/basic-write-read skip: igt/kms_plane_multiple/atomic-pipe-D-tiling-yf pass: igt/syncobj_wait/multi-wait-for-submit-submitted pass: igt/kms_draw_crc/draw-method-xrgb2101010-mmap-wc-xtiled skip: igt/kms_frontbuffer_tracking/psr-1p-primscrn-pri-indfb-draw-mmap-wc pass: igt/drv_hangman/error-state-sysfs-entry pass: igt/kms_draw_crc/draw-method-rgb565-mmap-gtt-xtiled pass: igt/kms_legacy_colorkey The hang means that it took too long to copy i915_engines_info, longer than give or take 6s (depending on how the hangcheck aligns). The only change there is that we now use the drm_printer indirection. (In reply to Chris Wilson from comment #4) > The hang means that it took too long to copy i915_engines_info, longer than > give or take 6s (depending on how the hangcheck aligns). The only change > there is that we now use the drm_printer indirection. I don't like the behavior of igt, where we assert out before deciding if the test should be skipped or not. It causes noise. We don't know how long the kernel will take to do the operations, but the *kernel* is also imposing the time constraint for the entire sequence of operations. For CI the problem is compounded by lockdep making it even slower, predicting the limits is impossible and subject to change. The best way around it is to disable the limitations the kernel imposes upon userspace, but those patches fell on an unreceptive audience. Also, on another subtest: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3228/shard-apl2/igt@gem_exec_schedule@reorder-wide-blt.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3239/shard-kbl1/igt@gem_exec_schedule@deep-bsd2.html CI_DRM_3277 KBL-shards igt@gem_exec_schedule@preempt-other-vebox fail: (gem_exec_schedule:1541) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_schedule:1541) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest preempt-other-vebox failed. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3277/shard-kbl3/igt@gem_exec_schedule@preempt-other-vebox.html new subtest on: CI_DRM_3288 shard-kbl2 igt@gem_exec_schedule@preempt-other-bsd (gem_exec_schedule:2576) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_schedule:2576) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest preempt-other-bsd failed. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3288/shard-kbl2/igt@gem_exec_schedule@preempt-other-bsd.html new subtest on: CI_DRM_3288 shard-apl6 igt@gem_exec_schedule@wide-blt (gem_exec_schedule:1842) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_schedule:1842) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest wide-blt failed. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3288/shard-apl6/igt@gem_exec_schedule@wide-blt.html (In reply to Marta Löfstedt from comment #0) > On CI_DRM_3099 > > (drv_module_reload:1435) igt-aux-CRITICAL: Test assertion failure function > sig_abort, file igt_aux.c:444: > (drv_module_reload:1435) igt-aux-CRITICAL: Failed assertion: !"GPU hung" > > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3099/shard-snb2/ > igt@drv_module_reload@basic-reload.html Just noticed that the original bug is for a completely different problem than gem_exec_schedule. The original bug is for ring initialisation failure on module load, which may be the same one that's been plaguing snb since 2010. Also see bug 103514, which is about the BAT machine fi-glk-dsi, where I filed wedged GPU and issues that happens after. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3305/shard-apl8/igt@gem_exec_schedule@preempt-self-vebox.html (gem_exec_schedule:18610) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_schedule:18610) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest preempt-self-vebox failed. <7>[ 2792.812533] [IGT] gem_exec_schedule: starting subtest preempt-self-vebox <7>[ 2796.776113] [drm:missed_breadcrumb [i915]] rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x68/0x90 [i915], irq posted? yes, current seqno=eb6, last=eba <6>[ 2803.831581] [drm] GPU HANG: ecode 9:1:0xe77ffef2, in gem_exec_schedu [18610], reason: Hang on bcs0, action: reset <6>[ 2803.831614] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. <6>[ 2803.831635] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel <6>[ 2803.831656] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. <6>[ 2803.831677] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. <6>[ 2803.831698] [drm] GPU crash dump saved to /sys/class/drm/card0/error <7>[ 2803.831922] [drm:i915_reset_device [i915]] resetting chip <5>[ 2803.832018] i915 0000:00:02.0: Resetting chip after gpu hang <7>[ 2803.835942] [drm:i915_gem_reset_engine [i915]] context gem_exec_schedu[18610]/1 marked guilty (score 10) banned? no <7>[ 2803.836020] [drm:i915_gem_reset_engine [i915]] resetting bcs0 to restart from tail of request 0x2c9 <6>[ 2803.836123] [drm] RC6 on <7>[ 2803.836261] [drm:gen8_init_common_ring [i915]] Execlists enabled for rcs0 <7>[ 2803.836354] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 12 <7>[ 2803.836480] [drm:gen8_init_common_ring [i915]] Execlists enabled for bcs0 <7>[ 2803.836606] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs0 <7>[ 2803.836728] [drm:gen8_init_common_ring [i915]] Execlists enabled for vecs0 <7>[ 2804.017517] [IGT] gem_exec_schedule: exiting, ret=99 https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3319/shard-kbl7/igt@gem_exec_schedule@preempt-self-blt.html (gem_exec_schedule:1382) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_schedule:1382) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest preempt-self-blt failed. APL-shards new subtest: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3321/shard-apl4/igt@gem_exec_schedule@preempt-other-blt.html (gem_exec_schedule:8484) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_schedule:8484) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest preempt-other-blt failed. <7>[ 842.217263] [IGT] gem_exec_schedule: executing <4>[ 842.240089] Setting dangerous option reset - tainting kernel <7>[ 842.241912] [IGT] gem_exec_schedule: starting subtest preempt-other-blt <7>[ 846.210788] [drm:missed_breadcrumb [i915]] rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x68/0x90 [i915], irq posted? yes, current seqno=25143, last=25147 <7>[ 857.224472] [drm:i915_reset_device [i915]] resetting chip <5>[ 857.224663] i915 0000:00:02.0: Resetting chip after gpu hang <7>[ 857.225994] [drm:i915_gem_reset_engine [i915]] context gem_exec_schedu[8484]/2 marked guilty (score 10) banned? no <7>[ 857.226214] [drm:i915_gem_reset_engine [i915]] resetting rcs0 to restart from tail of request 0x25144 <7>[ 857.226409] [drm:i915_gem_reset_engine [i915]] context gem_exec_schedu[8484]/2 marked guilty (score 20) banned? no <7>[ 857.226572] [drm:i915_gem_reset_engine [i915]] resetting bcs0 to restart from tail of request 0x6a8c <7>[ 857.226754] [drm:i915_gem_reset_engine [i915]] context gem_exec_schedu[8484]/2 marked guilty (score 30) banned? no <7>[ 857.226832] [drm:i915_gem_reset_engine [i915]] resetting vcs0 to restart from tail of request 0x2018 <7>[ 857.226931] [drm:i915_gem_reset_engine [i915]] context gem_exec_schedu[8484]/2 marked guilty (score 40) banned? yes <7>[ 857.227003] [drm:i915_gem_reset_engine [i915]] client gem_exec_schedu[8484]/2 has had 1 context banned <7>[ 857.227074] [drm:i915_gem_reset_engine [i915]] resetting vecs0 to restart from tail of request 0xa051 <6>[ 857.227204] [drm] RC6 on <7>[ 857.227373] [drm:gen8_init_common_ring [i915]] Execlists enabled for rcs0 <7>[ 857.227469] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 12 <7>[ 857.227599] [drm:gen8_init_common_ring [i915]] Execlists enabled for bcs0 <7>[ 857.227725] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs0 <7>[ 857.227851] [drm:gen8_init_common_ring [i915]] Execlists enabled for vecs0 <7>[ 857.237878] [IGT] gem_exec_schedule: exiting, ret=99 We're hitting the same issue randomly in test igt@gem_sync@basic-store-each: $ : sudo -E ./gem_sync --r basic-store-each IGT-Version: 1.20-ge6c4968 (x86_64) (Linux: 4.14.0-rc8-drm-intel-qa-ww45-commit-8eba051+ x86_64) Using GuC submission Has kernel scheduler - With priority sorting - With preemption enabled blt completed 14336 cycles: 354.177 us bsd1 completed 14336 cycles: 355.300 us bsd2 completed 14336 cycles: 358.892 us render completed 12288 cycles: 420.673 us (gem_sync:1602) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_sync:1602) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Stack trace: #0 [__igt_fail_assert+0x101] #1 [sig_abort+0x3a] #2 [killpg+0x40] #3 [__wait+0x1e] #4 [igt_waitchildren+0x68] #5 [igt_waitchildren_timeout+0xe] #6 [store_ring+0x291] #7 [__real_main785+0x5ae] #8 [main+0x23] #9 [__libc_start_main+0xf1] #10 [_start+0x29] #11 [<unknown>+0x29] Subtest basic-store-each failed. **** DEBUG **** (gem_sync:1602) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_sync:1602) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_sync:1602) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_sync:1602) igt-aux-CRITICAL: Failed assertion: !"GPU hung" (gem_sync:1602) igt-core-INFO: Stack trace: (gem_sync:1602) igt-core-INFO: #0 [__igt_fail_assert+0x101] (gem_sync:1602) igt-core-INFO: #1 [sig_abort+0x3a] (gem_sync:1602) igt-core-INFO: #2 [killpg+0x40] (gem_sync:1602) igt-core-INFO: #3 [__wait+0x1e] (gem_sync:1602) igt-core-INFO: #4 [igt_waitchildren+0x68] (gem_sync:1602) igt-core-INFO: #5 [igt_waitchildren_timeout+0xe] (gem_sync:1602) igt-core-INFO: #6 [store_ring+0x291] (gem_sync:1602) igt-core-INFO: #7 [__real_main785+0x5ae] (gem_sync:1602) igt-core-INFO: #8 [main+0x23] (gem_sync:1602) igt-core-INFO: #9 [__libc_start_main+0xf1] (gem_sync:1602) igt-core-INFO: #10 [_start+0x29] (gem_sync:1602) igt-core-INFO: #11 [<unknown>+0x29] **** END **** Subtest basic-store-each: FAIL (9.969s) Created attachment 135362 [details]
kernl_log_sig_abort
In this case it failed 1 of 5.
Created attachment 135363 [details]
error_state
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3326/shard-apl4/igt@gem_exec_schedule@wide-vebox.html (gem_exec_schedule:7972) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_schedule:7972) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest wide-vebox failed. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3354/shard-kbl7/igt@gem_exec_schedule@preempt-self-render.html (gem_exec_schedule:1592) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_schedule:1592) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest preempt-self-render failed. no *ERROR* in dmesg. However, <7>[ 74.590165] [IGT] gem_exec_schedule: starting subtest preempt-self-render <3>[ 74.697875] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang: ... <7>[ 78.729290] [drm:missed_breadcrumb [i915]] rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x5f/0x80 [i915], irq posted? yes, current seqno=1e4b8, last=1e4bc <3>[ 80.710935] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3355/shard-kbl5/igt@gem_exec_schedule@wide-bsd1.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3364/shard-kbl3/igt@gem_exec_schedule@preempt-other-render.html (gem_exec_schedule:2134) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_schedule:2134) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest preempt-other-render failed. https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3994/shard-kbl7/igt@gem_exec_schedule@preempt-self-bsd.html (gem_exec_schedule:1950) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_schedule:1950) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest preempt-self-bsd failed. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3390/shard-kbl6/igt@gem_exec_schedule@deep-bsd.html (gem_exec_schedule:1734) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_schedule:1734) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest deep-bsd failed. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3416/shard-kbl3/igt@gem_exec_schedule@wide-bsd2.html (gem_exec_schedule:1585) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_exec_schedule:1585) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest wide-bsd2 failed. We have it on today's KBL results with test igt@gem_exec_suspend@basic-s4-devices: (gem_exec_suspend:9573) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_exec_suspend:9573) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-S4-devices failed. IGT-Version: 1.20-g476c4b4 (x86_64) (Linux: 4.15.0-rc1-drm-intel-qa-ww48-commit-807db75+ x86_64) Stack trace: #0 [__igt_fail_assert+0x101] #1 [sig_abort+0x3a] #2 [killpg+0x40] #3 [__write_nocancel+0x7] #4 [igt_sysfs_write+0x43] #5 [igt_sysfs_set+0x2e] #6 [igt_system_suspend_autoresume+0x420] #7 [run_test+0x486] #8 [__real_main243+0x13c] #9 [main+0x23] #10 [__libc_start_main+0xf1] #11 [_start+0x29] #12 [<unknown>+0x29] Subtest basic-S4-devices: FAIL (16.261s) (In reply to Elizabeth from comment #28) > We have it on today's KBL results with test > igt@gem_exec_suspend@basic-s4-devices: > > (gem_exec_suspend:9573) igt-aux-CRITICAL: Test assertion failure function > sig_abort, file igt_aux.c:482: > (gem_exec_suspend:9573) igt-aux-CRITICAL: Failed assertion: !"GPU hung" > Subtest basic-S4-devices failed. > > IGT-Version: 1.20-g476c4b4 (x86_64) (Linux: > 4.15.0-rc1-drm-intel-qa-ww48-commit-807db75+ x86_64) > Stack trace: > #0 [__igt_fail_assert+0x101] > #1 [sig_abort+0x3a] > #2 [killpg+0x40] > #3 [__write_nocancel+0x7] > #4 [igt_sysfs_write+0x43] > #5 [igt_sysfs_set+0x2e] > #6 [igt_system_suspend_autoresume+0x420] > #7 [run_test+0x486] > #8 [__real_main243+0x13c] > #9 [main+0x23] > #10 [__libc_start_main+0xf1] > #11 [_start+0x29] > #12 [<unknown>+0x29] > Subtest basic-S4-devices: FAIL (16.261s) which is nothing to do with this bug. This bug is either the sporadic SNB hang on module load, or that the timeout for gem_exec_schedule results in a reported hung GPU. (Strange dup.) (In reply to Chris Wilson from comment #29) > (In reply to Elizabeth from comment #28) > > ... > which is nothing to do with this bug. > > This bug is either the sporadic SNB hang on module load, or that the timeout > for gem_exec_schedule results in a reported hung GPU. (Strange dup.) Understood, then should this go to bug 104020, seems to be same behavior with gem_exec_suspend@basic-s* tests. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3450/shard-kbl3/igt@gem_exec_schedule@preempt-other-bsd2.html (gem_exec_schedule:4029) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_exec_schedule:4029) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest preempt-other-bsd2 failed. https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4068/shard-kbl4/igt@gem_exec_schedule@preempt-other-bsd1.html (gem_exec_schedule:1619) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_exec_schedule:1619) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest preempt-other-bsd1 failed. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3633/shard-apl8/igt@gem_exec_schedule@wide-bsd.html (gem_exec_schedule:1720) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_exec_schedule:1720) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest wide-bsd failed. There is hangcheck data in dmesg: <7>[ 244.339804] [IGT] gem_exec_schedule: starting subtest wide-bsd ... <7>[ 253.900331] hangcheck Idle? no <6>[ 253.919540] [drm] GPU HANG: ecode 9:2:0x3d7fffff, in gem_exec_schedu [1720], reason: Hang on vcs0, action: reset I clean this bug up a bit from cibuglog perspective: the GPU hung fail has not been reproduced on: igt@gem_exec_schedule@preempt-self-bsd igt@drv_module_reload@basic-reload igt@gem_exec_schedule@preempt-self-bsd1 for a very long time and there was another issue happening on the last hit. So I removed them from impact in cibuglog This appear to be the "real" occurrences of this issue from the past 50 runs up until CI_DRM_3684: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3658/shard-glkb4/igt@gem_exec_schedule@reorder-wide-bsd.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3656/shard-apl6/igt@gem_exec_schedule@reorder-wide-vebox.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3658/shard-glkb4/igt@gem_exec_schedule@reorder-wide-bsd.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4050/shard-apl5/igt@gem_exec_schedule@deep-bsd.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3673/shard-kbl5/igt@gem_exec_schedule@deep-bsd.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4174/shard-kbl1/igt@gem_exec_schedule@deep-bsd2.html So, this is still very much an issue on GLK-, APL- and KBL-shards. SNB-shards has not been seen since: CI_DRM_3393: 2017-11-27 / 404 runs ago. So I will remove that from cibuglog impact. I notice that gem_exec_whisper has been lumped into this one from cibuglog; if that hangs that is a different issue (not the spurious slowdown that is affecting gem_exec_schedule from time to time) -- please could you track it separately. (In reply to Chris Wilson from comment #36) > I notice that gem_exec_whisper has been lumped into this one from cibuglog; > if that hangs that is a different issue (not the spurious slowdown that is > affecting gem_exec_schedule from time to time) -- please could you track it > separately. OK done. I.e. please disregard Comment 22 https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3726/shard-glkb3/igt@gem_exec_schedule@preempt-self-bsd.html (gem_exec_schedule:1494) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_exec_schedule:1494) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest preempt-self-bsd failed. https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4240/shard-apl6/igt@gem_exec_whisper@normal.html (gem_exec_whisper:2800) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_exec_whisper:2800) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest normal failed. My understanding is that the original bug is resolved with commit 6db24416fdcdf5571125f9005089241cc6ba2652 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jan 3 18:09:09 2018 +0000 lib/gem: Reset the global seqno at the start of each test When we require GEM, reset the global seqno. This gives each test a clean slate to work with, and avoids left-over state from previous tests impacting on the next. In particular, somes tests may be setting up long sequence of stalling batches not expecting to hit a seqno wraparound (leftover from, for example, gem_exec_whisper), causing long GPU hangs and incompletes in CI if they do. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com> However, we may have a few false dupes left which need their own tracking. (In reply to Chris Wilson from comment #40) > My understanding is that the original bug is resolved with > > commit 6db24416fdcdf5571125f9005089241cc6ba2652 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Wed Jan 3 18:09:09 2018 +0000 > > lib/gem: Reset the global seqno at the start of each test > > When we require GEM, reset the global seqno. This gives each test a > clean slate to work with, and avoids left-over state from previous tests > impacting on the next. In particular, somes tests may be setting up long > sequence of stalling batches not expecting to hit a seqno wraparound > (leftover from, for example, gem_exec_whisper), causing long GPU hangs > and incompletes in CI if they do. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com> > > However, we may have a few false dupes left which need their own tracking. OK Chris I will close and archive the bug and we'll see what pops up. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.