https://bugs.freedesktop.org/show_bug.cgi?id=111108 <7>[ 538.919490] [IGT] i915_selftest: starting subtest live_execlists ... <6>[ 539.473261] i915: Performing live selftests with st_random_seed=0xe061e48a st_timeout=1000 ... <6>[ 539.544358] i915: Running intel_execlists_live_selftests/live_unlite_switch ... <3>[ 539.772486] igt_live_test_end+0x1f/0xf0 [i915] timed out, cancelling all further testing.
(In reply to Imre Deak from comment #0) > https://bugs.freedesktop.org/show_bug.cgi?id=111108 Absolutely nothing to do with that.
(In reply to Chris Wilson from comment #1) > (In reply to Imre Deak from comment #0) > > https://bugs.freedesktop.org/show_bug.cgi?id=111108 > > Absolutely nothing to do with that. Arg, wrong URL I copied. The correct one: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14640/fi-bsw-n3050/igt@i915_selftest@live_execlists.html
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * BSW: igt@i915_selftest@live_execlists - dmesg-fail - igt_live_test_end+0x1f/0xf0 [i915] timed out - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6994/fi-bsw-kefka/igt@i915_selftest@live_execlists.html - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14640/fi-bsw-n3050/igt@i915_selftest@live_execlists.html
<0> [561.060235] i915_sel-4367 0.... 539658733us : __i915_subtests: Running intel_execlists_live_selftests/live_unlite_switch <0> [561.060594] i915_sel-4367 0.... 539662660us : __intel_context_do_pin: rcs0 context:efc0 pin ring:{head:0000, tail:0000} <0> [561.060953] i915_sel-4367 0.... 539664389us : __intel_context_do_pin: rcs0 context:efc1 pin ring:{head:0000, tail:0000} <0> [561.061312] i915_sel-4367 0.... 539665442us : __engine_unpark: rcs0 <0> [561.061671] i915_sel-4367 0.... 539665496us : __gt_unpark: <0> [561.062057] i915_sel-4367 0.... 539669438us : __i915_request_commit: rcs0 fence efc0:2 <0> [561.062444] i915_sel-4367 0d..1 539669467us : __i915_request_submit: rcs0 fence efc0:2, current 0 <0> [561.062810] i915_sel-4367 0d..1 539669474us : __execlists_submission_tasklet: rcs0: queue_priority_hint:-2147483648, submit:yes <0> [561.063177] i915_sel-4367 0d..1 539669479us : trace_ports: rcs0: submit { efc0:2, 0:0 } <0> [561.063544] jbd2/sda-215 1..s. 539669531us : process_csb: rcs0 cs-irq head=3, tail=4 <0> [561.063910] jbd2/sda-215 1..s. 539669535us : process_csb: rcs0 csb[4]: status=0x00000001:0x00000000 <0> [561.064279] jbd2/sda-215 1..s. 539669539us : trace_ports: rcs0: promote { efc0:2*, 0:0 } <0> [561.064646] jbd2/sda-215 1d.s1 539669547us : __execlists_submission_tasklet: rcs0: queue_priority_hint:-2147483648, submit:no <0> [561.065032] i915_sel-4367 0.... 539669654us : __i915_request_commit: rcs0 fence efc1:2 <0> [561.065419] i915_sel-4367 0d..1 539669692us : __i915_request_submit: rcs0 fence efc1:2, current 0 <0> [561.065787] i915_sel-4367 0d..1 539669697us : __execlists_submission_tasklet: rcs0: queue_priority_hint:-2147483648, submit:yes <0> [561.066155] i915_sel-4367 0d..1 539669701us : trace_ports: rcs0: submit { efc0:2*, efc1:2 } <0> [561.066542] i915_sel-4367 0.... 539669796us : __i915_request_commit: rcs0 fence efc0:4 <0> [561.066908] ksoftirq-16 1..s. 539669909us : process_csb: rcs0 cs-irq head=4, tail=5 <0> [561.067274] ksoftirq-16 1..s. 539669911us : process_csb: rcs0 csb[5]: status=0x00008002:0x00000001 <0> [561.067642] ksoftirq-16 1..s. 539669914us : trace_ports: rcs0: preempted { efc0:2*, 0:0 } <0> [561.068009] ksoftirq-16 1..s. 539669916us : trace_ports: rcs0: promote { efc0:2*, efc1:2 } <0> [561.068377] ksoftirq-16 1d.s1 539669948us : __execlists_submission_tasklet: rcs0: expired last=efc0:2, prio=3, hint=3 <0> [561.068764] ksoftirq-16 1d.s1 539669952us : __i915_request_unsubmit: rcs0 fence efc1:2, current 0 <0> [561.069151] ksoftirq-16 1d.s1 539670042us : __i915_request_unsubmit: rcs0 fence efc0:2, current 1 <0> [561.069537] ksoftirq-16 1d.s1 539670050us : __i915_request_submit: rcs0 fence efc0:2, current 1 <0> [561.069924] ksoftirq-16 1d.s1 539670056us : __i915_request_submit: rcs0 fence efc1:2, current 0 <0> [561.070290] ksoftirq-16 1d.s1 539670060us : __execlists_submission_tasklet: rcs0: queue_priority_hint:2, submit:yes <0> [561.070657] ksoftirq-16 1d.s1 539670062us : trace_ports: rcs0: submit { efc0:2*, efc1:2 } <0> [561.071043] i915_sel-4367 0.... 539670069us : i915_gem_wait_for_idle: flags=2 (locked), timeout=200 <0> [561.071408] ksoftirq-16 1..s. 539670171us : process_csb: rcs0 cs-irq head=5, tail=0 <0> [561.071775] ksoftirq-16 1..s. 539670172us : process_csb: rcs0 csb[0]: status=0x00008002:0x00000001 <0> [561.072142] ksoftirq-16 1..s. 539670174us : trace_ports: rcs0: preempted { efc0:2!, efc1:2 } <0> [561.072509] ksoftirq-16 1..s. 539670177us : trace_ports: rcs0: promote { efc0:2!, efc1:2 } <0> [561.072896] ksoftirq-16 1d.s1 539670184us : __i915_request_submit: rcs0 fence efc0:4, current 2 <0> [561.073263] ksoftirq-16 1d.s1 539670188us : __execlists_submission_tasklet: rcs0: queue_priority_hint:-2147483648, submit:yes <0> [561.073630] ksoftirq-16 1d.s1 539670190us : trace_ports: rcs0: submit { efc1:2, efc0:4 } <0> [561.073996] ksoftirq-16 1..s. 539670222us : process_csb: rcs0 cs-irq head=0, tail=1 <0> [561.074363] ksoftirq-16 1..s. 539670224us : process_csb: rcs0 csb[1]: status=0x00000814:0x00000001 <0> [561.074730] ksoftirq-16 1..s. 539670325us : trace_ports: rcs0: completed { efc0:2!, efc1:2 } <0> [561.075095] ksoftirq-16 1..s. 539670329us : process_csb: rcs0 cs-irq head=1, tail=4 <0> [561.075462] ksoftirq-16 1..s. 539670330us : process_csb: rcs0 csb[2]: status=0x00008002:0x00000001 <0> [561.075829] ksoftirq-16 1..s. 539670331us : trace_ports: rcs0: preempted { efc1:2!, 0:0 } <0> [561.076197] ksoftirq-16 1..s. 539670333us : trace_ports: rcs0: promote { efc1:2!, efc0:4 } <0> [561.076564] ksoftirq-16 1..s. 539670334us : process_csb: rcs0 csb[3]: status=0x00000814:0x00000001 <0> [561.076931] ksoftirq-16 1..s. 539670335us : trace_ports: rcs0: completed { efc1:2!, efc0:4 } <0> [561.077298] ksoftirq-16 1..s. 539670338us : process_csb: rcs0 csb[4]: status=0x00000818:0x00000001 <0> [561.077665] ksoftirq-16 1..s. 539670340us : trace_ports: rcs0: completed { efc0:4!, 0:0 } <0> [561.078032] ksoftirq-16 1d.s1 539670342us : __execlists_submission_tasklet: rcs0: queue_priority_hint:-2147483648, submit:no <0> [561.078418] i915_sel-4367 0.... 539670348us : i915_request_retire: rcs0 fence efc1:2, current 2 <0> [561.078776] i915_sel-4367 0.... 539670357us : intel_context_unpin: rcs0 context:efc1 retire <0> [561.079135] i915_sel-4367 0.... 539670418us : __intel_context_retire: rcs0 context:efc1 retire <0> [561.079522] i915_sel-4367 0.... 539671236us : i915_request_retire: rcs0 fence efc0:2, current 4 <0> [561.079909] i915_sel-4367 0.... 539671302us : i915_request_retire: rcs0 fence efc0:4, current 4 <0> [561.080295] i915_sel-4367 0d... 539671354us : __i915_request_commit: rcs0 fence efb0:8 <0> [561.080681] i915_sel-4367 0d..1 539671396us : __i915_request_submit: rcs0 fence efb0:8, current 6 <0> [561.081049] i915_sel-4367 0d..1 539671421us : __execlists_submission_tasklet: rcs0: queue_priority_hint:-2147483648, submit:yes <0> [561.081416] i915_sel-4367 0d..1 539671427us : trace_ports: rcs0: submit { efb0:8, 0:0 } <0> [561.081774] i915_sel-4367 0.... 539671438us : intel_context_unpin: rcs0 context:efc0 retire <0> [561.082132] i915_sel-4367 0.... 539671498us : __intel_context_retire: rcs0 context:efc0 retire <0> [561.082497] kworker/-155 1..s. 539671601us : process_csb: rcs0 cs-irq head=4, tail=5 <0> [561.082864] kworker/-155 1..s. 539671603us : process_csb: rcs0 csb[5]: status=0x00000001:0x00000000 <0> [561.083231] kworker/-155 1..s. 539671606us : trace_ports: rcs0: promote { efb0:8!, 0:0 } <0> [561.083597] kworker/-155 1d.s1 539671611us : __execlists_submission_tasklet: rcs0: queue_priority_hint:-2147483648, submit:no <0> [561.083963] kworker/-155 1..s. 539671613us : process_csb: rcs0 cs-irq head=5, tail=0 <0> [561.084330] kworker/-155 1..s. 539671614us : process_csb: rcs0 csb[0]: status=0x00000818:0x00000000 <0> [561.084697] kworker/-155 1..s. 539671616us : trace_ports: rcs0: completed { efb0:8!, 0:0 } <0> [561.085064] kworker/-155 1d.s1 539671620us : __execlists_submission_tasklet: rcs0: queue_priority_hint:-2147483648, submit:no <0> [561.085450] i915_sel-4367 0.N.. 539672266us : i915_request_retire: rcs0 fence efb0:8, current 8 <0> [561.085809] i915_sel-4367 0.... 539672832us : __engine_park: rcs0 <0> [561.086170] kworker/-2966 0.... 539672877us : __gt_park: <0> [561.086529] i915_sel-4367 0.... 539676058us : __intel_context_do_pin: bcs0 context:efc2 pin ring:{head:0000, tail:0000} <0> [561.086889] i915_sel-4367 0.... 539677466us : __intel_context_do_pin: bcs0 context:efc3 pin ring:{head:0000, tail:0000} <0> [561.087247] i915_sel-4367 0.... 539677497us : __engine_unpark: bcs0 <0> [561.087607] i915_sel-4367 0.... 539677511us : __gt_unpark: <0> [561.087993] i915_sel-4367 0.... 539679446us : __i915_request_commit: bcs0 fence efc2:2 <0> [561.088380] i915_sel-4367 0d..1 539679473us : __i915_request_submit: bcs0 fence efc2:2, current 0 <0> [561.088747] i915_sel-4367 0d..1 539679480us : __execlists_submission_tasklet: bcs0: queue_priority_hint:-2147483648, submit:yes <0> [561.089115] i915_sel-4367 0d..1 539679485us : trace_ports: bcs0: submit { efc2:2, 0:0 } <0> [561.089481] jbd2/sda-215 1.Ns1 539679542us : process_csb: bcs0 cs-irq head=3, tail=4 <0> [561.089848] jbd2/sda-215 1.Ns1 539679544us : process_csb: bcs0 csb[4]: status=0x00000001:0x00000000 <0> [561.090215] jbd2/sda-215 1.Ns1 539679548us : trace_ports: bcs0: promote { efc2:2*, 0:0 } <0> [561.090581] jbd2/sda-215 1dNs2 539679555us : __execlists_submission_tasklet: bcs0: queue_priority_hint:-2147483648, submit:no <0> [561.090967] i915_sel-4367 0.... 539679561us : __i915_request_commit: bcs0 fence efc3:2 <0> [561.091355] i915_sel-4367 0d..1 539679585us : __i915_request_submit: bcs0 fence efc3:2, current 0 <0> [561.091721] i915_sel-4367 0d..1 539679590us : __execlists_submission_tasklet: bcs0: queue_priority_hint:-2147483648, submit:yes <0> [561.092088] i915_sel-4367 0d..1 539679593us : trace_ports: bcs0: submit { efc2:2*, efc3:2 } <0> [561.092474] i915_sel-4367 0.... 539679674us : __i915_request_commit: bcs0 fence efc2:4 <0> [561.092839] ksoftirq-16 1..s. 539679717us : process_csb: bcs0 cs-irq head=4, tail=5 <0> [561.093205] ksoftirq-16 1..s. 539679718us : process_csb: bcs0 csb[5]: status=0x00008802:0x00000001 <0> [561.093573] ksoftirq-16 1..s. 539679721us : trace_ports: bcs0: preempted { efc2:2*, 0:0 } <0> [561.093940] ksoftirq-16 1..s. 539679724us : trace_ports: bcs0: promote { efc2:2*, efc3:2 } <0> [561.094311] ksoftirq-16 1d.s1 539679731us : __execlists_submission_tasklet: bcs0: expired last=efc2:2, prio=3, hint=3 <0> [561.094698] ksoftirq-16 1d.s1 539679734us : __i915_request_unsubmit: bcs0 fence efc3:2, current 0 <0> [561.095086] ksoftirq-16 1d.s1 539679739us : __i915_request_unsubmit: bcs0 fence efc2:2, current 1 <0> [561.095473] ksoftirq-16 1d.s1 539679746us : __i915_request_submit: bcs0 fence efc2:2, current 1 <0> [561.095860] ksoftirq-16 1d.s1 539679752us : __i915_request_submit: bcs0 fence efc3:2, current 0 <0> [561.096227] ksoftirq-16 1d.s1 539679756us : __execlists_submission_tasklet: bcs0: queue_priority_hint:2, submit:yes <0> [561.096595] ksoftirq-16 1d.s1 539679758us : trace_ports: bcs0: submit { efc2:2*, efc3:2 } <0> [561.096977] i915_sel-4367 0.... 539679795us : i915_gem_wait_for_idle: flags=2 (locked), timeout=200 <0> [561.097678] i915_sel-4367 0.... 539887173us : igt_flush_test: igt_live_test_end+0x1f/0xf0 [i915] timed out. That's the entire sequence. Looks suspiciously like we succeeded in killing the HW as intended... There is a redundant preemption we can fixup though.
In another example, <0> [629.616137] i915_sel-4301 0.... 618083258us : __engine_unpark: vcs0 <0> [629.616327] i915_sel-4301 0.... 618083271us : __gt_unpark: <0> [629.616530] i915_sel-4301 0.... 618084676us : __i915_request_commit: vcs0 fence e80d:2 <0> [629.616734] i915_sel-4301 0d..1 618084704us : __i915_request_submit: vcs0 fence e80d:2, current 0 <0> [629.616928] i915_sel-4301 0d..1 618084711us : __execlists_submission_tasklet: vcs0: queue_priority_hint:-2147483648, submit:yes <0> [629.617121] i915_sel-4301 0d..1 618084716us : trace_ports: vcs0: submit { e80d:2, 0:0 } <0> [629.617325] i915_sel-4301 0.... 618084786us : __i915_request_commit: vcs0 fence e80e:2 <0> [629.617518] dmesg-922 1..s. 618084787us : process_csb: vcs0 cs-irq head=3, tail=4 <0> [629.617711] dmesg-922 1..s. 618084792us : process_csb: vcs0 csb[4]: status=0x00000001:0x00000000 <0> [629.617905] dmesg-922 1..s. 618084798us : trace_ports: vcs0: promote { e80d:2*, 0:0 } <0> [629.618109] dmesg-922 1d.s1 618084814us : __i915_request_submit: vcs0 fence e80e:2, current 0 <0> [629.618302] dmesg-922 1d.s1 618084823us : __execlists_submission_tasklet: vcs0: queue_priority_hint:-2147483648, submit:yes <0> [629.618496] dmesg-922 1d.s1 618084829us : trace_ports: vcs0: submit { e80d:2*, e80e:2 } <0> [629.618700] i915_sel-4301 0.... 618084861us : __i915_request_commit: vcs0 fence e80d:4 <0> [629.618892] dmesg-922 1..s. 618084907us : process_csb: vcs0 cs-irq head=4, tail=5 <0> [629.619085] dmesg-922 1..s. 618084908us : process_csb: vcs0 csb[5]: status=0x00008002:0x00000001 <0> [629.619279] dmesg-922 1..s. 618084912us : trace_ports: vcs0: preempted { e80d:2*, 0:0 } <0> [629.619472] dmesg-922 1..s. 618084916us : trace_ports: vcs0: promote { e80d:2*, e80e:2 } <0> [629.619666] dmesg-922 1d.s1 618084926us : __execlists_submission_tasklet: vcs0: expired last=e80d:2, prio=3, hint=3 <0> [629.619869] dmesg-922 1d.s1 618084930us : __i915_request_unsubmit: vcs0 fence e80e:2, current 0 <0> [629.620073] dmesg-922 1d.s1 618084936us : __i915_request_unsubmit: vcs0 fence e80d:2, current 1 <0> [629.620277] dmesg-922 1d.s1 618084950us : __i915_request_submit: vcs0 fence e80d:2, current 1 <0> [629.620481] dmesg-922 1d.s1 618084955us : __i915_request_submit: vcs0 fence e80e:2, current 0 <0> [629.620674] dmesg-922 1d.s1 618084958us : __execlists_submission_tasklet: vcs0: queue_priority_ <7> [629.633154] __intel_gt_set_wedged vcs0 <7> [629.633159] __intel_gt_set_wedged Awake? 2 <7> [629.633165] __intel_gt_set_wedged Hangcheck: 0 ms ago <7> [629.633170] __intel_gt_set_wedged Reset count: 0 (global 0) <7> [629.633175] __intel_gt_set_wedged Requests: <7> [629.633183] __intel_gt_set_wedged MMIO base: 0x00012000 <7> [629.633201] __intel_gt_set_wedged RING_START: 0x00000000 <7> [629.633211] __intel_gt_set_wedged RING_HEAD: 0x00000000 <7> [629.633220] __intel_gt_set_wedged RING_TAIL: 0x00000000 <7> [629.633232] __intel_gt_set_wedged RING_CTL: 0x00000000 <7> [629.633245] __intel_gt_set_wedged RING_MODE: 0x00000200 [idle] <7> [629.633254] __intel_gt_set_wedged RING_IMR: fffffeff <7> [629.633270] __intel_gt_set_wedged ACTHD: 0x00000000_00000000 <7> [629.633286] __intel_gt_set_wedged BBADDR: 0x00000000_00000000 <7> [629.633303] __intel_gt_set_wedged DMA_FADDR: 0x00000000_00000000 <7> [629.633312] __intel_gt_set_wedged IPEIR: 0x00000000 <7> [629.633321] __intel_gt_set_wedged IPEHR: 0x00000000 <7> [629.633334] __intel_gt_set_wedged Execlist status: 0x00000301 00000000, entries 6 <7> [629.633339] __intel_gt_set_wedged Execlist CSB read 4, write 4, tasklet queued? no (enabled) <7> [629.633421] __intel_gt_set_wedged E e80d:2! prio=3 @ 11565ms: [i915] <7> [629.633469] __intel_gt_set_wedged E e80e:2!+ prio=3 @ 11565ms: signaled <7> [629.633477] __intel_gt_set_wedged E e80d:4! prio=2 @ 11565ms: [i915] <7> [629.633482] __intel_gt_set_wedged HWSP: <7> [629.633490] __intel_gt_set_wedged [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [629.633494] __intel_gt_set_wedged * <7> [629.633502] __intel_gt_set_wedged [0040] 00008002 00000001 00000814 00000001 00000818 00000001 00000001 00000000 <7> [629.633508] __intel_gt_set_wedged [0060] 00000818 00000001 00008002 00000001 00000000 00000000 00000000 00000004 <7> [629.633515] __intel_gt_set_wedged [0080] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [629.633520] __intel_gt_set_wedged * where it is much clearer that unlite-restore hit it's mark and tricked the HW into doing something naughty.
Hopefully it is the problem I was aiming for in the test... commit 2935ed5339c495066d901b97de384d86c55a3f6d Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Oct 4 14:40:08 2019 +0100 drm/i915: Remove logical HW ID With the introduction of ctx->engines[] we allow multiple logical contexts to be used on the same engine (e.g. with virtual engines). According to bspec, aach logical context requires a unique tag in order for context-switching to occur correctly between them. [Simple experiments show that it is not so easy to trick the HW into performing a lite-restore with matching logical IDs, though my memory from early Broadwell experiments do suggest that it should be generating lite-restores.] We only need to keep a unique tag for the active lifetime of the context, and for as long as we need to identify that context. The HW uses the tag to determine if it should use a lite-restore (why not the LRCA?) and passes the tag back for various status identifies. The only status we need to track is for OA, so when using perf, we assign the specific context a unique tag. v2: Calculate required number of tags to fill ELSP. Fixes: 976b55f0e1db ("drm/i915: Allow a context to define its set of engines ") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111895 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-14-chris@chris-wilson.co.uk
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.