Summary: | [CI][BAT] : igt@i915_selftest@live_hangcheck - incomplete - GEM_BUG_ON(!assert_pending_valid(execlists, "promote")) | ||
---|---|---|---|
Product: | DRI | Reporter: | Lakshmi <lakshminarayana.vudum> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | not set | ||
Priority: | not set | CC: | intel-gfx-bugs |
Version: | DRI git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | CFL | i915 features: | GEM/Other |
Description
Lakshmi
2019-10-18 20:10:12 UTC
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * CFL: igt@i915_selftest@live_hangcheck - incomplete - GEM_BUG_ON(!assert_pending_valid(execlists, "promote")) - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7130/fi-cfl-guc/igt@i915_selftest@live_hangcheck.html Still the same incomplete bug as before. We need the full trace. The CI Bug Log issue associated to this bug has been updated. ### New filters associated * CFL: igt@i915_selftest@live_hangcheck - incomplete - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7153/fi-cfl-8700k/igt@i915_selftest@live_hangcheck.html Found the missing tell-tale on bsw! So simple! Just a missing sync in the selftest before starting the manual reset. Going from the hit on bsw and assuming this the one and the same bug, commit 93100fdeb4de5b13a7f9113ede93cd062ba779f1 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Oct 24 00:24:43 2019 +0100 drm/i915/selftests: Flush interrupts before disabling tasklets When setting up the system to perform the atomic reset, we need to serialise with any ongoing interrupt tasklet or else: <0> [472.951428] i915_sel-4442 0d..1 466527056us : __i915_request_submit: rcs0 fence 11659:2, current 0 <0> [472.951554] i915_sel-4442 0d..1 466527059us : __execlists_submission_tasklet: rcs0: queue_priority_hint:-2147483648, submit:yes <0> [472.951681] i915_sel-4442 0d..1 466527061us : trace_ports: rcs0: submit { 11659:2, 0:0 } <0> [472.951805] i915_sel-4442 0.... 466527114us : __igt_atomic_reset_engine: i915_reset_engine(rcs0:active) under hardirq <0> [472.951932] i915_sel-4442 0d... 466527115us : intel_engine_reset: rcs0 flags=11d <0> [472.952056] i915_sel-4442 0d... 466527117us : execlists_reset_prepare: rcs0: depth<-1 <0> [472.952179] i915_sel-4442 0d... 466527119us : intel_engine_stop_cs: rcs0 <0> [472.952305] <idle>-0 1..s1 466527119us : process_csb: rcs0 cs-irq head=3, tail=4 <0> [472.952431] i915_sel-4442 0d... 466527122us : __intel_gt_reset: engine_mask=1 <0> [472.952557] <idle>-0 1..s1 466527124us : process_csb: rcs0 csb[4]: status=0x00000001:0x00000000 <0> [472.952683] <idle>-0 1..s1 466527130us : trace_ports: rcs0: promote { 11659:2*, 0:0 } <0> [472.952808] i915_sel-4442 0d... 466527131us : execlists_reset: rcs0 <0> [472.952933] i915_sel-4442 0d..1 466527133us : process_csb: rcs0 cs-irq head=3, tail=4 <0> [472.953059] i915_sel-4442 0d..1 466527134us : process_csb: rcs0 csb[4]: status=0x00000001:0x00000000 <0> [472.953185] i915_sel-4442 0d..1 466527136us : trace_ports: rcs0: preempted { 11659:2*, 0:0 } <0> [472.953310] i915_sel-4442 0d..1 466527150us : assert_pending_valid: Nothing pending for promotion! <0> [472.953436] i915_sel-4442 0d..1 466527158us : process_csb: process_csb:1930 GEM_BUG_ON(!assert_pending_valid(execlists, "promote")) We have the same CSB events being seen by process_csb() on two different processors. One being issued by the reset in the test, the other by the interrupt; this scenario is supposed to be prevented by flushing the interrupt tasklet with tasklet_disable() before we enter the atomic reset. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112069 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191023232443.17450-1-chris@chris-wilson.co.uk A CI Bug Log filter associated to this bug has been updated: {- CFL: igt@i915_selftest@live_hangcheck - incomplete - GEM_BUG_ON(!assert_pending_valid(execlists, "promote")) -} {+ CFL: igt@i915_selftest@live_hangcheck - incomplete - GEM_BUG_ON(!assert_pending_valid(execlists, "promote")) +} No new failures caught with the new filter A CI Bug Log filter associated to this bug has been updated: {- CFL: igt@i915_selftest@live_hangcheck - incomplete - GEM_BUG_ON(!assert_pending_valid(execlists, "promote")) -} {+ CFL TGL: igt@i915_selftest@live_hangcheck - incomplete - GEM_BUG_ON(!assert_pending_valid(execlists, "promote")) +} No new failures caught with the new filter https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7129/fi-tgl-u2/igt@i915_selftest@live_execlists.html Adding the TGL failure for reference. Reproduction rate of this issue is once in 8 runs. Last seen CI_DRM_7153 (2 weeks, 1 day old) and current run is 7277. Closing and archiving this issue. The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.