As per Chris' comment in https://bugs.freedesktop.org/show_bug.cgi?id=106560 specifying that any new failure in igt@drv_selftest@live_hangcheck should be treated as a separate bug, I am opening a new bug: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4811/fi-cfl-s3/igt@drv_selftest@live_hangcheck.html [ 503.668876] Setting dangerous option live_selftests - tainting kernel [ 526.201555] i915_reset_engine(vcs0:others-active): failed to idle after reset [ 526.206678] i915/intel_hangcheck_live_selftests: igt_reset_engines failed with error -5 [ 526.314861] i915: probe of 0000:00:02.0 failed with error -5
What happened here is the tasklet never runs. It's scheduled, enabled and ready... <3>[ 526.201555] i915_reset_engine(vcs0:others-active): failed to idle after reset <6>[ 526.201561] i915 0000:00:02.0: [drm] vcs0 <6>[ 526.201563] i915 0000:00:02.0: [drm] current seqno 15874, last 15874, hangcheck 0 [10481 ms] <6>[ 526.201564] i915 0000:00:02.0: [drm] Reset count: 16464 (global 1) <6>[ 526.201566] i915 0000:00:02.0: [drm] Requests: <6>[ 526.201574] i915 0000:00:02.0: [drm] first 15874! [71d7:6fa] prio=0 @ 52ms: (null) <6>[ 526.201576] i915 0000:00:02.0: [drm] last 15874! [71d7:6fa] prio=0 @ 52ms: (null) <6>[ 526.201612] i915 0000:00:02.0: [drm] RING_START: 0x002f0000 <6>[ 526.201623] i915 0000:00:02.0: [drm] RING_HEAD: 0x00000680 <6>[ 526.201625] i915 0000:00:02.0: [drm] RING_TAIL: 0x00000680 <6>[ 526.201627] i915 0000:00:02.0: [drm] RING_CTL: 0x00000000 <6>[ 526.201629] i915 0000:00:02.0: [drm] RING_MODE: 0x00000200 [idle] <6>[ 526.201631] i915 0000:00:02.0: [drm] RING_IMR: fffffeff <6>[ 526.201633] i915 0000:00:02.0: [drm] ACTHD: 0x00000000_00000680 <6>[ 526.201636] i915 0000:00:02.0: [drm] BBADDR: 0x00000000_00000000 <6>[ 526.201638] i915 0000:00:02.0: [drm] DMA_FADDR: 0x00000000_00000000 <6>[ 526.201640] i915 0000:00:02.0: [drm] IPEIR: 0x00000000 <6>[ 526.201641] i915 0000:00:02.0: [drm] IPEHR: 0x00000000 <6>[ 526.201644] i915 0000:00:02.0: [drm] Execlist status: 0x00000301 00000000 <6>[ 526.201646] i915 0000:00:02.0: [drm] Execlist CSB read 5, write 1 [mmio:1], tasklet queued? yes (enabled) <6>[ 526.201648] i915 0000:00:02.0: [drm] Execlist CSB[0]: 0x00000001 [mmio:0x00000001], context: 0 [mmio:0] <6>[ 526.201650] i915 0000:00:02.0: [drm] Execlist CSB[1]: 0x00000018 [mmio:0x00000018], context: 2 [mmio:2] <6>[ 526.201653] i915 0000:00:02.0: [drm] ELSP[0] count=1, ring->start=002f0000, rq: 15874! [71d7:6fa] prio=0 @ 52ms: (null) <6>[ 526.201654] i915 0000:00:02.0: [drm] ELSP[1] idle <6>[ 526.201655] i915 0000:00:02.0: [drm] HW active? 0x1 <6>[ 526.201682] i915 0000:00:02.0: [drm] E 15874! [71d7:6fa] prio=0 @ 52ms: (null) <6>[ 526.201700] i915 0000:00:02.0: [drm] Queue priority: -2147483648 <6>[ 526.201702] i915 0000:00:02.0: [drm] IRQ? 0x0 (breadcrumbs? no) <6>[ 526.201703] i915 0000:00:02.0: [drm] HWSP: <6>[ 526.201706] i915 0000:00:02.0: [drm] [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <6>[ 526.201707] i915 0000:00:02.0: [drm] * <6>[ 526.201709] i915 0000:00:02.0: [drm] [0040] 00000001 00000000 00000018 00000002 00000001 00000000 00000001 00000000 <6>[ 526.201711] i915 0000:00:02.0: [drm] [0060] 00000018 00000008 00000001 00000000 00000000 00000000 00000000 00000001 <6>[ 526.201713] i915 0000:00:02.0: [drm] [0080] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <6>[ 526.201714] i915 0000:00:02.0: [drm] * <6>[ 526.201717] i915 0000:00:02.0: [drm] [00c0] 00015874 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <6>[ 526.201718] i915 0000:00:02.0: [drm] [00e0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <6>[ 526.201720] i915 0000:00:02.0: [drm] * <6>[ 526.201721] i915 0000:00:02.0: [drm] Idle? no Loss for words. To cap it all we even try and kick the tasklet when checking idle. To skip that kick means the tasklet was locked?
commit 22495b68f954c9018fd52bc620292cc80df2e418 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Sep 14 09:00:16 2018 +0100 drm/i915: Flush the tasklet when checking for idle In order to reduce latency when checking for idle we kick the tasklet directly. Sometimes this is not enough as it is queued on another cpu and so to improve the accuracy of this idle-check (and so to reduce latency overall by avoiding another pass, or worse declaring a timeout!) wait for the tasklet to complete. References: https://bugs.freedesktop.org/show_bug.cgi?id=107916 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180914080017.30308-2-chris@chris-wilson.co.uk Hopefully.
This issue occurred only once with CI_DRM_4811 (1 month / 415 runs ago).
(In reply to Lakshmi from comment #3) > This issue occurred only once with CI_DRM_4811 (1 month / 415 runs ago). Closing and archiving. Thanks!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.