Bug 112096

Summary: [CI][BAT]igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_flush failed with error -22
Product: DRI Reporter: Lakshmi <lakshminarayana.vudum>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED MOVED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: medium CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: BXT, CFL, CML, GLK, ICL, KBL, SKL i915 features: GEM/Other

Description Lakshmi 2019-10-22 11:16:00 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7143/fi-cfl-8700k/igt@i915_selftest@live_gt_heartbeat.html
(i915_selftest:4816) igt_kmod-WARNING: i915/intel_heartbeat_live_selftests: live_idle_flush failed with error -22
(i915_selftest:4816) igt_kmod-WARNING: [drm:intel_power_well_enable [i915]] enabling always-on
(i915_selftest:4816) igt_kmod-WARNING: [drm:intel_power_well_enable [i915]] enabling DC off
(i915_selftest:4816) igt_kmod-WARNING: [drm:gen9_set_dc_state [i915]] Setting DC state from 02 to 00
(i915_selftest:4816) igt_kmod-WARNING: [drm:intel_power_well_enable [i915]] enabling power well 2
(i915_selftest:4816) igt_kmod-WARNING: [drm:intel_power_well_enable [i915]] enabling DDI A/E IO power well
(i915_selftest:4816) igt_kmod-WARNING: [drm:intel_power_well_enable [i915]] enabling DDI B IO power well
(i915_selftest:4816) igt_kmod-WARNING: [drm:intel_power_well_enable [i915]] enabling DDI C IO power well
(i915_selftest:4816) igt_kmod-WARNING: [drm:intel_power_well_enable [i915]] enabling DDI D IO power well
(i915_selftest:4816) igt_kmod-WARNING: i915: probe of 0000:00:02.0 failed with error -22
(i915_selftest:4816) igt_kmod-CRITICAL: Test assertion failure function igt_kselftest_execute, file ../lib/igt_kmod.c:548:
(i915_selftest:4816) igt_kmod-CRITICAL: Failed assertion: err == 0
(i915_selftest:4816) igt_kmod-CRITICAL: kselftest "i915 igt__20__live_gt_heartbeat=1 live_selftests=-1 disable_display=1 st_filter=" failed: Invalid argument [22]
(i915_selftest:4816) igt_core-INFO: Stack trace:
(i915_selftest:4816) igt_core-INFO:   #0 ../lib/igt_core.c:1716 __igt_fail_assert()
(i915_selftest:4816) igt_core-INFO:   #1 [igt_kselftest_execute+0x2e5]
(i915_selftest:4816) igt_core-INFO:   #2 ../lib/igt_kmod.c:582 igt_kselftests()
(i915_selftest:4816) igt_core-INFO:   #3 ../tests/i915/i915_selftest.c:43 __real_main29()
(i915_selftest:4816) igt_core-INFO:   #4 ../tests/i915/i915_selftest.c:29 main()
(i915_selftest:4816) igt_core-INFO:   #5 ../csu/libc-start.c:344 __libc_start_main()
(i915_selftest:4816) igt_core-INFO:   #6 [_start+0x2a]
****  END  ****
Subtest live_gt_heartbeat: FAIL (0.452s)
Comment 1 CI Bug Log 2019-10-22 11:17:06 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* KBL CFL CML: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_flush failed with error -22
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14904/fi-cfl-8109u/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14904/fi-cfl-8700k/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14904/fi-cfl-guc/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14904/fi-cml-s/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14904/fi-cml-u/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14904/fi-cml-u2/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14904/fi-kbl-7500u/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14904/fi-kbl-8809g/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14904/fi-kbl-guc/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14904/fi-kbl-r/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14904/fi-kbl-soraka/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14904/fi-kbl-x1275/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7143/fi-cfl-8700k/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7144/fi-cml-u/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_3592/fi-cml-u2/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14908/fi-kbl-8809g/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_5203/fi-kbl-r/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14910/fi-cml-u2/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14910/fi-kbl-8809g/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14911/fi-cfl-8109u/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14912/fi-cml-s/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14912/fi-kbl-guc/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7146/fi-kbl-r/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14913/fi-cml-u2/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14913/fi-kbl-guc/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14913/fi-kbl-r/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14914/fi-kbl-r/igt@i915_selftest@live_gt_heartbeat.html
Comment 2 CI Bug Log 2019-10-23 06:59:36 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* CFL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_pulse failed with error -22
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7148/fi-cfl-8109u/igt@i915_selftest@live_gt_heartbeat.html
Comment 3 CI Bug Log 2019-10-23 07:00:54 UTC
A CI Bug Log filter associated to this bug has been updated:

{- KBL CFL CML: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_flush failed with error -22 -}
{+ SKL KBL CFL CML: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_flush failed with error -22 +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7151/fi-skl-iommu/igt@i915_selftest@live_gt_heartbeat.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7153/fi-skl-iommu/igt@i915_selftest@live_gt_heartbeat.html
Comment 4 CI Bug Log 2019-10-23 07:03:34 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SKL KBL CFL CML: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_flush failed with error -22 -}
{+ BXT SKL KBL CFL CML: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_flush failed with error -22 +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7153/fi-bxt-dsi/igt@i915_selftest@live_gt_heartbeat.html
Comment 5 Chris Wilson 2019-10-23 15:50:50 UTC
Tried commit f79520bb333792fb23a32352f83d8d59a525cec9
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Oct 22 12:21:11 2019 +0100

    drm/i915/selftests: Synchronize checking active status with retirement

to remove one possible cause of not noticing the callback being run. Hmm, easy to write off as a test bug, but we do have lots of mysterious timeouts all possible due to the callbacks going astray. Hmm, that sounds like a coincidence!
Comment 6 CI Bug Log 2019-10-23 16:32:04 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BXT SKL KBL CFL CML: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_flush failed with error -22 -}
{+ BXT SKL KBL CFL WHL CML: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_flush failed with error -22 +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7159/fi-whl-u/igt@i915_selftest@live_gt_heartbeat.html
Comment 7 CI Bug Log 2019-10-24 05:17:55 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BXT SKL KBL CFL WHL CML: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_flush failed with error -22 -}
{+ BXT SKL KBL CFL WHL CML ICL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_flush failed with error -22 +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7162/fi-icl-u2/igt@i915_selftest@live_gt_heartbeat.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7167/fi-icl-dsi/igt@i915_selftest@live_gt_heartbeat.html
Comment 8 CI Bug Log 2019-10-24 05:18:30 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BXT SKL KBL CFL WHL CML ICL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_flush failed with error -22 -}
{+ BXT SKL KBL CFL WHL CML ICL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_flush failed with error -22 +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7166/fi-glk-dsi/igt@i915_selftest@live_gt_heartbeat.html
Comment 9 CI Bug Log 2019-10-24 16:24:33 UTC
A CI Bug Log filter associated to this bug has been updated:

{- CFL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_pulse failed with error -22 -}
{+ CFL ICL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_pulse failed with error -22 +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7171/fi-icl-dsi/igt@i915_selftest@live_gt_heartbeat.html
Comment 10 CI Bug Log 2019-10-25 06:03:35 UTC
A CI Bug Log filter associated to this bug has been updated:

{- CFL ICL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_pulse failed with error -22 -}
{+ APL SKL KBL CFL ICL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_pulse failed with error -22 +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7167/shard-skl3/igt@i915_selftest@live_gt_heartbeat.html
Comment 11 CI Bug Log 2019-10-25 15:11:02 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* BYT: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_heartbeat_fast failed with error -22
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_5181/fi-byt-n2820/igt@i915_selftest@live_gt_heartbeat.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7186/fi-byt-n2820/igt@i915_selftest@live_gt_heartbeat.html
Comment 12 CI Bug Log 2019-10-29 09:48:54 UTC
A CI Bug Log filter associated to this bug has been updated:

{- APL SKL KBL CFL ICL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_pulse failed with error -22 -}
{+ APL BXT SKL KBL CFL ICL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_pulse failed with error -22 +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5249/fi-bxt-dsi/igt@i915_selftest@live_gt_heartbeat.html
Comment 13 CI Bug Log 2019-10-31 12:18:41 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BYT: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_heartbeat_fast failed with error -22 -}
{+ BYT SKL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_heartbeat_fast failed with error -22 +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7222/shard-skl5/igt@i915_selftest@live_gt_heartbeat.html
Comment 14 Francesco Balestrieri 2019-11-01 06:01:36 UTC
Overall reproduction rate so far is 76 / 179 runs (42.5%), setting severity to high.
Comment 15 Chris Wilson 2019-11-01 15:56:42 UTC
vcs0: heartbeat pulse did not flush idle tasks
pulse active pulse_active+0x0/0x10 [i915]:pulse_retire+0x0/0x10 [i915]
pulse 	count: 0
pulse 	preallocated barriers? no

So just a synchronisation problem. :|
Comment 16 Chris Wilson 2019-11-02 08:40:06 UTC
commit 38813767c7c5d9f8e0bd6b14136add861cc79b33 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Nov 1 18:10:22 2019 +0000

    drm/i915/selftests: Flush all active callbacks
    
    Flushing the outer i915_active is not enough, as we need the barrier to
    be applied across all the active dma_fence callbacks. So we must
    serialise with each outstanding fence.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112096
    References: f79520bb3337 ("drm/i915/selftests: Synchronize checking active status with retirement")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Acked-by: Andi Shyti <andi.shyti@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20191101181022.25633-1-chris@chris-wilson.co.uk
Comment 17 CI Bug Log 2019-11-21 07:49:17 UTC
A CI Bug Log filter associated to this bug has been updated:

{- APL BXT SKL KBL CFL ICL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_pulse failed with error -22 -}
{+ APL BXT SKL KBL CFL ICL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_pulse failed with error -22 +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7391/fi-cml-u2/igt@i915_selftest@live_gt_heartbeat.html
Comment 18 CI Bug Log 2019-11-21 07:50:35 UTC
A CI Bug Log filter associated to this bug has been updated:

{- APL BXT SKL KBL CFL ICL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_pulse failed with error -22 -}
{+ APL BXT SKL KBL CFL CML ICL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_pulse failed with error -22 +}


  No new failures caught with the new filter
Comment 19 Lakshmi 2019-11-21 07:54:48 UTC
(In reply to CI Bug Log from comment #18)
> A CI Bug Log filter associated to this bug has been updated:
> 
> {- APL BXT SKL KBL CFL ICL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail
> - live_idle_pulse failed with error -22 -}
> {+ APL BXT SKL KBL CFL CML ICL: igt@i915_selftest@live_gt_heartbeat -
> dmesg-fail - live_idle_pulse failed with error -22 +}

Still happening, here are some of the latest failures.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7374/fi-kbl-7560u/igt@i915_selftest@live_gt_heartbeat.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7337/fi-skl-6770hq/igt@i915_selftest@live_gt_heartbeat.html
Comment 20 Chris Wilson 2019-11-21 08:28:39 UTC
And the annoying thing is...

<3> [383.431019] vcs0: heartbeat pulse did not flush idle tasks
<3> [383.431123] *ERROR* pulse active pulse_active+0x0/0x10 [i915]:pulse_retire+0x0/0x10 [i915]
<3> [383.431128] *ERROR* pulse 	count: 0
<3> [383.431132] *ERROR* pulse 	preallocated barriers? no

it is nothing more than bad timing; a missing CPU barrier. But where? I'm running out of places to put them!
Comment 21 CI Bug Log 2019-11-26 08:06:06 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* KBL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_pulse failed with error -62
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7417/fi-kbl-guc/igt@i915_selftest@live_gt_heartbeat.html
Comment 22 Lakshmi 2019-11-26 08:07:25 UTC
(In reply to Chris Wilson from comment #20)
> And the annoying thing is...
> 
> <3> [383.431019] vcs0: heartbeat pulse did not flush idle tasks
> <3> [383.431123] *ERROR* pulse active pulse_active+0x0/0x10
> [i915]:pulse_retire+0x0/0x10 [i915]
> <3> [383.431128] *ERROR* pulse 	count: 0
> <3> [383.431132] *ERROR* pulse 	preallocated barriers? no
> 
> it is nothing more than bad timing; a missing CPU barrier. But where? I'm
> running out of places to put them!


> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7417/fi-kbl-guc/
> igt@i915_selftest@live_gt_heartbeat.html

Another failure
<3> [686.205010] i915/intel_heartbeat_live_selftests: live_idle_pulse failed with error -62
Comment 23 Chris Wilson 2019-11-26 08:07:52 UTC
(In reply to CI Bug Log from comment #21)
> The CI Bug Log issue associated to this bug has been updated.
> 
> ### New filters associated
> 
> * KBL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_pulse
> failed with error -62
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7417/fi-kbl-guc/
> igt@i915_selftest@live_gt_heartbeat.html

Note that is a different class of failure entirely.
Comment 24 CI Bug Log 2019-11-27 08:57:28 UTC
The CI Bug Log issue associated to this bug has been updated.

### Removed filters

* APL BXT SKL KBL CFL CML ICL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_pulse failed with error -22 (added on 6 days, 1 hour ago)
* KBL: igt@i915_selftest@live_gt_heartbeat - dmesg-fail - live_idle_pulse failed with error -62 (added on 1 day ago)
Comment 25 Chris Wilson 2019-11-27 16:59:41 UTC
*** Bug 112406 has been marked as a duplicate of this bug. ***
Comment 26 Martin Peres 2019-11-29 19:43:13 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/541.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.