Bug 102250 - [CI] igt@pm_rps@* - fail - Test assertion failure function waitboost - Failed assertion: *_freqs[CUR] [==|<] *_freqs[*]
Summary: [CI] igt@pm_rps@* - fail - Test assertion failure function waitboost - Failed...
Status: REOPENED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
: 102248 105186 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-08-16 10:55 UTC by Martin Peres
Modified: 2019-05-15 05:46 UTC (History)
2 users (show)

See Also:
i915 platform: BSW/CHT, BXT, BYT, CFL, CNL, GLK, HSW, KBL, SKL
i915 features: power/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2017-08-16 10:55:35 UTC
The test igt@pm_rps@reset hits the following assertion on our haswell system:

(pm_rps:2652) CRITICAL: Test assertion failure function waitboost, file pm_rps.c:614:
(pm_rps:2652) CRITICAL: Failed assertion: boost_freqs[CUR] == boost_freqs[MAX]
(pm_rps:2652) CRITICAL: error: 1300 != 1250

Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw2/igt@pm_rps@reset.html
Comment 1 Chris Wilson 2017-08-16 11:02:33 UTC
The test assumes that the hw cannot overrule our requests, and that we sample during the boost (which itself is deferred to a task). It's not even a sensible test, it says that we must always boost a wait -- but we may wish to change that policy.
Comment 2 Chris Wilson 2017-08-16 11:25:54 UTC
*** Bug 102248 has been marked as a duplicate of this bug. ***
Comment 3 Martin Peres 2017-08-23 08:03:42 UTC
Also seen on Sandybridge.

So far, the failure rate on both HSW and SNB is close to 100% for all tests (SNB results will appear in an hour): https://intel-gfx-ci.01.org/cibuglog/index.html%3Faction_failures_history=204.html
Comment 5 Jani Saarinen 2017-09-01 08:01:17 UTC
Latest 2 shards show green:
https://intel-gfx-ci.01.org/tree/drm-tip/shards-all.html
Comment 6 Martin Peres 2017-09-06 09:36:15 UTC
Been good for > 20 runs. Closing.
Comment 7 Marta Löfstedt 2017-10-17 07:27:51 UTC
Now on APL- and KBL-shards:

(pm_rps:1459) CRITICAL: Test assertion failure function waitboost, file pm_rps.c:644:
(pm_rps:1459) CRITICAL: Failed assertion: boost_freqs[CUR] == boost_freqs[BOOST]
(pm_rps:1459) CRITICAL: error: 250 != 750
Subtest waitboost failed.

(pm_rps:1458) CRITICAL: Test assertion failure function waitboost, file pm_rps.c:645:
(pm_rps:1458) CRITICAL: Failed assertion: post_freqs[CUR] < post_freqs[MAX]
(pm_rps:1458) CRITICAL: error: 950 >= 950
Subtest waitboost failed.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3245/shard-apl1/igt@pm_rps@waitboost.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3243/shard-kbl5/igt@pm_rps@waitboost.html
Comment 8 krisman 2017-10-19 02:54:16 UTC
I can reproduce a funny one:

root@cubi2:~/work/igt-gpu-tools/build# tests/gem_sync --r basic-store-all
IGT-Version: 1.19-gf4de2cdf7fd5 (x86_64) (Linux: 4.14.0-rc5.intel-boxes+ x86_64)
Using Execlists submission
Completed 11264 cycles: 487.101 us
Subtest basic-store-all: SUCCESS (5.534s)
pm_rps@resetroot@cubi2:~/work/igt-gpu-tools/build# tests/pm_rps --r reset
IGT-Version: 1.19-gf4de2cdf7fd5 (x86_64) (Linux: 4.14.0-rc5.intel-boxes+ x86_64)
(pm_rps:1415) CRITICAL: Test assertion failure function waitboost, file ../../tests/pm_rps.c:615:
(pm_rps:1415) CRITICAL: Failed assertion: post_freqs[CUR] < post_freqs[MAX]
(pm_rps:1415) CRITICAL: error: 1000 >= 1000

On APL.
Comment 9 krisman 2017-11-15 21:10:24 UTC
(In reply to krisman from comment #8)
> I can reproduce a funny one:
> 
> root@cubi2:~/work/igt-gpu-tools/build# tests/gem_sync --r basic-store-all
> IGT-Version: 1.19-gf4de2cdf7fd5 (x86_64) (Linux: 4.14.0-rc5.intel-boxes+
> x86_64)
> Using Execlists submission
> Completed 11264 cycles: 487.101 us
> Subtest basic-store-all: SUCCESS (5.534s)
> pm_rps@resetroot@cubi2:~/work/igt-gpu-tools/build# tests/pm_rps --r reset
> IGT-Version: 1.19-gf4de2cdf7fd5 (x86_64) (Linux: 4.14.0-rc5.intel-boxes+
> x86_64)
> (pm_rps:1415) CRITICAL: Test assertion failure function waitboost, file
> ../../tests/pm_rps.c:615:
> (pm_rps:1415) CRITICAL: Failed assertion: post_freqs[CUR] < post_freqs[MAX]
> (pm_rps:1415) CRITICAL: error: 1000 >= 1000
> 
> On APL.

Which is no longer reproducible after the patch.

This has also not seen on shards in the past ~90 runs.  Closing.
Comment 10 Marta Löfstedt 2017-12-01 09:25:56 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4029/shard-apl1/igt@pm_rps@min-max-config-idle.html

(pm_rps:23290) CRITICAL: Test assertion failure function idle_check, file pm_rps.c:528:
(pm_rps:23290) CRITICAL: Failed assertion: freqs[CUR] == freqs[RPn]
(pm_rps:23290) CRITICAL: Last errno: 22, Invalid argument
(pm_rps:23290) CRITICAL: error: 433 != 100
Subtest min-max-config-idle failed.

Since it is the same assert I reopen this bug although it is a new subtest.
Comment 11 Chris Wilson 2017-12-01 10:22:40 UTC
(In reply to Marta Löfstedt from comment #10)
> https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4029/shard-apl1/igt@pm_rps@min-
> max-config-idle.html
> 
> (pm_rps:23290) CRITICAL: Test assertion failure function idle_check, file
> pm_rps.c:528:
> (pm_rps:23290) CRITICAL: Failed assertion: freqs[CUR] == freqs[RPn]
> (pm_rps:23290) CRITICAL: Last errno: 22, Invalid argument
> (pm_rps:23290) CRITICAL: error: 433 != 100
> Subtest min-max-config-idle failed.
> 
> Since it is the same assert I reopen this bug although it is a new subtest.

This test is unlike the reset scenario, it's a very basic failure. Can we move this to a new report so I don't get confused?
Comment 12 Marta Löfstedt 2017-12-18 11:12:22 UTC
The test filed on this bug no longer fail as described in this bug. I will archive and close this.
Comment 13 Marta Löfstedt 2018-01-16 07:45:10 UTC
Restored, but so far only the on the igt@pm_rps@waitboost subtest.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3634/shard-apl5/igt@pm_rps@waitboost.html

(pm_rps:4799) CRITICAL: Test assertion failure function waitboost, file pm_rps.c:548:
(pm_rps:4799) CRITICAL: Failed assertion: boost_freqs[CUR] == boost_freqs[BOOST]
(pm_rps:4799) CRITICAL: error: 250 != 750
Subtest waitboost failed.
Comment 14 Marta Löfstedt 2018-01-18 07:14:32 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3642/shard-glkb3/igt@pm_rps@waitboost.html

(pm_rps:8957) CRITICAL: Test assertion failure function waitboost, file pm_rps.c:549:
(pm_rps:8957) CRITICAL: Failed assertion: post_freqs[CUR] < post_freqs[MAX]
(pm_rps:8957) CRITICAL: error: 750 >= 750
Subtest waitboost failed.
Comment 15 Marta Löfstedt 2018-02-21 07:44:58 UTC
*** Bug 105186 has been marked as a duplicate of this bug. ***
Comment 16 Marta Löfstedt 2018-03-23 07:38:24 UTC
IGT_4234: 2018-02-10 / 338 runs ago
Comment 17 Marta Löfstedt 2018-03-28 06:08:14 UTC
Reopend due to:
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4385/shard-hsw2/igt@pm_rps@reset.html

(pm_rps:21511) CRITICAL: Test assertion failure function waitboost, file ../tests/pm_rps.c:568:
(pm_rps:21511) CRITICAL: Failed assertion: boost_freqs[CUR] == boost_freqs[BOOST]
(pm_rps:21511) CRITICAL: error: 750 != 1300
Subtest reset failed.
Comment 18 Marta Löfstedt 2018-04-06 08:59:14 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_12/fi-skl-6600u/igt@pm_rps@waitboost.html

(pm_rps:1247) CRITICAL: Test assertion failure function waitboost, file ../tests/pm_rps.c:568:
(pm_rps:1247) CRITICAL: Failed assertion: boost_freqs[CUR] == boost_freqs[BOOST]
(pm_rps:1247) CRITICAL: error: 300 != 1050
Subtest waitboost failed.
Comment 19 Marta Löfstedt 2018-04-13 05:53:55 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4050/shard-apl5/igt@pm_rps@reset.html

(pm_rps:1294) CRITICAL: Test assertion failure function waitboost, file ../tests/pm_rps.c:568:
(pm_rps:1294) CRITICAL: Failed assertion: boost_freqs[CUR] == boost_freqs[BOOST]
(pm_rps:1294) CRITICAL: error: 250 != 750
Subtest reset failed.
Comment 20 Martin Peres 2018-04-20 13:52:39 UTC
Also visible on fi-skl-6770hq and fi-cnl-y3:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_23/fi-skl-6770hq/igt@pm_rps@min-max-config-loaded.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_23/fi-cnl-y3/igt@pm_rps@min-max-config-loaded.html

(pm_rps:1485) CRITICAL: Test assertion failure function loaded_check, file ../tests/pm_rps.c:464:
(pm_rps:1485) CRITICAL: Failed assertion: freqs[MAX] <= freqs[CUR]
(pm_rps:1485) CRITICAL: Last errno: 22, Invalid argument
(pm_rps:1485) CRITICAL: error: 450 > 417
Subtest min-max-config-loaded failed.
Comment 21 Martin Peres 2018-08-28 15:19:03 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-icl-u/igt@pm_rps@waitboost.html

(pm_rps:1402) CRITICAL: Test assertion failure function waitboost, file ../tests/pm_rps.c:568:
(pm_rps:1402) CRITICAL: Failed assertion: post_freqs[CUR] < post_freqs[MAX]
(pm_rps:1402) CRITICAL: error: 600 >= 600
Comment 22 Martin Peres 2018-08-28 15:38:08 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-icl-u/igt@pm_rps@reset.html

(pm_rps:1791) CRITICAL: Test assertion failure function waitboost, file ../tests/pm_rps.c:568:
(pm_rps:1791) CRITICAL: Failed assertion: post_freqs[CUR] < post_freqs[MAX]
(pm_rps:1791) CRITICAL: error: 600 >= 600
Comment 23 Martin Peres 2018-08-31 10:28:30 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_96/fi-icl-u/igt@pm_rps@min-max-config-loaded.html

(pm_rps:2952) CRITICAL: Test assertion failure function loaded_check, file ../tests/pm_rps.c:463:
(pm_rps:2952) CRITICAL: Failed assertion: freqs[MAX] <= freqs[CUR]
(pm_rps:2952) CRITICAL: error: 600 > 300
Comment 24 Martin Peres 2018-09-04 07:29:36 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_101/fi-kbl-soraka/igt@pm_rps@waitboost.html

(pm_rps:1331) CRITICAL: Test assertion failure function waitboost, file ../tests/pm_rps.c:568:
(pm_rps:1331) CRITICAL: Failed assertion: post_freqs[CUR] < post_freqs[MAX]
(pm_rps:1331) CRITICAL: error: 900 >= 900
Subtest waitboost failed.
Comment 25 Chris Wilson 2018-09-04 07:35:42 UTC
A "light" load:

(pm_rps:1331) DEBUG: Apply low load again...
(pm_rps:1331) DEBUG: gt freq (MHz):  cur=900  min=300  max=900  RP0=900  RP1=300  RPn=300  boost=900
(pm_rps:1331) DEBUG: gt freq (MHz):  cur=900  min=300  max=900  RP0=900  RP1=300  RPn=300  boost=900
(pm_rps:1331) igt_debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0'
(pm_rps:1331) igt_debugfs-DEBUG: i915_rps_boost_info:
RPS enabled? 1
GPU busy? yes [1 requests]
CPU waiting? 0
Boosts outstanding? 0
Interactive? 0
Frequency requested 900
  min hard:300, soft:300; max soft:900, hard:900
  idle:300, efficient:300, boost:900
pm_rps [1331]: 1 boosts
pm_rps [1331]: 0 boosts
Kernel (anonymous) boosts: 0

RPS Autotuning (current "high power" window):
  Avg. up: 99% [above threshold? 85%]
  Avg. down: 99% [below threshold? 60%]

where the HW thinks it is 100% busy over the course of an EI. That's a test bug.
Comment 26 Chris Wilson 2018-09-04 14:02:13 UTC
Gone at last?

commit 93a1b39fcbbe18199165ce6000e26a0c1b082fb3
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Sep 4 13:43:12 2018 +0100

    igt/pm_rps: Clear previous high load on high->low transition
    
    Make sure we do flush out the previous spinner and delay signaling
    transition completion until we do.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=102250
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Katarzyna Dec <katarzyna.dec@intel.com>
    Reviewed-by: Katarzyna Dec <katarzyna.dec@intel.com>
Comment 27 Martin Peres 2018-09-07 17:42:26 UTC
(In reply to Chris Wilson from comment #26)
> Gone at last?
> 
> commit 93a1b39fcbbe18199165ce6000e26a0c1b082fb3
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Sep 4 13:43:12 2018 +0100
> 
>     igt/pm_rps: Clear previous high load on high->low transition
>     
>     Make sure we do flush out the previous spinner and delay signaling
>     transition completion until we do.
>     
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=102250
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Katarzyna Dec <katarzyna.dec@intel.com>
>     Reviewed-by: Katarzyna Dec <katarzyna.dec@intel.com>

Nope, and this is mostly seen on ICL:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_105/fi-icl-u/igt@pm_rps@waitboost.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_105/fi-icl-u/igt@pm_rps@reset.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_105/fi-icl-u/igt@pm_rps@min-max-config-loaded.html
Comment 28 Chris Wilson 2018-09-07 17:48:03 UTC
But that just looks like rps on icl doesn't work as intended yet... A different kettle of fish to the test not quite working. :)
Comment 29 Martin Peres 2018-09-10 10:03:14 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_106/fi-kbl-soraka/igt@pm_rps@reset.html

(pm_rps:1357) CRITICAL: Test assertion failure function waitboost, file ../tests/pm_rps.c:608:
(pm_rps:1357) CRITICAL: Failed assertion: post_freqs[CUR] < post_freqs[MAX]
(pm_rps:1357) CRITICAL: error: 900 >= 900
Comment 30 Lakshmi 2018-09-25 13:28:28 UTC
For ICL, separate bug is created.
https://bugs.freedesktop.org/show_bug.cgi?id=108059
Comment 31 Martin Peres 2018-11-19 11:37:21 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5153/shard-skl9/igt@pm_rps@waitboost.html

Starting subtest: waitboost
(pm_rps:1010) CRITICAL: Test assertion failure function waitboost, file ../tests/pm_rps.c:608:
(pm_rps:1010) CRITICAL: Failed assertion: post_freqs[CUR] < post_freqs[MAX]
(pm_rps:1010) CRITICAL: error: 850 >= 850
Comment 32 CI Bug Log 2018-12-31 14:40:51 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SKL APL GLK CNL ICL: igt@pm_rps@min-max-config-loaded - Failed assertion: freqs[MAX] &lt;= freqs[CUR] -}
{+ SKL APL KBL GLK CNL ICL: igt@pm_rps@min-max-config-loaded - Failed assertion: freqs[MAX] &lt;= freqs[CUR] +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_171/fi-kbl-7560u/igt@pm_rps@min-max-config-loaded.html
Comment 34 CI Bug Log 2019-01-29 16:10:45 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* fi-skl-iommu: igt@pm_rps@min-max-config-idle - fail - Failed assertion: freqs[CUR] == freqs[RPn]
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_181/fi-skl-iommu/igt@pm_rps@min-max-config-idle.html
Comment 35 CI Bug Log 2019-03-13 09:34:22 UTC
A CI Bug Log filter associated to this bug has been updated:

{- KBL ICL: igt@pm_rps@(reset|waitboost) - fail - Failed assertion: post_freqs[CUR] &lt; post_freqs[MAX] -}
{+ KBL CFL ICL: igt@pm_rps@(reset|waitboost) - fail - Failed assertion: post_freqs[CUR] &lt; post_freqs[MAX] +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_241/fi-whl-u/igt@i915_pm_rps@reset.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_242/fi-whl-u/igt@i915_pm_rps@reset.html
Comment 36 Lakshmi 2019-03-13 09:35:05 UTC
Also seen on WHL.
Comment 37 CI Bug Log 2019-03-13 10:53:27 UTC
A CI Bug Log filter associated to this bug has been updated:

{- KBL CFL ICL: igt@pm_rps@(reset|waitboost) - fail - Failed assertion: post_freqs[CUR] &lt; post_freqs[MAX] -}
{+ SKL KBL CFL ICL: igt@pm_rps@(reset|waitboost) - fail - Failed assertion: post_freqs[CUR] &lt; post_freqs[MAX] +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_241/fi-skl-6260u/igt@i915_pm_rps@reset.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_241/fi-skl-lmem/igt@i915_pm_rps@reset.html
Comment 38 Chris Wilson 2019-03-13 10:56:32 UTC
(In reply to CI Bug Log from comment #37)
> A CI Bug Log filter associated to this bug has been updated:
> 
> {- KBL CFL ICL: igt@pm_rps@(reset|waitboost) - fail - Failed assertion:
> post_freqs[CUR] &lt; post_freqs[MAX] -}
> {+ SKL KBL CFL ICL: igt@pm_rps@(reset|waitboost) - fail - Failed assertion:
> post_freqs[CUR] &lt; post_freqs[MAX] +}

This bug should not be capturing ICL as that is a very different type of general failure with RPS. Rather than this which is more of an issue with residual frequencies after reset.
Comment 39 CI Bug Log 2019-03-13 11:27:14 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SKL KBL CFL ICL: igt@pm_rps@(reset|waitboost) - fail - Failed assertion: post_freqs[CUR] &lt; post_freqs[MAX] -}
{+ SKL KBL CFL: igt@pm_rps@(reset|waitboost) - fail - Failed assertion: post_freqs[CUR] &lt; post_freqs[MAX] +}

 No new failures caught with the new filter
Comment 40 CI Bug Log 2019-03-13 11:29:24 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BYT BSW SKL APL KBL GLK CNL ICL: igt@pm_rps@min-max-config-loaded - Failed assertion: freqs[MAX] &lt;= freqs[CUR] -}
{+ BYT BSW SKL APL KBL GLK CNL: igt@pm_rps@min-max-config-loaded - Failed assertion: freqs[MAX] &lt;= freqs[CUR] +}

 No new failures caught with the new filter
Comment 41 Lakshmi 2019-03-13 11:37:27 UTC
(In reply to Chris Wilson from comment #38)
> (In reply to CI Bug Log from comment #37)
> > A CI Bug Log filter associated to this bug has been updated:
> > 
> > {- KBL CFL ICL: igt@pm_rps@(reset|waitboost) - fail - Failed assertion:
> > post_freqs[CUR] &lt; post_freqs[MAX] -}
> > {+ SKL KBL CFL ICL: igt@pm_rps@(reset|waitboost) - fail - Failed assertion:
> > post_freqs[CUR] &lt; post_freqs[MAX] +}
> 
> This bug should not be capturing ICL as that is a very different type of
> general failure with RPS. Rather than this which is more of an issue with
> residual frequencies after reset.

ICL failures are tracked as part of Bug 108059. Updated filters accordingly.
Comment 42 CI Bug Log 2019-03-21 09:24:53 UTC
A CI Bug Log filter associated to this bug has been updated:

{- GLK APL KBL: pm_rps@waitboost - Failed assertion: post_freqs[CUR] &lt; post_freqs[MAX] -}
{+ HSW APL KBL GLK: pm_rps@waitboost - Failed assertion: post_freqs[CUR] &lt; post_freqs[MAX] +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5784/shard-hsw8/igt@i915_pm_rps@waitboost.html
Comment 43 CI Bug Log 2019-05-15 05:46:30 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* BSW: igt@i915_pm_rps@min-max-config-idle - fail - Failed assertion: freqs[CUR] == freqs[RPn]
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_284/fi-bsw-kefka/igt@i915_pm_rps@min-max-config-idle.html


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.