Bug 105010 - [CI] igt@perf_pmu@rc6-runtime-pm*- fail/dmesg-fail - Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(slept) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(slept) (perf_pmu:1745) CRITICAL: 'idle - prev' != 'slept'
Summary: [CI] igt@perf_pmu@rc6-runtime-pm*- fail/dmesg-fail - Failed assertion: (doubl...
Status: ASSIGNED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Francesco Balestrieri
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-08 06:48 UTC by Marta Löfstedt
Modified: 2019-06-03 14:58 UTC (History)
1 user (show)

See Also:
i915 platform: CFL, CNL, GLK
i915 features: Perf/PMU


Attachments
dmesg and kern-logs (66.52 KB, application/gzip)
2018-04-05 20:37 UTC, Ricardo Perez
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marta Löfstedt 2018-02-08 06:48:42 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3739/shard-glkb3/igt@perf_pmu@rc6-runtime-pm.html

(perf_pmu:1745) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1295:
(perf_pmu:1745) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(slept) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(slept)
(perf_pmu:1745) CRITICAL: 'idle - prev' != 'slept' (2292681666.000000 not within 5.000000% tolerance of 2000298407.000000)
Subtest rc6-runtime-pm failed.
Comment 1 Marta Löfstedt 2018-02-08 09:10:54 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4231/shard-glkb3/igt@perf_pmu@rc6-runtime-pm-long.html

(perf_pmu:10969) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1295:
(perf_pmu:10969) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(slept) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(slept)
(perf_pmu:10969) CRITICAL: 'idle - prev' != 'slept' (235000000.000000 not within 5.000000% tolerance of 2000297046.000000)
Subtest rc6-runtime-pm-long failed.
Comment 3 Marta Löfstedt 2018-02-12 10:10:47 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4245/shard-hsw5/igt@perf_pmu@rc6-runtime-pm.html

(perf_pmu:7743) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1295:
(perf_pmu:7743) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(slept) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(slept)
(perf_pmu:7743) CRITICAL: 'idle - prev' != 'slept' (1876740160.000000 not within 5.000000% tolerance of 2000093247.000000)
Subtest rc6-runtime-pm failed.
Comment 4 Marta Löfstedt 2018-03-08 08:11:59 UTC
Note this was green for a while, but now only on GLK from:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3881/shard-glkb1/igt@perf_pmu@rc6-runtime-pm.html

(perf_pmu:2880) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1377:
(perf_pmu:2880) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(ts[1] - ts[0])
(perf_pmu:2880) CRITICAL: 'idle - prev' != 'ts[1] - ts[0]' (3304153333.000000 not within +5.000000%/-5.000000% tolerance of 2000332850.000000)
Subtest rc6-runtime-pm failed.
Comment 5 Marta Löfstedt 2018-03-09 08:54:42 UTC
(In reply to Marta Löfstedt from comment #4)
> Note this was green for a while, but now only on GLK from:
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3881/shard-glkb1/
> igt@perf_pmu@rc6-runtime-pm.html
> 
> (perf_pmu:2880) CRITICAL: Test assertion failure function test_rc6, file
> perf_pmu.c:1377:
> (perf_pmu:2880) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 +
> (tolerance)) * (double)(ts[1] - ts[0]) && (double)(idle - prev) >= (1.0 -
> (tolerance)) * (double)(ts[1] - ts[0])
> (perf_pmu:2880) CRITICAL: 'idle - prev' != 'ts[1] - ts[0]'
> (3304153333.000000 not within +5.000000%/-5.000000% tolerance of
> 2000332850.000000)
> Subtest rc6-runtime-pm failed.

This pattern is probably due to problem with building IGT that has caused all runs CI_DRM_3880 - CI_DRM_3902 to have the same test order.
Comment 7 Ricardo Perez 2018-04-05 20:37:46 UTC
Created attachment 138643 [details]
dmesg and kern-logs
Comment 8 Ricardo Perez 2018-04-05 20:39:41 UTC
CNL QA System is failing the following test:
igt@perf_pmu@rc6-runtime-pm


Software Version:
	
IGT-Version: 1.22-g0721161 (x86_64) (Linux: 4.16.0-rc6-drm-intel-qa-ww12-commit-4db112a+ x86_64)

Stdout:

Test requirement not met in function gem_require_engine, file ./../lib/igt_gt.h:122:
Test requirement: gem_has_engine(gem_fd, class, instance)
Stack trace:
  #0 [__igt_fail_assert+0x101]
  #1 [test_rc6+0x38e]
  #2 [<unknown>+0x38e]
Subtest rc6-runtime-pm: FAIL (6.482s)
Test requirement not met in function gem_require_engine, file ./../lib/igt_gt.h:122:
Test requirement: gem_has_engine(gem_fd, class, instance)

Stderr:

	
(perf_pmu:10868) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1377:
(perf_pmu:10868) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(ts[1] - ts[0])
(perf_pmu:10868) CRITICAL: 'idle - prev' != 'ts[1] - ts[0]' (6298901760.000000 not within +5.000000%/-5.000000% tolerance of 2000086080.000000)
Subtest rc6-runtime-pm failed.
**** DEBUG ****
(perf_pmu:10868) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0'
(perf_pmu:10868) DEBUG: Test requirement passed: !(fd < 0 && errno == ENODEV)
(perf_pmu:10868) igt-kms-DEBUG: VT: graphics mode set (mode was 0x0)
(perf_pmu:10868) igt-pm-DEBUG: Saved audio power management as '0' and ''
(perf_pmu:10868) igt-pm-DEBUG: Saved runtime power management as '10000' and 'on'
(perf_pmu:10868) DEBUG: Test requirement passed: igt_setup_runtime_pm()
(perf_pmu:10868) igt-pm-DEBUG: igt_get_runtime_pm_status() == status took 2865ms
(perf_pmu:10868) DEBUG: Test requirement passed: igt_wait_for_pm_status(IGT_RUNTIME_PM_STATUS_SUSPENDED)
(perf_pmu:10868) DEBUG: Test requirement passed: wait_for_rc6(fd)
(perf_pmu:10868) DEBUG: slept=2000072995 perf=2000086080
(perf_pmu:10868) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1377:
(perf_pmu:10868) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(ts[1] - ts[0])
(perf_pmu:10868) CRITICAL: 'idle - prev' != 'ts[1] - ts[0]' (6298901760.000000 not within +5.000000%/-5.000000% tolerance of 2000086080.000000)
(perf_pmu:10868) igt-core-INFO: Stack trace:
(perf_pmu:10868) igt-core-INFO:   #0 [__igt_fail_assert+0x101]
(perf_pmu:10868) igt-core-INFO:   #1 [test_rc6+0x38e]
(perf_pmu:10868) igt-core-INFO:   #2 [<unknown>+0x38e]
****  END  ****
Comment 10 Lakshmi 2018-09-04 07:33:19 UTC
Priority is raised to high as it fails every round with drmtip/igt Full/CI DRM FUll (Shards).
Comment 11 Chris Wilson 2018-09-04 08:36:06 UTC
Saying it fails every time is a bit of an overstatement. It's just a counter error across a sleep. We know that it may be the any of the sleep time, rc6 counter, or pm suspend time that may be adrift. The test is using our bkm of measuring its own sleep and pmu interval, so hopefully that just means its the combination of rc6 plus suspend time that's going wrong.

It's not a crucial feature of the driver, and under reporting rc6 time is far better than over!
Comment 12 Chris Wilson 2018-09-04 08:47:08 UTC
The biggest problem is that we only sync between rpm suspend and the rc6 counter at the time of sampling, so there's a gray error where the combined pmu result may go adrift, and our only limit is that we make sure the reported pmu value is monotonic.
Comment 13 Francesco Balestrieri 2018-09-04 10:16:29 UTC
Lowering priority based on Chris' comments.
Comment 14 CI Bug Log 2019-02-18 15:17:25 UTC
A CI Bug Log filter associated to this bug has been updated:

{- all machines: igt@perf_pmu@rc6-runtime-pm(-long) - Failed assertion: (double)(idle - prev) &lt;= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) &amp;&amp; ... -}
{+ all machines: igt@perf_pmu@rc6-runtime-pm(-long) - fail/dmesg-fail-Failed assertion: (double)(idle - prev) &lt;= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) &amp;&amp; . +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_221/fi-bsw-kefka/igt@perf_pmu@rc6-runtime-pm-long.html


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.