https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3739/shard-glkb3/igt@perf_pmu@rc6-runtime-pm.html (perf_pmu:1745) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1295: (perf_pmu:1745) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(slept) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(slept) (perf_pmu:1745) CRITICAL: 'idle - prev' != 'slept' (2292681666.000000 not within 5.000000% tolerance of 2000298407.000000) Subtest rc6-runtime-pm failed.
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4231/shard-glkb3/igt@perf_pmu@rc6-runtime-pm-long.html (perf_pmu:10969) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1295: (perf_pmu:10969) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(slept) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(slept) (perf_pmu:10969) CRITICAL: 'idle - prev' != 'slept' (235000000.000000 not within 5.000000% tolerance of 2000297046.000000) Subtest rc6-runtime-pm-long failed.
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3752/shard-kbl3/igt@perf_pmu@rc6-runtime-pm-long.html
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4245/shard-hsw5/igt@perf_pmu@rc6-runtime-pm.html (perf_pmu:7743) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1295: (perf_pmu:7743) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(slept) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(slept) (perf_pmu:7743) CRITICAL: 'idle - prev' != 'slept' (1876740160.000000 not within 5.000000% tolerance of 2000093247.000000) Subtest rc6-runtime-pm failed.
Note this was green for a while, but now only on GLK from: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3881/shard-glkb1/igt@perf_pmu@rc6-runtime-pm.html (perf_pmu:2880) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1377: (perf_pmu:2880) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(ts[1] - ts[0]) (perf_pmu:2880) CRITICAL: 'idle - prev' != 'ts[1] - ts[0]' (3304153333.000000 not within +5.000000%/-5.000000% tolerance of 2000332850.000000) Subtest rc6-runtime-pm failed.
(In reply to Marta Löfstedt from comment #4) > Note this was green for a while, but now only on GLK from: > > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3881/shard-glkb1/ > igt@perf_pmu@rc6-runtime-pm.html > > (perf_pmu:2880) CRITICAL: Test assertion failure function test_rc6, file > perf_pmu.c:1377: > (perf_pmu:2880) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + > (tolerance)) * (double)(ts[1] - ts[0]) && (double)(idle - prev) >= (1.0 - > (tolerance)) * (double)(ts[1] - ts[0]) > (perf_pmu:2880) CRITICAL: 'idle - prev' != 'ts[1] - ts[0]' > (3304153333.000000 not within +5.000000%/-5.000000% tolerance of > 2000332850.000000) > Subtest rc6-runtime-pm failed. This pattern is probably due to problem with building IGT that has caused all runs CI_DRM_3880 - CI_DRM_3902 to have the same test order.
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_11/fi-byt-j1900/igt@perf_pmu@rc6-runtime-pm-long.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_11/fi-glk-1/igt@perf_pmu@rc6-runtime-pm-long.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_11/fi-kbl-7500u/igt@perf_pmu@rc6-runtime-pm-long.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_10/fi-bsw-n3050/igt@perf_pmu@rc6-runtime-pm-long.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_10/fi-kbl-7567u/igt@perf_pmu@rc6-runtime-pm-long.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_10/fi-skl-6700hq/igt@perf_pmu@rc6-runtime-pm-long.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_10/fi-skl-6770hq/igt@perf_pmu@rc6-runtime-pm-long.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_9/fi-cfl-u/igt@perf_pmu@rc6-runtime-pm-long.html
Created attachment 138643 [details] dmesg and kern-logs
CNL QA System is failing the following test: igt@perf_pmu@rc6-runtime-pm Software Version: IGT-Version: 1.22-g0721161 (x86_64) (Linux: 4.16.0-rc6-drm-intel-qa-ww12-commit-4db112a+ x86_64) Stdout: Test requirement not met in function gem_require_engine, file ./../lib/igt_gt.h:122: Test requirement: gem_has_engine(gem_fd, class, instance) Stack trace: #0 [__igt_fail_assert+0x101] #1 [test_rc6+0x38e] #2 [<unknown>+0x38e] Subtest rc6-runtime-pm: FAIL (6.482s) Test requirement not met in function gem_require_engine, file ./../lib/igt_gt.h:122: Test requirement: gem_has_engine(gem_fd, class, instance) Stderr: (perf_pmu:10868) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1377: (perf_pmu:10868) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(ts[1] - ts[0]) (perf_pmu:10868) CRITICAL: 'idle - prev' != 'ts[1] - ts[0]' (6298901760.000000 not within +5.000000%/-5.000000% tolerance of 2000086080.000000) Subtest rc6-runtime-pm failed. **** DEBUG **** (perf_pmu:10868) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (perf_pmu:10868) DEBUG: Test requirement passed: !(fd < 0 && errno == ENODEV) (perf_pmu:10868) igt-kms-DEBUG: VT: graphics mode set (mode was 0x0) (perf_pmu:10868) igt-pm-DEBUG: Saved audio power management as '0' and '' (perf_pmu:10868) igt-pm-DEBUG: Saved runtime power management as '10000' and 'on' (perf_pmu:10868) DEBUG: Test requirement passed: igt_setup_runtime_pm() (perf_pmu:10868) igt-pm-DEBUG: igt_get_runtime_pm_status() == status took 2865ms (perf_pmu:10868) DEBUG: Test requirement passed: igt_wait_for_pm_status(IGT_RUNTIME_PM_STATUS_SUSPENDED) (perf_pmu:10868) DEBUG: Test requirement passed: wait_for_rc6(fd) (perf_pmu:10868) DEBUG: slept=2000072995 perf=2000086080 (perf_pmu:10868) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1377: (perf_pmu:10868) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(ts[1] - ts[0]) (perf_pmu:10868) CRITICAL: 'idle - prev' != 'ts[1] - ts[0]' (6298901760.000000 not within +5.000000%/-5.000000% tolerance of 2000086080.000000) (perf_pmu:10868) igt-core-INFO: Stack trace: (perf_pmu:10868) igt-core-INFO: #0 [__igt_fail_assert+0x101] (perf_pmu:10868) igt-core-INFO: #1 [test_rc6+0x38e] (perf_pmu:10868) igt-core-INFO: #2 [<unknown>+0x38e] **** END ****
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_21/fi-bdw-5557u/igt@perf_pmu@rc6-runtime-pm-long.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_21/fi-byt-n2820/igt@perf_pmu@rc6-runtime-pm-long.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_21/fi-cfl-s3/igt@perf_pmu@rc6-runtime-pm-long.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_20/fi-skl-guc/igt@perf_pmu@rc6-runtime-pm-long.html
Priority is raised to high as it fails every round with drmtip/igt Full/CI DRM FUll (Shards).
Saying it fails every time is a bit of an overstatement. It's just a counter error across a sleep. We know that it may be the any of the sleep time, rc6 counter, or pm suspend time that may be adrift. The test is using our bkm of measuring its own sleep and pmu interval, so hopefully that just means its the combination of rc6 plus suspend time that's going wrong. It's not a crucial feature of the driver, and under reporting rc6 time is far better than over!
The biggest problem is that we only sync between rpm suspend and the rc6 counter at the time of sampling, so there's a gray error where the combined pmu result may go adrift, and our only limit is that we make sure the reported pmu value is monotonic.
Lowering priority based on Chris' comments.
A CI Bug Log filter associated to this bug has been updated: {- all machines: igt@perf_pmu@rc6-runtime-pm(-long) - Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && ... -} {+ all machines: igt@perf_pmu@rc6-runtime-pm(-long) - fail/dmesg-fail-Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && . +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_221/fi-bsw-kefka/igt@perf_pmu@rc6-runtime-pm-long.html
This also seen on https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6179/re-icl-u/igt@perf_pmu@rc6-runtime-pm-long.html
commit 16ffe73c186b7897a1a6888ffb4949a8b4026624 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Sep 12 13:48:13 2019 +0100 drm/i915/pmu: Use GT parked for estimating RC6 while asleep As we track when we put the GT device to sleep upon idling, we can use that callback to sample the current rc6 counters and record the timestamp for estimating samples after that point while asleep. v2: Stick to using ktime_t v3: Track user_wakerefs that interfere with the new intel_gt_pm_wait_for_idle v4: No need for parked/unparked estimation if !CONFIG_PM v5: Keep timer park/unpark logic as was v6: Refactor duplicated estimate/update rc6 logic v7: Pull intel_get_pm_get_if_awake() out from the pmu->lock. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105010 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190912124813.19225-1-chris@chris-wilson.co.uk One hopes for the last time.
Reproduction rate of 100% 2 months ago(drmtip_364), not seen in the last 37 runs, so closing and archiving this.
The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.