Bug 105010 - [CI] igt@perf_pmu@rc6-runtime-pm*- fail/dmesg-fail - Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(slept) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(slept) (perf_pmu:1745) CRITICAL: 'idle - prev' != 'slept'
Summary: [CI] igt@perf_pmu@rc6-runtime-pm*- fail/dmesg-fail - Failed assertion: (doubl...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Francesco Balestrieri
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-08 06:48 UTC by Marta Löfstedt
Modified: 2019-11-13 16:36 UTC (History)
1 user (show)

See Also:
i915 platform: CFL, CNL, GLK
i915 features: Perf/PMU


Attachments
dmesg and kern-logs (66.52 KB, application/gzip)
2018-04-05 20:37 UTC, Ricardo Perez
no flags Details

Description Marta Löfstedt 2018-02-08 06:48:42 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3739/shard-glkb3/igt@perf_pmu@rc6-runtime-pm.html

(perf_pmu:1745) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1295:
(perf_pmu:1745) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(slept) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(slept)
(perf_pmu:1745) CRITICAL: 'idle - prev' != 'slept' (2292681666.000000 not within 5.000000% tolerance of 2000298407.000000)
Subtest rc6-runtime-pm failed.
Comment 1 Marta Löfstedt 2018-02-08 09:10:54 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4231/shard-glkb3/igt@perf_pmu@rc6-runtime-pm-long.html

(perf_pmu:10969) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1295:
(perf_pmu:10969) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(slept) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(slept)
(perf_pmu:10969) CRITICAL: 'idle - prev' != 'slept' (235000000.000000 not within 5.000000% tolerance of 2000297046.000000)
Subtest rc6-runtime-pm-long failed.
Comment 3 Marta Löfstedt 2018-02-12 10:10:47 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4245/shard-hsw5/igt@perf_pmu@rc6-runtime-pm.html

(perf_pmu:7743) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1295:
(perf_pmu:7743) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(slept) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(slept)
(perf_pmu:7743) CRITICAL: 'idle - prev' != 'slept' (1876740160.000000 not within 5.000000% tolerance of 2000093247.000000)
Subtest rc6-runtime-pm failed.
Comment 4 Marta Löfstedt 2018-03-08 08:11:59 UTC
Note this was green for a while, but now only on GLK from:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3881/shard-glkb1/igt@perf_pmu@rc6-runtime-pm.html

(perf_pmu:2880) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1377:
(perf_pmu:2880) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(ts[1] - ts[0])
(perf_pmu:2880) CRITICAL: 'idle - prev' != 'ts[1] - ts[0]' (3304153333.000000 not within +5.000000%/-5.000000% tolerance of 2000332850.000000)
Subtest rc6-runtime-pm failed.
Comment 5 Marta Löfstedt 2018-03-09 08:54:42 UTC
(In reply to Marta Löfstedt from comment #4)
> Note this was green for a while, but now only on GLK from:
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3881/shard-glkb1/
> igt@perf_pmu@rc6-runtime-pm.html
> 
> (perf_pmu:2880) CRITICAL: Test assertion failure function test_rc6, file
> perf_pmu.c:1377:
> (perf_pmu:2880) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 +
> (tolerance)) * (double)(ts[1] - ts[0]) && (double)(idle - prev) >= (1.0 -
> (tolerance)) * (double)(ts[1] - ts[0])
> (perf_pmu:2880) CRITICAL: 'idle - prev' != 'ts[1] - ts[0]'
> (3304153333.000000 not within +5.000000%/-5.000000% tolerance of
> 2000332850.000000)
> Subtest rc6-runtime-pm failed.

This pattern is probably due to problem with building IGT that has caused all runs CI_DRM_3880 - CI_DRM_3902 to have the same test order.
Comment 7 Ricardo Perez 2018-04-05 20:37:46 UTC
Created attachment 138643 [details]
dmesg and kern-logs
Comment 8 Ricardo Perez 2018-04-05 20:39:41 UTC
CNL QA System is failing the following test:
igt@perf_pmu@rc6-runtime-pm


Software Version:
	
IGT-Version: 1.22-g0721161 (x86_64) (Linux: 4.16.0-rc6-drm-intel-qa-ww12-commit-4db112a+ x86_64)

Stdout:

Test requirement not met in function gem_require_engine, file ./../lib/igt_gt.h:122:
Test requirement: gem_has_engine(gem_fd, class, instance)
Stack trace:
  #0 [__igt_fail_assert+0x101]
  #1 [test_rc6+0x38e]
  #2 [<unknown>+0x38e]
Subtest rc6-runtime-pm: FAIL (6.482s)
Test requirement not met in function gem_require_engine, file ./../lib/igt_gt.h:122:
Test requirement: gem_has_engine(gem_fd, class, instance)

Stderr:

	
(perf_pmu:10868) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1377:
(perf_pmu:10868) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(ts[1] - ts[0])
(perf_pmu:10868) CRITICAL: 'idle - prev' != 'ts[1] - ts[0]' (6298901760.000000 not within +5.000000%/-5.000000% tolerance of 2000086080.000000)
Subtest rc6-runtime-pm failed.
**** DEBUG ****
(perf_pmu:10868) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0'
(perf_pmu:10868) DEBUG: Test requirement passed: !(fd < 0 && errno == ENODEV)
(perf_pmu:10868) igt-kms-DEBUG: VT: graphics mode set (mode was 0x0)
(perf_pmu:10868) igt-pm-DEBUG: Saved audio power management as '0' and ''
(perf_pmu:10868) igt-pm-DEBUG: Saved runtime power management as '10000' and 'on'
(perf_pmu:10868) DEBUG: Test requirement passed: igt_setup_runtime_pm()
(perf_pmu:10868) igt-pm-DEBUG: igt_get_runtime_pm_status() == status took 2865ms
(perf_pmu:10868) DEBUG: Test requirement passed: igt_wait_for_pm_status(IGT_RUNTIME_PM_STATUS_SUSPENDED)
(perf_pmu:10868) DEBUG: Test requirement passed: wait_for_rc6(fd)
(perf_pmu:10868) DEBUG: slept=2000072995 perf=2000086080
(perf_pmu:10868) CRITICAL: Test assertion failure function test_rc6, file perf_pmu.c:1377:
(perf_pmu:10868) CRITICAL: Failed assertion: (double)(idle - prev) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && (double)(idle - prev) >= (1.0 - (tolerance)) * (double)(ts[1] - ts[0])
(perf_pmu:10868) CRITICAL: 'idle - prev' != 'ts[1] - ts[0]' (6298901760.000000 not within +5.000000%/-5.000000% tolerance of 2000086080.000000)
(perf_pmu:10868) igt-core-INFO: Stack trace:
(perf_pmu:10868) igt-core-INFO:   #0 [__igt_fail_assert+0x101]
(perf_pmu:10868) igt-core-INFO:   #1 [test_rc6+0x38e]
(perf_pmu:10868) igt-core-INFO:   #2 [<unknown>+0x38e]
****  END  ****
Comment 10 Lakshmi 2018-09-04 07:33:19 UTC
Priority is raised to high as it fails every round with drmtip/igt Full/CI DRM FUll (Shards).
Comment 11 Chris Wilson 2018-09-04 08:36:06 UTC
Saying it fails every time is a bit of an overstatement. It's just a counter error across a sleep. We know that it may be the any of the sleep time, rc6 counter, or pm suspend time that may be adrift. The test is using our bkm of measuring its own sleep and pmu interval, so hopefully that just means its the combination of rc6 plus suspend time that's going wrong.

It's not a crucial feature of the driver, and under reporting rc6 time is far better than over!
Comment 12 Chris Wilson 2018-09-04 08:47:08 UTC
The biggest problem is that we only sync between rpm suspend and the rc6 counter at the time of sampling, so there's a gray error where the combined pmu result may go adrift, and our only limit is that we make sure the reported pmu value is monotonic.
Comment 13 Francesco Balestrieri 2018-09-04 10:16:29 UTC
Lowering priority based on Chris' comments.
Comment 14 CI Bug Log 2019-02-18 15:17:25 UTC
A CI Bug Log filter associated to this bug has been updated:

{- all machines: igt@perf_pmu@rc6-runtime-pm(-long) - Failed assertion: (double)(idle - prev) &lt;= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) &amp;&amp; ... -}
{+ all machines: igt@perf_pmu@rc6-runtime-pm(-long) - fail/dmesg-fail-Failed assertion: (double)(idle - prev) &lt;= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) &amp;&amp; . +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_221/fi-bsw-kefka/igt@perf_pmu@rc6-runtime-pm-long.html
Comment 16 Chris Wilson 2019-09-12 20:01:49 UTC
commit 16ffe73c186b7897a1a6888ffb4949a8b4026624
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Sep 12 13:48:13 2019 +0100

    drm/i915/pmu: Use GT parked for estimating RC6 while asleep
    
    As we track when we put the GT device to sleep upon idling, we can use
    that callback to sample the current rc6 counters and record the
    timestamp for estimating samples after that point while asleep.
    
    v2: Stick to using ktime_t
    v3: Track user_wakerefs that interfere with the new
    intel_gt_pm_wait_for_idle
    v4: No need for parked/unparked estimation if !CONFIG_PM
    v5: Keep timer park/unpark logic as was
    v6: Refactor duplicated estimate/update rc6 logic
    v7: Pull intel_get_pm_get_if_awake() out from the pmu->lock.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105010
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190912124813.19225-1-chris@chris-wilson.co.uk

One hopes for the last time.
Comment 17 swathi.dhanavanthri 2019-11-13 16:36:33 UTC
Reproduction rate of 100% 2 months ago(drmtip_364), not seen in the last 37 runs, so closing and archiving this.
Comment 18 CI Bug Log 2019-11-13 16:36:50 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.