https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3773/shard-snb4/igt@perf_pmu@busy-start-vcs0.html (perf_pmu:1787) CRITICAL: Test assertion failure function busy_start, file perf_pmu.c:246: (perf_pmu:1787) CRITICAL: Failed assertion: (double)(val) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && (double)(val) >= (1.0 - (tolerance)) * (double)(ts[1] - ts[0]) (perf_pmu:1787) CRITICAL: 'val' != 'ts[1] - ts[0]' (455000000.000000 not within +5.000000%/-5.000000% tolerance of 506593737.000000) Subtest busy-start-vcs0 failed.
What's telling about this is the (perf_pmu:1787) DEBUG: slept=506588707 perf=506593737 (perf_pmu:1787) CRITICAL: 'val' != 'ts[1] - ts[0]' (455000000.000000 not within +5.000000%/-5.000000% tolerance of 506593737.000000) so it's not just a timing issue of how long we slept as we had hoped. Plan B was to wait for the spin batch to start.
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4307/shard-snb5/igt@perf_pmu@busy-vcs0.html (perf_pmu:3148) CRITICAL: Test assertion failure function single, file perf_pmu.c:211: (perf_pmu:3148) CRITICAL: Failed assertion: (double)(val) <= (1.0 + (tolerance)) * (double)(flags & (1) ? slept : 0.f) && (double)(val) >= (1.0 - (tolerance)) * (double)(flags & (1) ? slept : 0.f) (perf_pmu:3148) CRITICAL: 'val' != 'flags & (1) ? slept : 0.f' (410000000.000000 not within +5.000000%/-5.000000% tolerance of 500072256.000000) Subtest busy-vcs0 failed.
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3841/shard-snb1/igt@perf_pmu@busy-check-all-vcs0.html (perf_pmu:1619) CRITICAL: Test assertion failure function busy_check_all, file perf_pmu.c:380: (perf_pmu:1619) CRITICAL: Failed assertion: (double)(val[busy_idx]) <= (1.0 + (tolerance)) * (double)(slept) && (double)(val[busy_idx]) >= (1.0 - (tolerance)) * (double)(slept) (perf_pmu:1619) CRITICAL: 'val[busy_idx]' != 'slept' (475000000.000000 not within +5.000000%/-5.000000% tolerance of 510344059.000000) Subtest busy-check-all-vcs0 failed. Not sure this is considered the same issue.
this was heavily hit on IVB on the drmtip_1 and drmtip_2 runs with shards testlist on BAT machines. I am not going to spam the bug with hits but here are some examples: check cibuglog history and or respecive https://intel-gfx-ci.01.org/tree/drm-tip/drmtip.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_1/fi-ivb-3520m/igt@perf_pmu@render-node-busy-vcs0.html (perf_pmu:2153) CRITICAL: Test assertion failure function single, file ../tests/perf_pmu.c:213: (perf_pmu:2153) CRITICAL: Failed assertion: (double)(val) <= (1.0 + (tolerance)) * (double)(flags & (1) ? slept : 0.f) && (double)(val) >= (1.0 - (tolerance)) * (double)(flags & (1) ? slept : 0.f) (perf_pmu:2153) CRITICAL: 'val' != 'flags & (1) ? slept : 0.f' (240000000.000000 not within +5.000000%/-5.000000% tolerance of 504372512.000000) Subtest render-node-busy-vcs0 failed. https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_2/fi-ivb-3770/igt@perf_pmu@render-node-busy-vcs0.html (perf_pmu:1993) CRITICAL: Test assertion failure function single, file ../tests/perf_pmu.c:213: (perf_pmu:1993) CRITICAL: Failed assertion: (double)(val) <= (1.0 + (tolerance)) * (double)(flags & (1) ? slept : 0.f) && (double)(val) >= (1.0 - (tolerance)) * (double)(flags & (1) ? slept : 0.f) (perf_pmu:1993) CRITICAL: 'val' != 'flags & (1) ? slept : 0.f' (145000000.000000 not within +5.000000%/-5.000000% tolerance of 509359744.000000) Subtest render-node-busy-vcs0 failed. https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_1/fi-ivb-3770/igt@perf_pmu@busy-vcs0.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_2/fi-ivb-3520m/igt@perf_pmu@busy-vcs0.html
Should be improved with: commit a9741da52ad1963f7632ef1e852cbe1c3bcc601e (u-180326) Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Thu Mar 22 17:24:17 2018 +0000 tests/perf_pmu: Improve accuracy by waiting on spinner to start
Was seen on drmtip_9 on blb hardware: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_9/fi-blb-e6850/igt@perf_pmu@cpu-hotplug.html (perf_pmu:1378) CRITICAL: Test assertion failure function cpu_hotplug, file ../tests/perf_pmu.c:1166: (perf_pmu:1378) CRITICAL: Failed assertion: (double)(val) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && (double)(val) >= (1.0 - (tolerance)) * (double)(ts[1] - ts[0]) (perf_pmu:1378) CRITICAL: 'val' != 'ts[1] - ts[0]' (45000000.000000 not within +5.000000%/-5.000000% tolerance of 48568116.000000) Subtest cpu-hotplug failed.
Wait, there's more! commit d502f055ac4500cada758876a512ac4f14b34851 Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Wed Apr 4 10:51:52 2018 +0100 tests/perf_pmu: Avoid RT thread for accuracy test Realtime scheduling interferes with execlists submission (tasklet) so try to simplify the PWM loop in a few ways: * Drop RT. * Longer batches for smaller systematic error. * More truthful test duration calculation. * Less clock queries. * No self-adjust - instead just report the achieved cycle and let the parent check against it. * Report absolute cycle error. v2: * Bring back self-adjust. (Chris Wilson) (But slightly fixed version with no overflow.) v3: * Log average and mean calibration for each pass. v4: * Eliminate development leftovers. * Fix variance logging. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Presumptively closing.
(In reply to Martin Peres from comment #6) > Was seen on drmtip_9 on blb hardware: > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_9/fi-blb-e6850/ > igt@perf_pmu@cpu-hotplug.html > > (perf_pmu:1378) CRITICAL: Test assertion failure function cpu_hotplug, file > ../tests/perf_pmu.c:1166: > (perf_pmu:1378) CRITICAL: Failed assertion: (double)(val) <= (1.0 + > (tolerance)) * (double)(ts[1] - ts[0]) && (double)(val) >= (1.0 - > (tolerance)) * (double)(ts[1] - ts[0]) > (perf_pmu:1378) CRITICAL: 'val' != 'ts[1] - ts[0]' (45000000.000000 not > within +5.000000%/-5.000000% tolerance of 48568116.000000) > Subtest cpu-hotplug failed. Please be really careful not to mix up subtests inside perf_pmu, there are quite a few distinct issues to worry about.
Also seen on HSW: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_31/fi-hsw-peppy/igt@perf_pmu@cpu-hotplug.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_39/fi-ivb-3520m/igt@perf_pmu@multi-client-vcs0.html (In reply to Chris Wilson from comment #8) > (In reply to Martin Peres from comment #6) > > Was seen on drmtip_9 on blb hardware: > > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_9/fi-blb-e6850/ > > igt@perf_pmu@cpu-hotplug.html > > > > (perf_pmu:1378) CRITICAL: Test assertion failure function cpu_hotplug, file > > ../tests/perf_pmu.c:1166: > > (perf_pmu:1378) CRITICAL: Failed assertion: (double)(val) <= (1.0 + > > (tolerance)) * (double)(ts[1] - ts[0]) && (double)(val) >= (1.0 - > > (tolerance)) * (double)(ts[1] - ts[0]) > > (perf_pmu:1378) CRITICAL: 'val' != 'ts[1] - ts[0]' (45000000.000000 not > > within +5.000000%/-5.000000% tolerance of 48568116.000000) > > Subtest cpu-hotplug failed. > > Please be really careful not to mix up subtests inside perf_pmu, there are > quite a few distinct issues to worry about. We still see a lot of issues. Should I create a new bug?
Re-opening since it is still a problem in almost all drmtip runs...
Another try. The accuracy ones should be about done; pm-runtime measurements seem to be the hold out.
(In reply to Chris Wilson from comment #12) > Another try. The accuracy ones should be about done; pm-runtime measurements > seem to be the hold out. The only test failing is https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_105/fi-elk-e7500/igt@perf_pmu@cpu-hotplug.html, on a wide range of machines: (perf_pmu:1285) CRITICAL: Test assertion failure function cpu_hotplug, file ../tests/perf_pmu.c:1157: (perf_pmu:1285) CRITICAL: Failed assertion: (double)(val) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && (double)(val) >= (1.0 - (tolerance)) * (double)(ts[1] - ts[0]) (perf_pmu:1285) CRITICAL: 'val' != 'ts[1] - ts[0]' (35280953.000000 not within +5.000000%/-5.000000% tolerance of 37769634.000000) Subtest cpu-hotplug failed.
(In reply to Martin Peres from comment #13) > (In reply to Chris Wilson from comment #12) > > Another try. The accuracy ones should be about done; pm-runtime measurements > > seem to be the hold out. > > The only test failing is > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_105/fi-elk-e7500/ > igt@perf_pmu@cpu-hotplug.html, on a wide range of machines: > > (perf_pmu:1285) CRITICAL: Test assertion failure function cpu_hotplug, file > ../tests/perf_pmu.c:1157: > (perf_pmu:1285) CRITICAL: Failed assertion: (double)(val) <= (1.0 + > (tolerance)) * (double)(ts[1] - ts[0]) && (double)(val) >= (1.0 - > (tolerance)) * (double)(ts[1] - ts[0]) > (perf_pmu:1285) CRITICAL: 'val' != 'ts[1] - ts[0]' (35280953.000000 not > within +5.000000%/-5.000000% tolerance of 37769634.000000) > Subtest cpu-hotplug failed. 3.5 vs 3.77. Close enough that we should just call that victory and be done with it. Let's hope there aren't any worse fails.
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_128/fi-apl-guc/igt@perf_pmu@cpu-hotplug.html Starting subtest: cpu-hotplug (perf_pmu:918) CRITICAL: Test assertion failure function cpu_hotplug, file ../tests/perf_pmu.c:1157: (perf_pmu:918) CRITICAL: Failed assertion: (double)(val) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && (double)(val) >= (1.0 - (tolerance)) * (double)(ts[1] - ts[0]) (perf_pmu:918) CRITICAL: 'val' != 'ts[1] - ts[0]' (70394268.000000 not within +5.000000%/-5.000000% tolerance of 66944504.000000) Subtest cpu-hotplug failed.
> 3.5 vs 3.77. Close enough that we should just call that victory and be done with it. > Let's hope there aren't any worse fails. So can we call this a victory? Martin reported another failure above, but the values aren't that far off either (I think).
Ping. I'm inclined to close this if there are no objections.
Or if this is really OK, maybe change the tolerance in the testcase so that it's not reported as an error?
A CI Bug Log filter associated to this bug has been updated: {- CTG BWR IVB BLB ELK ILK BYT SNB HSW APL: igt@perf_pmu@.* - fail - Failed assertion: (double)(val) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && ... -} {+ CTG ELK BWR IVB BLB ELK ILK BYT SNB HSW APL: igt@perf_pmu@.* - fail - Failed assertion: (double)(val) <= (1.0 + (tolerance)) * (double)(ts[1] - ts[0]) && ... +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_374/fi-elk-e7500/igt@perf_pmu@cpu-hotplug.html
A CI Bug Log filter associated to this bug has been updated: {- snb ivb: igt@perf_pmu@busy(-idle)-check-all-vcs[01] - Failed assertion: (double)(val[busy_idx]) <= (1.0 + (tolerance)) * (double)(slept) &&... -} {+ snb ivb: igt@perf_pmu@busy(-idle)-check-all-vcs[01] - Failed assertion: (double)(val[busy_idx]) <= (1.0 + (tolerance)) * (double)(slept) &&... +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_404/fi-bsw-kefka/igt@perf_pmu@busy-idle-check-all-vcs0.html
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/75.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.