Bug 105157

Summary: [CI] igt@perf_pmu@busy-accuracy-* - fail - Failed assertion: (double)(busy_r) <= (1.0 + (0.15)) * (double)((double)target_busy_pct / 100.0) && (double)(busy_r) >= (1.0 - (0.15)) * (double)((double)target_busy_pct / 100.0)
Product: DRI Reporter: Marta Löfstedt <marta.lofstedt>
Component: DRM/IntelAssignee: Francesco Balestrieri <francesco.balestrieri>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs, martin.peres
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BSW/CHT, BXT, CFL, GLK, KBL, SKL i915 features: Perf/PMU

Description Marta Löfstedt 2018-02-19 07:05:54 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3793/shard-glkb1/igt@perf_pmu@busy-accuracy-2-vecs0.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3791/shard-glkb1/igt@perf_pmu@busy-accuracy-50-vecs0.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3790/shard-glkb1/igt@perf_pmu@busy-accuracy-2-bcs0.html


(perf_pmu:1500) CRITICAL: Test assertion failure function accuracy, file perf_pmu.c:1544:
(perf_pmu:1500) CRITICAL: Failed assertion: (double)(busy_r) <= (1.0 + (0.15)) * (double)((double)target_busy_pct / 100.0) && (double)(busy_r) >= (1.0 - (0.15)) * (double)((double)target_busy_pct / 100.0)
(perf_pmu:1500) CRITICAL: Last errno: 2, No such file or directory
(perf_pmu:1500) CRITICAL: 'busy_r' != '(double)target_busy_pct / 100.0' (0.016899 not within +15.000000%/-15.000000% tolerance of 0.020000)

Note, so far this looks like low frequency flip-flopping and only fail on shard-glkb1. Let's see if it only continues to fail on GLKB1 and that machine will have trouble to stay in the lab.
Comment 2 Marta Löfstedt 2018-02-20 08:29:30 UTC
This is NOT only on GLKB1:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3798/shard-glkb2/igt@perf_pmu@busy-accuracy-50-bcs0.html
	

(perf_pmu:1650) CRITICAL: Test assertion failure function accuracy, file perf_pmu.c:1544:
(perf_pmu:1650) CRITICAL: Failed assertion: (double)(busy_r) <= (1.0 + (0.15)) * (double)((double)target_busy_pct / 100.0) && (double)(busy_r) >= (1.0 - (0.15)) * (double)((double)target_busy_pct / 100.0)
(perf_pmu:1650) CRITICAL: Last errno: 2, No such file or directory
(perf_pmu:1650) CRITICAL: 'busy_r' != '(double)target_busy_pct / 100.0' (0.420954 not within +15.000000%/-15.000000% tolerance of 0.500000)
Subtest busy-accuracy-50-bcs0 failed.
Comment 3 Marta Löfstedt 2018-02-20 12:42:04 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4271/shard-glkb6/igt@perf_pmu@busy-accuracy-2-rcs0.html

(perf_pmu:1505) CRITICAL: Test assertion failure function accuracy, file perf_pmu.c:1550:
(perf_pmu:1505) CRITICAL: Failed assertion: (double)(busy_r) <= (1.0 + (0.15)) * (double)((double)target_busy_pct / 100.0) && (double)(busy_r) >= (1.0 - (0.15)) * (double)((double)target_busy_pct / 100.0)
(perf_pmu:1505) CRITICAL: Last errno: 2, No such file or directory
(perf_pmu:1505) CRITICAL: 'busy_r' != '(double)target_busy_pct / 100.0' (0.016381 not within +15.000000%/-15.000000% tolerance of 0.020000)
Subtest busy-accuracy-2-rcs0 failed.
Comment 4 Marta Löfstedt 2018-02-21 07:25:36 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4273/shard-apl6/igt@perf_pmu@busy-accuracy-2-bcs0.html	

(perf_pmu:1586) CRITICAL: Test assertion failure function accuracy, file perf_pmu.c:1550:
(perf_pmu:1586) CRITICAL: Failed assertion: (double)(busy_r) <= (1.0 + (0.15)) * (double)((double)target_busy_pct / 100.0) && (double)(busy_r) >= (1.0 - (0.15)) * (double)((double)target_busy_pct / 100.0)
(perf_pmu:1586) CRITICAL: Last errno: 2, No such file or directory
(perf_pmu:1586) CRITICAL: 'busy_r' != '(double)target_busy_pct / 100.0' (0.016973 not within +15.000000%/-15.000000% tolerance of 0.020000)
Subtest busy-accuracy-2-bcs0 failed.
Comment 5 Chris Wilson 2018-02-22 08:37:46 UTC
We're trying a new method to see how that fares:

commit 1ecc978a69a531858ba799425770062ebeb13888 (upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Feb 20 13:00:37 2018 +0000

    igt/perf_pmu: Use a self-correcting busy pwm
Comment 6 Marta Löfstedt 2018-02-22 08:56:35 UTC
patch integrated in IGT_4281 still no CI_DRM_ run but should be in CI_DRM_3820. The frequency looks quite low for this issue, but we'll see...
Comment 7 Marta Löfstedt 2018-02-22 13:23:29 UTC
Patch from Comment #5 is in:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3820/shard-apl2/igt@perf_pmu@busy-accuracy-98-vecs0.html

(perf_pmu:1953) CRITICAL: Test assertion failure function accuracy, file perf_pmu.c:1545:
(perf_pmu:1953) CRITICAL: Failed assertion: (double)(1 - busy_r) <= (1.0 + (0.15)) * (double)(1 - expected) && (double)(1 - busy_r) >= (1.0 - (0.15)) * (double)(1 - expected)
(perf_pmu:1953) CRITICAL: Last errno: 2, No such file or directory
(perf_pmu:1953) CRITICAL: '1 - busy_r' != '1 - expected' (0.023063 not within +15.000000%/-15.000000% tolerance of 0.019999)
Subtest busy-accuracy-98-vecs0 failed.
Comment 8 Marta Löfstedt 2018-02-23 06:48:57 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3824/shard-glkb6/igt@perf_pmu@busy-accuracy-98-vcs0.html
	

(perf_pmu:2875) CRITICAL: Test assertion failure function accuracy, file perf_pmu.c:1545:
(perf_pmu:2875) CRITICAL: Failed assertion: (double)(1 - busy_r) <= (1.0 + (0.15)) * (double)(1 - expected) && (double)(1 - busy_r) >= (1.0 - (0.15)) * (double)(1 - expected)
(perf_pmu:2875) CRITICAL: Last errno: 2, No such file or directory
(perf_pmu:2875) CRITICAL: '1 - busy_r' != '1 - expected' (0.023030 not within +15.000000%/-15.000000% tolerance of 0.020000)
Subtest busy-accuracy-98-vcs0 failed.
Comment 9 Octavio 2018-02-26 18:54:28 UTC
This test is failing on CFL QA 

igt@perf_pmu@busy-accuracy-2-vcs0

IGT-Version: 1.21-ga2664f8 (x86_64) (Linux: 4.16.0-rc2-drm-intel-qa-ww9-commit-01a067a+ x86_64)
	
(perf_pmu:2528) CRITICAL: Test assertion failure function accuracy, file perf_pmu.c:1544:
(perf_pmu:2528) CRITICAL: Failed assertion: (double)(busy_r) <= (1.0 + (0.15)) * (double)(expected) && (double)(busy_r) >= (1.0 - (0.15)) * (double)(expected)
(perf_pmu:2528) CRITICAL: Last errno: 9, Bad file descriptor
(perf_pmu:2528) CRITICAL: 'busy_r' != 'expected' (0.023442 not within +15.000000%/-15.000000% tolerance of 0.020000)
Subtest busy-accuracy-2-vcs0 failed.
**** DEBUG ****
(perf_pmu:2528) DEBUG: Test requirement passed: gem_has_execlists(gem_fd)
(perf_pmu:2528) INFO: calibration=1000000us, test=1000000us; ratio=2.00% (2500us/122500us)
(perf_pmu:2528) DEBUG: Test requirement passed: !(fd < 0 && errno == ENODEV)
(perf_pmu:2528) INFO: error=17.21% (2.34% vs 2.00%)
(perf_pmu:2528) CRITICAL: Test assertion failure function accuracy, file perf_pmu.c:1544:
(perf_pmu:2528) CRITICAL: Failed assertion: (double)(busy_r) <= (1.0 + (0.15)) * (double)(expected) && (double)(busy_r) >= (1.0 - (0.15)) * (double)(expected)
(perf_pmu:2528) CRITICAL: Last errno: 9, Bad file descriptor
(perf_pmu:2528) CRITICAL: 'busy_r' != 'expected' (0.023442 not within +15.000000%/-15.000000% tolerance of 0.020000)
(perf_pmu:2528) igt-core-INFO: Stack trace:
(perf_pmu:2528) igt-core-INFO:   #0 [__igt_fail_assert+0x101]
(perf_pmu:2528) igt-core-INFO:   #1 [__real_main1548+0x2bf2]
(perf_pmu:2528) igt-core-INFO:   #2 [main+0x23]
(perf_pmu:2528) igt-core-INFO:   #3 [__libc_start_main+0xf1]
(perf_pmu:2528) igt-core-INFO:   #4 [_start+0x29]
(perf_pmu:2528) igt-core-INFO:   #5 [<unknown>+0x29]
****  END
Comment 10 Marta Löfstedt 2018-03-01 07:06:38 UTC
new subtest on glk:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3846/shard-glkb2/igt@perf_pmu@busy-accuracy-98-bcs0.html

(perf_pmu:2037) CRITICAL: Test assertion failure function accuracy, file perf_pmu.c:1545:
(perf_pmu:2037) CRITICAL: Failed assertion: (double)(1 - busy_r) <= (1.0 + (0.15)) * (double)(1 - expected) && (double)(1 - busy_r) >= (1.0 - (0.15)) * (double)(1 - expected)
(perf_pmu:2037) CRITICAL: Last errno: 2, No such file or directory
(perf_pmu:2037) CRITICAL: '1 - busy_r' != '1 - expected' (0.023298 not within +15.000000%/-15.000000% tolerance of 0.019966)
Subtest busy-accuracy-98-bcs0 failed.
Comment 11 Marta Löfstedt 2018-03-13 06:34:24 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4350/shard-glkb3/igt@perf_pmu@busy-accuracy-50-vcs0.html

(perf_pmu:1769) CRITICAL: Test assertion failure function accuracy, file ../tests/perf_pmu.c:1606:
(perf_pmu:1769) CRITICAL: Failed assertion: (double)(100.0 * busy_r) <= ((double)(100.0 * expected) + (2)) && (double)(100.0 * busy_r) >= ((double)(100.0 * expected) - (2))
(perf_pmu:1769) CRITICAL: Last errno: 2, No such file or directory
(perf_pmu:1769) CRITICAL: 47.652167 not within +2.000000/-2.000000 of 50.004811! ('100.0 * busy_r' vs '100.0 * expected')
Subtest busy-accuracy-50-vcs0 failed.
Comment 12 Martin Peres 2018-03-13 08:29:24 UTC
*** Bug 105462 has been marked as a duplicate of this bug. ***
Comment 13 Octavio 2018-03-13 15:57:42 UTC
This test fails on CFL QA 

igt@perf_pmu@busy-accuracy-2-bcs0

IGT-Version: 1.22-g5d71d77 (x86_64) (Linux: 4.16.0-rc4-drm-intel-qa-ww11-commit-73f9dfa+ x86_64)
calibration=1000000us, test=1000000us; ratio=2.00% (2500us/122500us)
0: busy 21295us, idle 1043658us: 2.00% (target: 2%)
1: busy 40179us, idle 1968831us: 2.00% (target: 2%)
error=-25.32% (1.49% vs 2.00%)
Stack trace:
  #0 [__igt_fail_assert+0x101]
  #1 [accuracy+0x779]
  #2 [__real_main1597+0x2009]
  #3 [main+0x23]
  #4 [__libc_start_main+0xf1]
  #5 [_start+0x29]
  #6 [<unknown>+0x29]
Subtest busy-accuracy-2-bcs0: FAIL (3.083s)
Test requirement not met in function gem_require_engine, file ./../lib/igt_gt.h:120:
Test requirement: gem_has_engine(gem_fd, class, instance)
Test requirement not met in function gem_require_engine, file ./../lib/igt_gt.h:120:
Test requirement: gem_has_engine(gem_fd, class, instance)

(perf_pmu:1536) CRITICAL: Test assertion failure function accuracy, file perf_pmu.c:1593:
(perf_pmu:1536) CRITICAL: Failed assertion: (double)(busy_r) <= (1.0 + (0.15)) * (double)(expected) && (double)(busy_r) >= (1.0 - (0.15)) * (double)(expected)
(perf_pmu:1536) CRITICAL: Last errno: 9, Bad file descriptor
(perf_pmu:1536) CRITICAL: 'busy_r' != 'expected' (0.014935 not within +15.000000%/-15.000000% tolerance of 0.020000)
Subtest busy-accuracy-2-bcs0 failed.
**** DEBUG ****
(perf_pmu:1536) DEBUG: Test requirement passed: gem_has_execlists(gem_fd)
(perf_pmu:1536) INFO: calibration=1000000us, test=1000000us; ratio=2.00% (2500us/122500us)
(perf_pmu:1536) DEBUG: Test requirement passed: !(fd < 0 && errno == ENODEV)
(perf_pmu:1536) INFO: error=-25.32% (1.49% vs 2.00%)
(perf_pmu:1536) CRITICAL: Test assertion failure function accuracy, file perf_pmu.c:1593:
(perf_pmu:1536) CRITICAL: Failed assertion: (double)(busy_r) <= (1.0 + (0.15)) * (double)(expected) && (double)(busy_r) >= (1.0 - (0.15)) * (double)(expected)
(perf_pmu:1536) CRITICAL: Last errno: 9, Bad file descriptor
(perf_pmu:1536) CRITICAL: 'busy_r' != 'expected' (0.014935 not within +15.000000%/-15.000000% tolerance of 0.020000)
(perf_pmu:1536) igt-core-INFO: Stack trace:
(perf_pmu:1536) igt-core-INFO:   #0 [__igt_fail_assert+0x101]
(perf_pmu:1536) igt-core-INFO:   #1 [accuracy+0x779]
(perf_pmu:1536) igt-core-INFO:   #2 [__real_main1597+0x2009]
(perf_pmu:1536) igt-core-INFO:   #3 [main+0x23]
(perf_pmu:1536) igt-core-INFO:   #4 [__libc_start_main+0xf1]
(perf_pmu:1536) igt-core-INFO:   #5 [_start+0x29]
(perf_pmu:1536) igt-core-INFO:   #6 [<unknown>+0x29]
****  END  ****
Comment 14 Marta Löfstedt 2018-03-15 11:09:49 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3932/shard-kbl6/igt@perf_pmu@busy-accuracy-50-vcs1.html

(perf_pmu:3688) CRITICAL: Test assertion failure function accuracy, file ../tests/perf_pmu.c:1606:
(perf_pmu:3688) CRITICAL: Failed assertion: (double)(100.0 * busy_r) <= ((double)(100.0 * expected) + (2)) && (double)(100.0 * busy_r) >= ((double)(100.0 * expected) - (2))
(perf_pmu:3688) CRITICAL: Last errno: 2, No such file or directory
(perf_pmu:3688) CRITICAL: 47.999050 not within +2.000000/-2.000000 of 49.999746! ('100.0 * busy_r' vs '100.0 * expected')
Subtest busy-accuracy-50-vcs1 failed.
Comment 16 Tvrtko Ursulin 2018-04-03 10:45:23 UTC
Some hope of https://patchwork.freedesktop.org/series/40662/ could improve these ones, just need to re-spin it to include fewer of the proposed changes.
Comment 18 Chris Wilson 2018-05-11 21:21:39 UTC
commit d502f055ac4500cada758876a512ac4f14b34851
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date:   Wed Apr 4 10:51:52 2018 +0100

    tests/perf_pmu: Avoid RT thread for accuracy test
    
    Realtime scheduling interferes with execlists submission (tasklet) so try
    to simplify the PWM loop in a few ways:
    
     * Drop RT.
     * Longer batches for smaller systematic error.
     * More truthful test duration calculation.
     * Less clock queries.
     * No self-adjust - instead just report the achieved cycle and let the
       parent check against it.
     * Report absolute cycle error.
Comment 19 Martin Peres 2018-05-22 20:57:01 UTC
(In reply to Chris Wilson from comment #18)
> commit d502f055ac4500cada758876a512ac4f14b34851
> Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Date:   Wed Apr 4 10:51:52 2018 +0100
> 
>     tests/perf_pmu: Avoid RT thread for accuracy test
>     
>     Realtime scheduling interferes with execlists submission (tasklet) so try
>     to simplify the PWM loop in a few ways:
>     
>      * Drop RT.
>      * Longer batches for smaller systematic error.
>      * More truthful test duration calculation.
>      * Less clock queries.
>      * No self-adjust - instead just report the achieved cycle and let the
>        parent check against it.
>      * Report absolute cycle error.

Still visible every single run...
Comment 20 Martin Peres 2018-05-28 16:56:35 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_43/fi-bxt-j4205/igt@perf_pmu@busy-accuracy-50-rcs0.html

(perf_pmu:1698) CRITICAL: Test assertion failure function accuracy, file ../tests/perf_pmu.c:1651:
(perf_pmu:1698) CRITICAL: Failed assertion: (double)(100.0 * busy_r) <= ((double)(100.0 * expected) + (2)) && (double)(100.0 * busy_r) >= ((double)(100.0 * expected) - (2))
(perf_pmu:1698) CRITICAL: Last errno: 2, No such file or directory
(perf_pmu:1698) CRITICAL: 51.924130 not within +2.000000/-2.000000 of 49.911359! ('100.0 * busy_r' vs '100.0 * expected')
Subtest busy-accuracy-50-rcs0 failed.


https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_43/fi-kbl-guc/igt@perf_pmu@busy-accuracy-50-vecs0.html

(perf_pmu:1262) CRITICAL: Test assertion failure function accuracy, file ../tests/perf_pmu.c:1651:
(perf_pmu:1262) CRITICAL: Failed assertion: (double)(100.0 * busy_r) <= ((double)(100.0 * expected) + (2)) && (double)(100.0 * busy_r) >= ((double)(100.0 * expected) - (2))
(perf_pmu:1262) CRITICAL: Last errno: 9, Bad file descriptor
(perf_pmu:1262) CRITICAL: 54.022868 not within +2.000000/-2.000000 of 49.843047! ('100.0 * busy_r' vs '100.0 * expected')
Subtest busy-accuracy-50-vecs0 failed.
Comment 21 Martin Peres 2018-07-16 13:08:32 UTC
Also seen on GLK: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4468/shard-glk6/igt@perf_pmu@busy-accuracy-50-rcs0.html

(perf_pmu:1295) CRITICAL: Test assertion failure function accuracy, file ../tests/perf_pmu.c:1652:
(perf_pmu:1295) CRITICAL: Failed assertion: (double)(100.0 * busy_r) <= ((double)(100.0 * expected) + (2)) && (double)(100.0 * busy_r) >= ((double)(100.0 * expected) - (2))
(perf_pmu:1295) CRITICAL: Last errno: 2, No such file or directory
(perf_pmu:1295) CRITICAL: 52.936172 not within +2.000000/-2.000000 of 50.592464! ('100.0 * busy_r' vs '100.0 * expected')
Subtest busy-accuracy-50-rcs0 failed.
Comment 22 Martin Peres 2018-08-29 09:56:28 UTC
Also seen on BSW: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-bsw-kefka/igt@perf_pmu@busy-accuracy-50-rcs0.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-bsw-kefka/igt@perf_pmu@busy-accuracy-50-bcs0.html

(perf_pmu:1270) CRITICAL: Test assertion failure function accuracy, file ../tests/perf_pmu.c:1655:
(perf_pmu:1270) CRITICAL: Failed assertion: (double)(100.0 * busy_r) <= ((double)(100.0 * expected) + (2)) && (double)(100.0 * busy_r) >= ((double)(100.0 * expected) - (2))
(perf_pmu:1270) CRITICAL: Last errno: 2, No such file or directory
(perf_pmu:1270) CRITICAL: 52.409599 not within +2.000000/-2.000000 of 49.659115! ('100.0 * busy_r' vs '100.0 * expected')
Subtest busy-accuracy-50-bcs0 failed.
Comment 23 Chris Wilson 2018-08-31 13:48:15 UTC
Once more into the breach,

commit 1754cbd35005605a80b06d808b4f891555a151cd
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Aug 8 14:20:26 2018 +0100

    igt/perf_pmu: Aim for a fixed number of iterations for calibrating accuracy
    
    Our observation is that the systematic error is proportional to the
    number of iterations we perform; the suspicion is that it directly
    correlates with the number of sleeps. Reduce the number of iterations,
    to try and keep the error in check.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Comment 24 Martin Peres 2018-09-07 17:38:28 UTC
(In reply to Chris Wilson from comment #23)
> Once more into the breach,
> 
> commit 1754cbd35005605a80b06d808b4f891555a151cd
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed Aug 8 14:20:26 2018 +0100
> 
>     igt/perf_pmu: Aim for a fixed number of iterations for calibrating
> accuracy
>     
>     Our observation is that the systematic error is proportional to the
>     number of iterations we perform; the suspicion is that it directly
>     correlates with the number of sleeps. Reduce the number of iterations,
>     to try and keep the error in check.
>     
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>     Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

That finally fixed it! Thanks :)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.