Bug 104013 - [CI] igt@perf_pmu@semaphore-wait-[vcs0|vecs0|bcs0] - Failed assertion: (double)(val[1] - val[0]) <= (1.0 + (tolerance)) * (double)(slept) && (double)(val[1] - val[0]) >= (1.0 - (tolerance)) * (double)(slept)
Summary: [CI] igt@perf_pmu@semaphore-wait-[vcs0|vecs0|bcs0] - Failed assertion: (doubl...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Marta Löfstedt
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-01 09:15 UTC by Marta Löfstedt
Modified: 2018-01-12 08:24 UTC (History)
1 user (show)

See Also:
i915 platform: BXT, GLK
i915 features: Perf/OA


Attachments

Description Marta Löfstedt 2017-12-01 09:15:36 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4028/shard-apl6/igt@perf_pmu@semaphore-wait-vcs0.html


(perf_pmu:11040) CRITICAL: Test assertion failure function sema_wait, file perf_pmu.c:433:
(perf_pmu:11040) CRITICAL: Failed assertion: (double)(val[1] - val[0]) <= (1.0 + (tolerance)) * (double)(slept) && (double)(val[1] - val[0]) >= (1.0 - (tolerance)) * (double)(slept)
(perf_pmu:11040) CRITICAL: 'val[1] - val[0]' != 'slept' (455000000.000000 not within 5.000000% tolerance of 500107097.000000)
Subtest semaphore-wait-vcs0 failed.
Comment 1 Marta Löfstedt 2017-12-04 07:36:43 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3438/shard-apl1/igt@perf_pmu@semaphore-wait-bcs0.html

	
(perf_pmu:2639) CRITICAL: Test assertion failure function sema_wait, file perf_pmu.c:433:
(perf_pmu:2639) CRITICAL: Failed assertion: (double)(val[1] - val[0]) <= (1.0 + (tolerance)) * (double)(slept) && (double)(val[1] - val[0]) >= (1.0 - (tolerance)) * (double)(slept)
(perf_pmu:2639) CRITICAL: 'val[1] - val[0]' != 'slept' (450000000.000000 not within 5.000000% tolerance of 500083573.000000)
Subtest semaphore-wait-bcs0 failed.
Comment 2 Marta Löfstedt 2017-12-04 08:10:17 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3445/shard-glkb6/igt@perf_pmu@semaphore-wait-vecs0.html

(perf_pmu:3452) CRITICAL: Test assertion failure function sema_wait, file perf_pmu.c:433:
(perf_pmu:3452) CRITICAL: Failed assertion: (double)(val[1] - val[0]) <= (1.0 + (tolerance)) * (double)(slept) && (double)(val[1] - val[0]) >= (1.0 - (tolerance)) * (double)(slept)
(perf_pmu:3452) CRITICAL: 'val[1] - val[0]' != 'slept' (450000000.000000 not within 5.000000% tolerance of 500341224.000000)
Subtest semaphore-wait-vecs0 failed.
Comment 3 Marta Löfstedt 2017-12-04 08:10:40 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3439/shard-apl5/igt@perf_pmu@semaphore-wait-vecs0.html

(perf_pmu:9937) CRITICAL: Test assertion failure function sema_wait, file perf_pmu.c:433:
(perf_pmu:9937) CRITICAL: Failed assertion: (double)(val[1] - val[0]) <= (1.0 + (tolerance)) * (double)(slept) && (double)(val[1] - val[0]) >= (1.0 - (tolerance)) * (double)(slept)
(perf_pmu:9937) CRITICAL: 'val[1] - val[0]' != 'slept' (450000000.000000 not within 5.000000% tolerance of 500105128.000000)
Subtest semaphore-wait-vecs0 failed.
Comment 4 Jani Saarinen 2017-12-04 10:28:30 UTC
reference: https://patchwork.freedesktop.org/series/34818/
Comment 5 Chris Wilson 2017-12-04 21:40:34 UTC
commit c7b20c950276a41badd994324e1983760e44842b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Dec 2 00:14:52 2017 +0000

    igt/perf_pmu: Tighten semaphore-wait measurement
    
    Record the before/after semaphore-wait values around the sleep to try to
    reduce the inaccuracy from scheduler delays. Previously, the samples
    were taken before submitting the batch and then after synchronising its
    completion. The measurement will then be the total that the semaphore
    was being sampled, but with the extra syscalls intervening may have
    drifted from the sleep duration. To further reduce the disparity, wait
    for the batch to start executing before taking our samples.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=104013
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Once again, hopefully the last tweak required.
Comment 6 Marta Löfstedt 2017-12-05 08:47:29 UTC
Fix included in CI_DRM_3451, this has been quite flip/flippy I'll let it simmer for a while.
Comment 7 Marta Löfstedt 2017-12-07 09:11:19 UTC
The fix doesn't appear to work:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3463/shard-glkb6/igt@perf_pmu@semaphore-wait-rcs0.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3462/shard-glkb2/igt@perf_pmu@semaphore-wait-rcs0.html

	

(perf_pmu:1948) CRITICAL: Test assertion failure function sema_wait, file perf_pmu.c:443:
(perf_pmu:1948) CRITICAL: Failed assertion: (double)(val[1] - val[0]) <= (1.0 + (tolerance)) * (double)(slept) && (double)(val[1] - val[0]) >= (1.0 - (tolerance)) * (double)(slept)
(perf_pmu:1948) CRITICAL: 'val[1] - val[0]' != 'slept' (450000000.000000 not within 5.000000% tolerance of 500397783.000000)
Subtest semaphore-wait-rcs0 failed.
Comment 8 Chris Wilson 2018-01-11 17:39:08 UTC
commit 4900727d35bb20028f9bd83146ec4bf78afffe30
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jan 11 07:30:31 2018 +0000

    drm/i915/pmu: Reconstruct active state on starting busy-stats
    
    We have a hole in our busy-stat accounting if the pmu is enabled during
    a long running batch, the pmu will not start accumulating busy-time
    until the next context switch. This then fails tests that are only
    sampling a single batch.
    
    v2: Count each active port just once (context in/out events are only on
    the first and last assignment to a port).
    v3: Avoid hardcoding knowledge of 2 submission ports
    
    Fixes: 30e17b7847f5 ("drm/i915: Engine busy time tracking")
    Testcase: igt/perf_pmu/busy-start
    Testcase: igt/perf_pmu/busy-double-start
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180111073031.14614-1-chris@chris-wilson.co.uk
Comment 9 Marta Löfstedt 2018-01-12 08:24:04 UTC
Fix integrated in CI_DRM_3622 tests are green.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.