Bug 102252

Summary: [CI][HSW] igt@perf@[blocking|polling] fails
Product: DRI Reporter: Martin Peres <martin.peres>
Component: IGTAssignee: Lionel Landwerlin <lionel.g.landwerlin>
Status: CLOSED WONTFIX QA Contact:
Severity: critical    
Priority: highest CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: HSW i915 features: Perf/OA

Description Martin Peres 2017-08-16 12:13:29 UTC
The test igt@perf@blocking fails the following assert:

(perf:1521) CRITICAL: Test assertion failure function test_blocking, file perf.c:1831:
(perf:1521) CRITICAL: Failed assertion: n <= (max_iterations + n_extra_iterations)

Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw4/igt@perf@blocking.html
Comment 1 Martin Peres 2017-08-16 12:15:08 UTC
Need to create a new section for OA bugs. Will try to discuss that with the main dev.
Comment 2 Lionel Landwerlin 2017-08-16 17:40:38 UTC
I have this series to help out with the flakyness of this test :
https://patchwork.freedesktop.org/series/28373/

Not landed yet, hopefully soon!
Comment 3 Martin Peres 2017-08-17 07:01:35 UTC
This is also visible on the following test: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2970/shard-hsw3/igt@perf@polling.html
Comment 4 Jari Tahvanainen 2017-09-04 08:15:39 UTC
Tested-by - the series 28373 seem to improve the situation on my dev-skl-i5-6600k having 

Without series (IGT-Version: 1.19-g5ce65a9a):
Subtest oa-exponents: FAIL (0,214s)
Subtest per-context-mode-unprivileged: FAIL (0,004s)
Subtest polling: FAIL (10,032s)
Subtest short-reads: FAIL (0,001s)
Subtest mi-rpc: FAIL (0,001s)
Subtest rc6-disable: FAIL (0,001s)
Subtest create-destroy-userspace-config: FAIL (0,003s)

With series 28373 applied on above:
Subtest i915-ref-count: SUCCESS (0,043s)
Subtest sysctl-defaults: SUCCESS (0,000s)
Subtest non-system-wide-paranoid: SUCCESS (0,015s)
Subtest invalid-open-flags: SUCCESS (0,000s)
Subtest invalid-oa-metric-set-id: SUCCESS (0,007s)
Subtest invalid-oa-format-id: SUCCESS (0,008s)
Subtest missing-sample-flags: SUCCESS (0,000s)
Subtest oa-formats: SUCCESS (0,073s)
Subtest invalid-oa-exponent: SUCCESS (0,007s)
Subtest low-oa-exponent-permissions: SUCCESS (0,015s)
Subtest oa-exponents: SUCCESS (15,035s)
Test requirement not met in function __real_main4515, file perf.c:4580:
Test requirement: IS_HASWELL(devid)
Subtest per-context-mode-unprivileged: SKIP (0,000s)
Subtest buffer-fill: SUCCESS (1,734s)
Subtest disabled-read-error: SUCCESS (0,037s)
Subtest non-sampling-read-error: SUCCESS (0,007s)
Subtest enable-disable: SUCCESS (1,730s)
Subtest blocking: SUCCESS (10,022s)
Subtest polling: SUCCESS (10,010s)
Subtest short-reads: SUCCESS (0,020s)
Subtest mi-rpc: SUCCESS (0,009s)
Test requirement not met in function __real_main4515, file perf.c:4608:
Test requirement: IS_HASWELL(devid)
Subtest unprivileged-single-ctx-counters: SKIP (0,000s)
Subtest gen8-unprivileged-single-ctx-counters: SUCCESS (0,027s)
Subtest rc6-disable: SUCCESS (1,510s)
Subtest invalid-create-userspace-config: SUCCESS (0,000s)
Subtest invalid-remove-userspace-config: SUCCESS (0,007s)
Subtest create-destroy-userspace-config: SUCCESS (0,022s)
Subtest whitelisted-registers-userspace-config: SUCCESS (0,000s)

For HSW, APL, KBL see shards results https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_132/shards.html
E.g. HSW has Test perf:
        Subgroup polling:
                fail       -> PASS       (shard-hsw) fdo#102252
        Subgroup oa-exponents:
                fail       -> PASS       (shard-hsw) fdo#102254
Comment 5 Jari Tahvanainen 2017-09-04 08:17:17 UTC
Changing component to IGT since fault is in intel-gpu-tools git.
Comment 6 Lionel Landwerlin 2017-10-04 12:52:18 UTC
Just pushed a series that should make this failure go away.
Feel free to reopen if needed.
The 2 commits related to this issue :

https://cgit.freedesktop.org/drm/igt-gpu-tools/commit/?id=eafaf4fb49ba7a02c11def787b5de2a14de532f2

https://cgit.freedesktop.org/drm/igt-gpu-tools/commit/?id=f1514a6320f65a1524f36407f7f22d6fc7c7679e
Comment 7 Marta Löfstedt 2017-10-05 12:03:08 UTC
The issue is not reproduced from:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3175
Comment 8 Marta Löfstedt 2017-10-06 11:38:47 UTC
Unfortunately this has been reproduced:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3183/shard-hsw6/igt@perf@polling.html
Comment 9 Daniel Vetter 2017-11-08 13:29:56 UTC
I think perf/OA on hsw has some hw limitations that we try to work around, but fundamentally it's not possible. So even with best effort, everyone once in a while this will go wrong.

Note: This _only_ applies to hsw. I think later platforms should work better.
Comment 10 Lionel Landwerlin 2017-11-08 13:47:43 UTC
Not sure what's wrong with HSW. It seems to be just off by 1, so it could be that the test doesn't account for an edge case scenario.
Unfortunately I haven't had time to investigate HSW so far :(
Comment 11 Marta Löfstedt 2018-04-11 09:00:11 UTC
I bow to jani nikkula and set this bug to closed/wontfix

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.