The test igt@perf@oa-exponents hits the following assert: (perf:1635) CRITICAL: Test assertion failure function read_2_oa_reports, file perf.c:1201: (perf:1635) CRITICAL: Failed assertion: !"reached" Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw5/igt@perf@oa-exponents.html
I have this series to help out with the flakyness of this test : https://patchwork.freedesktop.org/series/28373/ Not landed yet, hopefully soon!
*** Bug 102421 has been marked as a duplicate of this bug. ***
Tested-by - the series 28373 seem to improve the situation on my dev-skl-i5-6600k having Without series (IGT-Version: 1.19-g5ce65a9a): Subtest oa-exponents: FAIL (0,214s) Subtest per-context-mode-unprivileged: FAIL (0,004s) Subtest polling: FAIL (10,032s) Subtest short-reads: FAIL (0,001s) Subtest mi-rpc: FAIL (0,001s) Subtest rc6-disable: FAIL (0,001s) Subtest create-destroy-userspace-config: FAIL (0,003s) With series 28373 applied on above: Subtest i915-ref-count: SUCCESS (0,043s) Subtest sysctl-defaults: SUCCESS (0,000s) Subtest non-system-wide-paranoid: SUCCESS (0,015s) Subtest invalid-open-flags: SUCCESS (0,000s) Subtest invalid-oa-metric-set-id: SUCCESS (0,007s) Subtest invalid-oa-format-id: SUCCESS (0,008s) Subtest missing-sample-flags: SUCCESS (0,000s) Subtest oa-formats: SUCCESS (0,073s) Subtest invalid-oa-exponent: SUCCESS (0,007s) Subtest low-oa-exponent-permissions: SUCCESS (0,015s) Subtest oa-exponents: SUCCESS (15,035s) Test requirement not met in function __real_main4515, file perf.c:4580: Test requirement: IS_HASWELL(devid) Subtest per-context-mode-unprivileged: SKIP (0,000s) Subtest buffer-fill: SUCCESS (1,734s) Subtest disabled-read-error: SUCCESS (0,037s) Subtest non-sampling-read-error: SUCCESS (0,007s) Subtest enable-disable: SUCCESS (1,730s) Subtest blocking: SUCCESS (10,022s) Subtest polling: SUCCESS (10,010s) Subtest short-reads: SUCCESS (0,020s) Subtest mi-rpc: SUCCESS (0,009s) Test requirement not met in function __real_main4515, file perf.c:4608: Test requirement: IS_HASWELL(devid) Subtest unprivileged-single-ctx-counters: SKIP (0,000s) Subtest gen8-unprivileged-single-ctx-counters: SUCCESS (0,027s) Subtest rc6-disable: SUCCESS (1,510s) Subtest invalid-create-userspace-config: SUCCESS (0,000s) Subtest invalid-remove-userspace-config: SUCCESS (0,007s) Subtest create-destroy-userspace-config: SUCCESS (0,022s) Subtest whitelisted-registers-userspace-config: SUCCESS (0,000s) For HSW, APL, KBL see shards results https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_132/shards.html E.g. HSW has Test perf: Subgroup polling: fail -> PASS (shard-hsw) fdo#102252 Subgroup oa-exponents: fail -> PASS (shard-hsw) fdo#102254
Changing component to IGT since fix identified in intel-gpu-tools git.
Should be fixed by https://cgit.freedesktop.org/drm/igt-gpu-tools/commit/?id=f1514a6320f65a1524f36407f7f22d6fc7c7679e
The issue is not reproduced since: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3172 on HSW-shards
Issue is reproduced on APL-shards: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3190/shard-apl3/igt@perf@oa-exponents.html
Also, on https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3249/shard-hsw4/igt@perf@oa-exponents.html
Also, on GLK: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3326/shard-glkb3/igt@perf@oa-exponents.html
I've been thinking about this a bit. Since https://patchwork.freedesktop.org/patch/180544/ seemed to have fix the problem on big cores, we probably have a power management issue on the atoms...
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3335/shard-kbl6/igt@perf@oa-exponents.html
I noted that this test quite frequently also fail like this: (perf:1560) CRITICAL: Test assertion failure function test_oa_exponents, file perf.c:1922: (perf:1560) CRITICAL: Failed assertion: n_reports == (sizeof(reports)/sizeof(reports[0])) (perf:1560) CRITICAL: error: 10 != 30 Subtest oa-exponents failed. for example: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3602/shard-apl5/igt@perf@oa-exponents.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3600/shard-glkb4/igt@perf@oa-exponents.html
I've got a rewrite for that test that seems a lot better than the current test : https://patchwork.freedesktop.org/series/38372/ It completes within a fraction of the time (~1s vs ~10s) and seems a lot more reliable.
Note, this test has since ~CI_DRM_3817 quite often ended with incomplete owatch timeout on APL- and KBL-sahrds Here are some examples: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3827/shard-kbl2/igt@perf@oa-exponents.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3817/shard-apl1/igt@perf@oa-exponents.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3819/shard-apl2/igt@perf@oa-exponents.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3822/shard-apl5/igt@perf@oa-exponents.html
Are those tests manually stopped or is the machine hanging? : https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3827/shard-kbl2/pstore6-1519347265_Oops_1.log
(In reply to Lionel Landwerlin from comment #15) > Are those tests manually stopped or is the machine hanging? : > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3827/shard-kbl2/pstore6- > 1519347265_Oops_1.log Lionel, the owatch will trigger kernel softdog after 370 seconds of "inactivity"
Lionel owatch is part of ezbenche: https://cgit.freedesktop.org/ezbench
Thanks. I'm also trying to understand why the enable-disable subtest fails with 0 reports from time to time. I think this might be the same issue. The current theory is, after a context switch to the preempt context, the value stored in the OACONTROL register is messed up and so the OA unit doesn't output reports anymore.
With commit 41d3fdcd15d5ecf29cc73e8b79c2327ebb54b960 Author: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Date: Thu Mar 1 11:06:13 2018 +0000 drm/i915/perf: fix perf stream opening lock landed, I really hope this is finally fixed for good.
Patch integrated to CI_DRM_3860 so far last softdog was at https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3859/shard-apl8/igt@perf@oa-exponents.html I monitor this over the weekend
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.