Bug 110728

Summary: [CI][SHARDS] igt@perf@(blocking|polling) - fail - Failed assertion: kernel_ns <= (test_duration_ns / 100ull)
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED MOVED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: ICL, KBL, SKL i915 features: Perf/OA

Description Martin Peres 2019-05-22 14:01:04 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5000/shard-iclb8/igt@perf@blocking.html
	
Starting subtest: blocking
(perf:1571) CRITICAL: Test assertion failure function test_blocking, file ../tests/perf.c:2127:
(perf:1571) CRITICAL: Failed assertion: kernel_ns <= (test_duration_ns / 100ull)
Comment 1 CI Bug Log 2019-05-22 14:01:42 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: igt@perf@(blocking|polling) - fail - Failed assertion: kernel_ns &lt;= (test_duration_ns / 100ull)
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4275/shard-iclb2/igt@perf@blocking.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6087/shard-iclb3/igt@perf@blocking.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4298/shard-iclb2/igt@perf@blocking.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5000/shard-iclb7/igt@perf@polling.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5000/shard-iclb8/igt@perf@blocking.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6112/shard-iclb7/igt@perf@blocking.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13055/shard-iclb8/igt@perf@blocking.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6114/shard-iclb4/igt@perf@polling.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6114/shard-iclb7/igt@perf@blocking.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13056/shard-iclb7/igt@perf@blocking.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13058/shard-iclb2/igt@perf@polling.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13058/shard-iclb3/igt@perf@blocking.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13059/shard-iclb2/igt@perf@blocking.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13059/shard-iclb7/igt@perf@polling.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13062/shard-iclb2/igt@perf@polling.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13064/shard-iclb6/igt@perf@blocking.html
Comment 2 CI Bug Log 2019-06-03 12:02:44 UTC
A CI Bug Log filter associated to this bug has been updated:

{- ICL: igt@perf@(blocking|polling) - fail - Failed assertion: kernel_ns &lt;= (test_duration_ns / 100ull) -}
{+ SKL ICL: igt@perf@(blocking|polling) - fail - Failed assertion: kernel_ns &lt;= (test_duration_ns / 100ull) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6176/shard-skl6/igt@perf@blocking.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6179/shard-skl1/igt@perf@polling.html
Comment 3 emersion 2019-06-10 11:44:02 UTC
In [1], 170ms (±10ms) are spent blocking in the kernel. This is more than 1% of the test duration (10s = 10000ms, time spent in kernel is 1.7%). In other logs, similar issues can be observed.

This means that we're burning too many CPU cycles in the kernel when reading from Observation Architecture.

User impact: some performance issues when measuring performance.

I suspect these performance issues only happen when actively using OA, so most users shouldn't be affected, but I'm not familiar with OA, so don't quote me on this.

[1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6176/shard-skl6/igt@perf@blocking.html
Comment 4 emersion 2019-06-10 12:06:06 UTC
Reproduction rate: 3%

Only ICL and SKL is affected. The bug is hit daily.
Comment 5 emersion 2019-06-24 08:28:55 UTC
Posted a patch to improve logging: https://patchwork.freedesktop.org/patch/313695/
Comment 6 emersion 2019-06-24 08:29:45 UTC
(In reply to emersion from comment #5)
> Posted a patch to improve logging:
> https://patchwork.freedesktop.org/patch/313695/

Erf, please disregard that, it's unrelated.
Comment 7 emersion 2019-08-16 12:51:50 UTC
Issue is still hit daily (every 5 hours or so). However it seems like ICL is not hit anymore: last ICL hit was 1 month, 3 weeks ago. So now this seems to be SKL-specific.
Comment 8 CI Bug Log 2019-09-02 09:33:25 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SKL ICL: igt@perf@(blocking|polling) - fail - Failed assertion: kernel_ns &lt;= (test_duration_ns / 100ull) -}
{+ SKL KBL ICL: igt@perf@(blocking|polling) - fail - Failed assertion: kernel_ns &lt;= (test_duration_ns / 100ull) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_354/fi-kbl-soraka/igt@perf@polling.html
Comment 9 umesh 2019-10-21 18:47:02 UTC
The test itself runs for 10 seconds. In this time, I see these extra logs in the failed dmesg compared to the succeeding cases.

<7> [710.600658] [drm:intel_power_well_disable [i915]] disabling DC off
<7> [710.601387] [drm:skl_enable_dc6 [i915]] Enabling DC6
<7> [710.601896] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 02

Issue is still hit daily. Last report was 1 week 3 days ago. Importance stays high.

Impact: Needs further root cause before impact can be assessed.
Comment 10 Martin Peres 2019-11-29 19:07:49 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/297.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.