Bug 106539

Summary: [CI] IGT suspend issue: CRITICAL: Failed assertion: wait_for_suspended() | igt_wait_for_pm_status(IGT_RUNTIME_PM_STATUS_SUSPENDED)
Product: DRI Reporter: Tomi Sarvela <tomi.p.sarvela>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED WORKSFORME QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard: ReadyForDev
i915 platform: CNL, GLK, HSW, KBL, SKL i915 features: power/runtime PM
Attachments:
Description Flags
dmesg hsw8 igt@pm_rpm@modeset-non-lpsp-stress none

Description Tomi Sarvela 2018-05-16 07:53:01 UTC
Starting from recent DRM-Tip 4.17-rc5 pull, one of GLKs has seen suspend issues with following IGT tests:

igt@pm_rpm@fences:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4179/shard-glk8/igt@pm_rpm@fences.html

igt@pm_rpm@dpms-mode-unset-non-lpsp:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4179/shard-glk8/igt@pm_rpm@dpms-mode-unset-non-lpsp.html

igt@kms_vblank@pipe-b-ts-continuation-dpms-rpm:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4179/shard-glk8/igt@kms_vblank@pipe-b-ts-continuation-dpms-rpm.html

igt@drv_suspend@fence-restore-untiled:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4179/shard-glk8/igt@drv_suspend@fence-restore-untiled.html
Comment 1 Martin Peres 2018-05-29 08:03:40 UTC
Also seen on CNL: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_49/fi-cnl-psr/igt@pm_rpm@legacy-planes.html

(pm_rpm:1701) CRITICAL: Test assertion failure function test_one_plane, file ../tests/pm_rpm.c:1572:
(pm_rpm:1701) CRITICAL: Failed assertion: wait_for_suspended()
Subtest legacy-planes failed.
Comment 3 Martin Peres 2018-07-17 12:40:18 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_78/fi-skl-iommu/igt@pm_rpm@system-suspend-execbuf.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_81/fi-skl-iommu/igt@pm_rpm@system-suspend.html

(pm_rpm:1796) CRITICAL: Test assertion failure function system_suspend_subtest, file ../tests/pm_rpm.c:1368:
(pm_rpm:1796) CRITICAL: Failed assertion: wait_for_suspended()
Subtest system-suspend failed.
Comment 4 Martin Peres 2018-07-27 13:34:16 UTC
Also seen on KBL: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4542/shard-kbl2/igt@pm_rpm@drm-resources-equal.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4544/shard-kbl3/igt@pm_rpm@drm-resources-equal.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4543/shard-kbl4/igt@pm_rpm@drm-resources-equal.html

(pm_rpm:2770) CRITICAL: Test assertion failure function drm_resources_equal_subtest, file ../tests/pm_rpm.c:801:
(pm_rpm:2770) CRITICAL: Failed assertion: wait_for_suspended()
Subtest drm-resources-equal failed.

It was visible for 5 runs in a row, then went back to working. It may be a separate issue from this bug, but it is impossible to tell.
Comment 5 Tomi Sarvela 2018-08-09 07:52:55 UTC
Created attachment 141023 [details]
dmesg hsw8 igt@pm_rpm@modeset-non-lpsp-stress
Comment 6 Tomi Sarvela 2018-08-09 07:54:39 UTC
This pm_rpm issue has shown test order dependency, so I took a bad shard-testlist shards/x0002 from IGT_4588. This gave the following behaviour after fresh boot:

running: igt/pm_rpm/modeset-non-lpsp-stress
pass: igt/pm_rpm/modeset-non-lpsp-stress   
[1/1] pass: 1  
Thank you for running Piglit!
Results have been written to /home/testrunner/results
running: igt/drv_module_reload/basic-reload-inject
pass: igt/drv_module_reload/basic-reload-inject   
[1/1] pass: 1  
Thank you for running Piglit!
Results have been written to /home/testrunner/results
running: igt/pm_rpm/modeset-non-lpsp-stress
fail: igt/pm_rpm/modeset-non-lpsp-stress   
[1/1] fail: 1  
Thank you for running Piglit!
Results have been written to /home/testrunner/results

Full dmesg from this single run attached.
Comment 7 Tomi Sarvela 2018-08-10 08:52:33 UTC
I ran through full shard testlist with igt@pm_rpm@modeset-non-lpsp-stress interleaved in. Only igt@drv_module_reload@basic-reload-inject leaking context to pm_rpm test.
Comment 8 Chris Wilson 2018-08-10 10:06:17 UTC
commit fce9638b2e60afce872b3056c19a729b1b3708be (HEAD, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Aug 10 10:01:08 2018 +0100

    intel-ci: Skip module reloads
    
    Reloading the module may impact subsequent tests by destabilising the
    system. As we do for BAT, if we want to test reloads, it should be
    handled explicitly at the end of the run, rather than placed at random
    in the middle of the test list.
    
    v2: Commentary
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=106539
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tomi Sarvela <tomi.p.sarvela@intel.com>
    Acked-by: Petri Latvala <petri.latvala@intel.com>

for temporary relief.
Comment 9 Chris Wilson 2018-08-10 15:14:50 UTC
Isolated the problem to drv_module_reload/inject
Comment 10 Lakshmi 2018-10-11 19:59:52 UTC
Last seen this issue 2 months ago or 835 rounds ago. It used appear for every two rounds. Closing this bug as works for me.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.