Summary: | [CI][BAT] igt@pm_rpm@(module-reload|basic-rte) - fail - Failed assertion: setup_environment() | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> | ||||
Component: | DRM/Intel | Assignee: | Vanshidhar Konda <vanshidhar.r.konda> | ||||
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Severity: | normal | ||||||
Priority: | high | CC: | chris, ida.jankowska, intel-gfx-bugs, sudeep.dutt | ||||
Version: | XOrg git | ||||||
Hardware: | Other | ||||||
OS: | All | ||||||
Whiteboard: | ReadyForDev | ||||||
i915 platform: | BSW/CHT, BXT, BYT, CNL, SKL | i915 features: | power/runtime PM | ||||
Attachments: |
|
Description
Martin Peres
2018-11-19 16:45:49 UTC
Also seen on SKL: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5186_157/fi-skl-6600u/igt@pm_rpm@basic-rte.html Starting subtest: basic-rte (pm_rpm:3173) CRITICAL: Test assertion failure function main, file ../tests/pm_rpm.c:1948: (pm_rpm:3173) CRITICAL: Failed assertion: setup_environment() Subtest basic-rte failed. Also seen on BSW and BYT: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5322/fi-byt-j1900/igt@pm_rpm@basic-rte.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5321/fi-bsw-cyan/igt@pm_rpm@basic-rte.html Starting subtest: basic-rte (pm_rpm:2797) CRITICAL: Test assertion failure function main, file ../tests/pm_rpm.c:1967: (pm_rpm:2797) CRITICAL: Failed assertion: setup_environment() Subtest basic-rte failed. Failed to suspend the device when idle, which leads to consume more battery than expected. Setting the priority to High based on the impact of this bug. This test seems to fail mostly on the BSW and BYT systems. Here's the observations I've made so far: 1) The kernel device usage count for suspending the device is higher (+1) in the failed execution of the test than the successful execution. 2) When the test disables all screens (through modeset) power-wells in the display domain are being turned off - except the display and always-on power-wells. 3) In the failed case, there is a difference in the wakeref acquire/release log. The following wakerefs don't happen for the successful executions. Wakeref last acquired: intel_display_power_get+0x18/0x50 [i915] intel_power_domains_init_hw+0x90/0x500 [i915] intel_power_domains_resume+0x3d/0x70 [i915] i915_pm_resume_early+0x9d/0x130 [i915] dpm_run_callback+0x64/0x280 device_resume_early+0xa6/0xe0 async_resume_early+0x14/0x40 async_run_entry_fn+0x34/0x160 Wakeref last released: i915_drm_suspend_late+0xad/0x120 [i915] dpm_run_callback+0x64/0x280 __device_suspend_late+0xad/0x140 async_suspend_late+0x15/0x90 async_run_entry_fn+0x34/0x160 process_one_work+0x245/0x610 worker_thread+0x37/0x380 kthread+0x119/0x130 Possible reasons for failure: The display and always-on power-wells have a higher reference count than the rest of the power-wells in the display domain. Next steps: 1) Confirm that the reference count on display and always-on power-wells is different between test fail/success case. 2) If confirmed, try to figure out the reason for extra reference count on the power-wells in question. (In reply to Vanshidhar Konda from comment #4) > This test seems to fail mostly on the BSW and BYT systems. Here's the > observations I've made so far: > > 1) The kernel device usage count for suspending the device is higher (+1) in > the failed execution of the test than the successful execution. > > 2) When the test disables all screens (through modeset) power-wells in the > display domain are being turned off - except the display and always-on > power-wells. > > 3) In the failed case, there is a difference in the wakeref acquire/release > log. The following wakerefs don't happen for the successful executions. > Wakeref last acquired: > intel_display_power_get+0x18/0x50 [i915] > intel_power_domains_init_hw+0x90/0x500 [i915] > intel_power_domains_resume+0x3d/0x70 [i915] > i915_pm_resume_early+0x9d/0x130 [i915] > dpm_run_callback+0x64/0x280 > device_resume_early+0xa6/0xe0 > async_resume_early+0x14/0x40 > async_run_entry_fn+0x34/0x160 > Wakeref last released: > i915_drm_suspend_late+0xad/0x120 [i915] > dpm_run_callback+0x64/0x280 > __device_suspend_late+0xad/0x140 > async_suspend_late+0x15/0x90 > async_run_entry_fn+0x34/0x160 > process_one_work+0x245/0x610 > worker_thread+0x37/0x380 > kthread+0x119/0x130 > > Possible reasons for failure: > The display and always-on power-wells have a higher reference count than the > rest of the power-wells in the display domain. > > Next steps: > 1) Confirm that the reference count on display and always-on power-wells is > different between test fail/success case. > 2) If confirmed, try to figure out the reason for extra reference count on > the power-wells in question. This could be due to the audio driver not suspending. Could you provide the contents of /sys/kernel/debug/dri/0/i915_power_domain_info after all screen gets disabled (with runtime PM enabled for both the i915 and the audio driver)? Note that you could get the equivalent info by running the test with https://patchwork.freedesktop.org/patch/290262/?series=57526&rev=1 Is there something that (In reply to Imre Deak from comment #5) > (In reply to Vanshidhar Konda from comment #4) > > This test seems to fail mostly on the BSW and BYT systems. Here's the > > observations I've made so far: > > > > 1) The kernel device usage count for suspending the device is higher (+1) in > > the failed execution of the test than the successful execution. > > > > 2) When the test disables all screens (through modeset) power-wells in the > > display domain are being turned off - except the display and always-on > > power-wells. > > > > 3) In the failed case, there is a difference in the wakeref acquire/release > > log. The following wakerefs don't happen for the successful executions. > > Wakeref last acquired: > > intel_display_power_get+0x18/0x50 [i915] > > intel_power_domains_init_hw+0x90/0x500 [i915] > > intel_power_domains_resume+0x3d/0x70 [i915] > > i915_pm_resume_early+0x9d/0x130 [i915] > > dpm_run_callback+0x64/0x280 > > device_resume_early+0xa6/0xe0 > > async_resume_early+0x14/0x40 > > async_run_entry_fn+0x34/0x160 > > Wakeref last released: > > i915_drm_suspend_late+0xad/0x120 [i915] > > dpm_run_callback+0x64/0x280 > > __device_suspend_late+0xad/0x140 > > async_suspend_late+0x15/0x90 > > async_run_entry_fn+0x34/0x160 > > process_one_work+0x245/0x610 > > worker_thread+0x37/0x380 > > kthread+0x119/0x130 > > > > Possible reasons for failure: > > The display and always-on power-wells have a higher reference count than the > > rest of the power-wells in the display domain. > > > > Next steps: > > 1) Confirm that the reference count on display and always-on power-wells is > > different between test fail/success case. > > 2) If confirmed, try to figure out the reason for extra reference count on > > the power-wells in question. > > This could be due to the audio driver not suspending. > > Could you provide the contents of > /sys/kernel/debug/dri/0/i915_power_domain_info after all screen gets > disabled (with runtime PM enabled for both the i915 and the audio driver)? On the CI systems it seems like runtime PM is enabled for i915. How can I setup/check if runtime PM is enabled for the audio driver? CI reported a few failures on BSW/BYT systems after taking the patch to IGT from Chris. This shows that there is 1 reference remaining even after disabling all the screens. Also, like you pointed out earlier, there is a reference from the audio driver that is not present in successful runs. Wakeref x1 taken at: intel_display_power_get+0x18/0x50 [i915] i915_audio_component_get_power+0x11/0x20 [i915] snd_hdac_display_power+0x6a/0x100 [snd_hda_core] hda_codec_runtime_resume+0x52/0x60 [snd_hda_codec] pm_runtime_force_resume+0x6a/0xd0 dpm_run_callback+0x64/0x280 device_resume+0xb3/0x1e0 async_resume+0x14/0x40 But, why would this reference be taken only in the failure cases that happens only in some of the runs? Links to a few failures after the patch to IGT from Chris: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5718/fi-byt-squawks/igt@i915_pm_rpm@basic-rte.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5718/fi-byt-j1900/igt@i915_pm_rpm@basic-rte.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5718/fi-bsw-cyan/igt@i915_pm_rpm@basic-rte.html Created attachment 143745 [details]
Basic-rte logs
Hello,
we've executed subtest basic-rte 1000 times on ICL-U. The issue does not reproduce.
(In reply to Ida from comment #9) > Created attachment 143745 [details] > Basic-rte logs > > Hello, > we've executed subtest basic-rte 1000 times on ICL-U. The issue does not > reproduce. Hello, the issue is only seen on Braswell and Baytrail systems - mostly Chromebooks. Looks like either -rc1 or Petri's static analysis cleanup of lib/igt_pm.c made this disappear from BAT. At least, I can't see any residual wakeref caused by snd_hda over the last couple of weeks. A CI Bug Log filter associated to this bug has been updated: {- BYT BSW SKL CNL: igt@pm_rpm@(module-reload|basic-rte) - fail - Failed assertion: setup_environment() -} {+ BYT BSW APL SKL CNL: igt@pm_rpm@(module-reload|basic-rte) - fail - Failed assertion: setup_environment() +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_265/fi-apl-guc/igt@i915_pm_rpm@basic-rte.html * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_267/fi-apl-guc/igt@i915_pm_rpm@basic-rte.html A CI Bug Log filter associated to this bug has been updated: {- BYT BSW APL SKL CNL: igt@pm_rpm@(module-reload|basic-rte) - fail - Failed assertion: setup_environment() -} {+ BYT BSW APL SKL CFL CNL: igt@pm_rpm@(module-reload|basic-rte) - fail - Failed assertion: setup_environment() +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_265/fi-cfl-guc/igt@i915_pm_rpm@basic-rte.html * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_266/fi-cfl-guc/igt@i915_pm_rpm@basic-rte.html * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_267/fi-cfl-guc/igt@i915_pm_rpm@basic-rte.html A CI Bug Log filter associated to this bug has been updated: {- BYT BSW APL SKL CFL CNL: igt@pm_rpm@(module-reload|basic-rte) - fail - Failed assertion: setup_environment() -} {+ BYT BSW APL SKL CFL CNL: igt@pm_rpm@(module-reload|basic-rte) - fail/dmesg-fail - Failed assertion: setup_environment() +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_274/fi-apl-guc/igt@i915_pm_rpm@basic-rte.html * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_274/fi-cfl-guc/igt@i915_pm_rpm@basic-rte.html * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_274/fi-skl-guc/igt@i915_pm_rpm@basic-rte.html The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.