Summary: | [CI][BAT] igt@pm_rpm@module-reload - dmesg-warn - Use count on domain AUDIO is already zero | ||
---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | high | CC: | arkadiusz.hiler, intel-gfx-bugs |
Version: | XOrg git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | GLK, ICL, SKL | i915 features: | display/audio |
Description
Martin Peres
2019-01-31 08:33:06 UTC
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * SKL: igt@pm_rpm@module-reload - dmesg-warn - Use count on domain AUDIO is already zero - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2277/fi-skl-6700hq/igt@pm_rpm@module-reload.html - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_3733/fi-skl-6260u/igt@pm_rpm@module-reload.html - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5511/fi-skl-6700hq/igt@pm_rpm@module-reload.html Overlapping WARNs, I feel we are missing some serialisation -- hopefully just at the debug level. A CI Bug Log filter associated to this bug has been updated: {- SKL: igt@pm_rpm@module-reload - dmesg-warn - Use count on domain AUDIO is already zero -} {+ SKL ICL: igt@pm_rpm@module-reload - dmesg-warn - Use count on domain AUDIO is already zero +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5551/fi-icl-y/igt@pm_rpm@module-reload.html A CI Bug Log filter associated to this bug has been updated: {- SKL ICL: igt@pm_rpm@module-reload - dmesg-warn - Use count on domain AUDIO is already zero -} {+ SKL WHL ICL: igt@pm_rpm@module-reload - dmesg-warn - Use count on domain AUDIO is already zero +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5534/fi-whl-u/igt@pm_rpm@module-reload.html The CI Bug Log issue associated to this bug has been updated. ### New filters associated * SKL WHL ICL: igt@runner@aborted - fail - Previous test: pm_rpm (module-reload) - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2253/fi-whl-u/igt@runner@aborted.html - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12144/fi-whl-u/igt@runner@aborted.html - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2334/fi-whl-u/igt@runner@aborted.html - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5534/fi-whl-u/igt@runner@aborted.html A CI Bug Log filter associated to this bug has been updated: {- SKL WHL ICL: igt@runner@aborted - fail - Previous test: pm_rpm (module-reload) -} {+ SKL WHL ICL: igt@runner@aborted - fail - Previous test: i915_pm_rpm (module-reload) +} No new failures caught with the new filter A CI Bug Log filter associated to this bug has been updated: {- SKL WHL ICL: igt@pm_rpm@module-reload - dmesg-warn - Use count on domain AUDIO is already zero -} {+ SKL KBL WHL ICL: igt@pm_rpm@module-reload - dmesg-warn - Use count on domain AUDIO is already zero +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5670/fi-kbl-7560u/igt@i915_pm_rpm@module-reload.html This issue does not reproduce on ICL U (HDA Legacy Audio). This is the Kernel we use: https://gfx-assets.igk.intel.com/artifactory/webapp/#/artifacts/browse/tree/General/gfx-osgc-assets-igk/builds/DRM/CI_DRM/5675 (In reply to Ida from comment #8) > This issue does not reproduce on ICL U (HDA Legacy Audio). Hey, Thanks for letting us know. CI_DRM_5675 is from 28th of February, but we still have hit the issue after that: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5811/fi-whl-u/igt@i915_pm_rpm@module-reload.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5788/fi-whl-u/igt@i915_pm_rpm@module-reload.html etc. Seems like the reproduction rate is once every 150-200 runs. We see it mostly on WHL nowadays, but it will take about 1k runs to be sensibly sure that its gone for ICL for good. For customer, the impact is negligible but for CI its annoying. Setting the priority to Medium. A CI Bug Log filter associated to this bug has been updated: {- SKL KBL WHL ICL: igt@pm_rpm@module-reload - dmesg-warn - Use count on domain AUDIO is already zero -} {+ SKL KBL GLK WHL ICL: igt@pm_rpm@module-reload|system-suspend-execbuf - dmesg-warn/fail - Use count on domain AUDIO is already zero +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5873/shard-glk2/igt@i915_pm_rpm@system-suspend-execbuf.html A CI Bug Log filter associated to this bug has been updated: {- SKL KBL GLK WHL ICL: igt@pm_rpm@module-reload|system-suspend-execbuf - dmesg-warn/fail - Use count on domain AUDIO is already zero -} {+ SKL KBL GLK WHL ICL: igt@pm_rpm@module-reload|system-suspend-execbuf|pm-tiling - dmesg-warn/fail - Use count on domain AUDIO is already zero +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4929/shard-glk6/igt@i915_pm_rpm@pm-tiling.html The issue seems to be still there: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12702/shard-glk5/dmesg1.log I think it's a significant issue that someone from the audio team should track down and fix. A CI Bug Log filter associated to this bug has been updated: {- SKL KBL GLK WHL ICL: igt@pm_rpm@module-reload|system-suspend-execbuf|pm-tiling - dmesg-warn/fail - Use count on domain AUDIO is already zero -} {+ SKL KBL GLK WHL ICL: igt@(i915_)?pm_rpm@* - dmesg-warn/fail - Use count on domain AUDIO is already zero +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5878/shard-glk4/igt@i915_pm_rpm@dpms-mode-unset-non-lpsp.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5878/shard-glk7/igt@i915_pm_rpm@cursor-dpms.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5880/shard-glk4/igt@i915_pm_rpm@cursor-dpms.html Hi, First of all, as I understand CI is not testing audio per se, but can you consider adding snd_hda_core.dyndbg="+pfl" snd_soc_hdac_hdmi.dyndbg="+pfl" for debugging HDMI Audio functionality to kernel parameters? It will give more useful logs from audio side. Can you tell me what exactly is being tested in this test? One of the things I see, that in all logs, moments before warning device goes to sleep and then is being woken up. I've took a look at where calls are performed from audio side and I don't see anything obviously wrong with them. But considering that we just finished resume it looks to me like something tries to call __intel_display_power_put() on device that is still down. Another thing is the "enabling always-on" & "disabling always-on" sequence, in failing case it seems suspiciously short to me (in comparison with working cases), but I'm not well versed in graphics code/states, so can't tell if it's really correct. Suspend: <7>[ 244.855833] [drm:intel_runtime_suspend [i915]] Suspending device <7>[ 244.856535] [drm:gen9_set_dc_state [i915]] Setting DC state from 01 to 00 <7>[ 244.856695] [drm:intel_power_well_disable [i915]] disabling power well 1 <7>[ 244.856863] [drm:hsw_wait_for_power_well_disable.isra.7 [i915]] power well 1 forced on (bios:1 driver:0 kvmr:0 debug:0) <7>[ 244.857358] [drm:bxt_enable_dc9 [i915]] Enabling DC9 <7>[ 244.857415] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 08 <7>[ 244.857580] [drm:intel_runtime_suspend [i915]] Device suspended ... <7>[ 244.858101] [drm:intel_atomic_check [i915]] New cdclk calculated to be logical 79200 kHz, actual 79200 kHz <7>[ 244.858208] [drm:intel_atomic_check [i915]] New voltage level calculated to be logical 4, actual 4 <7>[ 244.858903] [drm:drm_mode_setcrtc] [CRTC:125:pipe C] Resume: <7>[ 244.881249] [drm:intel_runtime_resume [i915]] Resuming device <7>[ 244.881317] [drm:bxt_disable_dc9 [i915]] Disabling DC9 <7>[ 244.881376] [drm:gen9_set_dc_state [i915]] Setting DC state from 08 to 00 <7>[ 244.881458] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 00 <7>[ 244.881577] [drm:intel_power_well_enable [i915]] enabling power well 1 <7>[ 244.881726] [drm:intel_dump_cdclk_state [i915]] Current CDCLK 19200 kHz, VCO 0 kHz, ref 19200 kHz, bypass 19200 kHz, voltage level 1 <7>[ 244.881790] [drm:bxt_init_cdclk [i915]] Sanitizing cdclk programmed by pre-os <7>[ 244.882371] [drm:gen9_enable_dc5 [i915]] Enabling DC5 <7>[ 244.882425] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 01 <7>[ 244.886970] [drm:__bxt_hpd_detection_setup [i915]] Invert bit setting: hp_ctl:10001818 hp_port:38 <7>[ 244.892311] [drm:__bxt_hpd_detection_setup [i915]] Invert bit setting: hp_ctl:10001818 hp_port:30 <7>[ 244.895516] [drm:intel_runtime_resume [i915]] Device resumed Sequence between "enabling always-on" & "disabling always-on": <7>[ 244.895677] [drm:intel_power_well_enable [i915]] enabling always-on <7>[ 244.895732] [drm:intel_power_well_enable [i915]] enabling DC off <7>[ 244.896797] [drm:gen9_set_dc_state [i915]] Setting DC state from 01 to 00 <7>[ 244.897856] [drm:intel_atomic_commit_tail [i915]] [ENCODER:126:DDI B] <7>[ 244.897920] [drm:intel_atomic_commit_tail [i915]] [ENCODER:135:DDI C] <7>[ 244.897981] [drm:verify_single_dpll_state.isra.108 [i915]] PORT PLL A <7>[ 244.898053] [drm:verify_single_dpll_state.isra.108 [i915]] PORT PLL B <7>[ 244.898283] [drm:verify_single_dpll_state.isra.108 [i915]] PORT PLL C <7>[ 244.898372] [drm:intel_power_well_disable [i915]] disabling DC off <7>[ 244.898434] [drm:gen9_enable_dc5 [i915]] Enabling DC5 <7>[ 244.898488] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 01 <7>[ 244.902735] [drm:intel_power_well_disable [i915]] disabling always-on Warning: <4>[ 244.904316] ------------[ cut here ]------------ <4>[ 244.904321] Use count on domain AUDIO is already zero <4>[ 244.904403] WARNING: CPU: 0 PID: 274 at drivers/gpu/drm/i915/intel_runtime_pm.c:1928 __intel_display_power_put+0xe9/0x100 [i915] Summing up, I see few suspicious things, but without additional logs enabled it's bit hard to tell anything concrete. (In reply to Amadeusz Sławiński from comment #15) > Hi, > > First of all, as I understand CI is not testing audio per se, but can you > consider adding > snd_hda_core.dyndbg="+pfl" snd_soc_hdac_hdmi.dyndbg="+pfl" > for debugging HDMI Audio functionality to kernel parameters? > It will give more useful logs from audio side. Hey Amadeusz, First we would have to see how much noise it would cause using some pre-merge testing. That would require us to send a kernel "HAX" patches that force those options on kernel side to intel-gfx@lists.freedesktop.org. Then, if it's not causing too much havoc, we can enable it for some of the offending machines to gather more data over time. If you can cook up such patch and send it - that would be appreciated. > Can you tell me what exactly is being tested in this test? Test: https://gitlab.freedesktop.org/drm/igt-gpu-tools/blob/master/tests/i915/i915_pm_rpm.c#L2101 Preparing the environment: https://gitlab.freedesktop.org/drm/igt-gpu-tools/blob/master/tests/i915/i915_pm_rpm.c#L740 It reloads i915 with display disabled, waits for the graphics to enter d3, reloads again with display enabled, waits for d3 again, enables one screen and then unloads the driver altogether. Setting this highest as blocking our CI GLK DSI to proceed. http://gfx-ci.fi.intel.com/tree/drm-tip/fi-glk-dsi.html commit 4bdc42094d9c4af75f90ee49b9009ea8dfe41444 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued) Author: Takashi Iwai <tiwai@suse.de> Date: Wed Apr 10 13:03:22 2019 +0200 ALSA: hda: Fix racy display power access snd_hdac_display_power() doesn't handle the concurrent calls carefully enough, and it may lead to the doubly get_power or put_power calls, when a runtime PM and an async work get called in racy way. This patch addresses it by reusing the bus->lock mutex that has been used for protecting the link state change in ext bus code, so that it can protect against racy display state changes. The initialization of bus->lock was moved from snd_hdac_ext_bus_init() to snd_hdac_bus_init() as well accordingly. Testcase: igt/i915_pm_rpm/module-reload #glk-dsi Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Imre Deak <imre.deak@intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/s5h8swiunph.wl-tiwai@suse.de (In reply to Chris Wilson from comment #18) > commit 4bdc42094d9c4af75f90ee49b9009ea8dfe41444 (HEAD -> > drm-intel-next-queued, > drm-intel/for-linux-next, drm-intel/drm-intel-next-queued) > Author: Takashi Iwai <tiwai@suse.de> > Date: Wed Apr 10 13:03:22 2019 +0200 > > ALSA: hda: Fix racy display power access > > snd_hdac_display_power() doesn't handle the concurrent calls carefully > enough, and it may lead to the doubly get_power or put_power calls, > when a runtime PM and an async work get called in racy way. > > This patch addresses it by reusing the bus->lock mutex that has been > used for protecting the link state change in ext bus code, so that it > can protect against racy display state changes. The initialization of > bus->lock was moved from snd_hdac_ext_bus_init() to > snd_hdac_bus_init() as well accordingly. > > Testcase: igt/i915_pm_rpm/module-reload #glk-dsi > Reported-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Imre Deak <imre.deak@intel.com> > Signed-off-by: Takashi Iwai <tiwai@suse.de> > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Link: > https://patchwork.freedesktop.org/patch/msgid/s5h8swiunph.wl-tiwai@suse.de It definitely fixed the issue! Thanks! The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.