Summary: | [CI][BAT] igt@i915_module_load@reload - timeout - Unable to sync register 0x2f8100 | ||
---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> |
Component: | DRM/Intel | Assignee: | Cezary Rojewski <cezary.j.rojewski> |
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | major | ||
Priority: | high | CC: | intel-gfx-bugs, kai.vehmanen, stanislav.lisovskiy |
Version: | XOrg git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | ICL | i915 features: | display/audio |
Description
Martin Peres
2019-07-25 08:29:22 UTC
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * ICL: igt@i915_module_load@reload - timeout - Unable to sync register 0x2f8100 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6546/fi-icl-u2/igt@i915_module_load@reload.html - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6546/fi-icl-u3/igt@i915_module_load@reload.html - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6546/fi-icl-y/igt@i915_module_load@reload.html - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6547/fi-icl-u2/igt@i915_module_load@reload.html - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6547/fi-icl-u3/igt@i915_module_load@reload.html - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_3292/fi-icl-u2/igt@i915_module_load@reload.html - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_3292/fi-icl-u3/igt@i915_module_load@reload.html A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@i915_module_load@reload - timeout - Unable to sync register 0x2f8100 -} {+ ICL: igt@i915_module_load@(reload|module-reload) - timeout - Unable to sync register 0x2f8100 +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6550/fi-icl-y/igt@i915_pm_rpm@module-reload.html Filip, this bug needs assessment. Can you please add your comments and set appropriate severity and priority? Issue seems to be connected with SGPC - Shared Graphics Power Component. That is, during power state change on gfx side, codec -or clock (not entirely sure which one) may be turned off without notifying audio side about such change. Now, from audio side codec is still present on the link and thus counted as active. And thus verbs are still being sent to it - timing out in consequence. Problem seems to exist since SKL and isn't strictly tied to ICL. The only reason it is visible now, is probably of yet another issue present for ICL platforms. We need to fix the root cause though. GFX guys must be engaged and comment on this. The issue is indeed related to power management and suspend/resume logic. I could reproduce it with suspending/resuming the machine, azx_response_timeout can be reproduced easily by enabling codec while disabling correspondent power domain. I also discovered that at least in Ubuntu if I have DP audio device available before suspend, then after suspend it disappears. There seems to be some issue/mismatch in intel_audio_codec_enable/intel_audio_codec_disable calls. I'm now also investigating it - to check which issues we might need to fix from GFX side. Looks like that power well might be not an issue, as power wells 2,3 were enabled just before the timeout. However the reason seems to be in a codec itself, at least according to that mailing thread: https://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg198762.html Note that although BAT is not reporting the timeout anymore, it is still present as icl-u2: 198.37 igt@i915_module_load@reload dmesg-warn (In reply to Chris Wilson from comment #7) > Note that although BAT is not reporting the timeout anymore, it is still > present as icl-u2: 198.37 igt@i915_module_load@reload dmesg-warn Do I understand correctly that this is supposed to be fixed by Takashi Iwai? Patches: https://patchwork.freedesktop.org/patch/319867/ Comments from Stan: This issue can lead to unpredictable timeouts when verbs are sent to code and it doesn't break the system, not even audio I guess. Most likely this is going to be fixed by this: https://patchwork.freedesktop.org/patch/319867/ Updating the priority and severity. Based on CI results, last seen one month ago. Probably above mentioned issue in the codec was fixed. Resolving as fixed. Module reload still takes over 190s. It is not fixed. (In reply to Chris Wilson from comment #11) > Module reload still takes over 190s. It is not fixed. I thought that was about audio issue, i.e "snd_hda_intel 0000:00:1f.3: azx_get_response timeout, switching to polling mode: last cmd=0x202f8100 <4> [389.635866] snd_hda_intel 0000:00:1f.3: No response from codec, disabling MSI: last cmd=0x202f8100 <3> [581.484906] snd_hda_codec_hdmi hdaudioC0D2: Unable to sync register 0x2f8100. -11" CI bug log filter doesn't catch that issue anymore - http://gfx-ci.fi.intel.com/cibuglog-ng/issue/1623 So we need to either edit filter or create another bug. (In reply to Stanislav Lisovskiy from comment #12) > (In reply to Chris Wilson from comment #11) > > Module reload still takes over 190s. It is not fixed. > > I thought that was about audio issue, i.e "snd_hda_intel 0000:00:1f.3: > azx_get_response timeout, switching to polling mode: last cmd=0x202f8100 > <4> [389.635866] snd_hda_intel 0000:00:1f.3: No response from codec, > disabling MSI: last cmd=0x202f8100 > <3> [581.484906] snd_hda_codec_hdmi hdaudioC0D2: Unable to sync register > 0x2f8100. -11" > > CI bug log filter doesn't catch that issue anymore - > http://gfx-ci.fi.intel.com/cibuglog-ng/issue/1623 > > So we need to either edit filter or create another bug. Stan, that's part of Bug 110595. (In reply to Lakshmi from comment #13) > (In reply to Stanislav Lisovskiy from comment #12) > > (In reply to Chris Wilson from comment #11) > > > Module reload still takes over 190s. It is not fixed. > > > > I thought that was about audio issue, i.e "snd_hda_intel 0000:00:1f.3: > > azx_get_response timeout, switching to polling mode: last cmd=0x202f8100 > > <4> [389.635866] snd_hda_intel 0000:00:1f.3: No response from codec, > > disabling MSI: last cmd=0x202f8100 > > <3> [581.484906] snd_hda_codec_hdmi hdaudioC0D2: Unable to sync register > > 0x2f8100. -11" > > > > CI bug log filter doesn't catch that issue anymore - > > http://gfx-ci.fi.intel.com/cibuglog-ng/issue/1623 > > > > So we need to either edit filter or create another bug. > Stan, that's part of Bug 110595. Despite it is also about module reload, I don't think those have anything in common except that issue happens also in module_reload. 110595 is about dmesg warn which we get because Port D was incorrectly marked as non-legacy TypeC port. Which I believe is not related anyhow to audio codec issue, we are dealing with here. Also I don't think it affects module_load@reload execution time, as it is just a dmesg-warn. I guess we really need to avoid this kind generic bugs to which we associate all kinds of bugs/issues which happen during this particular test case. Initially this bug was about azx_response_timeout message we get during module_load, which is related to audio codec. The CI Bug Log issue associated to this bug has been updated. ### New filters associated * ICL: igt@i915_module_load@reload - dseg-warn - azx_get_response timeout, switching to polling mode: last cmd=.* (No new failures associated) (In reply to CI Bug Log from comment #15) > The CI Bug Log issue associated to this bug has been updated. > > ### New filters associated > > * ICL: igt@i915_module_load@reload - dseg-warn - azx_get_response timeout, > switching to polling mode: last cmd=.* > (No new failures associated) Lakshmi, there is a typo - "dseg-warn" A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@i915_module_load@reload - dseg-warn - azx_get_response timeout, switching to polling mode: last cmd=.* -} {+ ICL: igt@i915_module_load@reload - dmesg-warn - azx_get_response timeout, switching to polling mode: last cmd=.* +} No new failures caught with the new filter (In reply to Cezary Rojewski from comment #4) > Problem seems to exist since SKL and isn't strictly tied to ICL. The only > reason it is visible now, is probably of yet another issue present for ICL > platforms. We need to fix the root cause though. GFX guys must be engaged > and comment on this. This seems to be the case. I'm debugging this with SOF audio driver and it seems to match this description: "[ICL] occasional display audio codec probe failure - fail to get afg sub nodes" https://github.com/thesofproject/linux/issues/1184 The reproduction rate goes up when an arbitrary delay is added between display probe and i915 audio component setup. Bug is fixed if i915 power domains are kept up during the whole sequence. Latest analysis in comment: https://github.com/thesofproject/linux/issues/1184#issuecomment-532765982 Potential bugfix to this issue submitted on intel-gfx list: https://lists.freedesktop.org/archives/intel-gfx/2019-October/214621.html OK, some encouraging results from CI (if I'm interpreting the CI results correctly): https://patchwork.freedesktop.org/series/67460/ ... * igt@i915_module_load@reload: - fi-icl-u3: [DMESG-WARN][7] ([fdo#107724] / [fdo#111214]) -> [PASS][8] [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6986/fi-icl-u3/igt@i915_module_load@reload.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14617/fi-icl-u3/igt@i915_module_load@reload.html ... based on the logs, it seem HDMI audio probe fails in baseline, but now succeeded with the patch! commit 1580d3cdddbba4a5ef78a04a5289e32844e6af24 Author: Kai Vehmanen <kai.vehmanen@linux.intel.com> Date: Thu Oct 3 11:55:30 2019 +0300 drm/i915: Fix audio power up sequence for gen10+ display On platfroms with gen10+ display, driver must set the enable bit of AUDIO_PIN_BUF_CTL register before transactions with the HDA controller can proceed. Add setting this bit to the audio power up sequence. Failing to do this resulted in errors during display audio codec probe, and failures during resume from suspend. Note: We may also need to disable the bit afterwards, but there are still unresolved issues with that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111214 Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191003085531.30990-1-kai.vehmanen@linux.intel.com |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.