Bug 102937 - [GLK-RVP]/[APL-Electro]HDA verb command response timedout; board to board variation with same S/W
Summary: [GLK-RVP]/[APL-Electro]HDA verb command response timedout; board to board var...
Status: ASSIGNED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high major
Assignee: Abhay Kumar
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: PatchSubmitted
Keywords:
: 105841 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-09-21 23:03 UTC by Abhay Kumar
Modified: 2018-11-08 19:44 UTC (History)
3 users (show)

See Also:
i915 platform: GLK
i915 features: display/audio


Attachments
logs (48.62 KB, text/plain)
2017-09-21 23:04 UTC, Abhay Kumar
no flags Details
logs with drm.debug=0x0e (106.19 KB, text/plain)
2018-04-09 23:59 UTC, Abhay Kumar
no flags Details
vbt file output of "attach /sys/kernel/debug/dri/0/i915_vbt" (6.00 KB, text/plain)
2018-04-10 19:00 UTC, Abhay Kumar
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Abhay Kumar 2017-09-21 23:03:26 UTC
1. Environment 
               GLK-RVP2C/GLK-RVP2D, Chrome OS.
               During Probe, we are observing "HDA verb commands response getting timedout

When HDMI is connected then it works fine and in absence it works on some board and not on some board.

2. Expected Behavior 

    HDA verb commands should not timeout.

3. Actual Behavior 
    HDA verb commands getting timeout.
    As timeout occurring, codec probe failed. Because of this, sound card not created and hence Audio fails

power well registers are same in passing and fail usecase.
localhost ~ # intel_reg read 0x45404
Warning: stat '/usr/share/intel-gpu-tools/registers' failed: No such file or directory. Using builtin register spec.
(0x00045404): 0xf03f000c
localhost ~ # intel_reg read 0x45405
Warning: stat '/usr/share/intel-gpu-tools/registers' failed: No such file or directory. Using builtin register spec.
(0x00045405): 0xffffffff
localhost ~ # intel_reg read 0x45406
Warning: stat '/usr/share/intel-gpu-tools/registers' failed: No such file or directory. Using builtin register spec.
(0x00045406): 0xffffffff
localhost ~ # intel_reg read 0x45407
Warning: stat '/usr/share/intel-gpu-tools/registers' failed: No such file or directory. Using builtin register spec.
(0x00045407): 0xffffffff
Comment 1 Abhay Kumar 2017-09-21 23:04:18 UTC
Created attachment 134421 [details]
logs
Comment 2 Sathyanarayana Nujella 2017-09-22 00:15:55 UTC
This is the location where HDA verb command issue observed:


https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git/tree/sound/soc/intel/skylake/skl.c?h=for-next#n523
static int probe_codec(struct hdac_ext_bus *ebus, int addr)
{
	struct hdac_bus *bus = ebus_to_hbus(ebus);
	unsigned int cmd = (addr << 28) | (AC_NODE_ROOT << 20) |
		(AC_VERB_PARAMETERS << 8) | AC_PAR_VENDOR_ID;
	unsigned int res = -1;

	mutex_lock(&bus->cmd_mutex);
	snd_hdac_bus_send_cmd(bus, cmd);
	snd_hdac_bus_get_response(bus, addr, &res);
	
==>
For the above hadc bus got response timedout and log shows this response:

[ 6.388520] addr=0x2 cmd=0x200f0000 res=0xffffffff
Comment 3 anusha 2017-09-22 19:57:46 UTC
Hi try giving i915.disable_power_well=0 as the module parameter. This will keep all power well enabled all the time.
Comment 4 Sathyanarayana Nujella 2017-09-22 21:02:41 UTC
Yes, I did the change for test.. I still see the same problem:

This is what I did and what I see in dmesg:

--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -40,7 +40,7 @@ struct i915_params i915 __read_mostly = {
        .enable_ppgtt = -1,
        .enable_psr = -1,
        .alpha_support = IS_ENABLED(CONFIG_DRM_I915_ALPHA_SUPPORT),
-       .disable_power_well = -1,
+       .disable_power_well = 0,



From dmesg:

[    2.674730] sathya intel_power_domains_init_hw i915.disable_power_well=0
...
Comment 5 Chintan Patel 2017-09-29 17:56:50 UTC
After experimenting and doing several things I wanted to make below notes here:

Case 1. 
Surprisingly, this could also be reproduced on APL/Electro simply by adding "drm.debug=14 log_buf_len=1M". On my device this is 100% reproducible.
So lets no say its GLK-RVP  Specific bug. Add APL also in bug description.

Case 2.[NO HDA Verb Timeout, as there no Codec #%d probe error; disabling it...\n", c); happened here]
Interestingly, I one time just added simple printk(3 lines additions) in skl.c and I could see this issue to, but less rate of reproduction. In this case I saw:

[   11.548286] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: binding Bxt Audio Port
[   11.548298] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: CPU DAI System Pin not registered
[   11.557716] chintan: skl_i915_init err:0
[   11.560182] chintan: skl_probe_work err: 0
[   11.566803] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: binding Bxt Audio Port
[   11.566816] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: CPU DAI System Pin not registered
[   11.567440] snd_soc_skl 0000:00:0e.0: codec #2 probed OK
[   11.580011] HDMI HDA Codec ehdaudio0D2: Max dais supported: 3
[   11.580021] HDMI HDA Codec ehdaudio0D2: codec register ehdaudio0D2
[   11.580030] HDMI HDA Codec ehdaudio0D2: ASoC: dai register ehdaudio0D2 #3
[   11.580032] HDMI HDA Codec ehdaudio0D2: ASoC: dynamically register DAI ehdaudio0D2
[   11.580035] HDMI HDA Codec ehdaudio0D2: ASoC: Registered DAI 'intel-hdmi-hifi1'
[   11.580037] HDMI HDA Codec ehdaudio0D2: ASoC: dynamically register DAI ehdaudio0D2
[   11.580040] HDMI HDA Codec ehdaudio0D2: ASoC: Registered DAI 'intel-hdmi-hifi2'
[   11.580042] HDMI HDA Codec ehdaudio0D2: ASoC: dynamically register DAI ehdaudio0D2
[   11.580045] HDMI HDA Codec ehdaudio0D2: ASoC: Registered DAI 'intel-hdmi-hifi3'
[   11.580048] HDMI HDA Codec ehdaudio0D2: ASoC: Registered codec 'ehdaudio0D2'
[   11.580184] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: binding Bxt Audio Port
[   11.580192] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: CPU DAI System Pin not registered
[   11.580303] chintan: probe_codec probe_codec_ret: 0

Case 3 [NO HDA Verb Timeout, as there is no Codec #%d probe error; disabling it...\n", c); happened here]:
Below case also shows no sound card:

[ 4799.781611] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: binding Bxt Audio Port
[ 4799.781640] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: binding Bxt Audio Capture Port
[ 4799.781651] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: binding Bxt Audio Reference cap
[ 4799.781661] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: binding Bxt Audio DMIC cap
[ 4799.781671] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: binding Bxt HDMI Port1
[ 4799.781682] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: binding Bxt HDMI Port2
[ 4799.781692] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: binding Bxt HDMI Port3
[ 4799.781702] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: binding SSP5-Codec
[ 4799.781712] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: binding SSP1-Codec
[ 4799.781722] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: binding dmic01
[ 4799.781731] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: binding iDisp1
[ 4799.781740] bxt_da7219_max98357a_i2s bxt_da7219_max98357a_i2s: ASoC: CODEC DAI intel-hdmi-hifi1 not registered

As you can also see whenever there is no sound card it also always shows ASoC DAI registration error : 

ASoC: CODEC DAI intel-hdmi-hifi1 not registered

OR

ASoC: CPU DAI System Pin not registered


At this point I feel we should also explore more area and not just HDA Verb responses.
Comment 6 Abhay Kumar 2017-10-24 19:06:55 UTC
CD Clock frequency is 79.2 MHz in failing case and is 316.8 MHz in passing case. Minimum frequency required for Audio Codec to operate on GLK is 96 MHz. Hence the failing case is expected to fail. This is statement from windows driver team.

Basically for 1080p panel where cdclk is 79.2 it fails and for 4k panel where cdclk is 316.8 it passes. Looks like we need to check design. Any thoughts?
Comment 7 Jani Nikula 2017-10-25 10:59:37 UTC
Please try current drm-tip kernel from https://cgit.freedesktop.org/drm/drm-tip
Comment 8 Abhay Kumar 2017-10-25 19:02:50 UTC
https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/i915/intel_cdclk.c

we are not entering into func "intel_crtc_compute_min_cdclk"

if (intel_crtc_has_dp_encoder(crtc_state) &&
	    crtc_state->has_audio &&
	    crtc_state->port_clock >= 540000 &&
	    crtc_state->lane_count == 4) {
		if (IS_CANNONLAKE(dev_priv) || IS_GEMINILAKE(dev_priv)) {
			/* Display WA #1145: glk,cnl */
			min_cdclk = max(316800, min_cdclk);
		} else if (IS_GEN9(dev_priv) || IS_BROADWELL(dev_priv)) {
			/* Display WA #1144: skl,bxt */
			min_cdclk = max(432000, min_cdclk);
		}
	}


Looks like it will only enter inside for HBR2 and not for HBR or RBR.
Comment 9 Abhay Kumar 2017-10-26 01:51:49 UTC
Tried last night DINQ and same behaviour.

pushed a patch for review.

https://patchwork.freedesktop.org/series/32657/
Comment 10 Jani Nikula 2018-02-14 17:46:28 UTC
(In reply to Abhay Kumar from comment #8)
> https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/i915/
> intel_cdclk.c
> 
> we are not entering into func "intel_crtc_compute_min_cdclk"
> 
> if (intel_crtc_has_dp_encoder(crtc_state) &&
> 	    crtc_state->has_audio &&
> 	    crtc_state->port_clock >= 540000 &&
> 	    crtc_state->lane_count == 4) {
> 		if (IS_CANNONLAKE(dev_priv) || IS_GEMINILAKE(dev_priv)) {
> 			/* Display WA #1145: glk,cnl */
> 			min_cdclk = max(316800, min_cdclk);
> 		} else if (IS_GEN9(dev_priv) || IS_BROADWELL(dev_priv)) {
> 			/* Display WA #1144: skl,bxt */
> 			min_cdclk = max(432000, min_cdclk);
> 		}
> 	}
> 
> 
> Looks like it will only enter inside for HBR2 and not for HBR or RBR.

Look right below that:

	/* According to BSpec, "The CD clock frequency must be at least twice
	 * the frequency of the Azalia BCLK." and BCLK is 96 MHz by default.
	 */
	if (crtc_state->has_audio && INTEL_GEN(dev_priv) >= 9)
		min_cdclk = max(2 * 96000, min_cdclk);
Comment 11 Jani Nikula 2018-02-14 18:00:09 UTC
Fresh comments on the patch.
Comment 12 Jani Saarinen 2018-03-29 07:09:58 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 13 Abhay Kumar 2018-03-29 17:55:39 UTC
Hi Jani,

   Bug is still there and looks like symptom getting worse. We are trying on one custom board as well and even after bumping cdclk it is not working but after enabling powerwell all the time "i915.disable_power_well=0" it works fine. investigation continues.
Comment 14 Abhay Kumar 2018-04-09 23:59:24 UTC
Created attachment 138715 [details]
logs with drm.debug=0x0e
Comment 15 qwang13 2018-04-10 02:21:05 UTC
*** Bug 105841 has been marked as a duplicate of this bug. ***
Comment 16 Jani Nikula 2018-04-10 07:58:08 UTC
Please attach /sys/kernel/debug/dri/0/i915_vbt on the affected machine.
Comment 17 Abhay Kumar 2018-04-10 19:00:19 UTC
Created attachment 138743 [details]
vbt file output of "attach /sys/kernel/debug/dri/0/i915_vbt"
Comment 18 Jani Saarinen 2018-04-20 14:29:12 UTC
cc Imre
Comment 19 Jani Saarinen 2018-05-04 09:49:19 UTC
Jani, do you see anything from info provided?
Comment 20 Abhay Kumar 2018-05-04 17:07:00 UTC
Last patch needed to resolve this is in review.

https://patchwork.kernel.org/patch/10370679/
Comment 21 Abhay Kumar 2018-08-18 19:33:26 UTC
Hi,

 we had this https://patchwork.kernel.org/patch/10480401/ which can solve this issue with bumping cdclk while probing at bootup and once probing done bump down cdclk. This will save power and also workaround audio architecture issue.

Problem with this approach is that it needs modeset and due to that at bootup it flicker twice and also takes more time to comeup which is not acceptable both in terms of time and user experience. 

we need to comeup with some solution where we can chagne cdclk without modeset and also audio working with cdclk toggle.

Thanks.
Comment 22 Ville Syrjala 2018-08-23 14:21:27 UTC
(In reply to Abhay Kumar from comment #21)
> Hi,
> 
>  we had this https://patchwork.kernel.org/patch/10480401/ which can solve
> this issue with bumping cdclk while probing at bootup and once probing done
> bump down cdclk. This will save power and also workaround audio architecture
> issue.
> 
> Problem with this approach is that it needs modeset and due to that at
> bootup it flicker twice and also takes more time to comeup which is not
> acceptable both in terms of time and user experience. 
> 
> we need to comeup with some solution where we can chagne cdclk without
> modeset and also audio working with cdclk toggle.

We cannot change cdclk without a modeset. That's just how the hw works.
Comment 23 Abhay Kumar 2018-08-29 16:45:06 UTC
Thanks, Ville for the patch.https://patchwork.freedesktop.org/patch/246133/


However, we have two issue to address before we close this
1. get rid of cdclk mismatch warning when doing suspend resume.
2. improvement suggested by ville for atomic commit.
Comment 24 Lakshmi 2018-10-19 12:43:10 UTC
Abhay, Any updates here?
Comment 25 Jani Saarinen 2018-11-08 19:44:32 UTC
Abhay, ping.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.