Bug 109189 - [CI][DRMTIP] igt@pm_rpm@system-suspend-* - fail - <3> ...: azx_get_response timeout, switching to single_cmd mode
Summary: [CI][DRMTIP] igt@pm_rpm@system-suspend-* - fail - <3> ...: azx_get_response t...
Status: RESOLVED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-12-31 10:29 UTC by Martin Peres
Modified: 2019-07-09 08:02 UTC (History)
1 user (show)

See Also:
i915 platform: BYT, ICL
i915 features: display/audio


Attachments
Dump additional logs (452 bytes, text/x-sh)
2019-02-19 11:17 UTC, Cezary Rojewski
no flags Details
Dump additional traces (557 bytes, text/x-sh)
2019-02-19 11:17 UTC, Cezary Rojewski
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2018-12-31 10:29:19 UTC
Starting subtest: system-suspend-execbuf
(pm_rpm:1305) CRITICAL: Test assertion failure function system_suspend_execbuf_subtest, file ../tests/pm_rpm.c:1476:
(pm_rpm:1305) CRITICAL: Failed assertion: wait_for_suspended()
Subtest system-suspend-execbuf failed.

<4> [307.965661] snd_hda_intel 0000:00:1b.0: No response from codec, disabling MSI: last cmd=0x204f0900
<3> [308.973666] snd_hda_intel 0000:00:1b.0: azx_get_response timeout, switching to single_cmd mode: last cmd=0x204f0900
<7> [308.974737] [drm:i915_audio_component_get_eld [i915]] Not valid for port B
<6> [308.986686] acpi LNXPOWER:02: Turning OFF
<6> [308.988673] acpi LNXPOWER:01: Turning OFF
<6> [308.989711] Bluetooth: hci0: RTL: rtl: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723

<6> [308.990539] acpi LNXPOWER:00: Turning OFF
<6> [308.991186] OOM killer enabled.
<6> [308.991197] Restarting tasks ... 
<6> [308.992383] Bluetooth: hci0: RTL: rom_version status=0 version=1

<6> [308.992415] Bluetooth: hci0: RTL: rtl: loading rtl_bt/rtl8723b_fw.bin

<4> [308.994397] done.
<6> [309.002877] Bluetooth: hci0: RTL: rtl: loading rtl_bt/rtl8723b_config.bin

<4> [309.003231] bluetooth hci0: Direct firmware load for rtl_bt/rtl8723b_config.bin failed with error -2
<6> [309.003284] Bluetooth: hci0: RTL: cfg_sz -2, total sz 22496

<6> [309.009486] PM: suspend exit
Comment 2 Cezary Rojewski 2019-02-19 11:17:11 UTC
Created attachment 143409 [details]
Dump additional logs
Comment 3 Cezary Rojewski 2019-02-19 11:17:35 UTC
Created attachment 143410 [details]
Dump additional traces
Comment 4 Cezary Rojewski 2019-02-19 11:20:52 UTC
Hello,

After thousands of iterations we are still not able to reproduce the issue.
Attaching simple debug and trace scripts which dump additional driver logs.

Please retest using following steps:
1.	Upload dmesg_logs.sh and trace_logs.sh to tested machine
2.	Call chmod +x on both files
3.	Start “./dmesg_logs.sh > dmesg.txt” in first terminal and "./trace_logs.sh > trace.txt” in second terminal
4.	Note – scripts are calling some commands as “sudo” so you may need to enter password
5.	Execute your test
6.	Interrupt scripts with ctrl+c
Comment 5 Martin Peres 2019-02-19 14:49:40 UTC
Sorry for forgetting to add the link of the ICL failure. After a fair bit of archaeology, I found it: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5164/shard-iclb1/igt@pm_rpm@system-suspend-modeset.html

(In reply to cezary.j.rojewski from comment #4)
> Hello,
> 
> After thousands of iterations we are still not able to reproduce the issue.
> Attaching simple debug and trace scripts which dump additional driver logs.
> 
> Please retest using following steps:
> 1.	Upload dmesg_logs.sh and trace_logs.sh to tested machine
> 2.	Call chmod +x on both files
> 3.	Start “./dmesg_logs.sh > dmesg.txt” in first terminal and
> "./trace_logs.sh > trace.txt” in second terminal
> 4.	Note – scripts are calling some commands as “sudo” so you may need to
> enter password
> 5.	Execute your test
> 6.	Interrupt scripts with ctrl+c

Thanks for doing this.

So far, we have had a reproduction rate of ~1/500. What test did you execute in a loop? On ICL (do not tell the HW revision here though)? How many thousand times did you run the test?

Thanks to your work, we will be able to drop the priority of this issue for ICL, but the problem is still very visible for our fi-byt-j1900[1], which you still probably want to look into.

[1] https://intel-gfx-ci.01.org/hardware.html#fi-byt-j1900 , https://www.gigabyte.com/Mini-PcBarebone/GB-BXBT-1900-rev-10
Comment 6 Ida 2019-03-15 10:42:25 UTC
We tried to reproduce this issue on ICL U. 
We've installed pm-graph:
https://01.org/pm-graph/downloads/pm-graph-v5.3
Then we run analyze_suspend.py 2k times:
sudo ./analyze_suspend.py -rtcwake 3 -multi 2000 1 -m mem

And we do not have reproduction.
Comment 7 Martin Peres 2019-04-23 12:30:28 UTC
The failure has not been seen on ICL since. As for BYT, we used to see the issue every 4.5 runs in average, but now not seen since drmtip_240. Let's wait until drmtip_285 before closing!
Comment 8 Lakshmi 2019-07-09 08:01:52 UTC
Current drmtip is 321, still not seen. Closing this issue as WORKSFORME.
Comment 9 CI Bug Log 2019-07-09 08:02:15 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.