Bug 100125 - [KBL][BAT] gem_exec_suspend@basic-s4-devices dmesg warning
Summary: [KBL][BAT] gem_exec_suspend@basic-s4-devices dmesg warning
Status: CLOSED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: highest critical
Assignee: David Weinehall
QA Contact: David Weinehall
URL:
Whiteboard: ReadyForDev
Keywords:
: 100428 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-03-08 20:21 UTC by Jani Saarinen
Modified: 2017-07-17 08:35 UTC (History)
1 user (show)

See Also:
i915 platform: KBL, SKL, SNB
i915 features: power/suspend-resume


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jani Saarinen 2017-03-08 20:21:11 UTC
On CI on Joule bxt-t5700 gem_exec_suspend@basic-s4-devices causes dmesg warning

https://intel-gfx-ci.01.org/CI/CI_DRM_2306/fi-bxt-t5700/igt@gem_exec_suspend@basic-s4-devices.html

Dmesg	
[  468.379071] Suspending console(s) (use no_console_suspend to debug)
[  473.772679] usb usb1: root hub lost power or was reset
[  473.772881] usb usb2: root hub lost power or was reset
[  474.370142] ------------[ cut here ]------------
[  474.370183] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x225/0x230
[  474.370197] Modules linked in: ax88179_178a usbnet mii x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core mei_me snd_pcm mei mmc_block i915 sdhci_pci sdhci mmc_core prime_numbers i2c_hid pinctrl_broxton pinctrl_intel
[  474.370409] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.11.0-rc1-CI-CI_DRM_2306+ #1
[  474.370412] Hardware name: Intel Corp. Broxton M/SDS, BIOS GTPPA16A.X64.0143.B30.1608112014 08/11/2016
[  474.370414] Call Trace:
[  474.370417]  <IRQ>
[  474.370425]  dump_stack+0x67/0x92
[  474.370432]  __warn+0xc6/0xe0
[  474.370437]  warn_slowpath_fmt+0x4a/0x50
[  474.370445]  dev_watchdog+0x225/0x230
[  474.370449]  ? qdisc_rcu_free+0x40/0x40
[  474.370452]  ? qdisc_rcu_free+0x40/0x40
[  474.370456]  call_timer_fn+0x92/0x380
[  474.370459]  ? process_timeout+0x10/0x10
[  474.370463]  ? qdisc_rcu_free+0x40/0x40
[  474.370467]  expire_timers+0x150/0x1f0
[  474.370472]  run_timer_softirq+0x7c/0x160
[  474.370480]  __do_softirq+0x116/0x4c0
[  474.370486]  irq_exit+0xa9/0xc0
[  474.370491]  smp_apic_timer_interrupt+0x38/0x50
[  474.370496]  apic_timer_interrupt+0x90/0xa0
[  474.370502] RIP: 0010:cpuidle_enter_state+0x135/0x380
[  474.370505] RSP: 0018:ffffc90000087e88 EFLAGS: 00000216 ORIG_RAX: ffffffffffffff10
[  474.370510] RAX: ffff88017a878040 RBX: 00000000080f0a0c RCX: 0000000000000001
[  474.370512] RDX: 0000000000000000 RSI: ffffffff81ca163e RDI: ffffffff81c7ce58
[  474.370515] RBP: ffffc90000087ec0 R08: ffff88017fd16f84 R09: 0000000000000018
[  474.370517] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000007
[  474.370520] R13: ffff88017fd236e0 R14: ffffffff81ec5798 R15: 0000006e6a99133f
[  474.370522]  </IRQ>
[  474.370532]  ? cpuidle_enter_state+0x131/0x380
[  474.370538]  cpuidle_enter+0x12/0x20
[  474.370542]  call_cpuidle+0x1e/0x40
[  474.370545]  do_idle+0x17e/0x1f0
[  474.370549]  cpu_startup_entry+0x18/0x20
[  474.370553]  start_secondary+0x102/0x120
[  474.370559]  start_cpu+0x14/0x14
[  474.370568] ---[ end trace e3af8012fdbe43a9 ]---
[  480.516639] xhci_hcd 0000:00:15.0: WARN: unexpected TRB Type 4
Comment 2 Chris Wilson 2017-03-16 12:34:23 UTC
[  474.370191] NETDEV WATCHDOG: enx000acd2892fb (ax88179_178a): transmit queue 0 timed out
Comment 4 Martin Peres 2017-03-23 10:01:37 UTC
Raising the priority, because it reduces our code coverage.

Failure rate 16/123 run(s) (13%)
Comment 5 Imre Deak 2017-03-29 11:18:29 UTC
*** Bug 100428 has been marked as a duplicate of this bug. ***
Comment 6 Martin Peres 2017-03-29 11:38:20 UTC
Also seen on fi-snb-2600 and fi-kbl-7560u.
Comment 7 Martin Peres 2017-03-31 11:50:34 UTC
Updated failing statistics:

 - bxt-t5700: Failure rate 21/184 run(s) (11%)
 - fi-kbl-7560u: Failure rate 16/41 run(s) (39%)
 - fi-snb-2600: Failure rate 2/22 run(s) (9%)
Comment 8 Maarten Lankhorst 2017-04-13 09:51:43 UTC
KBL is failing for a different reason in s4:

[  272.313155] [drm:intel_sbi_read [i915]] *ERROR* error during SBI read of reg 2a00
[  272.313182] [drm:intel_sbi_write [i915]] *ERROR* error during SBI write of 0 to reg 2a00
Comment 9 Jani Saarinen 2017-05-02 09:27:27 UTC
Now different reason for KBL 7500u/igt@gem_exec_suspend@basic-s4-devices
https://intel-gfx-ci.01.org/CI/CI_DRM_2569/fi-kbl-7500u/igt@gem_exec_suspend@basic-s4-devices.html

[  242.936931] [drm:intel_dp_aux_ch [i915]] *ERROR* dp aux hw did not signal timeout (has irq: 1)!
[  242.936950] [drm:intel_dp_aux_ch [i915]] *ERROR* dp_aux_ch not done status 0xac1003ff
Comment 10 Jari Tahvanainen 2017-05-02 10:22:26 UTC
Jani - please create a new bug for this new failure. Let' not mix several things in one  bug.
Comment 11 Jani Saarinen 2017-05-02 10:30:30 UTC
Yep, will do.
Comment 12 Jani Saarinen 2017-05-02 10:38:07 UTC
(In reply to Jani Saarinen from comment #11)
> Yep, will do.

Will be followed on https://bugs.freedesktop.org/show_bug.cgi?id=100904
Comment 13 Ricardo 2017-05-09 16:45:21 UTC
Adding tag into "Whiteboard" field - ReadyForDev
The bug still active
*Status is correct
*Platform is included
*Feature is included
*Priority and Severity correctly set
Comment 14 David Weinehall 2017-05-18 14:33:26 UTC
Tomi replaced the last ax88179_178a USB-net dongle in CI yesterday, so this particular warning *should* be fixed (of course the "real" fix would be for the ax88179_178a driver to handle power management properly, but that's out of our hands).

The "error during SBI read" is a different issue and should be reported separately (I can only see that one in the logs for BDW GVT-D though, not KBL?).

SBI_DBUFF0 (0x2a00) seems to be specific to LynxPoint though, it shouldn't be possible on anything else than Haswell & Broadwell.

@Marten: Do you have a link to logs where the SBI error occurred on KBL?

Tentatively marking this one as fixed.
Comment 15 Jani Saarinen 2017-05-18 18:09:14 UTC
Still issue on SNB: https://patchwork.freedesktop.org/series/24635/
Test gem_exec_suspend:
        Subgroup basic-s4-devices:
                pass       -> DMESG-WARN (fi-snb-2600) fdo#100125

Maybe KBL now fixed.
Comment 17 Jani Saarinen 2017-06-07 10:50:43 UTC
Still issues seen:
https://intel-gfx-ci.01.org/CI/igt@gem_exec_suspend@basic-s4-devices.html
Comment 18 Jari Tahvanainen 2017-07-04 11:51:26 UTC
still problem on KBL and SKL (SNB to be followed)
fi-kbl-r: 30 minutes / 0 runs ago, with result 'dmesg-warn'
fi-kbl-7560u: 3 hours / 1 run ago, with result 'dmesg-warn'
fi-skl-6600u: 1 day / 5 runs ago, with result 'dmesg-warn'
fi-snb-2600: 2017-06-21, with result 'dmesg-warn'
Removing BXT from the platforms.
Comment 19 Martin Peres 2017-07-17 08:35:19 UTC
This bug is not going anywhere, so I moved it here: https://bugzilla.kernel.org/show_bug.cgi?id=196399


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.