Bug 107139

Summary: [CI] KBL-G Hades Canyon doesn't survive igt@gem_exec_suspend@basic-s4-devices
Product: DRI Reporter: Tomi Sarvela <tomi.p.sarvela>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED DUPLICATE QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: KBL i915 features:

Description Tomi Sarvela 2018-07-06 12:59:36 UTC
Intel Hades Canyon (Kaby Lake-G CPU, with Vega-M graphics) does seem to panic in IGT igt@gem_exec_suspend@basic-s4-devices

Panic traces available:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4443/fi-kbl-8809g/pstore0-1530866512_Panic_1.log

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4444/fi-kbl-8809g/pstore0-1530875939_Panic_1.log
Comment 1 Chris Wilson 2018-07-06 13:03:28 UTC
The system hung after resume and owatch panicked, judging by the 60s gap in timestamps. I'd postulate it's related to the warns we see if does "successfully" resume.
Comment 2 Tomi Sarvela 2018-07-11 08:29:35 UTC
Trace catched. Moving bug to DRM/AMDgpu side

[  201.720189] atkbd serio0: Use 'setkeycodes 7c <keycode>' to make it known.
[  201.781782] done (allocated 209842 pages)
[  202.536562] DMAR: DRHD: handling fault status reg 3
[  202.536585] DMAR: [DMA Read] Request device [01:00.0] fault addr 527000 [fault reason 06] PTE Read access is not set
[  203.554655] [drm:amdgpu_vce_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
[  203.554684] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 15 (-110).
[  203.554687] [drm:process_one_work] *ERROR* ib ring test failed (-110).
[  207.647706] usb usb1: root hub lost power or was reset
[  207.647709] usb usb2: root hub lost power or was reset
[  207.648779] usb usb3: root hub lost power or was reset
[  207.648783] usb usb4: root hub lost power or was reset
[  207.977164] [drm:uvd_v6_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 12 test failed (0xCAFEDEAD)
[  207.977177] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <uvd_v6_0> failed -22
[  207.977189] [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-22).
[  207.977192] dpm_run_callback(): pci_pm_restore+0x0/0xa0 returns -22
[  207.977198] PM: Device 0000:01:00.0 failed to restore async: error -22
[  208.256034] atkbd serio0: Unknown key released (translated set 2, code 0x7c on isa0060/serio0).
[  208.256036] atkbd serio0: Use 'setkeycodes 7c <keycode>' to make it known.
[  208.459780] atkbd serio0: Unknown key released (translated set 2, code 0x7c on isa0060/serio0).
[  208.459782] atkbd serio0: Use 'setkeycodes 7c <keycode>' to make it known.
[  208.664209] atkbd serio0: Unknown key released (translated set 2, code 0x7c on isa0060/serio0).
[  208.664211] atkbd serio0: Use 'setkeycodes 7c <keycode>' to make it known.
[  208.671947] Setting dangerous option reset - tainting kernel

Full trace:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4467/fi-kbl-8809g/igt@gem_exec_suspend@basic-s4-devices.html

History:
https://intel-gfx-ci.01.org/tree/drm-tip/fi-kbl-8809g.html

Hardware: Intel Hades Canyon NUC8i7HVK (Kaby Lake CPU with "Vega-M" graphic)

Kconfig: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4467/kernel.config.bz2
Comment 3 ashutosh.dixit 2019-10-24 20:58:05 UTC
Bug assessment: The IGT submits batch buffers both before and after hibernating the system and ensures they complete successfully. This bug has not been updated in over a year. The system is a KBL NUC with a AMD GPU. Earlier, from the comments, it seems there was a panic in amdgpu side and component was assigned as DRM/AMDgpu.

At this point, from what I can tell, the test itself is passing except for the following trace in dmesg:

<3> [76.484258] [drm:intel_dp_aux_xfer [i915]] *ERROR* dp aux hw did not signal timeout!

https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5237/fi-kbl-7500u/igt@gem_exec_suspend@basic-s4-devices.html

This trace is coming from the i915 display port. The above trace is already been tracked as part of bug 105128. As a result of these findings I am:

a. Setting the component of this bug to DRM/Intel
b. Marking this bug as a duplicate of bug 105128
Comment 4 ashutosh.dixit 2019-10-24 21:01:41 UTC

*** This bug has been marked as a duplicate of bug 105128 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.