Bug 76152

Summary: [IVB/HSW/BDW bisected]WARNING: CPU: 3 PID: 4178 at drivers/gpu/drm/i915/intel_pm.c:5570 check_power_well_state+0x69/0x6f [i915]() after resume form S3
Product: DRI Reporter: Guo Jinxian <jinxianx.guo>
Component: DRM/IntelAssignee: Imre Deak <imre.deak>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
resume power wells during audio resume
none
dmesg
none
move the i915 resume handling before audio resume
none
dmesg on next-queued 4726e0b
none
Call trace on latest next-queued none

Description Guo Jinxian 2014-03-14 04:07:57 UTC
Created attachment 95769 [details]
dmesg

system Environment:
--------------------------
platform: BDW HSW IVB 
Kernel:	(drm-intel-next-queued) 065f2ec2afc850960dcebc3b00766bc31c4ffd3b

Bug detailed description:
-------------------------
CallTrace"WARNING: CPU: 3 PID: 4178 at drivers/gpu/drm/i915/intel_pm.c:5570 check_power_well_state+0x69/0x6f [i915]() after resume form S3" after resume form S3

It's a regression bug.

Reproduce steps:
--------------------
1. echo mem > /sys/power/state
2. resume the OS
3. dmesg|grep Call

CallTrace
--------------------
[   55.632355] WARNING: CPU: 3 PID: 4178 at drivers/gpu/drm/i915/intel_pm.c:5570 check_power_well_state+0x69/0x6f [i915]()
[   55.632358] state mismatch for 'display' (always_on 0 hw state 0 use-count 2 disable_power_well 1
[   55.632378] Modules linked in: dm_mod iTCO_wdt pcspkr iTCO_vendor_support i2c_i801 snd_hda_codec_hdmi lpc_ich mfd_core snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore battery ppdev parport_pc parport ac acpi_cpufreq i915 video button drm_kms_helper drm
[   55.632383] CPU: 3 PID: 4178 Comm: kworker/u16:11 Not tainted 3.14.0-rc6_drm-intel-next-queued_065f2e_20140313+ #682
[   55.632394] Workqueue: events_unbound async_run_entry_fn
[   55.632399]  0000000000000000 0000000000000009 ffffffff817170e3 ffff88009edf1cb8
[   55.632402]  ffffffff81035052 0000000000000000 ffffffffa0063f69 ffff88014f4d16c0
[   55.632405]  ffffffffa00e7f18 ffffffffa00e7f18 0000000000000001 0000000000400000
[   55.632406] Call Trace:
[   55.632418]  [<ffffffff817170e3>] ? dump_stack+0x41/0x51
[   55.632423]  [<ffffffff81035052>] ? warn_slowpath_common+0x73/0x8b
[   55.632437]  [<ffffffffa0063f69>] ? check_power_well_state+0x69/0x6f [i915]
[   55.632442]  [<ffffffff81035102>] ? warn_slowpath_fmt+0x45/0x4a
[   55.632455]  [<ffffffffa0063f69>] ? check_power_well_state+0x69/0x6f [i915]
[   55.632470]  [<ffffffffa00683f0>] ? intel_display_power_get+0x7f/0xb6 [i915]
[   55.632476]  [<ffffffffa0221715>] ? azx_resume+0x35/0x103 [snd_hda_intel]
[   55.632482]  [<ffffffff812f9297>] ? pci_pm_restore+0x94/0x94
[   55.632488]  [<ffffffff81381d69>] ? dpm_run_callback.isra.8+0x24/0x52
[   55.632493]  [<ffffffff813822d4>] ? device_resume+0x10b/0x14c
[   55.632498]  [<ffffffff81382329>] ? async_resume+0x14/0x38
[   55.632502]  [<ffffffff8104f4bf>] ? async_run_entry_fn+0x55/0x10b
[   55.632506]  [<ffffffff81046660>] ? process_one_work+0x1bc/0x2ed
[   55.632510]  [<ffffffff81046bce>] ? worker_thread+0x1c7/0x2bc
[   55.632514]  [<ffffffff81046a07>] ? rescuer_thread+0x251/0x251
[   55.632518]  [<ffffffff8104b53e>] ? kthread+0xc5/0xcd
[   55.632524]  [<ffffffff8104b479>] ? kthread_freezable_should_stop+0x40/0x40
[   55.632530]  [<ffffffff817216bc>] ? ret_from_fork+0x7c/0xb0
[   55.632535]  [<ffffffff8104b479>] ? kthread_freezable_should_stop+0x40/0x40
[   55.632538] ---[ end trace 8103dbb1bb4c1e34 ]---
[   55.632820] e1000e 0000:00:19.0: System wakeup disabled by ACPI
[   55.632866] e1000e 0000:00:19.0: irq 57 for MSI/MSI-X
[   55.632877] snd_hda_intel 0000:00:03.0: irq 60 for MSI/MSI-X
Comment 1 Daniel Vetter 2014-03-14 06:18:19 UTC
We've had a massive pile of runtime pm patches and Paulo is now on vacation ... can you please bisect?
Comment 2 Guo Jinxian 2014-03-14 07:34:14 UTC
(In reply to comment #1)
> We've had a massive pile of runtime pm patches and Paulo is now on vacation
> ... can you please bisect?

The first bad commit is 8b48463f89429af408ff695244dc627e1acff4f7, thanks.
commit 8b48463f89429af408ff695244dc627e1acff4f7
Author: Lv Zheng <lv.zheng@intel.com>
Date:   Tue Dec 3 08:49:16 2013 +0800

    ACPI: Clean up inclusions of ACPI header files
Comment 3 Imre Deak 2014-03-14 13:37:03 UTC
Created attachment 95803 [details] [review]
resume power wells during audio resume

Looks as if the audio resume handler ran before the i915 one. After resume the HW state is not in sync with the SW state, so we have to sync them manually. This is nicely shown now after we added the state checks for power domains (commit 25eaa003bd186e4), but I suspect this issue has been always around and could cause the audio resume path to run with the required power well disabled. It will get enabled as soon as the i915 resume handler gets to run though.

I don't think this can happen on IVB, there the audio driver won't ask for the power well to begin with.

The attached patch should fix this (only compile tested), could you try it?
Comment 4 Guo Jinxian 2014-03-17 08:05:04 UTC
Created attachment 95912 [details]
dmesg

The patch unable to boot, there shows many Call Trace errors during booting. please check the attached dmesg for details. thanks.
Comment 5 Imre Deak 2014-03-19 12:25:47 UTC
Created attachment 96038 [details] [review]
move the i915 resume handling before audio resume

(In reply to comment #4)
> Created attachment 95912 [details]
> dmesg
> 
> The patch unable to boot, there shows many Call Trace errors during booting.
> please check the attached dmesg for details. thanks.

That was due to another bug fixed now by:

commit 05efeebd2838f0dedf765179244a7fb543fdca8a
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Mar 18 16:26:25 2014 +0100

    drm/i915: Fix up the forcewake timer initialization

I also found that the fix I attached didn't do a proper resume of the i915 device, it left the pci device itself disabled.

I attached a new fix and it should get rid of the warning in this bug report. Could you give it a try? Note that there can be other independent warnings that you see even after applying this patch. The ones you see both before and after the fix is applied should be handled separately from this bugzilla ticket.
Comment 6 Guo Jinxian 2014-03-21 02:36:59 UTC
(In reply to comment #5)
> Created attachment 96038 [details] [review] [review]
> move the i915 resume handling before audio resume
> 
> (In reply to comment #4)
> > Created attachment 95912 [details]
> > dmesg
> > 
> > The patch unable to boot, there shows many Call Trace errors during booting.
> > please check the attached dmesg for details. thanks.
> 
> That was due to another bug fixed now by:
> 
> commit 05efeebd2838f0dedf765179244a7fb543fdca8a
> Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> Date:   Tue Mar 18 16:26:25 2014 +0100
> 
>     drm/i915: Fix up the forcewake timer initialization
> 
> I also found that the fix I attached didn't do a proper resume of the i915
> device, it left the pci device itself disabled.
> 
> I attached a new fix and it should get rid of the warning in this bug
> report. Could you give it a try? Note that there can be other independent
> warnings that you see even after applying this patch. The ones you see both
> before and after the fix is applied should be handled separately from this
> bugzilla ticket.

this patch had merged into latest next-queued branch(4726e0b045b80c514377da35ca01467ef6a4de53), this CallTrace unable to reproduce, bug another CallTrace still exist.
"WARNING: CPU: 0 PID: 1417 at drivers/gpu/drm/drm_modes.c:119 drm_mode_probed_add+0x27/0x41 [drm]()"

And here has some other CallTraces after booting. please check dmesg in attachment for details.
Comment 7 Guo Jinxian 2014-03-21 02:38:35 UTC
Created attachment 96133 [details]
dmesg on next-queued 4726e0b
Comment 8 Chris Wilson 2014-03-21 09:32:20 UTC
Original bug fixed, new WARN is tracked elsewhere.
Comment 9 Guo Jinxian 2014-03-27 03:31:43 UTC
Checked on latest -next-queued(), this bug still can be reproduce.

[   41.430017] WARNING: CPU: 1 PID: 4206 at drivers/gpu/drm/i915/intel_pm.c:5569 check_power_well_state+0x69/0x6f [i915]()
[   41.430019] state mismatch for 'display' (always_on 0 hw state 0 use-count 2 disable_power_well 1
[   41.430034] Modules linked in: dm_mod iTCO_wdt iTCO_vendor_support pcspkr i2c_i801 snd_hda_codec_hdmi lpc_ich mfd_core snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore battery ppdev parport_pc parport ac acpi_cpufreq i915 video button drm_kms_helper drm
[   41.430037] CPU: 1 PID: 4206 Comm: kworker/u16:11 Tainted: G        W    3.14.0-rc7_drm-intel-next-queued_57e00b_20140326+ #1042
[   41.430044] Workqueue: events_unbound async_run_entry_fn
[   41.430047]  0000000000000000 0000000000000009 ffffffff81716de3 ffff8800a1631cb8
[   41.430049]  ffffffff81035052 0000000000000000 ffffffffa0064da8 ffff88014f4116c0
[   41.430052]  ffffffffa00e8fd8 ffffffffa00e8fd8 0000000000000001 0000000000400000
[   41.430052] Call Trace:
[   41.430058]  [<ffffffff81716de3>] ? dump_stack+0x41/0x51
[   41.430061]  [<ffffffff81035052>] ? warn_slowpath_common+0x73/0x8b
[   41.430070]  [<ffffffffa0064da8>] ? check_power_well_state+0x69/0x6f [i915]
[   41.430074]  [<ffffffff81035102>] ? warn_slowpath_fmt+0x45/0x4a
[   41.430087]  [<ffffffffa0064da8>] ? check_power_well_state+0x69/0x6f [i915]
[   41.430101]  [<ffffffffa006928d>] ? intel_display_power_get+0x85/0xbd [i915]
[   41.430105]  [<ffffffffa01de715>] ? azx_resume+0x35/0x103 [snd_hda_intel]
[   41.430110]  [<ffffffff812f91f7>] ? pci_pm_restore+0x94/0x94
[   41.430114]  [<ffffffff81381cb1>] ? dpm_run_callback.isra.8+0x24/0x52
[   41.430118]  [<ffffffff8138221c>] ? device_resume+0x10b/0x14c
[   41.430122]  [<ffffffff81382271>] ? async_resume+0x14/0x38
[   41.430125]  [<ffffffff8104f4bf>] ? async_run_entry_fn+0x55/0x10b
[   41.430129]  [<ffffffff81046660>] ? process_one_work+0x1bc/0x2ed
[   41.430132]  [<ffffffff81046bce>] ? worker_thread+0x1c7/0x2bc
[   41.430134]  [<ffffffff81046a07>] ? rescuer_thread+0x251/0x251
[   41.430138]  [<ffffffff8104b53e>] ? kthread+0xc5/0xcd
[   41.430143]  [<ffffffff8104b479>] ? kthread_freezable_should_stop+0x40/0x40
[   41.430146]  [<ffffffff817213fc>] ? ret_from_fork+0x7c/0xb0
[   41.430150]  [<ffffffff8104b479>] ? kthread_freezable_should_stop+0x40/0x40
[   41.430152] ---[ end trace 47b212a61ac14e5f ]---
[   41.430381] snd_hda_intel 0000:00:03.0: irq 57 for MSI/MSI-X
[   41.430469] e1000e 0000:00:19.0: System wakeup disabled by ACPI
[   41.430746] e1000e 0000:00:19.0: irq 60 for MSI/MSI-X
Comment 10 Guo Jinxian 2014-03-27 03:34:02 UTC
Created attachment 96439 [details]
Call trace on latest next-queued

The latest -next-queued commit is 57e00b2e1b348b3855f4a3b681d583f8c9cf39fb
Comment 11 Daniel Vetter 2014-03-27 08:35:09 UTC
Another one for Paulo to look at ... Paulo, do you have a branch with all the latest stuff?
Comment 12 Imre Deak 2014-03-27 09:13:24 UTC
(In reply to comment #11)
> Another one for Paulo to look at ... Paulo, do you have a branch with all
> the latest stuff?

The attached fix solves the issue based on comment 6.
Comment 13 Imre Deak 2014-04-15 18:27:43 UTC
The fix for this is merged now, so closing it.
Comment 14 Guo Jinxian 2014-05-15 06:40:37 UTC
Fixed on latest -nightly(c74cad3c2599b47438b168ca5629fbb00ab63f95). Thanks.
Comment 15 Elizabeth 2017-10-06 14:39:25 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.