Bug 86702 - [HSW] power domain refcount underrun
Summary: [HSW] power domain refcount underrun
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-25 13:52 UTC by Jonathan McDowell
Modified: 2017-07-24 22:50 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Jonathan McDowell 2014-11-25 13:52:45 UTC
Kernel 3.18-rc5 (the + is a DVB-T2 driver patch, confined to drivers/media/), but I've seen it with previous kernels as well. Seems to happen quite a lot, with different callpaths that always end up in intel_display_power_put. Laptop is undocked with only the internal LCD operational. Userspace is Debian/testing (jessie).

First instance on this boot:


Nov 21 08:20:48 mixian kernel: [83274.888687] ------------[ cut here ]------------
Nov 21 08:20:48 mixian kernel: [83274.888693] WARNING: CPU: 1 PID: 865 at drivers/gpu/drm/i915/intel_pm.c:6590 intel_display_power_put+0x14c/0x160()
Nov 21 08:20:48 mixian kernel: [83274.888694] Modules linked in: ctr ccm si2157 si2168 i2c_mux dvb_usb_cxusb dib0070 dvb_usb dvb_core rc_core binfmt_misc bnep nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc arc4 snd_hda_codec_hdmi joydev snd_hda_codec_realtek snd_hda_codec_generic iwlmvm x86_pkg_temp_thermal intel_powerclamp mac80211 btusb pcspkr psmouse serio_raw hid_multitouch bluetooth i2c_i801 iwlwifi snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm cfg80211 snd_timer xhci_pci xhci_hcd tpm_tis tpm battery ac evdev processor fuse autofs4 algif_skcipher af_alg hid_generic usbhid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sg ehci_pci sdhci_pci ehci_hcd thermal i2c_hid hid sdhci_acpi sdhci mmc_core
Nov 21 08:20:48 mixian kernel: [83274.888724] CPU: 1 PID: 865 Comm: Xorg Not tainted 3.18.0-rc5+ #11
Nov 21 08:20:48 mixian kernel: [83274.888725] Hardware name: Dell Inc. Latitude E7240/0V120R, BIOS A08 02/18/2014
Nov 21 08:20:48 mixian kernel: [83274.888726]  0000000000000000 0000000000000009 ffffffff8169d202 0000000000000000
Nov 21 08:20:48 mixian kernel: [83274.888727]  ffffffff810a7152 ffff880409ae002c ffff880409ae8810 ffff88041e01b000
Nov 21 08:20:48 mixian kernel: [83274.888729]  0000000000000001 ffff880409ae0000 ffffffff8141d98c ffff8803bd715f80
Nov 21 08:20:48 mixian kernel: [83274.888730] Call Trace:
Nov 21 08:20:48 mixian kernel: [83274.888736]  [<ffffffff8169d202>] ? dump_stack+0x41/0x51
Nov 21 08:20:48 mixian kernel: [83274.888739]  [<ffffffff810a7152>] ? warn_slowpath_common+0x72/0x90
Nov 21 08:20:48 mixian kernel: [83274.888741]  [<ffffffff8141d98c>] ? intel_display_power_put+0x14c/0x160
Nov 21 08:20:48 mixian kernel: [83274.888745]  [<ffffffff8146b0cb>] ? intel_crtc_control+0x7b/0x100
Nov 21 08:20:48 mixian kernel: [83274.888746]  [<ffffffff8146b1b6>] ? intel_crtc_update_dpms+0x66/0x80
Nov 21 08:20:48 mixian kernel: [83274.888748]  [<ffffffff81474401>] ? intel_connector_dpms+0x51/0x70
Nov 21 08:20:48 mixian kernel: [83274.888751]  [<ffffffff81405875>] ? drm_mode_obj_set_property_ioctl+0x345/0x350
Nov 21 08:20:48 mixian kernel: [83274.888753]  [<ffffffff814058ac>] ? drm_mode_connector_property_set_ioctl+0x2c/0x40
Nov 21 08:20:48 mixian kernel: [83274.888755]  [<ffffffff813f6d63>] ? drm_ioctl+0x1c3/0x5a0
Nov 21 08:20:48 mixian kernel: [83274.888759]  [<ffffffff81325448>] ? lockref_put_or_lock+0x48/0x80
Nov 21 08:20:48 mixian kernel: [83274.888762]  [<ffffffff811efefc>] ? dput+0x1c/0x1a0
Nov 21 08:20:48 mixian kernel: [83274.888764]  [<ffffffff811ec9f0>] ? do_vfs_ioctl+0x2d0/0x4a0
Nov 21 08:20:48 mixian kernel: [83274.888766]  [<ffffffff810c131c>] ? task_work_run+0x9c/0xd0
Nov 21 08:20:48 mixian kernel: [83274.888767]  [<ffffffff811ecc39>] ? SyS_ioctl+0x79/0x90
Nov 21 08:20:48 mixian kernel: [83274.888770]  [<ffffffff816a373f>] ? int_signal+0x12/0x17
Nov 21 08:20:48 mixian kernel: [83274.888771]  [<ffffffff816a34d2>] ? system_call_fastpath+0x12/0x17
Nov 21 08:20:48 mixian kernel: [83274.888772] ---[ end trace 2646903d2d0b00ab ]---
Nov 21 08:20:48 mixian kernel: [83274.888773] ------------[ cut here ]------------

Other more recent instances (same boot, hence tainted):

[165509.209043] ------------[ cut here ]------------
[165509.209050] WARNING: CPU: 3 PID: 771 at drivers/gpu/drm/i915/intel_pm.c:6594 intel_display_power_put+0xf9/0x160()
[165509.209051] Modules linked in: cdc_acm cpuid ctr ccm si2157 si2168 i2c_mux dvb_usb_cxusb dib0070 dvb_usb dvb_core rc_core binfmt_misc bnep nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc arc4 snd_hda_codec_hdmi joydev snd_hda_codec_realtek snd_hda_codec_generic iwlmvm x86_pkg_temp_thermal intel_powerclamp mac80211 btusb pcspkr psmouse serio_raw hid_multitouch bluetooth i2c_i801 iwlwifi snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm cfg80211 snd_timer xhci_pci xhci_hcd tpm_tis tpm battery ac evdev processor fuse autofs4 algif_skcipher af_alg hid_generic usbhid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sg ehci_pci sdhci_pci ehci_hcd thermal i2c_hid hid sdhci_acpi sdhci mmc_core
[165509.209102] CPU: 3 PID: 771 Comm: systemd-logind Tainted: G        W      3.18.0-rc5+ #11
[165509.209104] Hardware name: Dell Inc. Latitude E7240/0V120R, BIOS A08 02/18/2014
[165509.209105]  0000000000000000 0000000000000009 ffffffff8169d202 0000000000000000
[165509.209109]  ffffffff810a7152 ffffffff81c82780 ffff880409ae8810 0000000000000000
[165509.209112]  0000000000002000 ffff880409ae0000 ffffffff8141d939 0000000000000041
[165509.209116] Call Trace:
[165509.209121]  [<ffffffff8169d202>] ? dump_stack+0x41/0x51
[165509.209127]  [<ffffffff810a7152>] ? warn_slowpath_common+0x72/0x90
[165509.209130]  [<ffffffff8141d939>] ? intel_display_power_put+0xf9/0x160
[165509.209135]  [<ffffffff81494ec6>] ? intel_hdmi_set_edid+0x56/0xe0
[165509.209138]  [<ffffffff81495038>] ? intel_hdmi_detect+0x48/0xa0
[165509.209142]  [<ffffffff813fda63>] ? status_show+0x33/0x70
[165509.209146]  [<ffffffff814a9c47>] ? dev_attr_show+0x17/0x50
[165509.209150]  [<ffffffff8124a22a>] ? sysfs_kf_seq_show+0xaa/0x150
[165509.209154]  [<ffffffff811fbb7d>] ? seq_read+0xcd/0x3c0
[165509.209158]  [<ffffffff811d97da>] ? vfs_read+0x8a/0x170
[165509.209161]  [<ffffffff811da2bd>] ? SyS_read+0x3d/0xa0
[165509.209166]  [<ffffffff816a34d2>] ? system_call_fastpath+0x12/0x17
[165509.209168] ---[ end trace 2646903d2d0b012d ]---

[167256.763647] ------------[ cut here ]------------
[167256.763653] WARNING: CPU: 3 PID: 865 at drivers/gpu/drm/i915/intel_pm.c:6590
 intel_display_power_put+0x14c/0x160()
[167256.763655] Modules linked in: cdc_acm cpuid ctr ccm si2157 si2168 i2c_mux d
vb_usb_cxusb dib0070 dvb_usb dvb_core rc_core binfmt_misc bnep nfsd auth_rpcgss 
oid_registry nfs_acl nfs lockd grace fscache sunrpc arc4 snd_hda_codec_hdmi joydev snd_hda_codec_realtek snd_hda_codec_generic iwlmvm x86_pkg_temp_thermal intel_powerclamp mac80211 btusb pcspkr psmouse serio_raw hid_multitouch bluetooth i2c_i801 iwlwifi snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm cfg80211 snd_timer xhci_pci xhci_hcd tpm_tis tpm battery ac evdev processor fuse autofs4 algif_skcipher af_alg hid_generic usbhid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sg ehci_pci sdhci_pci ehci_hcd thermal i2c_hid hid sdhci_acpi sdhci mmc_core
[167256.763687] CPU: 3 PID: 865 Comm: Xorg Tainted: G        W      3.18.0-rc5+ #11
[167256.763688] Hardware name: Dell Inc. Latitude E7240/0V120R, BIOS A08 02/18/2014
[167256.763689]  0000000000000000 0000000000000009 ffffffff8169d202 0000000000000000
[167256.763691]  ffffffff810a7152 ffff880409ae002c ffff880409ae8810 0000000000000000
[167256.763693]  0000000000000000 ffff880409ae0000 ffffffff8141d98c 0000000000000000
[167256.763694] Call Trace:
[167256.763699]  [<ffffffff8169d202>] ? dump_stack+0x41/0x51
[167256.763703]  [<ffffffff810a7152>] ? warn_slowpath_common+0x72/0x90
[167256.763705]  [<ffffffff8141d98c>] ? intel_display_power_put+0x14c/0x160
[167256.763708]  [<ffffffff8148bc0f>] ? intel_edp_backlight_power+0x2f/0xa0
[167256.763711]  [<ffffffff8149abc2>] ? intel_backlight_device_update_status+0xe2/0x160
[167256.763713]  [<ffffffff81374a09>] ? brightness_store+0xc9/0xe0
[167256.763716]  [<ffffffff812494e3>] ? kernfs_fop_write+0xe3/0x160
[167256.763720]  [<ffffffff811d996d>] ? vfs_write+0xad/0x1e0
[167256.763721]  [<ffffffff811da35d>] ? SyS_write+0x3d/0xa0
[167256.763724]  [<ffffffff816a34d2>] ? system_call_fastpath+0x12/0x17
[167256.763725] ---[ end trace 2646903d2d0b0132 ]---
[167256.763728] ------------[ cut here ]------------

[168741.159842] ------------[ cut here ]------------
[168741.159848] WARNING: CPU: 2 PID: 16909 at drivers/gpu/drm/i915/intel_pm.c:6590 intel_display_power_put+0x14c/0x160()
[168741.159849] Modules linked in: cdc_acm cpuid ctr ccm si2157 si2168 i2c_mux dvb_usb_cxusb dib0070 dvb_usb dvb_core rc_core binfmt_misc bnep nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc arc4 snd_hda_codec_hdmi joydev snd_hda_codec_realtek snd_hda_codec_generic iwlmvm x86_pkg_temp_thermal intel_powerclamp mac80211 btusb pcspkr psmouse serio_raw hid_multitouch bluetooth i2c_i801 iwlwifi snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm cfg80211 snd_timer xhci_pci xhci_hcd tpm_tis tpm battery ac evdev processor fuse autofs4 algif_skcipher af_alg hid_generic usbhid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sg ehci_pci sdhci_pci ehci_hcd thermal i2c_hid hid sdhci_acpi sdhci mmc_core
[168741.159882] CPU: 2 PID: 16909 Comm: kworker/2:3 Tainted: G        W      3.18.0-rc5+ #11
[168741.159883] Hardware name: Dell Inc. Latitude E7240/0V120R, BIOS A08 02/18/2014
[168741.159887] Workqueue: events edp_panel_vdd_work
[168741.159888]  0000000000000000 0000000000000009 ffffffff8169d202 0000000000000000
[168741.159890]  ffffffff810a7152 ffff880409ae002c ffff880409ae8810 ffff88041eb127c0
[168741.159891]  0000000000000000 ffff880409ae0000 ffffffff8141d98c ffff8804085a7508
[168741.159893] Call Trace:
[168741.159898]  [<ffffffff8169d202>] ? dump_stack+0x41/0x51
[168741.159901]  [<ffffffff810a7152>] ? warn_slowpath_common+0x72/0x90
[168741.159903]  [<ffffffff8141d98c>] ? intel_display_power_put+0x14c/0x160
[168741.159906]  [<ffffffff810bdd22>] ? process_one_work+0x142/0x3e0
[168741.159908]  [<ffffffff810be293>] ? worker_thread+0x63/0x480
[168741.159910]  [<ffffffff810be230>] ? rescuer_thread+0x270/0x270
[168741.159912]  [<ffffffff810c28fe>] ? kthread+0xce/0xf0
[168741.159914]  [<ffffffff810c2830>] ? kthread_create_on_node+0x180/0x180
[168741.159916]  [<ffffffff816a342c>] ? ret_from_fork+0x7c/0xb0
[168741.159918]  [<ffffffff810c2830>] ? kthread_create_on_node+0x180/0x180
[168741.159919] ---[ end trace 2646903d2d0b014d ]---
Comment 1 Daniel Vetter 2014-11-26 08:32:31 UTC
It's an inbalance in the power domain reference counts. Once we managed to get there only a report will stop these.

We have a few other reports, but thus far no one has a good way to quickly reproduce these. Which is what we need to be able to understand where that refcount goes amiss - all the backtraces here are way past the point of the failure unfortunately.

So if you could try to figure out what sequence is required to get there (really needs some good luck I guess) that might help.
Comment 2 Damien Lespiau 2014-12-03 14:59:09 UTC
If you manage to reproduce the failure, could you dump /sys/kernel/debug/dri/0/i915_power_domain_info? this will have some useful info to confine the problem within a certain area.
Comment 3 Jonathan McDowell 2014-12-03 17:01:05 UTC
I failed to see a reproduction all week with a fresh boot into 3.18-rc6; I have rebooted today into 3.18-rc7 and will continue to keep an eye out, grabbing that file if/when I see the problem again.
Comment 4 Jonathan McDowell 2014-12-26 16:47:41 UTC
I didn't manage to catch this at the first occurrence but hopefully it's still helpful:

Power well/domain         Use count
always-on                 -2
  PIPE_A                  1
  TRANSCODER_EDP          1
  PORT_DDI_A_2_LANES      0
  PORT_DDI_A_4_LANES      -4
  PORT_DDI_B_2_LANES      0
  PORT_DDI_B_4_LANES      0
  PORT_DDI_C_2_LANES      0
  PORT_DDI_C_4_LANES      0
  PORT_DDI_D_2_LANES      0
  PORT_DDI_D_4_LANES      0
  PORT_CRT                0
  PLLS                    0
  INIT                    0
display                   1
  PIPE_B                  0
  PIPE_C                  0
  PIPE_A_PANEL_FITTER     0
  PIPE_B_PANEL_FITTER     0
  PIPE_C_PANEL_FITTER     0
  TRANSCODER_A            0
  TRANSCODER_B            0
  TRANSCODER_C            0
  PORT_DSI                0
  PORT_OTHER              0
  VGA                     0
  AUDIO                   1
  INIT                    0
Comment 5 Jesse Barnes 2015-03-30 22:00:27 UTC
Still an issue with current bits?  The power domain handling has seen some fixes since then iirc.
Comment 6 Jonathan McDowell 2015-03-30 22:35:18 UTC
I haven't seen this in over a month (that's as far back as logs go). Currently running 4.0-rc3 and seeing a warning in intel_check_page_flip() but it looks like that's fixed by 6c51d46f135b00c00373fcd029786ccef2b02b5b so I will update to 4.0-rc6.

Feel free to close this and I can re-open if I see a power_put problem again?
Comment 7 Jani Nikula 2015-04-01 08:41:32 UTC
Thanks for the report, don't hesitate to reopen if the problem reappears.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.