Bug 93768

Summary: [BAT SKL DMC] GPU death starting with first symptom: WARNING backtrace: "DC6 already programmed to be enabled."
Product: DRI Reporter: Mika Kuoppala <mika.kuoppala>
Component: DRM/IntelAssignee: Patrik Jakobsson <patrik.r.jakobsson>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: highest CC: daniel, gary.c.wang, intel-gfx-bugs, rodrigo.vivi, zinigor+freedesktop
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: SKL i915 features: power/Other

Description Mika Kuoppala 2016-01-19 07:53:29 UTC
With two runs as of now:

archive/results/CI_IGT_test/CI_DRM_985/skl-i7k-2/
archive/results/CI_IGT_test/CI_DRM_974/skl-i7k-2/

the gpu as become unusable (failed initialization hanging always all
rings except render).

The first symptom is warning about dc6 is already enabled, then the problems mount up flips start to fail:

[  228.570754] kms_flip: executing
[  228.574100] kms_flip: starting subtest basic-plain-flip
[  230.658351] [drm] RC6 on
[  240.643520] [drm] RC6 on
[  249.367727] ------------[ cut here ]------------
[  249.367748] WARNING: CPU: 2 PID: 6202 at drivers/gpu/drm/i915/intel_runtime_pm.c:578 skl_enable_dc6+0x16a/0x190 [i915]()
[  249.367750] DC6 already programmed to be enabled.
[  249.367751] Modules linked in: i915 ax88179_178a usbnet mii x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul mei_me mei i2c_hid e1000e ptp pps_core [last unloaded: i915]
[  249.367768] CPU: 2 PID: 6202 Comm: kms_flip Tainted: G     U          4.4.0-gfxbench+ #1
[  249.367770] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 0505 11/16/2015
[  249.367771]  ffffffffa03296d8 ffff8800bbeafaa0 ffffffff813e00ac ffff8800bbeafae8
[  249.367775]  ffff8800bbeafad8 ffffffff810746f1 ffff88008b430000 ffff8800bc49c520
[  249.367779]  0000000000000002 ffff88008b430000 0000000004000000 ffff8800bbeafb38
[  249.367782] Call Trace:
[  249.367787]  [<ffffffff813e00ac>] dump_stack+0x4e/0x82
[  249.367791]  [<ffffffff810746f1>] warn_slowpath_common+0x81/0xc0
[  249.367793]  [<ffffffff81074777>] warn_slowpath_fmt+0x47/0x50
[  249.367805]  [<ffffffffa0259cba>] skl_enable_dc6+0x16a/0x190 [i915]
[  249.367816]  [<ffffffffa025a1f6>] gen9_dc_off_power_well_disable+0x136/0x240 [i915]
[  249.367827]  [<ffffffffa0257e67>] intel_power_well_disable+0x27/0x50 [i915]
[  249.367838]  [<ffffffffa025a937>] intel_display_power_put+0xb7/0x130 [i915]
[  249.367859]  [<ffffffffa02bc80a>] intel_atomic_commit+0x74a/0x17c0 [i915]
[  249.367862]  [<ffffffff81511285>] ? drm_atomic_check_only+0x145/0x660
[  249.367865]  [<ffffffff81510d68>] ? drm_atomic_set_crtc_for_connector+0x38/0xe0
[  249.367868]  [<ffffffff815117d2>] drm_atomic_commit+0x32/0x50
[  249.367871]  [<ffffffff814ee6d5>] drm_atomic_helper_set_config+0x75/0xb0
[  249.367874]  [<ffffffff81500650>] drm_mode_set_config_internal+0x60/0x110
[  249.367876]  [<ffffffff815053e6>] drm_mode_setcrtc+0x186/0x4f0
[  249.367879]  [<ffffffff81184658>] ? __might_fault+0x48/0xa0
[  249.367882]  [<ffffffff814f746d>] drm_ioctl+0x13d/0x590
[  249.367884]  [<ffffffff81505260>] ? drm_mode_setplane+0x1b0/0x1b0
[  249.367888]  [<ffffffff811d53fc>] do_vfs_ioctl+0x2fc/0x550
[  249.367890]  [<ffffffff8118dd4a>] ? vm_munmap+0x4a/0x60
[  249.367892]  [<ffffffff811e0e6a>] ? __fget_light+0x6a/0x90
[  249.367894]  [<ffffffff811d568c>] SyS_ioctl+0x3c/0x70
[  249.367898]  [<ffffffff8179e6db>] entry_SYSCALL_64_fastpath+0x16/0x73
[  249.367900] ---[ end trace 5fefebc7a8054607 ]---
[  249.512297] ------------[ cut here ]------------
[  249.512313] WARNING: CPU: 7 PID: 6202 at drivers/gpu/drm/drm_irq.c:1271 drm_wait_one_vblank+0x150/0x1a0()
[  249.512318] vblank wait timed out on crtc 2
[  249.512321] Modules linked in: i915 ax88179_178a usbnet mii x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul mei_me mei i2c_hid e1000e ptp pps_core [last unloaded: i915]
[  249.512358] CPU: 7 PID: 6202 Comm: kms_flip Tainted: G     U  W       4.4.0-gfxbench+ #1
[  249.512362] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 0505 11/16/2015
[  249.512366]  ffffffff81aa6387 ffff8800bbeafae0 ffffffff813e00ac ffff8800bbeafb28
[  249.512376]  ffff8800bbeafb18 ffffffff810746f1 ffff8800bc49c520 0000000000000002
[  249.512385]  0000000000000018 ffff88022cdabc30 0000000000000000 ffff8800bbeafb78
[  249.512393] Call Trace:
[  249.512402]  [<ffffffff813e00ac>] dump_stack+0x4e/0x82
[  249.512408]  [<ffffffff810746f1>] warn_slowpath_common+0x81/0xc0
[  249.512413]  [<ffffffff81074777>] warn_slowpath_fmt+0x47/0x50
[  249.512421]  [<ffffffff810bae19>] ? finish_wait+0x59/0x70
[  249.512428]  [<ffffffff814f9490>] drm_wait_one_vblank+0x150/0x1a0
[  249.512434]  [<ffffffff810baf80>] ? wait_woken+0x90/0x90
[  249.512490]  [<ffffffffa02bc7d1>] intel_atomic_commit+0x711/0x17c0 [i915]
[  249.512498]  [<ffffffff81511285>] ? drm_atomic_check_only+0x145/0x660
[  249.512504]  [<ffffffff81510d9a>] ? drm_atomic_set_crtc_for_connector+0x6a/0xe0
[  249.512511]  [<ffffffff815117d2>] drm_atomic_commit+0x32/0x50
[  249.512516]  [<ffffffff814ee6d5>] drm_atomic_helper_set_config+0x75/0xb0
[  249.512523]  [<ffffffff81500650>] drm_mode_set_config_internal+0x60/0x110
[  249.512530]  [<ffffffff815053e6>] drm_mode_setcrtc+0x186/0x4f0
[  249.512537]  [<ffffffff814f746d>] drm_ioctl+0x13d/0x590
[  249.512543]  [<ffffffff81505260>] ? drm_mode_setplane+0x1b0/0x1b0
[  249.512550]  [<ffffffff811d53fc>] do_vfs_ioctl+0x2fc/0x550
[  249.512556]  [<ffffffff8179e866>] ? int_ret_from_sys_call+0x52/0x9f
[  249.512561]  [<ffffffff811e0e6a>] ? __fget_light+0x6a/0x90
[  249.512567]  [<ffffffff811d568c>] SyS_ioctl+0x3c/0x70
[  249.512573]  [<ffffffff8179e6db>] entry_SYSCALL_64_fastpath+0x16/0x73
[  249.512578] ---[ end trace 5fefebc7a8054608 ]---
[  250.658658] [drm] RC6 on
[  252.743728] kms_flip: exiting, ret=99
[  312.744229] ------------[ cut here ]------------
[  312.744293] WARNING: CPU: 5 PID: 6202 at drivers/gpu/drm/i915/intel_display.c:3946 intel_atomic_commit+0x1741/0x17c0 [i915]()
[  312.744297] Removing stuck page flip
[  312.744300] Modules linked in: i915 ax88179_178a usbnet mii x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul mei_me mei i2c_hid e1000e ptp pps_core [last unloaded: i915]
[  312.744334] CPU: 5 PID: 6202 Comm: kms_flip Tainted: G     U  W       4.4.0-gfxbench+ #1
[  312.744338] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 0505 11/16/2015
[  312.744342]  ffffffffa03317b0 ffff8800bbeafb70 ffffffff813e00ac ffff8800bbeafbb8
[  312.744350]  ffff8800bbeafba8 ffffffff810746f1 ffff8800bc49c8a8 0000000000000000
[  312.744358]  ffff880230b62290 0000000000000000 ffff88022cad8348 ffff8800bbeafc08
[  312.744366] Call Trace:
[  312.744376]  [<ffffffff813e00ac>] dump_stack+0x4e/0x82
[  312.744383]  [<ffffffff810746f1>] warn_slowpath_common+0x81/0xc0
[  312.744388]  [<ffffffff81074777>] warn_slowpath_fmt+0x47/0x50
[  312.744439]  [<ffffffffa02bd801>] intel_atomic_commit+0x1741/0x17c0 [i915]
[  312.744447]  [<ffffffff810baf80>] ? wait_woken+0x90/0x90
[  312.744453]  [<ffffffff814ed5e1>] ? __drm_atomic_helper_crtc_duplicate_state+0x51/0x70
[  312.744461]  [<ffffffff815117d2>] drm_atomic_commit+0x32/0x50
[  312.744470]  [<ffffffff814eceb9>] drm_atomic_helper_connector_dpms+0xe9/0x1a0
[  312.744478]  [<ffffffff81506545>] drm_mode_obj_set_property_ioctl+0x235/0x240
[  312.744485]  [<ffffffff8150657b>] drm_mode_connector_property_set_ioctl+0x2b/0x30
[  312.744492]  [<ffffffff814f746d>] drm_ioctl+0x13d/0x590
[  312.744499]  [<ffffffff81506550>] ? drm_mode_obj_set_property_ioctl+0x240/0x240
[  312.744506]  [<ffffffff811d53fc>] do_vfs_ioctl+0x2fc/0x550
[  312.744512]  [<ffffffff81184658>] ? __might_fault+0x48/0xa0
[  312.744517]  [<ffffffff811e0e6a>] ? __fget_light+0x6a/0x90
[  312.744522]  [<ffffffff811d568c>] SyS_ioctl+0x3c/0x70
[  312.744529]  [<ffffffff8179e6db>] entry_SYSCALL_64_fastpath+0x16/0x73
[  312.744534] ---[ end trace 5fefebc7a8054609 ]---
Comment 1 Mika Kuoppala 2016-01-19 07:57:05 UTC
The dmesg traces on both cases quite identical, the triggering test being

kms_flip --r basic-plain-flip

[  232.182718] kms_flip: starting subtest basic-plain-flip
[  233.614716] [drm] RC6 on
[  244.593971] [drm] RC6 on
[  252.955029] ------------[ cut here ]------------
[  252.955049] WARNING: CPU: 7 PID: 6205 at drivers/gpu/drm/i915/intel_runtime_pm.c:578 skl_enable_dc6+0x16a/0x190 [i915]()
[  252.955050] DC6 already programmed to be enabled.
...
Comment 2 Chris Wilson 2016-01-19 09:12:09 UTC
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/inte
index bbca527..a90d4d0 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -497,7 +497,8 @@ static void gen9_set_dc_state(struct drm_i915_private *dev_p
        val &= ~mask;
        val |= state;
        I915_WRITE(DC_STATE_EN, val);
-       POSTING_READ(DC_STATE_EN);
+       if (wait_for((I915_READ(DC_STATE_EN) & mask) == state, 500))
+               DRM_ERROR("Timeout whilst enabling DC6\n");
 }
 
 void bxt_enable_dc9(struct drm_i915_private *dev_priv)
Comment 3 Daniel Vetter 2016-01-19 13:59:12 UTC
Just a note: We've also seen instances of this happening with patchwork runs for patches which only change non-skl specific code (i.e. not even generic code). This is definitely real at least on this specific machine. Would be good to have more skl machines up&running to figure out whether it's just a broken machine or a larger issue with our skl support.
Comment 4 Mika Kuoppala 2016-01-19 17:03:15 UTC
(In reply to Chris Wilson from comment #2)
> diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c
> b/drivers/gpu/drm/i915/inte
> index bbca527..a90d4d0 100644
> --- a/drivers/gpu/drm/i915/intel_runtime_pm.c
> +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
> @@ -497,7 +497,8 @@ static void gen9_set_dc_state(struct drm_i915_private
> *dev_p
>         val &= ~mask;
>         val |= state;
>         I915_WRITE(DC_STATE_EN, val);
> -       POSTING_READ(DC_STATE_EN);
> +       if (wait_for((I915_READ(DC_STATE_EN) & mask) == state, 500))
> +               DRM_ERROR("Timeout whilst enabling DC6\n");
>  }
>  
>  void bxt_enable_dc9(struct drm_i915_private *dev_priv)

Indeed something goes wrong around here. Sometimes when we disable,
DC6 stays disabled for only short amount of time (and/or few reads) but it pops back up as enabled. Thats still a mystery.

But what happens next is that we have DC6 ON and we try to reset, and apparently the resetting while on dc6 doesn't lead to success.
Comment 5 Patrik Jakobsson 2016-01-26 00:11:43 UTC
It seems we're accessing DC_STATE_EN with PG0 off. The DMC should trap and fix this but apparently fails. Looks like it's not waiting for PG0 to come back up. The result is that we mess up the DMC hw state bits and get unexpected results.

Reading DC_STATE_EN with PG0 off gives me all bits as 1's. We could poll until we get sensible data from the register or perhaps just manually power on PG0.
Comment 6 cprigent 2016-01-26 17:07:09 UTC
Bug scrub:
----------
Assigned to Patrick
Comment 7 Patrik Jakobsson 2016-01-26 17:23:41 UTC
*** Bug 93697 has been marked as a duplicate of this bug. ***
Comment 8 Patrik Jakobsson 2016-04-21 11:38:26 UTC
Mika, I believe this got fixed by patches sent by Imre (or do we still have hangs?). Did we pinpoint it to a specific patch? Can this bug be closed?
Comment 9 Mika Kuoppala 2016-04-22 06:08:35 UTC
(In reply to Patrik Jakobsson from comment #8)
> Mika, I believe this got fixed by patches sent by Imre (or do we still have
> hangs?). Did we pinpoint it to a specific patch? Can this bug be closed?

Tried to pinpoint but I think it was combination of Imre's rpm fixes with dmc state harderning ones. We haven't seen this since then.
Comment 10 Igor Zinovyev 2017-06-08 15:56:55 UTC
I seem to be affected by this bug as well, here is the kern.log snippet:

 [  537.187920] ------------[ cut here ]------------
 [  537.187949] WARNING: CPU: 2 PID: 5 at drivers/gpu/drm/i915/intel_runtime_pm.c:686 skl_enable_dc6+0xb5/0

 [  537.187949] DC6 already programmed to be enabled.
 [  537.187950] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat tun
 bnep i2c_designware_platform i2c_designware_core dell_wmi dell_rbtn dell_laptop dm_crypt snd_hda_codec_hdm
p_thermal dell_smbios intel_powerclamp dcdbas coretemp crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek 
codec_generic aesni_intel aes_x86_64 crypto_simd glue_helper cryptd joydev snd_usb_audio snd_usb_toneport s
o_raw snd_usb_line6 snd_hda_codec snd_hwdep snd_seq_midi snd_seq_midi_event snd_hda_core snd_rawmidi snd_se
ds uvcvideo videobuf2_vmalloc snd_seq_device iwlmvm videobuf2_memops snd_timer videobuf2_v4l2 usblp videobu

 [  537.187970]  videodev mac80211 snd media soundcore rtsx_pci_ms btusb memstick btrtl idma64 iwlwifi mei_
ss_pci processor_thermal_device shpchp intel_soc_dts_iosf hci_uart btbcm btqca btintel int3403_thermal blue
i intel_lpss int3402_thermal int3400_thermal int340x_thermal_zone mac_hid acpi_pad acpi_thermal_rel intel_h
rqfd irqbypass vfio_iommu_type1 vfio pci_stub parport_pc ppdev lp parport autofs4 rtsx_pci_sdmmc mmc_core n
ttm psmouse i2c_algo_bit drm_kms_helper firewire_ohci firewire_core syscopyarea sysfillrect crc_itu_t sysim
tsx_pci drm i2c_hid wmi pinctrl_sunrisepoint pinctrl_intel
 [  537.187993] CPU: 2 PID: 5 Comm: kworker/u16:0 Tainted: G        W       4.11.3 #36
 [  537.187993] Hardware name: Dell Inc. Precision 5510/08R8KJ, BIOS 1.2.13 08/08/2016
 [  537.188007] Workqueue: i915-dp i915_digport_work_func [i915]
 [  537.188008] Call Trace:
 [  537.188010]  dump_stack+0x4d/0x66
 [  537.188012]  __warn+0xc6/0xe0
 [  537.188013]  warn_slowpath_fmt+0x55/0x80
 [  537.188025]  ? fwtable_read32+0x9c/0x1c0 [i915]
 [  537.188034]  ? skl_set_power_well+0x143/0x5e0 [i915]
 [  537.188043]  skl_enable_dc6+0xb5/0xc0 [i915]
 [  537.188051]  gen9_dc_off_power_well_disable+0x2b/0x30 [i915]
 [  537.188059]  intel_power_well_disable+0x39/0x40 [i915]
 [  537.188068]  intel_display_power_put+0xcf/0x140 [i915]
 [  537.188080]  intel_dp_hpd_pulse+0x146/0x2f0 [i915]
 [  537.188092]  i915_digport_work_func+0x88/0x100 [i915]
 [  537.188094]  process_one_work+0x1ec/0x480
 [  537.188095]  worker_thread+0x43/0x4d0
 [  537.188096]  kthread+0x103/0x140
 [  537.188097]  ? process_one_work+0x480/0x480
 [  537.188098]  ? kthread_create_on_node+0x60/0x60
 [  537.188099]  ret_from_fork+0x29/0x40
 [  537.188100] ---[ end trace 612d405cb24a9eda ]---


I'm running Linux 4.11.3 on Ubuntu 17.04, 
I have a Dell Precision m5510 with an Intel GPU on the 915 chipset:

00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics P530 [8086:191d] (rev 06)
	DeviceName:  Onboard IGD
	Subsystem: Dell HD Graphics P530 [1028:06e5]
	Kernel driver in use: i915
	Kernel modules: i915
01:00.0 3D controller [0302]: NVIDIA Corporation GM107GLM [Quadro M1000M] [10de:13b1] (rev a2)
	Subsystem: Dell GM107GLM [Quadro M1000M] [1028:06e5]
	Kernel driver in use: nouveau
	Kernel modules: nvidiafb, nouveau

I'm using graphics drivers from Padoka PPA, the current versions are
libdrm-intel1:amd64 2.4.81+git1706051541.16444e1~z~padoka0
*mesa* 1:17.2~git170605162900.4b1e6ed~z~padoka0
libwayland* 1.13.0+git201705130017.0eefe99~z~padoka0

Please let me know if you need any more info from me. I'm happy to do a bisect if I can find a way to reproduce that error. Usually it happens when I leave the laptop sitting idle. After that I can't even REISUB, it's going completely unresponsive.
Comment 11 Mika Kuoppala 2017-06-08 16:42:25 UTC
Hi Igor,

Is it possible for you to try to reproduce with drm-tip? (https://cgit.freedesktop.org/drm-tip/log/)
Comment 12 Igor Zinovyev 2017-06-22 11:19:38 UTC
Hi Mika!

Thanks for the reply, sorry to keep you waiting for so long - forgot to sign up for notifications from this thread.

Sure, I'd be happy to try. Do I need to use more verbose debugging or something of that sort?
Comment 13 Igor Zinovyev 2017-06-22 15:11:25 UTC
So I have installed 4.12-rc6 from drm-tip, and this is what the kern.log entry looks like now for me:

Jun 22 17:27:06 precision kernel: [ 8177.720306] DC6 already programmed to be enabled.
Jun 22 17:27:06 precision kernel: [ 8177.720330] ------------[ cut here ]------------
Jun 22 17:27:06 precision kernel: [ 8177.720366] WARNING: CPU: 6 PID: 7082 at drivers/gpu/drm/i915/intel_runtime_pm.c:725 skl_enable_dc6+0x9f/0xb0 [i915]
Jun 22 17:27:06 precision kernel: [ 8177.720366] Modules linked in: uinput ccm rfcomm hid_multitouch cmac bnep i2c_designware_platform i2c_designware_core dell_wmi nls_iso8859_1 snd_hda_codec_hdmi intel_rapl dell_rbtn x86_pkg_temp_thermal dell_laptop intel_powerclamp dell_smbios dcdbas coretemp kvm_intel snd_hda_codec_realtek snd_hda_codec_generic kvm irqbypass iwlmvm snd_usb_toneport snd_usb_line6 joydev mac80211 snd_hda_intel snd_usb_audio serio_raw snd_hda_codec snd_usbmidi_lib snd_hwdep snd_hda_core snd_rawmidi snd_seq snd_pcm snd_seq_device uvcvideo input_leds iwlwifi rtsx_pci_ms snd_timer videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 memstick snd videobuf2_core mei_me videodev soundcore usblp mei media idma64 btusb intel_lpss_pci intel_pch_thermal btrtl processor_thermal_device ie31200_edac shpchp intel_soc_dts_iosf hci_uart
Jun 22 17:27:06 precision kernel: [ 8177.720385]  btbcm serdev btqca int3403_thermal btintel bluetooth dell_smo8800 ecdh_generic intel_lpss_acpi intel_lpss int3402_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel acpi_pad intel_hid mac_hid parport_pc ppdev lp parport efivarfs autofs4 btrfs xor raid6_pq algif_skcipher af_alg dm_crypt dm_mirror dm_region_hash dm_log rtsx_pci_sdmmc mmc_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc i915 aesni_intel nouveau aes_x86_64 crypto_simd glue_helper cryptd mxm_wmi psmouse ttm prime_numbers i2c_algo_bit firewire_ohci drm_kms_helper firewire_core crc_itu_t syscopyarea sysfillrect sysimgblt nvme fb_sys_fops nvme_core rtsx_pci drm i2c_hid wmi pinctrl_sunrisepoint pinctrl_intel
Jun 22 17:27:06 precision kernel: [ 8177.720405] CPU: 6 PID: 7082 Comm: kworker/u16:1 Tainted: G        W       4.12.0-rc6+ #1
Jun 22 17:27:06 precision kernel: [ 8177.720406] Hardware name: Dell Inc. Precision 5510/08R8KJ, BIOS 1.2.25 05/07/2017
Jun 22 17:27:06 precision kernel: [ 8177.720424] Workqueue: i915-dp i915_digport_work_func [i915]
Jun 22 17:27:06 precision kernel: [ 8177.720425] task: ffff969ca176c440 task.stack: ffffa81a86888000
Jun 22 17:27:06 precision kernel: [ 8177.720439] RIP: 0010:skl_enable_dc6+0x9f/0xb0 [i915]
Jun 22 17:27:06 precision kernel: [ 8177.720439] RSP: 0018:ffffa81a8688bd50 EFLAGS: 00010282
Jun 22 17:27:06 precision kernel: [ 8177.720440] RAX: 0000000000000025 RBX: ffff969cd4380000 RCX: 0000000000000000
Jun 22 17:27:06 precision kernel: [ 8177.720440] RDX: 0000000000000000 RSI: ffff969cfdd8cc88 RDI: ffff969cfdd8cc88
Jun 22 17:27:06 precision kernel: [ 8177.720441] RBP: ffffa81a8688bd58 R08: 0000000000000001 R09: 00000000000006b4
Jun 22 17:27:06 precision kernel: [ 8177.720441] R10: 0000000000000040 R11: 0000000000000000 R12: ffff969cd4380000
Jun 22 17:27:06 precision kernel: [ 8177.720442] R13: ffffffffc06ccde0 R14: ffff969cd4380000 R15: 0000000020000000
Jun 22 17:27:06 precision kernel: [ 8177.720443] FS:  0000000000000000(0000) GS:ffff969cfdd80000(0000) knlGS:0000000000000000
Jun 22 17:27:06 precision kernel: [ 8177.720443] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 22 17:27:06 precision kernel: [ 8177.720443] CR2: 0000364890f0e000 CR3: 000000076e20a000 CR4: 00000000003406e0
Jun 22 17:27:06 precision kernel: [ 8177.720444] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 22 17:27:06 precision kernel: [ 8177.720444] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 22 17:27:06 precision kernel: [ 8177.720445] Call Trace:
Jun 22 17:27:06 precision kernel: [ 8177.720458]  gen9_dc_off_power_well_disable+0x2b/0x30 [i915]
Jun 22 17:27:06 precision kernel: [ 8177.720470]  intel_power_well_disable+0x39/0x40 [i915]
Jun 22 17:27:06 precision kernel: [ 8177.720482]  intel_display_power_put+0xb5/0x110 [i915]
Jun 22 17:27:06 precision kernel: [ 8177.720499]  intel_dp_hpd_pulse+0x229/0x310 [i915]
Jun 22 17:27:06 precision kernel: [ 8177.720515]  i915_digport_work_func+0x88/0x100 [i915]
Jun 22 17:27:06 precision kernel: [ 8177.720518]  process_one_work+0x1d9/0x3e0
Jun 22 17:27:06 precision kernel: [ 8177.720519]  worker_thread+0x43/0x3e0
Jun 22 17:27:06 precision kernel: [ 8177.720520]  kthread+0x103/0x140
Jun 22 17:27:06 precision kernel: [ 8177.720521]  ? trace_event_raw_event_workqueue_work+0xa0/0xa0
Jun 22 17:27:06 precision kernel: [ 8177.720522]  ? kthread_create_on_node+0x60/0x60
Jun 22 17:27:06 precision kernel: [ 8177.720543]  ret_from_fork+0x22/0x30
Jun 22 17:27:06 precision kernel: [ 8177.720544] Code: 05 67 c0 15 00 01 e8 c1 76 fb e1 0f ff eb 99 80 3d 56 c0 15 00 00 75 a7 48 c7 c7 48 cb 6c c0 c6 05 46 c0 15 00 01 e8 a1 76 fb e1 <0f> ff eb 90 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 48 83 bf 38 
Jun 22 17:27:06 precision kernel: [ 8177.720559] ---[ end trace 6339127b4bfb1449 ]---
Jun 22 17:27:06 precision kernel: [ 8177.720859] [drm:gen9_set_dc_state [i915]] *ERROR* DC state mismatch (0x0 -> 0x2)
Jun 22 17:28:16 precision kernel: [ 8247.122737] [drm:gen9_set_dc_state [i915]] *ERROR* DC state mismatch (0x0 -> 0x2)

The most frustrating thing that it seems to be only happening when I'm away, although I don't have a suspend timeout set. Plus it only seems to be happening when I use an external monitor via a thunderbolt port.

Can I provide any more details here?
Comment 14 Mika Kuoppala 2017-06-27 08:22:44 UTC
Igor, you have different bug although with similar symptom about
dc6 state.

Please try to reproduce bug again with drm-tip kernel and drm.debug=0xe.

Can you attach whole dmesg? I take it that you can log through
ssh? Also attach error state if there is one 'cat /sys/class/drm/card0/error'
Comment 15 Ricardo 2017-06-27 14:07:14 UTC
closing

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.