Bug 94534 - Kernel 4.5.0 crashes when docking Lenovo T450s (with external monitor connected): GPF(drm_dp_payload_send_msg)
Summary: Kernel 4.5.0 crashes when docking Lenovo T450s (with external monitor connect...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: highest blocker
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-14 16:18 UTC by cs_gon
Modified: 2016-04-29 08:00 UTC (History)
9 users (show)

See Also:
i915 platform: BDW
i915 features: display/DP, display/DP MST


Attachments
dmesg output with drm.debug=0x1e (987.14 KB, text/plain)
2016-03-15 09:57 UTC, cs_gon
no flags Details

Description cs_gon 2016-03-14 16:18:58 UTC
I have also reported this bug on bugzilla.kernel.org ( https://bugzilla.kernel.org/show_bug.cgi?id=114601 ), not sure where the best place to report this is.



When docking the T450s into the docking station (Lenovo Pro Dock) with an external monitor connected, the kernel frequently crashes.

I have enabled kernel crash dumps and drm.debug=0x1e, and at least got the dmesg output. But I currently don't have the debugging symbols (the kernel is from the Ubuntu mainline PPA), so cannot provide a full backtrace at the moment.

Please let me know if this is needed. And is compiling with CONFIG_DEBUG_INFO=y enough, or are additional steps needed?

I also tested the drm-intel-nightly kernel (linux-image-4.5.0-994-generic_4.5.0-994.201603132257_amd64.deb from the Ubuntu kernel PPA), which causes the same crash (apart from the addresses the exact same stack trace).
Comment 1 cs_gon 2016-03-15 09:53:25 UTC
Was it really intended to set the i915 platform to "SKL"? This stands for Skylake, right? The Lenovo T450s is a Broadwell system.
Comment 2 cs_gon 2016-03-15 09:57:05 UTC
Created attachment 122312 [details]
dmesg output with drm.debug=0x1e

Hmm, I thought I had attached the dmesg output when submitting the bug, but I cannot find the attachment. So here is the dmesg output again..
Comment 3 Chris Wilson 2016-03-15 10:50:04 UTC
[   49.748726] BUG: unable to handle kernel paging request at 000001280000002d
[   49.748749] IP: [<ffffffffc018028b>] drm_dp_payload_send_msg+0x15b/0x210 [drm_kms_helper]
[   49.748778] PGD 0 
[   49.748784] Oops: 0000 [#1] SMP 
[   49.748795] Modules linked in: drbg ansi_cprng ctr ccm dm_crypt ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_nat nf_nat_ipv4 nf_nat xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter ip_tables x_tables cmac cdc_mbim cdc_wdm cdc_ncm usbnet cdc_acm mii btusb btrtl btbcm btintel rfcomm bnep bluetooth intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul arc4 crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel binfmt_misc aesni_intel iwlmvm aes_x86_64 mac80211 lrw gf128mul glue_helper ablk_helper cryptd snd_hda_codec iwlwifi input_leds joydev serio_raw rtsx_pci_ms cfg80211 intel_pch_thermal snd_hda_core mei_me snd_hwdep thinkpad_acpi memstick
[   49.749030]  snd_pcm lpc_ich mei nvram shpchp snd_timer parport_pc ppdev nf_conntrack_ftp nf_conntrack snd soundcore lp parport mac_hid rtsx_pci_sdmmc i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops psmouse ahci e1000e rtsx_pci libahci drm ptp pps_core wmi fjes video
[   49.749120] CPU: 3 PID: 1544 Comm: Xorg Tainted: G        W       4.5.0-040500-generic #201603140130
[   49.749144] Hardware name: LENOVO 20BWS00V00/20BWS00V00, BIOS JBET51WW (1.16 ) 07/08/2015
[   49.749166] task: ffff88034cf1aac0 ti: ffff88034cf3c000 task.ti: ffff88034cf3c000
[   49.749186] RIP: 0010:[<ffffffffc018028b>]  [<ffffffffc018028b>] drm_dp_payload_send_msg+0x15b/0x210 [drm_kms_helper]
[   49.749218] RSP: 0018:ffff88034cf3fad0  EFLAGS: 00010282
[   49.749233] RAX: ffff88034cf1aac0 RBX: 0000012800000005 RCX: ffff88034a329910
[   49.749252] RDX: 0000000080000000 RSI: 0000012800000005 RDI: ffff88034a329910
[   49.749271] RBP: ffff88034cf3fb20 R08: 00000000000b6ca4 R09: 0000000000002c51
[   49.749290] R10: 00000000ffff0b9b R11: 0000000000002c51 R12: ffff880349b364c0
[   49.749309] R13: ffff88034a329658 R14: ffff88034a329658 R15: ffff880035c3d000
[   49.749328] FS:  00007fc2936678c0(0000) GS:ffff88035dcc0000(0000) knlGS:0000000000000000
[   49.749350] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   49.749365] CR2: 000001280000002d CR3: 000000034a293000 CR4: 00000000003406e0
[   49.749384] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   49.749403] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   49.749422] Stack:
[   49.749428]  0258032002f34f60 ffff88034a329910 ffff88034cf3faf0 000000003463293f
[   49.749450]  000000003463293f 0000000000000000 ffff880349b364c0 ffff88034a329658
[   49.749471]  0000000000000000 ffff88034a3299a8 ffff88034cf3fb60 ffffffffc01808d2
[   49.749493] Call Trace:
[   49.749504]  [<ffffffffc01808d2>] drm_dp_update_payload_part2+0xc2/0x130 [drm_kms_helper]
[   49.749546]  [<ffffffffc024af51>] intel_mst_post_disable_dp+0x41/0xb0 [i915]
[   49.749580]  [<ffffffffc0220228>] haswell_crtc_disable+0x138/0x300 [i915]
[   49.749612]  [<ffffffffc02293e3>] intel_atomic_commit+0x3a3/0xb30 [i915]
[   49.749640]  [<ffffffffc004f0ac>] ? drm_ut_debug_printk+0x6c/0x90 [drm]
[   49.749667]  [<ffffffffc0066c17>] drm_atomic_commit+0x37/0x60 [drm]
[   49.749688]  [<ffffffffc0184bb6>] drm_atomic_helper_set_config+0x76/0xb0 [drm_kms_helper]
[   49.749717]  [<ffffffffc0055c32>] drm_mode_set_config_internal+0x62/0x100 [drm]
[   49.749745]  [<ffffffffc005a5b0>] drm_mode_setcrtc+0x3e0/0x500 [drm]
[   49.749768]  [<ffffffffc004b812>] drm_ioctl+0x152/0x540 [drm]
[   49.749790]  [<ffffffffc005a1d0>] ? drm_mode_setplane+0x1b0/0x1b0 [drm]
[   49.749810]  [<ffffffff81228bd1>] do_vfs_ioctl+0xa1/0x5b0
[   49.749827]  [<ffffffff8108ddc1>] ? __set_task_blocked+0x41/0xa0
[   49.749844]  [<ffffffff81090746>] ? __set_current_blocked+0x36/0x60
[   49.749862]  [<ffffffff81229159>] SyS_ioctl+0x79/0x90
[   49.749876]  [<ffffffff81090a0e>] ? SyS_rt_sigprocmask+0x8e/0xc0
[   49.749894]  [<ffffffff818235f6>] entry_SYSCALL_64_fastpath+0x16/0x75
[   49.749911] Code: e0 eb d1 49 8d 8e b8 02 00 00 49 8b 9f 30 04 00 00 48 89 cf 48 89 4d b8 e8 63 11 6a c1 49 83 be e8 02 00 00 00 48 8b 4d b8 74 6f <4c> 8b 63 28 4d 85 e4 74 66 49 3b 5c 24 20 74 46 49 8b 9c 24 30 
[   49.749998] RIP  [<ffffffffc018028b>] drm_dp_payload_send_msg+0x15b/0x210 [drm_kms_helper]
[   49.750025]  RSP <ffff88034cf3fad0>
[   49.750034] CR2: 000001280000002d
Comment 4 Nobody 2016-03-15 21:35:23 UTC
(In reply to cs_gon from comment #1)
> Was it really intended to set the i915 platform to "SKL"? This stands for
> Skylake, right? The Lenovo T450s is a Broadwell system.

You are right, updating i915 platform to BDW
Comment 5 cs_gon 2016-03-18 12:30:16 UTC
I rebuilt the 4.5.0 kernel from git, and tried to get more information/a better stack trace with the crash tool, but sadly that didn't work. 

Does anyone have an idea what to do? I tried a newer version of the crash tool (7.1.4, from Xenial), but got the same result. Could a corrupted stack be the reason for this?


Here is the output I got:

root@localhost:/var/crash/201603181202# crash /usr/lib/debug/lib/modules/4.5.0/vmlinux dump.201603181202 

crash 7.0.3
Copyright (C) 2002-2013  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

please wait... (gathering module symbol data)   
crash: invalid structure member offset: module_num_symtab
       FILE: kernel.c  LINE: 3049  FUNCTION: module_init()

[/usr/bin/crash] error trace: 467986 => 4d37d7 => 519f4a => 519ecc

  519ecc: (undetermined)
  519f4a: OFFSET_verify+42
  4d37d7: module_init+1255
  467986: main_loop+214
Comment 6 cs_gon 2016-04-18 12:06:06 UTC
I expected this, after reading the changelog of kernel 4.5.1, but still wanted to report back, that the problem still exists in kernel 4.5.1.

Is there any information you need from me to fix this bug? Chris Wilson raised the importance to highest/blocker, but for over a month now nothing has happened. If I can be of any help to get this resolved, please let me know! Thanks!
Comment 7 kowalski marcin 2016-04-21 10:01:48 UTC
I am having similar issues on hp 840 g3 when booting with displayport plugged in - sometimes.

I had a very easy to reproduce setup when booting arch linux installation image from february 2016 where it would crash everytime at boot with dp plugged in, printing some MCE information, other systems would produce different errors.
Comment 8 Jani Nikula 2016-04-22 09:02:09 UTC
Please try drm-intel-nightly branch of http://cgit.freedesktop.org/drm-intel and report back.
Comment 9 cs_gon 2016-04-22 13:30:25 UTC
I have tested with the current drm-intel-nightly branch. So far (after about 30 mins of testing) I haven't been able to reproduce the crash when only one external monitor is connected to the docking station, but with two external monitors I still ran into the same problem:


[ 6434.242848] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
[ 6434.242893] IP: [<ffffffffc01559c5>] drm_dp_get_last_connected_port_to_mstb+0x5/0x30 [drm_kms_helper]
[ 6434.242950] PGD 0 
[ 6434.242963] Oops: 0000 [#1] SMP 
[ 6434.242981] Modules linked in: hid_cherry hid_generic usbhid hid usb_serial_simple usbserial des_generic md4 nls_utf8 cifs fscache drbg ansi_cprng ctr ccm ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_nat nf_nat_ipv4 nf_nat xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter ip_tables x_tables dm_crypt cmac rfcomm bnep arc4 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel binfmt_misc kvm iwlmvm irqbypass mac80211 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iwlwifi cdc_acm aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd cfg80211 cdc_mbim cdc_wdm cdc_ncm joydev usbnet input_leds mii serio_raw btusb btrtl rtsx_pci_ms btbcm memstick btintel snd_hda_codec_realtek intel_pch_thermal snd_hda_codec_generic                                                                                                              
[ 6434.243399]  bluetooth lpc_ich snd_hda_codec_hdmi shpchp snd_hda_intel snd_hda_codec parport_pc snd_hda_core thinkpad_acpi ppdev nvram nf_conntrack_ftp nf_conntrack snd_hwdep snd_pcm lp parport snd_timer mei_me mac_hid snd soundcore mei rtsx_pci_sdmmc i915 psmouse i2c_algo_bit drm_kms_helper syscopyarea ahci sysfillrect sysimgblt e1000e libahci fb_sys_fops rtsx_pci drm ptp pps_core wmi fjes video                                                            
[ 6434.243582] CPU: 2 PID: 1495 Comm: Xorg Tainted: G        W       4.6.0-rc4+ #1
[ 6434.243613] Hardware name: LENOVO 20BWS00V00/20BWS00V00, BIOS JBET51WW (1.16 ) 07/08/2015
[ 6434.243648] task: ffff88034adbc740 ti: ffff88034bf98000 task.ti: ffff88034bf98000
[ 6434.243689] RIP: 0010:[<ffffffffc01559c5>]  [<ffffffffc01559c5>] drm_dp_get_last_connected_port_to_mstb+0x5/0x30 [drm_kms_helper]
[ 6434.243762] RSP: 0018:ffff88034bf9ba20  EFLAGS: 00010286
[ 6434.243787] RAX: ffff88034adbc740 RBX: 0000000000000000 RCX: 0000000000000000
[ 6434.243816] RDX: ffff88032833eeb8 RSI: 0000000000000000 RDI: 0000000000000000
[ 6434.243846] RBP: ffff88034bf9ba78 R08: 000000000001a040 R09: ffffffffc0159eea
[ 6434.243888] R10: ffff88035dc9a040 R11: ffffea000c292000 R12: ffff880349fac658
[ 6434.243925] R13: ffff8802fad05800 R14: ffff880349fac910 R15: ffff8802fad05c38
[ 6434.243959] FS:  00007f217ed298c0(0000) GS:ffff88035dc80000(0000) knlGS:0000000000000000
[ 6434.243999] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6434.244034] CR2: 0000000000000028 CR3: 000000034be4b000 CR4: 00000000003406e0
[ 6434.244065] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6434.244097] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 6434.244130] Stack:
[ 6434.244140]  ffffffffc0159f4c 0000000000000000 ffff880300000003 ffffffffc0153c00
[ 6434.244177]  00000009000002c0 00000000385f3add 0000000000000001 ffff880349fac658
[ 6434.244215]  0000000000000001 ffff880349570350 ffff8802fad05c38 ffff88034bf9bac0
[ 6434.244253] Call Trace:
[ 6434.244273]  [<ffffffffc0159f4c>] ? drm_dp_payload_send_msg+0x14c/0x1c0 [drm_kms_helper]
[ 6434.244316]  [<ffffffffc0153c00>] ? drm_dp_link_train_channel_eq_delay+0x20/0x40 [drm_kms_helper]
[ 6434.244374]  [<ffffffffc015a51c>] drm_dp_update_payload_part2+0xcc/0x130 [drm_kms_helper]
[ 6434.244420]  [<ffffffffc0153d2b>] ? drm_dp_dpcd_read+0x1b/0x20 [drm_kms_helper]
[ 6434.244486]  [<ffffffffc025ce92>] intel_mst_enable_dp+0x122/0x1b0 [i915]
[ 6434.244547]  [<ffffffffc023c79e>] haswell_crtc_enable+0x34e/0x900 [i915]
[ 6434.244606]  [<ffffffffc0237660>] intel_atomic_commit+0x1380/0x1fb0 [i915]
[ 6434.244660]  [<ffffffffc00635b7>] ? drm_atomic_set_crtc_for_connector+0x57/0xe0 [drm]
[ 6434.244713]  [<ffffffffc0064227>] drm_atomic_commit+0x37/0x60 [drm]
[ 6434.244751]  [<ffffffffc015e57b>] drm_atomic_helper_set_config+0x7b/0xb0 [drm_kms_helper]
[ 6434.244807]  [<ffffffffc00541e4>] drm_mode_set_config_internal+0x64/0x100 [drm]
[ 6434.244857]  [<ffffffffc0057d0d>] drm_mode_setcrtc+0xdd/0x500 [drm]
[ 6434.244899]  [<ffffffffc0049b2d>] drm_ioctl+0x25d/0x4f0 [drm]
[ 6434.244940]  [<ffffffffc0057c30>] ? drm_mode_setplane+0x1c0/0x1c0 [drm]
[ 6434.244977]  [<ffffffff81221376>] do_vfs_ioctl+0x96/0x590
[ 6434.245007]  [<ffffffff8108cf52>] ? __set_task_blocked+0x32/0x80
[ 6434.245040]  [<ffffffff81210951>] ? __sb_end_write+0x21/0x30
[ 6434.245080]  [<ffffffff8108f5d6>] ? __set_current_blocked+0x36/0x60
[ 6434.245112]  [<ffffffff812218e9>] SyS_ioctl+0x79/0x90
[ 6434.245138]  [<ffffffff8108f866>] ? SyS_rt_sigprocmask+0x86/0xb0
[ 6434.245170]  [<ffffffff817ff536>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[ 6434.245200] Code: 00 00 ba 80 ff ff ff eb c1 ba 04 00 00 00 01 c0 89 c1 83 f1 13 a8 10 0f 45 c1 83 ea 01 75 ef 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 <48> 8b 47 28 48 85 c0 74 06 48 39 78 20 74 01 c3 55 48 8b b8 30 
[ 6434.245346] RIP  [<ffffffffc01559c5>] drm_dp_get_last_connected_port_to_mstb+0x5/0x30 [drm_kms_helper]
[ 6434.247571]  RSP <ffff88034bf9ba20>
[ 6434.249750] CR2: 0000000000000028
Comment 10 Jani Nikula 2016-04-25 07:47:37 UTC
Please try current drm-intel-nightly branch of http://cgit.freedesktop.org/drm-intel, there are plenty of DP MST fixes.

If that doesn't work, also try this patch on top:
http://patchwork.freedesktop.org/patch/msgid/1461355726-13616-1-git-send-email-cpaul@redhat.com
Comment 11 cs_gon 2016-04-27 09:28:11 UTC
I cannot reproduce the crash any longer, using the drm-intel-nightly kernel from Monday up to commit 

commit b29d3f4720fd2b385caca40ccd6f95ad9736edd8
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Apr 25 09:36:12 2016 +0200

    drm-intel-nightly: 2016y-04m-25d-07h-35m-42s UTC integration manifest


I cannot say for sure, if this is fixed, because the crash was somewhat sporadic. Maybe some of the other affected people can weigh in on this?


But with this kernel I now have a kernel crash on resume in a specific use case, I will open a new bug report for that shortly.
Comment 12 yann 2016-04-27 11:36:08 UTC
Closing this bug since this is not occurring on last drm-intel-nightly. Please reopen if it happens again.
Tracking now new issue on bug 95165


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.