Bug 105367 - Dell XPS 13 and TB16: null pointer upon resume from hibernation/sleep
Summary: Dell XPS 13 and TB16: null pointer upon resume from hibernation/sleep
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-06 12:16 UTC by aappddeevv
Modified: 2018-04-24 06:59 UTC (History)
1 user (show)

See Also:
i915 platform: SKL
i915 features: display/DP MST


Attachments
journalctl -k -b -1 log (4.30 MB, text/plain)
2018-03-07 12:08 UTC, aappddeevv
no flags Details
journal -k -b -1 (2.04 MB, text/plain)
2018-03-07 12:21 UTC, aappddeevv
no flags Details
i915 crash after suspend resume, journalctl -k -r logs (372.45 KB, application/gzip)
2018-03-09 12:19 UTC, aappddeevv
no flags Details

Description aappddeevv 2018-03-06 12:16:54 UTC
I'm running fedora 4.17.7. If I sleep or hibernate while attached to the dock, upon resume I get a null pointer dereference. The computer is still running but graphics are lost and I must reboot. Oddly, running systemctl reboot does not work and I must physically power cycle.

 ---[ end trace b65a0397584a1c12 ]---
Mar 06 06:37:48 nc6910p kernel: CR2: 0000000000000000
Mar 06 06:37:48 nc6910p kernel: RIP: drm_dp_get_mst_branch_device+0xc6/0xf0 [drm_kms_helper] RSP: ffffa9358917bd28
Mar 06 06:37:48 nc6910p kernel: Code: 02 eb a4 48 8b 6f 20 48 85 ed 75 e0 45 0f b6 04 24 44 89 e9 48 c7 c2 28 c0 52 c0 31 f6 48 c7 c7 32 d3 52 c0 e8 0c 64 f7 ff eb 0a <f0> ff 45 00 0f 88 99 d4 00 00 48 89 df e8 38 26 38 ea 48 89
Mar 06 06:37:48 nc6910p kernel:  ret_from_fork+0x35/0x40
Mar 06 06:37:48 nc6910p kernel:  ? kthread_create_worker_on_cpu+0x70/0x70
Mar 06 06:37:48 nc6910p kernel:  kthread+0x113/0x130
Mar 06 06:37:48 nc6910p kernel:  ? process_one_work+0x390/0x390
Mar 06 06:37:48 nc6910p kernel:  worker_thread+0x2e/0x380
Mar 06 06:37:48 nc6910p kernel:  process_one_work+0x175/0x390
Mar 06 06:37:48 nc6910p kernel:  i915_digport_work_func+0x8d/0x110 [i915]
Mar 06 06:37:48 nc6910p kernel:  ? __clear_rsb+0x15/0x3d
Mar 06 06:37:48 nc6910p kernel:  intel_dp_hpd_pulse+0x19c/0x310 [i915]
Mar 06 06:37:48 nc6910p kernel:  intel_dp_check_mst_status+0xc1/0x200 [i915]
Mar 06 06:37:48 nc6910p kernel:  ? intel_dp_check_mst_status+0xc1/0x200 [i915]
Mar 06 06:37:48 nc6910p kernel:  drm_dp_mst_hpd_irq+0x10b/0x8d0 [drm_kms_helper]
Mar 06 06:37:48 nc6910p kernel: Call Trace:
Mar 06 06:37:48 nc6910p kernel: CR2: 0000000000000000 CR3: 00000001f420a002 CR4: 00000000003606f0
Mar 06 06:37:48 nc6910p kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 06 06:37:48 nc6910p kernel: FS:  0000000000000000(0000) GS:ffff8cdd3ec00000(0000) knlGS:0000000000000000
Mar 06 06:37:48 nc6910p kernel: R13: 0000000000000001 R14: ffff8cdd27a39748 R15: ffff8cdd27a39710
Mar 06 06:37:48 nc6910p kernel: R10: ffffa9358917bd28 R11: 0000000000000035 R12: ffff8cdd27a39880
Mar 06 06:37:48 nc6910p kernel: RBP: 0000000000000000 R08: ffff8cdd3ec214a0 R09: 0000000000000000
Mar 06 06:37:48 nc6910p kernel: RDX: 0000000000000000 RSI: ffff8cdbfa4e9fc0 RDI: ffff8cdd27a399e0
Mar 06 06:37:48 nc6910p kernel: RAX: 0000000000000000 RBX: ffff8cdd27a399d8 RCX: ffff8cdbfa4e9fc1
Mar 06 06:37:48 nc6910p kernel: RSP: 0018:ffffa9358917bd28 EFLAGS: 00010246
Mar 06 06:37:48 nc6910p kernel: RIP: 0010:drm_dp_get_mst_branch_device+0xc6/0xf0 [drm_kms_helper]
Mar 06 06:37:48 nc6910p kernel: Workqueue: i915-dp i915_digport_work_func [i915]
Mar 06 06:37:48 nc6910p kernel: Hardware name: Dell Inc. XPS 13 9350/0H67KH, BIOS 1.6.1 12/14/2017
Mar 06 06:37:48 nc6910p kernel: CPU: 0 PID: 26397 Comm: kworker/u8:122 Tainted: G        W        4.15.7-300.fc27.x86_64 #1
Mar 06 06:37:48 nc6910p kernel:  int3400_thermal pinctrl_sunrisepoint int340x_thermal_zone acpi_pad acpi_thermal_rel pinctrl_intel nfsd auth_rpcgss binfmt_misc nfs_acl lockd grace sunrpc i915 rtsx_pci_sdmmc mmc_core i2c_algo_bit
Mar 06 06:37:48 nc6910p kernel:  snd_soc_sst_dsp snd_hda_codec_hdmi snd_soc_sst_ipc snd_soc_acpi iTCO_wdt iTCO_vendor_support snd_soc_core intel_rapl dell_wmi wmi_bmof x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec
Mar 06 06:37:48 nc6910p kernel: Modules linked in: uinput cmac rfcomm bnep btusb btrtl btbcm btintel bluetooth ecdh_generic fuse hid_plantronics snd_usb_audio snd_usbmidi_lib snd_rawmidi cdc_ether usbnet r8152 mii thunderbolt nv
Mar 06 06:37:48 nc6910p kernel: Oops: 0002 [#1] SMP PTI
Mar 06 06:37:48 nc6910p kernel: PGD 0 P4D 0 
Mar 06 06:37:48 nc6910p kernel: IP: drm_dp_get_mst_branch_device+0xc6/0xf0 [drm_kms_helper]
Mar 06 06:37:48 nc6910p kernel: BUG: unable to handle kernel NULL pointer dereference at           (null)
Comment 1 aappddeevv 2018-03-06 12:18:52 UTC
I always have the laptop screen and 2 monitors running. The 2 monitors are connected through the dock.
Comment 2 aappddeevv 2018-03-06 12:21:23 UTC
Just prior to this message I also noticed another message which may be related:

Mar 06 06:37:48 nc6910p kernel: thunderbolt 0000:03:00.0: 1: reading drom (length: 0x6e)
Mar 06 06:37:48 nc6910p kernel: thunderbolt 0000:03:00.0:    unknown1: 0x0 unknown4: 0x0
Mar 06 06:37:48 nc6910p kernel: thunderbolt 0000:03:00.0:    Upstream Port Number: 1 Depth: 1 Route String: 0x1 Enabled: 1, PlugEventsDelay: 254ms
Mar 06 06:37:48 nc6910p kernel: thunderbolt 0000:03:00.0:   Config:
Mar 06 06:37:48 nc6910p kernel: thunderbolt 0000:03:00.0:   Max Port Number: 11
Mar 06 06:37:48 nc6910p kernel: thunderbolt 0000:03:00.0:  Switch: 8086:1578 (Revision: 4, TB Version: 2)
Mar 06 06:37:48 nc6910p kernel: thunderbolt 0000:03:00.0: current switch config:
Comment 3 Elizabeth 2018-03-06 16:01:28 UTC
Could you please attach full log with debug information, drm.debug=0xe parameter on grub, from boot till issue? Thank you.
Comment 4 aappddeevv 2018-03-07 12:08:08 UTC
Created attachment 137854 [details]
journalctl -k -b -1 log

Bad display after resume. Had to reboot. This does *not* showt the null pointer problem.
Comment 5 aappddeevv 2018-03-07 12:21:14 UTC
Created attachment 137855 [details]
journal -k -b -1

Another bad resume after hibernate. This does *not* show the null pointer error.
Comment 6 aappddeevv 2018-03-07 12:22:32 UTC
It will take me another day or so to generate the null condition as it usually happens after I use the computer all day.
Comment 7 aappddeevv 2018-03-09 12:16:58 UTC
Another log showing a crash/hang although not showing the null pointer error. The errors all seem different and dependent on the accumulation of what I was doing on the computer at the time.
Comment 8 aappddeevv 2018-03-09 12:19:51 UTC
Created attachment 137935 [details]
i915 crash after suspend resume, journalctl -k -r logs

Logs. This time I was resuming from suspend only (not hibernate) then attached to the dock. The screen never appeared and the log is attached. Eventually, the window manager crashed and I lost my session. After an evening sleep/hibernate and after resuming without attaching to the dock, then attach to the dock, I get these errors. This is *not* the null pointer error but its in the same spirit re: crash and must restart to fully recover.
Comment 9 Elizabeth 2018-03-09 18:18:03 UTC
I guess these may be related:

Channel equalization failed 5 times
*ERROR* Timed out waiting for DP idle patterns
Comment 10 aappddeevv 2018-03-28 22:14:38 UTC
Is the DP idle patterns being addressed somewhere? I have been trying to track kernel i915 commits but I don't see any activity. I'm not sure what to do. Ever since 4.15.10 (or so), this error wipes out my entire session daily.
Comment 11 aappddeevv 2018-03-28 22:14:46 UTC
Is the DP idle patterns being addressed somewhere? I have been trying to track kernel i915 commits but I don't see any activity. I'm not sure what to do. Ever since 4.15.10 (or so), this error wipes out my entire session daily.
Comment 12 Jani Saarinen 2018-03-29 07:11:23 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 13 aappddeevv 2018-04-05 12:32:35 UTC
Ouch, I turned off i915 debugging and actually started running 4.16. Of course, the next morning I ran into this bug again. I don't have debug logs but here's some text. I think 4.16 is more stable with the dock but not that much more. My machine froze and I had to reboot. The scenario was that I undocked at night. Hibernated, then resumed in the morning. Then plugged the dock back in. Froze immediately.

Apr 05 07:55:44 nc6910p kernel: RIP: drm_dp_get_mst_branch_device+0xc6/0xf0 [drm_kms_helper] RSP: ffffaf348944fce8
Apr 05 07:55:44 nc6910p kernel: Code: 02 eb a4 48 8b 6f 20 48 85 ed 75 e0 45 0f b6 04 24 44 89 e9 48 c7 c2 98 50 49 c0 31 f6 48 c7 c7 fc 63 49 c0 e8 6c 94 f8 ff eb 0a <f0> ff 45 00 0f 88 19 dd 00 00 48 89 df e8 78 8d 43 e8 48 89 e8 
Apr 05 07:55:44 nc6910p kernel:  ret_from_fork+0x35/0x40
Apr 05 07:55:44 nc6910p kernel:  ? kthread_create_worker_on_cpu+0x70/0x70
Apr 05 07:55:44 nc6910p kernel:  kthread+0x113/0x130
Apr 05 07:55:44 nc6910p kernel:  ? process_one_work+0x360/0x360
Apr 05 07:55:44 nc6910p kernel:  worker_thread+0x2e/0x380
Apr 05 07:55:44 nc6910p kernel:  process_one_work+0x175/0x360
Apr 05 07:55:44 nc6910p kernel:  i915_digport_work_func+0x8d/0x110 [i915]
Apr 05 07:55:44 nc6910p kernel:  ? __switch_to_asm+0x40/0x70
Apr 05 07:55:44 nc6910p kernel:  ? __switch_to+0xa2/0x4c0
Apr 05 07:55:44 nc6910p kernel:  ? __switch_to_asm+0x34/0x70
Apr 05 07:55:44 nc6910p kernel:  ? __switch_to_asm+0x40/0x70
Apr 05 07:55:44 nc6910p kernel:  ? __switch_to_asm+0x34/0x70
Apr 05 07:55:44 nc6910p kernel:  ? __switch_to_asm+0x40/0x70
Apr 05 07:55:44 nc6910p kernel:  ? __switch_to_asm+0x34/0x70
Apr 05 07:55:44 nc6910p kernel:  ? __switch_to_asm+0x40/0x70
Apr 05 07:55:44 nc6910p kernel:  intel_dp_hpd_pulse+0x1f9/0x380 [i915]
Apr 05 07:55:44 nc6910p kernel:  intel_dp_check_mst_status+0xc1/0x200 [i915]
Apr 05 07:55:44 nc6910p kernel:  ? intel_dp_check_mst_status+0xc1/0x200 [i915]
Apr 05 07:55:44 nc6910p kernel:  drm_dp_mst_hpd_irq+0x10b/0x8d0 [drm_kms_helper]
Apr 05 07:55:44 nc6910p kernel: Call Trace:
Apr 05 07:55:44 nc6910p kernel: CR2: 0000000000000000 CR3: 000000031020a005 CR4: 00000000003606e0
Apr 05 07:55:44 nc6910p kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 05 07:55:44 nc6910p kernel: FS:  0000000000000000(0000) GS:ffff8fc23ed00000(0000) knlGS:0000000000000000
Apr 05 07:55:44 nc6910p kernel: R13: 0000000000000001 R14: ffff8fc227865750 R15: ffff8fc227865718
Apr 05 07:55:44 nc6910p kernel: R10: ffffaf348944fce8 R11: 000000000000006b R12: ffff8fc227865888
Apr 05 07:55:44 nc6910p kernel: RBP: 0000000000000000 R08: ffff8fc23ed21560 R09: 0000000000000000
Apr 05 07:55:44 nc6910p kernel: RDX: 0000000000000000 RSI: ffff8fc23ece0000 RDI: ffff8fc2278659e8
Apr 05 07:55:44 nc6910p kernel: RAX: 0000000000000000 RBX: ffff8fc2278659e0 RCX: ffff8fc23ece0001
Apr 05 07:55:44 nc6910p kernel: RSP: 0018:ffffaf348944fce8 EFLAGS: 00010246
Apr 05 07:55:44 nc6910p kernel: RIP: 0010:drm_dp_get_mst_branch_device+0xc6/0xf0 [drm_kms_helper]
Apr 05 07:55:44 nc6910p kernel: Workqueue: i915-dp i915_digport_work_func [i915]
Apr 05 07:55:44 nc6910p kernel: Hardware name: Dell Inc. XPS 13 9350/0H67KH, BIOS 1.6.1 12/14/2017
Apr 05 07:55:44 nc6910p kernel: CPU: 2 PID: 22385 Comm: kworker/u8:52 Tainted: G           OE    4.16.0-1.vanilla.knurd.1.fc27.x86_64 #1
Apr 05 07:55:44 nc6910p kernel:  intel_lpss_pci intel_lpss wmi intel_hid sparse_keymap int3403_thermal int3400_thermal pinctrl_sunrisepoint acpi_thermal_rel int340x_thermal_zone pinctrl_intel acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc i915 rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper crc32
Apr 05 07:55:44 nc6910p kernel:  videobuf2_common videodev media ecdh_generic arc4 snd_hda_codec_hdmi snd_soc_skl snd_soc_skl_ipc snd_hda_ext_core snd_soc_sst_dsp snd_soc_sst_ipc snd_soc_acpi mac80211 snd_soc_core dell_wmi dell_laptop snd_hda_codec_realtek snd_hda_codec_generic intel_rapl iTCO_wdt iTCO_vendor_support 
Apr 05 07:55:44 nc6910p kernel: Modules linked in: ccm iwlmvm uinput hid_plantronics cdc_ether usbnet snd_usb_audio snd_usbmidi_lib snd_rawmidi r8152 mii thunderbolt nvmem_core rfcomm vmnet(OE) vmmon(OE) nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set n
Apr 05 07:55:44 nc6910p kernel: Oops: 0002 [#1] SMP PTI
Apr 05 07:55:44 nc6910p kernel: PGD 0 P4D 0 
Apr 05 07:55:44 nc6910p kernel: IP: drm_dp_get_mst_branch_device+0xc6/0xf0 [drm_kms_helper]
Apr 05 07:55:44 nc6910p kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
Comment 14 aappddeevv 2018-04-23 12:19:16 UTC
I'm having much better luck under 4.17 and the latest linux firmware (which helped alot it seems). I'm going to close this.
Comment 15 Jani Saarinen 2018-04-24 06:59:50 UTC
Thanks you, closed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.