Bug 100727 - [skl] [drm] BUG: unable to handle kernel NULL pointer dereference at (null) [drm_kms_helper]
Summary: [skl] [drm] BUG: unable to handle kernel NULL pointer dereference at (null) ...
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-19 20:49 UTC by kang
Modified: 2018-05-04 07:56 UTC (History)
2 users (show)

See Also:
i915 platform: SKL
i915 features: display/USB-C


Attachments
i915_display_info with dock connected (3.48 KB, text/plain)
2017-04-19 20:49 UTC, kang
no flags Details
i915_dp_mst_info (1.33 KB, text/plain)
2017-04-19 20:51 UTC, kang
no flags Details
t440_oops (4.16 KB, text/plain)
2017-08-16 22:31 UTC, carbenium
no flags Details
t440p_display_info (11.68 KB, text/plain)
2017-08-16 22:32 UTC, carbenium
no flags Details
t400p_dp_mst_info (1.67 KB, text/plain)
2017-08-16 22:34 UTC, carbenium
no flags Details
debug patch with workaround (802 bytes, patch)
2018-04-01 10:54 UTC, Jiri Slaby
no flags Details | Splinter Review

Description kang 2017-04-19 20:49:41 UTC
Created attachment 130927 [details]
i915_display_info with dock connected

-- Reboot --
Apr 19 13:08:31 xps13 kernel: IP: drm_dp_get_mst_branch_device+0xa2/0x110 [drm_kms_helper]
Apr 19 13:08:31 xps13 kernel: BUG: unable to handle kernel NULL pointer dereference at           (null)


Sadly that is all I have as it freezes here - took a few tries to get any data written to log before reboot. This happens when re-connecting a USB-C display on a dock that has multiple video inputs (only one screen connected in addition to laptop screen though).

Hardware: Dell XPS14 9350 / i915 / i7-6560U
Kernel tested: 4.10.10 and 4.11rc7
xorg-server: 1.19.3-2
xf86-video-intel: 1:2.99.917+770+gcb6ba2da-1 (git snap)

Will try to gather more data when it occurs - tips welcome though
Comment 1 kang 2017-04-19 20:51:42 UTC
Created attachment 130928 [details]
i915_dp_mst_info
Comment 2 Blaine Gardner 2017-05-02 18:46:58 UTC
I am seeing the same issue on a Dell Precision 5510 when I plug my docking station (with 2 connected screens) into my laptop. The system works if it is started with the dock attached. This is a new issue that has popped up sometime after early March.

> May 02 09:47:47 grim kernel: BUG: unable to handle kernel NULL pointer dereference at           (null)
> May 02 09:47:47 grim kernel: IP: drm_dp_get_mst_branch_device+0xc8/0x110 [drm_kms_helper]
> -- Reboot --

Hardware: Dell Precision 5510 / i915 / i7-6820HQ
Kernel: 4.10.10-1-default
xorg-server: xorg-x11-server-1.19.3-1.1.x86_64
xf86-video-intel: not installed
Comment 3 kang 2017-05-02 19:16:01 UTC
interestingly, same time frame here
I suspect we should bisect the kernel or at least try 4.8 or 4.9
Comment 4 kang 2017-05-08 20:42:39 UTC
to which i'll add, i now tried 4.9.26 and it does not have this problem.
Comment 5 Blaine Gardner 2017-05-12 00:37:12 UTC
I haven't had time to do any more testing on this, but I just confirmed that if I plug my laptop into the dock while it is in hibernation state, there are no issues when it comes back online with the dock plugged in. This may be a tolerable workaround until the issue is fixed.
Comment 6 kang 2017-05-19 22:26:22 UTC
on the bright side something seems to have fixed it after upgrading to kernel 4.12rc1

At least, after a couple of days I did not run into this. Fingers crossed ;-)
Comment 7 Jani Nikula 2017-05-30 12:32:07 UTC
There's insufficient info here to go on with. The full oops backtrace would be a start, full dmesg with drm.debug=14 even better.

I suspected dupe of bug 97666 but that's been fixed in v4.9.
Comment 8 Blaine Gardner 2017-06-07 18:24:57 UTC
Seems to be fixed on OpenSUSE TW's 4.11.3-1-default kernel.
Comment 9 Elizabeth 2017-06-21 16:12:32 UTC
(In reply to Blaine Gardner from comment #8)
> Seems to be fixed on OpenSUSE TW's 4.11.3-1-default kernel.

It seems that the problem has been fixed in recent kernel versions. I'm closing the bug. If there is any change in this case, please share it and change to REOPEN. Thanks.
Comment 10 carbenium 2017-08-16 22:29:59 UTC
I just hit this on 4.12.4-041204-generic x86_64

Hardware: Lenovo T440p / i7-4710MQ / NVIDIA GK208

Laptop is connected to a docking station and the two Dell U2515H displays are chained (DP 1.2).
The crash was triggered by switching inputs on the display connected directly to the docking station.
Comment 11 carbenium 2017-08-16 22:31:02 UTC
Created attachment 133559 [details]
t440_oops
Comment 12 carbenium 2017-08-16 22:32:40 UTC
Created attachment 133562 [details]
t440p_display_info
Comment 13 carbenium 2017-08-16 22:34:05 UTC
Created attachment 133563 [details]
t400p_dp_mst_info
Comment 14 Elizabeth 2017-10-25 16:54:25 UTC
(In reply to carbenium from comment #10)
> I just hit this on 4.12.4-041204-generic x86_64
> 
> Hardware: Lenovo T440p / i7-4710MQ / NVIDIA GK208
> 
> Laptop is connected to a docking station and the two Dell U2515H displays
> are chained (DP 1.2).
> The crash was triggered by switching inputs on the display connected
> directly to the docking station.
Hello Carbenium, could you try to reproduce with latest kernel versions or tip and if reproducible attach dmesg with debug info: drm.debug=0x1e log_bug_len=2M on grub. Thank you.

https://cgit.freedesktop.org/drm-tip
https://www.kernel.org
Comment 15 Jani Saarinen 2018-03-29 07:10:41 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 16 Jiri Slaby 2018-04-01 10:53:23 UTC
I hit it with 4.15.14 right now. I don't know how to reproduce though. 

What about this (mgr->mst_primary seems to be NULL here)?
@@ -1288,7 +1288,10 @@ static struct drm_dp_mst_branch *drm_dp_get_mst_branch_device(struct drm_dp_mst_
                        }
                }
        }
-       kref_get(&mstb->kref);
+       if (WARN_ON_ONCE(!mstb))
+               ;
+       else
+               kref_get(&mstb->kref);
 out:
        mutex_unlock(&mgr->lock);
        return mstb;
Comment 17 Jiri Slaby 2018-04-01 10:54:40 UTC
Created attachment 138472 [details] [review]
debug patch with workaround
Comment 18 Jani Saarinen 2018-04-20 14:07:57 UTC
More information provided.
Comment 19 Jani Saarinen 2018-04-25 06:43:15 UTC
JIri, 
Can you try with latest tip: https://cgit.freedesktop.org/drm-tip.
And attach dmesg with debug info with drm.debug=0x1e log_bug_len=2M on grub.
Comment 20 Jani Saarinen 2018-04-25 06:43:54 UTC
There has been MST fixed in drm-tip lately.
Comment 21 Jani Saarinen 2018-05-04 07:56:37 UTC
Closing, please re-open if occurs again.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.