Created attachment 141349 [details]
Kernel log from journalctl output
-- chipset: Intel HD Graphics 520 (GT2)
-- system architecture: x86_64
-- xorg-server: 1.20.1 (using generic modesetting driver)
-- libdrm: 2.4.93
-- kernel version: 4.18.5-arch1-1-ARCH
-- Linux distribution: Archlinux
-- Machine: Lenovo T470 20JN (Intel Core i5-6300U)
-- Display connector: HDMI over DP MST adapter plugged on Thunderbolt 3 port
-- Adapter reference: Cable Matters USB-C Multiport Travel Dock with Dual HDMI and PD
When plugging the TB3 dock with two monitors already attached to it, the system hangs (unable to switch to a TTY or to blind-logging to reboot cleanly).
If the dock is plugged before booting, everything is working fine including the two monitors.
Additionally, if only one monitor is attached to the dock, the hot-plugging appears to work too.
I am not sure of the root cause of the issue, as the kernel shows several Oops, one of them being in Xorg process context (see below and attachment), but I think the problem is either in the generic DRM/KMS code or in the Intel DRM code.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000320
CPU: 2 PID: 1216 Comm: Xorg Tainted: G O 4.18.5-arch1-1-ARCH #1
? drm_mode_connector_property_set_ioctl+0x60/0x60 [drm]
I attached the kernel log from journalctl, the dock was re-connected at 16:32:30 (line 1029 in the log).
I don't have much time these days, but I will do my best to try to reproduce the bug with the drm-tip branch and to update the issue as soon as I have new inputs.
Please let me know if some other information is required.
Created attachment 141350 [details]
I was able to reproduce the bug with the drm-tip branch (commit 3c17d3c5703ec98d04ef2b5b735f297081ec0531).
I will attach the kernel log, but it seems to be the same error than with my Arch Linux stock kernel.
However, with drm-tip, I was unable to startup with the dock attached (several drm-related error during the systemd boot, and it hangs just before the auto-logging).
Should I open another bug report for that, or this kind of regression is expected on the drm-tip branch?
Created attachment 141366 [details]
Kernel log using drm-tip kernel
Leo, could you attach a dmesg log with kernel parameters drm.debug=0x1e log_buf_len=4M?
How often it occurs?
Recomended that dmesg shall have whole boot information.
(In reply to Lakshmi from comment #4)
> Leo, could you attach a dmesg log with kernel parameters drm.debug=0x1e
> How often it occurs?
Thank you for looking at the issue Lakshmi.
Indeed, I forgot to put the debug flags...
This time the bug cause only one of the kernel oops I had in the previous cases, and this is in a kworker context instead of in the Xorg one (time 37.197 in attached dmesg).
It occurs 100% of the time, as far as I tested, when I hotplug the adapter.
(In reply to Lakshmi from comment #5)
> Recomended that dmesg shall have whole boot information.
Maybe I missed something, I believed my attachments contained the whole boot info. Which additional information is missing and do you have some pointers on how to get it?
Created attachment 141371 [details]
Kernel log, drm-tip with DRM debug info
Booted without the adapter, opened an X session and then plugged the adapter (at ~36s in the logs) with the 2 monitors attached to it.
Created attachment 141372 [details]
Kernel log, stock 4.18.5 kernel, without hotplug (no issue)
In case it helps, I wanted to attach the dmesg of a case for which the adapter works without any issue.
Unfortunately, as some probably unrelated bugs happen with the drm-tip kernel, I had to use my distribution stock kernel (4.18.5).
The adapter was connected _before_ to power on the laptop. In this case, everything works as expected: early KMS detects and uses the 3 monitors (laptop display and the 2 external ones), Xorg can be extended on these without issue.
(In reply to Léo Grange from comment #7)
> Created attachment 141371 [details]
> Kernel log, drm-tip with DRM debug info
> Booted without the adapter, opened an X session and then plugged the adapter
> (at ~36s in the logs) with the 2 monitors attached to it.
For now, this is enough. I will come back to you, if I need more info.
Created attachment 141847 [details] [review]
Possible NULL dereference fix
Reporter, can you please check if this attached experimental patch helps against oops.
(In reply to Stanislav Lisovskiy from comment #10)
> Created attachment 141847 [details] [review] [review]
> Possible NULL dereference fix
> Reporter, can you please check if this attached experimental patch helps
> against oops.
Just had enough time to test quickly this afternoon, using the latest drm-tip (commit 6b7a44d1597) with your patch applied.
Tried a few different configurations (coldplug, hotplug a few times...): everything appear to work as expected!
I think you can close this issue for now, if a related issue appear during further tests I will let you know.
Thanks a lot for your work and for the quality of the Intel graphics support on Linux in general!
(In reply to Léo Grange from comment #11)
> (In reply to Stanislav Lisovskiy from comment #10)
> > Created attachment 141847 [details] [review] [review] [review]
> > Possible NULL dereference fix
> > Reporter, can you please check if this attached experimental patch helps
> > against oops.
> Just had enough time to test quickly this afternoon, using the latest
> drm-tip (commit 6b7a44d1597) with your patch applied.
> Tried a few different configurations (coldplug, hotplug a few times...):
> everything appear to work as expected!
> I think you can close this issue for now, if a related issue appear during
> further tests I will let you know.
> Thanks a lot for your work and for the quality of the Intel graphics support
> on Linux in general!
Great! Now I need to make this patch find it's way to upstream :)
(In reply to Léo Grange from comment #11)
BTW: Can you try also without this patch - could be it was just fixed with recent drm-tip. Also considering that I've just added a "not NULL" check against mgr->mst_primary, if that really helps, this means that there are some internal logic problems, which probably need to be fixed somewhere else, while this check is merely fixing a symptom.
Leo, can you verify if issue can reproduced with latest drm-tip without the patch (comment 10). This information is very much needed.
(In reply to Lakshmi from comment #14)
> Leo, can you verify if issue can reproduced with latest drm-tip without the
> patch (comment 10). This information is very much needed.
I think I had a repro on fairly recent drm-tip on https://bugs.freedesktop.org/show_bug.cgi?id=108616
I've been running with Stanislav's patch for a few hours and so far no hang, even after a few rapid unplug/replug.
Sorry for the lack of response during all this time...
I was quite busy the last weeks and had no access to the concerned device.
I will do my best to test with/without the patch during the next week, using the latest drm-tip branch.
The description of bug #108616 seems indeed similar to my issue, but it occurs only during the plug of the dock, not unplug in my case.
Created attachment 142386 [details] [review]
Assign intel_dp->is_mst only after mgr structure is properly initialized
Please check my new patch, which attempts to cure the origin of a problem, but not the symptom. Also please remove my first patch before that - otherwise it will hide the problem.
Presumably fixed by
Author: Stanislav Lisovskiy <email@example.com>
Date: Fri Nov 9 11:00:12 2018 +0200
drm/dp_mst: Check if primary mstb is null