Bug 108616 - USB-C dock unplug trigger NULL pointer access
Summary: USB-C dock unplug trigger NULL pointer access
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Stanislav Lisovskiy
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2018-10-31 20:55 UTC by Lionel Landwerlin
Modified: 2018-11-14 11:00 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features: display/USB-C


Attachments
Kernel backtrace (28.05 KB, text/plain)
2018-10-31 20:56 UTC, Lionel Landwerlin
no flags Details
Assign intel_dp->is_mst only after mgr structure is properly initialized (1.31 KB, patch)
2018-11-06 08:41 UTC, Stanislav Lisovskiy
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Lionel Landwerlin 2018-10-31 20:55:40 UTC
That's on drm-tip : 

commit 3aa71a0d803ee01605f9a3026ddd989a591a73c6 (drm-tip-ssh/drm-tip)
Author: Jani Nikula <jani.nikula@intel.com>
Date:   Wed Oct 31 13:19:06 2018 +0200

    drm-tip: 2018y-10m-31d-11h-18m-05s UTC integration manifest


The dock is a Dell TB16 with an XPS 13 9360 laptop.
Comment 1 Lionel Landwerlin 2018-10-31 20:56:04 UTC
Created attachment 142311 [details]
Kernel backtrace
Comment 2 Lakshmi 2018-11-01 11:09:16 UTC
Lionel, can you attach the full dmesg from boot with kernel parameters drm.debug=0x1e log_buf_len=4M?
What is the impact of this issue? Apart from the 
null pointer references in the log, any other impact as a user?
Comment 3 Jani Saarinen 2018-11-01 11:13:57 UTC
Do we have any dups from this behaviour? Stan? 
At least from same docking station we have bugs...
Comment 4 Jani Saarinen 2018-11-01 11:15:00 UTC
Eg. https://bugs.freedesktop.org/show_bug.cgi?id=106250?
Comment 5 Lionel Landwerlin 2018-11-01 11:22:48 UTC
(In reply to Lakshmi from comment #2)
> Lionel, can you attach the full dmesg from boot with kernel parameters
> drm.debug=0x1e log_buf_len=4M?
> What is the impact of this issue? Apart from the 
> null pointer references in the log, any other impact as a user?

The machine is completely unusable, I thought of a hard hang first, but it seems it was just the kernel completely blocked.

I don't have the fully log since boot.
Comment 6 Stanislav Lisovskiy 2018-11-01 11:58:32 UTC
(In reply to Jani Saarinen from comment #3)
> Do we have any dups from this behaviour? Stan? 
> At least from same docking station we have bugs...

That can be with a pretty high probability duplicate of https://bugs.freedesktop.org/show_bug.cgi?id=107738. At least stack trace and circumstance when it happens look very similar.

I had attached an experimental fix, which seemed to help reporter of 107738. 
I would suggest to try it here as well.
Comment 7 Lionel Landwerlin 2018-11-01 12:14:50 UTC
(In reply to Stanislav Lisovskiy from comment #6)
> (In reply to Jani Saarinen from comment #3)
> > Do we have any dups from this behaviour? Stan? 
> > At least from same docking station we have bugs...
> 
> That can be with a pretty high probability duplicate of
> https://bugs.freedesktop.org/show_bug.cgi?id=107738. At least stack trace
> and circumstance when it happens look very similar.
> 
> I had attached an experimental fix, which seemed to help reporter of 107738. 
> I would suggest to try it here as well.

Thanks, I think this bug is duplicate too, proving we can reproduce on drm-tip.
I'll give a try to your patch.
Comment 8 Lionel Landwerlin 2018-11-02 08:47:48 UTC
What additional info do you need?
Comment 9 Stanislav Lisovskiy 2018-11-02 09:23:50 UTC
(In reply to Lionel Landwerlin from comment #8)
> What additional info do you need?

So does the fix help from 107738 help? My main question to the reported of this bug was that is this really reproducible with most recent drm-tip.

If so and the fix I attached really helps, it means that there is an internal logic issue with dp mst handling, which makes primary reference assigned to NULL,
even though it is still used. 
So even if my fix helps, there will be more serious fixes needed.
Comment 10 Lionel Landwerlin 2018-11-02 10:00:57 UTC
(In reply to Stanislav Lisovskiy from comment #9)
> (In reply to Lionel Landwerlin from comment #8)
> > What additional info do you need?
> 
> So does the fix help from 107738 help? My main question to the reported of
> this bug was that is this really reproducible with most recent drm-tip.

Yes the fix seems to help (still no hang/crash so far).

I'll retry without the patch. Here is the list of commit between the filing of this bug and what I'm currently running (+ your patch) :

$ git log --no-merges 5a4712f472bf67dc81b4d7cc571edca11cf52c87..drm-tip/drm-tip ./drivers/gpu/drm/i915/ | grep ^commit
commit 47164e0b0df88494235e6416a68ffcb993503b8d
commit ca05359f1e64cf8303ee532e50efe4ab7563d4a9
commit f9776280c29e77a18cbc7ebb6d48f7885e494990
commit 49af5d95b9b3c21a84ad115a9db9acbc036d849a
commit de9f8eea5a44b0b756d3d6345af7f8e630a3c8c0
commit 48197bc564c7a1888c86024a1ba4f956e0ec2300
commit 835fe6d75d14c1513910ed7f5665127fee12acc8
commit b4ec5f39e4a0bc9844634faa88dd08ca94dca39d
commit 83db37385306072eca403ed80c0a8cf7b0d39e05
commit a9b84b4492774668237b19aa65536e934df51567
commit d9a515867bdba59ebf196a6ade10faae8e8be36a
commit ab0d6a141843e0b4b2709dfd37b53468b5452c3a
commit e3118a038dfd1d6d902ea966e0ce3ce4e91e503b
commit 041444458835d7fb2c9f042598bfe16bf375b15d
commit 4bbf0d4749e707b6b262d576a9a9ef5c63b52dd4
commit c02ba4ef16eefe663fdefcccaa57fad32d5481bf
commit 80c188695a77eddaa6e8885510ff4ef59fd478c3
commit 708ea872601e56ef9ea8398c5a11938482bcbb4d
commit 9b27390139dbe0dc10d1899545248862fe826b61
commit 7cada4d0b7a0fb813dbc9777fec092e9ed0546e9
commit 3f6d5ba173da9a41c799146a3ad999da2158716a
commit 4c9613ce556fdeb671e779668b627ea3a2b61728
commit 4ca8ca9fe7dc792000c3762de5081a4d6dc33667
commit bda6b1c95751bbf1779562b8eca3c20b270c0f53
commit 7759ca3aac79648d01c9edcb3b00503c02bec2f5
commit a1ac5f0943019bfd76345fe05a42cbc400da685c
commit d817de3bc186c305b8e72a52547df2971c06499d
commit db7c8f1e5f1c1a5e1aaec04e50be6721c1cb4dff
commit 17dc7af70e89db773a7213f0b4270c69236a63ab
commit 92a6803149465e2339f8f7f8f6415d75be80073d
commit 792fab2c0d45758ad3d187bd252570d2bb627fa9
commit 0a1b60d76b0abcc2a0de4eb96d5dd379cd855f30
commit 2b82435cb90bed2c5f8398730d964dd11602217c
commit 4fe967912ee83048beb45a6b4f0f6774fddcfa0a
commit 399334708b4f07b107094e5db4a390f0f25d2d4f
commit 54ff01fd0d44b9681615f77c15fe9ea6dfadb501
commit b244ffa15c8b1aabdc117c0b6008086df7b668b7
commit b2b599fb54f90ae395ddc51f0d49e4f28244a8f8
commit b9b824a55876275f8506c1c187558ab22d879f73
commit c8ab5ac30ccc20a31672ab0f8938a6271dfe4122
commit 9174c1d6196d612799808009ec2796df021ab625
Comment 11 Lionel Landwerlin 2018-11-06 00:13:02 UTC
I had another unrecoverable crash with your patch, right after unplugging the laptop from the dock, but unfortunately no traces were written to the disk...
Comment 12 Lionel Landwerlin 2018-11-06 01:26:24 UTC
(In reply to Lionel Landwerlin from comment #11)
> I had another unrecoverable crash with your patch, right after unplugging
> the laptop from the dock, but unfortunately no traces were written to the
> disk...

Sorry, I meant without your patch on drm-tip : 

commit 6ae61ee5db4af12c0b21bf39e0400ccf024187c4 (drm-tip-ssh/drm-tip)
Author: Dave Airlie <airlied@redhat.com>
Date:   Mon Nov 5 15:18:11 2018 +1000

    drm-tip: 2018y-11m-05d-05h-17m-28s UTC integration manifest
Comment 13 Stanislav Lisovskiy 2018-11-06 07:47:48 UTC
(In reply to Lionel Landwerlin from comment #12)
> (In reply to Lionel Landwerlin from comment #11)
> > I had another unrecoverable crash with your patch, right after unplugging
> > the laptop from the dock, but unfortunately no traces were written to the
> > disk...
> 
> Sorry, I meant without your patch on drm-tip : 
> 
> commit 6ae61ee5db4af12c0b21bf39e0400ccf024187c4 (drm-tip-ssh/drm-tip)
> Author: Dave Airlie <airlied@redhat.com>
> Date:   Mon Nov 5 15:18:11 2018 +1000
> 
>     drm-tip: 2018y-11m-05d-05h-17m-28s UTC integration manifest

OK, but no issues with the patch? If so I need to proceed with the fix.
Comment 14 Stanislav Lisovskiy 2018-11-06 08:41:35 UTC
Created attachment 142385 [details] [review]
Assign intel_dp->is_mst only after mgr structure is properly initialized

Please check my new patch, which attempts to cure the origin of a problem, but not the symptom. Also please remove my first patch before that - otherwise it will hide the problem.
Comment 15 Lakshmi 2018-11-14 10:17:32 UTC
Lionel, did new patches helped the situation?
Comment 16 Ville Syrjala 2018-11-14 11:00:29 UTC
Presumably fixed by

commit 23d8003907d094f77cf959228e2248d6db819fa7
Author: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Date:   Fri Nov 9 11:00:12 2018 +0200

    drm/dp_mst: Check if primary mstb is null


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.