Created attachment 117412 [details]
Path to make drm_dp_send_link_address set link_address_sent before hotplug handler changes the pointer to it
When I "hot plug" insert a DisplayPort multistream transport ("MST") hub into a computer that is running a Linux 4.2-rc4 kernel or any recent version, I get a kernel memory fault in drm_dp_add_port. In earlier kernels, I traced it to the line after drm_dp_add_port's call to drm_dp_send_link_address:
port->mstb->link_address_sent = true;
I believe that the problem is that drm_dp_send_link_address can call a hotplug handler, which can change port->mstb so that it cannot be dereferenced (I think it is set to NULL, but I have forgotten).
Since drm_dp_send_link_address is a static function in this file, its address is never taken, and there are only two call sites, both of which set link_address_sent to true, the patch that I have attached changes drm_dp_send_link_address so that it sets that flag, reducing duplication of code, and also allowing drm_dp_send_link_address to skip setting the flag if sending the link address fails.
An intentional behavior change introduced by this patch is that the mstb->link_address_sent is not set if sending the link address was aborted due to the memory allocation in drm_dp_send_link_address failing.
Perhaps it should also not be set if the action receives a NAK reply, but I haven't studied the code and DisplayPort documentation enough to guess. Whoever examines this patch should feel free to revise it as they see fit.
I should also warn that after I made this change in linux-4.2-rc4, the resulting kernel eventually got another kernel memory fault in i2c_transfer (called by drm_do_probe_ddc_edid), but I believe that that is a separate issue.
I am happy to answer questions about this patch. If nobody sees any problems with it, I would ask that whoever takes this bug report submit the patch upstream.
Thanks for considering this bug report and proposed patch.
Thank you, Dave Arlie, and anyone else involved, for the DisplayPort MultiStream Transport fixes in Linux 4.1.6 and 4.2 (and a few of the subreleases and release candidates leading up to these versions).
However, I think that some change along the lines of the patch that I mentioned may still be necessary. I cannot be sure, because when I try to boot linux-4.1.6, I get a other kernels bugs either before I get a chance to test this or while I am testing it. I am pretty sure I tried Linux-4.2 and observed the kernel getting fault that would go away when I would apply the attached patch.
It looks to me like, toward the end of drm_dp_add_port in Linux-4.2, I still see the following code:
port->mstb->link_address_sent = true;
...and drm_dp_send_link_address has this line, which I think can call a function that can change port->mstb:
I've missed this bug, but I've posted a patch that is like yours just not as large a change.
Hopefully it works for you.
Thank you for posting your patch to the dri-devel mailing list (at http://article.gmane.org/gmane.comp.video.dri.devel/136173 ).
Your patch is probably fine in practice, although I am not sure it deals correctly with the the ways of drm_dp_send_link_address can fail. Part of the reason I made my patch the way I did was that I guessed that if the kzalloc at the beginning of drm_dp_send_link_address failed (which probably will never happen), that it might be better not to set link_address_sent to true. On the other hand, I was not sure what to do with the "link address failed" and "link address nak received" cases in that function. Perhaps link_address_sent should not be set in those cases either.
Anyhow, pushing your patch upstream would certainly be an improvement. I'm all for it if you think you patch is actually correct in these cases too, or if you want to push your patch upstream and think about corner cases later.
If you do have any thoughts on these corner cases, your comments would be most welcome of course.
Thanks again for addressing this bug.