Summary: | DisplayPort MST (multi-stream transport) "atomic sleep" Linux kernel bug | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Adam J. Richter <adam_richter2004> | ||||||||
Component: | General | Assignee: | Dave Airlie <airlied> | ||||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||||
Severity: | normal | ||||||||||
Priority: | medium | CC: | j, kijiki0 | ||||||||
Version: | unspecified | ||||||||||
Hardware: | x86 (IA32) | ||||||||||
OS: | Linux (All) | ||||||||||
Whiteboard: | |||||||||||
i915 platform: | i915 features: | ||||||||||
Attachments: |
|
Description
Adam J. Richter
2015-02-28 07:19:18 UTC
Dave has a fix! Hi, Jesse. Thanks for your encouraging response. Can you tell me what I should do obtain the possible fix to test? I assume that by "Dave" you mean Dave Arlie. When I pull the git tree at git://people.freedesktop.org/~airlied/linux , I do not see any *_mst_* files in linux/drivers/gpu/drm. Thanks for your response, and thanks in advance for any guidance on this. Just to keep the information current, I'll mention that the problem is still present in Linux 4.0-rc3. linux-4.0-rc4 has the possible partial fix that I suggested (in drm_dp_mst_topology.c remove mutex_{,un}lock from check_txmsg_state). Unfortunately, as I originally mentioned was the case when I tried that patch, I also get a kernel memory fault from it when I plug in a DisplayPort multi-stream transport (MST), in this case at drm_dp_add_port+0x2dc. I expect I'll try to track this down further, although doing so in linux-4.0-rc4 is slightly more complex because 'rc4 seems to be crashing a couple of other X programs that seem to run fine with rc3, implying that there may now be another kernel-release graphics bug creating symptoms at the same time. I'll try to post an update if I have further news. Thanks for pushing my suggested change through though. I believe that change will be part of the complete fix. I wrote that I would provide an update if I had news, so here goes. I think I have found the source of the next crash. In drm_dp_mst_topology.c, drm_dp_send_link_address() can call a hotplug handler that change change port->mstb, but the callers of this function assume the value has not changed, and sometimes get a null pointer dereference when attempting to set posrt->mstb->link_address_sent to true. So, I made a change that consolidates all the uses of link_address_sent inside drm_dp_send_link_address() to avoid this. Unfortunately, there is another crash after that. So, I think I'm probably stretching this bug report to much by trying to cover it, since I narrowly wrote the subject as just being about the atomic sleep symptom, which was address in 4.0-rc4. So, I think I should mark this bug as resolved, and then open a new bug with a broader more functional problem description like "MST hotplug causes kernel memory fault" or something like that. I'll leave this bug report open for at least the next ~15 hours to see if anyone asks me to do otherwise. If I don't see any objections, I'll close this bug, open the new one, and put a comment in with a link to the new bug report. My comment was based on an irc discussion with Dave. I'll ping him again (yes, Dave Airlie of Red Hat, to whom this bug is assigned). Thank you, Jesse. please file a new bug for the new bugs I'm not having much look reproducing these. I'm not sure if maybe differing userspaces might have different access patterns. can you give some more detailed info on the hw you have, the only DP MST machine I have is a Haswell Lenovo t440s with dock, and a few Dell DP monitors. I've seen what appears to be the same bug (well, same as the last incarnation): Kernel takes a #PF in drm_dp_check_and_send_link_address because the 2nd param (mstb) is NULL. It happens when I plug my T440p into a dock connected to 2x Dell 2001FPs. Each monitor is connected to the dock's DisplayPort ports via a DVI<->DisplayPort adapter, as the monitors only support DVI. Oh, forgot to mention, I've seen it with Ubuntu's 3.19.0-18, as well as a mainline 4.1-rc2. Interestingly, I removed one of the DP->DVI adapters and connected that monitor to the dock's DVI port, and it seems to work fine now. So to summarize: 2x DP->DVI = BAD 1x DP->DVI, 1x DVI = GOOD I have a t440s laptop with a dock and a single monitor (monoprice) attached via a displayport cable. I am hitting the error mentioned in comment #5 very regularly, and finally got a panic that saved the backtrace https://bugs.archlinux.org/task/45369 https://bbs.archlinux.org/viewtopic.php?pid=1537752#p1537752 I can very easily reproduce this issue locally with the archlinux vanilla kernel. Please let me know if there is anything I can do to help debug this. If there is another bug opened for what Adam described, please let me know and I'll remove this comment and add this comment there. Created attachment 116586 [details] [review] cancel work to avoid oops this should fix this, by cancelling the work queue earlier. Created attachment 116587 [details] [review] try again, this should handle things better the last patch had lockdep warnings Created attachment 116591 [details]
dmesg from second patch
Hmmm. Well it didn't seem to do too well. Survived for a bit longer, but when I disconnected from the dock, the laptop froze. Also seemed to really spam my dmesg. I've attached the kernel logs, unfortunately the kernel freeze (if it was that) was not captured in these logs. well it was just meant to stop the oops you were seeing, I'm suspecting the dock firmware is failing to deal with MST monitor Jun 18 19:33:06 nevada kernel: [drm:drm_dp_mst_handle_down_rep] Got NAK reply: req 0x21, reason 0x08, nak data 0x10 those are never good. not sure what is causing all the reprobing in that log, that is just wierd, Thanks for looking into it! If there's anything else you'd like me to test please let me know. It's a standard lenovo dock (the "ultra" dock I believe) at the latest firmware, with a monoprice monitor. Will look into what the nacks mean. With 4.1.6 I don't see these issues anymore on arch with an Ultra Dock (glorified port expander essentially). Just reporting this here in case other people on this bug still see this in 4.1.6 or later so they can update with what they see. Thanks for all the help again Dave, you really made using my laptop bearable in the interim with your information. I am sorry I failed to mention earlier that around 2015-07-28, I opened a separate freedesktop.org bug report for the problem I mentioned in comment #5, at https://bugs.freedesktop.org/show_bug.cgi?id=91481 , which includes an illustrative if probably incorrect patch, and to which I will add a note about why I think a patch like the one I provided there is probably still necessary in 4.1.6. I think that the discussion after comment #5 here is all about this other bug, so I am marking this ticket as resolved. However, if you think you are discussing a problem covered by this bug report and not the new one, please feel change this ticket's status from resolved to whatever status you believe is more appropriate. Otherwise, I invite everyone to move to continue the discussion at the new bug report. Thank you all for your fixes and information. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.