Bug 98563 - Xorg segfaults with displaylink attached and mesa version >= 13.0
Summary: Xorg segfaults with displaylink attached and mesa version >= 13.0
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Mesa core (show other bugs)
Version: 13.0
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-03 02:04 UTC by Trevor Bramble
Modified: 2016-12-07 17:02 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Photo of kernel panic when plugging in UGA-4KDP with libdrm 677cd97dc4a930af508388713f5016baf664ed18 (93.15 KB, image/jpeg)
2016-11-05 13:14 UTC, Andrew Poelstra
Details
Xorg log of crash (34.49 KB, text/plain)
2016-12-07 15:57 UTC, David Rosenstrauch
Details

Description Trevor Bramble 2016-11-03 02:04:47 UTC
Hello,

After upgrading mesa to 13.0, Xorg will crash when a DisplayLink adapter is connected.

Please see this thread on the Arch Linux forum for more details. https://bbs.archlinux.org/viewtopic.php?pid=1666430

Downgrading to 12.0.3, changing nothing else, resolves the issue.

Thanks,
Trevor
Comment 1 Mark Janes 2016-11-04 17:24:01 UTC
Can you indicate a specific device that we can obtain to reproduce this?

A bisection of mesa will help speed resolution to this issue.
Comment 2 Trevor Bramble 2016-11-04 17:35:29 UTC
Mark,

I'm using a Diamond BVU5500H. http://www.diamondmm.com/bvu5500-video-graphics-adapter.html

I'll ask others in that Arch thread to chime in with theirs if it will help. I'm sure there are a variety.

Not sure what you mean by bisection of mesa. As in `git bisect` across changes between the known good and bad releases? If that's accurate, I have no idea how to go about accessing the source or building it locally. (I'm just another user, not the maintainer of Arch's mesa packages.)
Comment 3 Emil Velikov 2016-11-04 17:45:56 UTC
The following libdrm commit should fix it. Please apply it locally and let me know if it works.

commit 677cd97dc4a930af508388713f5016baf664ed18
Author: Rob Herring <robh@kernel.org>
Date:   Fri Oct 21 10:07:59 2016 -0700

    Return an -ENODEV from drmGetDevice() when no device was found.

https://cgit.freedesktop.org/mesa/drm/commit/?id=677cd97dc4a930af508388713f5016baf664ed18

Btw you can fetch the patch alone or just rebuild libdrm/master.

There is another pending fix in the drmDevice area, so as soon as we get a confirmation [from a few devs], I'll see that we get another libdrm release.
Comment 4 Andrew Poelstra 2016-11-05 13:07:08 UTC
(In reply to Emil Velikov from comment #3)
> The following libdrm commit should fix it. Please apply it locally and let
> me know if it works.
> 
> commit 677cd97dc4a930af508388713f5016baf664ed18
> Author: Rob Herring <robh@kernel.org>
> Date:   Fri Oct 21 10:07:59 2016 -0700
> 
>     Return an -ENODEV from drmGetDevice() when no device was found.
> 
> https://cgit.freedesktop.org/mesa/drm/commit/
> ?id=677cd97dc4a930af508388713f5016baf664ed18
> 
> Btw you can fetch the patch alone or just rebuild libdrm/master.
> 
> There is another pending fix in the drmDevice area, so as soon as we get a
> confirmation [from a few devs], I'll see that we get another libdrm release.

I also see this bug, my hardware is a UGA-4KDP DisplayLink HDMI device branded "pluggable".

With Rob's patch, if X is running and I plug in the device (or if I plug it in then start X), my kernel panics with "PAX: overwritten function pointer or return address detected: 0000 [01] PREEMPT SMP".

(Actually this might not be the panic -- several lines down grsec says "Halting the system due to suspicous kernel crash caused by root" so possibly with a non-grsec kernel it would keep running. I'd prefer not to try :))
Comment 5 Andrew Poelstra 2016-11-05 13:14:27 UTC
Created attachment 127790 [details]
Photo of kernel panic when plugging in UGA-4KDP with libdrm 677cd97dc4a930af508388713f5016baf664ed18
Comment 6 Emil Velikov 2016-11-07 15:03:47 UTC
Andrew, the affected/new codepaths shouldn't do anything that causes such behaviour. Thus I'm inclined that this a separate bug. 

Can you please check/bisect the offending mesa commit and (in parallel/at first) try mesa built without libdrm/HW drivers*. The latter will isolate any of the (affected here) libdrm/loader rework.

Please keep all the information in a separate bug and add me in the cc-list.
Thanks

* Check that libdrm isn't installed/accessible.
* Use ./configure --with-dri-drivers=swrast --with-gallium-drivers=swrast ...
Comment 7 Andrew Poelstra 2016-11-09 15:36:30 UTC
With

./configure --with-dri-drivers=swrast --with-gallium-drivers=swrast

I find that I cannot start Xorg -- I get "cannot find libdrm.so.2". I don't fully understand how Arch has packaged the pieces of X, what packages I need to modify/rebuild to make this work (and how to do it).

For now I'm going to ignore it and see if I can bisect the the kernel panic (cherry-picking Rob's patch onto each bisect so I don't hit the glameregl bug).
Comment 8 Andrew Poelstra 2016-11-09 16:49:58 UTC
Ok, with a grsec kernel the panic occurs as far back as libdrm 2.4.60, so I suspect that drm is not to blame, and that this is some unrelated grsec interaction that may have been around forever.

With a non-grsec kernel Rob's patch appears to work and restores full DisplayLink functionality!
Comment 9 Emil Velikov 2016-11-09 17:08:55 UTC
(In reply to Andrew Poelstra from comment #7)
> With
> 
> ./configure --with-dri-drivers=swrast --with-gallium-drivers=swrast
> 
> I find that I cannot start Xorg -- I get "cannot find libdrm.so.2". I don't
> fully understand how Arch has packaged the pieces of X, what packages I need
> to modify/rebuild to make this work (and how to do it).
> 
I might have been unclear there - libdrm should be missing only for the mesa build/install stage. Easiest option is to:

$ mv /usr/lib/pkgconfig/libdrm{,-foo}.pc
$ rebuild mesa
$ mv /usr/lib/pkgconfig/libdrm{-foo,}.pc

AFAICT for Xorg libdrm is a must.
Comment 10 Emil Velikov 2016-11-09 17:18:10 UTC
Thanks for the confirmation !

On the grsec [related] topic - let's keep that as separate bug. If you want to simulate same/similar behaviour w/o running Xorg [as root], you can try the drmdevice binary from libdrm.

Marking this as resolved.
Comment 11 David Rosenstrauch 2016-12-07 02:05:59 UTC
Can you please clarify?  If this is resolved, then what new package release provides the fixed behavior?  I'm still seeing this issue today (also on Arch Linux) and had to downgrade back to Mesa 12 to work around.
Comment 12 Mark Janes 2016-12-07 02:13:13 UTC
David:  please reopen this bug if it is not fixed by libdrm 2.4.74
Comment 13 David Rosenstrauch 2016-12-07 14:42:33 UTC
2.4.74 is what I'm running, so I'm going to re-open.
Comment 14 Emil Velikov 2016-12-07 15:23:11 UTC
(In reply to David Rosenstrauch from comment #11)
> Can you please clarify?  If this is resolved, then what new package release
> provides the fixed behavior?  I'm still seeing this issue today (also on
> Arch Linux) and had to downgrade back to Mesa 12 to work around.

David the original report/issue (segfaults when using DisplayLink (non-PCI devices in general) should be fixed with commit 677cd97dc4a (libdrm 2.4.72).

Please be specific when you say "seeing this issue". Do attach Xorg.log and/or relevant section of dmesg/journalctl.
Comment 15 David Rosenstrauch 2016-12-07 15:57:57 UTC
Created attachment 128369 [details]
Xorg log of crash
Comment 16 Chris Wilson 2016-12-07 16:01:41 UTC
(In reply to David Rosenstrauch from comment #15)
> Created attachment 128369 [details]
> Xorg log of crash

That was my fault, I didn't check opendir() for failure.
Comment 17 David Rosenstrauch 2016-12-07 16:03:39 UTC
(In reply to Emil Velikov from comment #14)
> David the original report/issue (segfaults when using DisplayLink (non-PCI
> devices in general) should be fixed with commit 677cd97dc4a (libdrm 2.4.72).
> 
> Please be specific when you say "seeing this issue". Do attach Xorg.log
> and/or relevant section of dmesg/journalctl.


My apologies - this may not be the exact same bug.  However, since I was seeing the same stack trace from Xorg (see below) I assumed it was.

Issue I'm having is that when I connect my laptop to my docking station with external monitor, I receive the same Xorg crash as the other person.  (Intel driver crashes with the below stack trace.)

I actually don't know if the monitor or the docking station is displaylink or not.  (Monitor is Samsung u28e590d; dock is Dell e-port replicator.)

However, downgrading from Mesa 13 to 12 does fix the issue.  And the upgrade to libdrm 2.4.72 didn't fix it.

---

Stack trace:

[  6496.994] (II) intel(0): Enabled output DP1-1
[  6497.010] (II) intel(0): Enabled output DP1-2
[  6497.011] (II) intel(0): Enabled output DP1-3
[  6497.058] (--) intel(0): HDMI max TMDS frequency 300000KHz
[  6497.061] (EE) 
[  6497.061] (EE) Backtrace:
[  6497.061] (EE) 0: /usr/lib/xorg-server/Xorg (OsLookupColor+0x139) [0x59cd49]
[  6497.065] (EE) 1: /usr/lib/libc.so.6 (__restore_rt+0x0) [0x7f2d13c670af]
[  6497.066] (EE) 2: /usr/lib/libc.so.6 (readdir+0x29) [0x7f2d13ce77e9]
[  6497.067] (EE) 3: /usr/lib/xorg/modules/drivers/intel_drv.so (_init+0x63998) [0x7f2d0f2dd728]
[  6497.068] (EE) 4: /usr/lib/xorg/modules/drivers/intel_drv.so (_init+0x662b3) [0x7f2d0f2e3103]
[  6497.069] (EE) 5: /usr/lib/xorg/modules/drivers/intel_drv.so (_init+0x6d985) [0x7f2d0f2f1fb5]
[  6497.069] (EE) 6: /usr/lib/xorg-server/Xorg (xf86Wakeup+0x197) [0x479697]
[  6497.069] (EE) 7: /usr/lib/xorg-server/Xorg (WakeupHandler+0x6d) [0x43b2dd]
[  6497.070] (EE) 8: /usr/lib/xorg-server/Xorg (WaitForSomething+0x1e9) [0x5954f9]
[  6497.070] (EE) 9: /usr/lib/xorg-server/Xorg (SendErrorToClient+0x10e) [0x4365ee]
[  6497.070] (EE) 10: /usr/lib/xorg-server/Xorg (remove_fs_handlers+0x463) [0x43a7f3]
[  6497.071] (EE) 11: /usr/lib/libc.so.6 (__libc_start_main+0xf1) [0x7f2d13c54291]
[  6497.072] (EE) 12: /usr/lib/xorg-server/Xorg (_start+0x29) [0x4246e9]
[  6497.073] (EE) 13: ? (?+0x29) [0x29]
[  6497.073] (EE) 
[  6497.073] (EE) Segmentation fault at address 0x4
[  6497.073] (EE) 
Fatal server error:
[  6497.073] (EE) Caught signal 11 (Segmentation fault). Server aborting
Comment 18 Emil Velikov 2016-12-07 16:12:08 UTC
(In reply to David Rosenstrauch from comment #17)
> (In reply to Emil Velikov from comment #14)
> > David the original report/issue (segfaults when using DisplayLink (non-PCI
> > devices in general) should be fixed with commit 677cd97dc4a (libdrm 2.4.72).
> > 
> > Please be specific when you say "seeing this issue". Do attach Xorg.log
> > and/or relevant section of dmesg/journalctl.
> 
> 
> My apologies - this may not be the exact same bug.  However, since I was
> seeing the same stack trace from Xorg (see below) I assumed it was.
> 
Stack trace is completely different from the original one [on the Arch forum] :-)

For the future please track which component (updated package) which breaks things and work from there. As you can see a simple log can give you the answer in a few minutes, as opposed to "I'm having the same issue" :-P

Fwiw the opendir handling in the xf86-video-intel package was fixed with commit a1b39eb6dd1.

If there is an issue with the above commit, please open separate bug report.

Thanks
Comment 19 David Rosenstrauch 2016-12-07 16:50:28 UTC
Well my stack trace is actually almost identical to one someone posted later in that Arch forum thread (see https://bbs.archlinux.org/viewtopic.php?pid=1672878#p1672878) which is how I found that thread - and this bug report - in the first place.

Still, point taken - I probably am responding about a different bug, and my apologies for that.
Comment 20 David Rosenstrauch 2016-12-07 17:00:09 UTC
BTW, is there a separate bug report for that opendir issue that I can track to know when a fix gets released?
Comment 21 Mark Janes 2016-12-07 17:02:19 UTC
David, thank you for your input on this bug.  Even though the root cause of your crash was different, I would have come to the same initial conclusion as you based on the different behavior of mesa 12/13.

I agree that users will likely be confused by the lack of a bug xf86-video-intel, even though it is already fixed upstream.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.