Created attachment 142275 [details]
Created attachment 142276 [details]
Xorg Segfault Error
Ugh, wiped the summary box while submitting this, so heres the summary:
Found a regression in 18.1 from 18.0. I use xscale for multi monitor DPI handling and video playback under 18.1 with scaling enabled crashes the server regularly. Doesn't happen in 18.0, does still happen on trunk. Doesn't happen without scaling enabled.
Versions: Arch, Xorg-server 1.20.3, kernel 4.18.16. Like I said, no segfaults on 18.0, 18.1+ segfaults in every video player I can find (HW accelerated or not - happens in Firefox, VLC, MPV). Doesn't seem to crash without playing video - I've had a desktop last several hours actively avoiding any video playback.
Please make sure debugging symbols are available for /usr/lib/xorg/modules/drivers/amdgpu_drv.so and /usr/lib/Xorg, reproduce the crash again and attach the full corresponding Xorg log file.
Created attachment 142332 [details]
Xorg + drv with debug symbols
Heres a full log with debug symbols as requested. Doesn't seem to want to give the name of whatever its calling into AMDGPU for though.
Thanks, but yeah I'm afraid that's still not very useful. Can you try, in order of preference:
1. Get a backtrace with gdb. See https://www.x.org/wiki/Development/Documentation/ServerDebugging/ for some detailed information about that.
2. Make sure xserver is compiled with --enable-libunwind, and attach another log file from that.
3. Provide the output of
addr2line -e /usr/lib/xorg/modules/drivers/amdgpu_drv.so 0x7f727ae89000+0xc8e1
Created attachment 142349 [details]
gdb of segfault
#1 Attached the GDB. Got a SIGPIPE that seems unrelated and didn't crash the server but I kept it in case it was relevant. The segfault starts at line 108. Would the <optimized out> stuff be useful if I rebuilt without optimizations?
#2 The meson build of X doesn't support libunwind and the PKGBUILD in the AUR for X git is completely broken (and seems non-trivial to fix). I could try updating the older 1.19 autotools based script if necessary.
#3 addr2line doesn't give any useful output n the debug trunk drv, just ??:0.
Does https://gitlab.freedesktop.org/daenzer/xf86-video-amdgpu/commit/5d9dee908543c141641fe8b6178874f772179937 help by any chance?
*** Bug 108459 has been marked as a duplicate of this bug. ***
(In reply to Michel Dänzer from comment #8)
> 5d9dee908543c141641fe8b6178874f772179937 help by any chance?
Built it last night and after a good four hours so far I haven't been able to reproduce any crashes. Before I could pretty reliably kill the server rapidly opening and closing applications while moving them between screens.
Going to mark as resolved and fixed. If it crashes again I'll produce some more gdb logs and reopen. Thank you so much for the quick response!
Karma is really a fierce one. It did crash, same indexes into amdgpu. Took a really long while this time it seems! I'll generate new, unoptimized gdb logs against the latest gits of the server and ddx. Reopened.
> [ 9275.266] (EE) 3: /usr/lib/xorg/modules/drivers/amdgpu_drv.so (0x7fa221cb8000+0xc4fb) [0x7fa221cc44fb]
> [ 9275.266] (EE) 4: /usr/lib/xorg/modules/drivers/amdgpu_drv.so (0x7fa221cb8000+0xc8f0) [0x7fa221cc48f0]
Thanks for testing, will need to look at the gdb backtrace with that patch applied.
FYI, bug reports should only be resolved once a fix lands on the main Git master branch.
Created attachment 142401 [details]
second gdb log from crash
New gdb logs from revised ddx.
Optimized outs are there because I didn't realize makepkg options override optimizations from envvars. Next run I'll set -Og in the build step.
I got another sigpipe first again, included for posterity, but this crash is happening in amdgpu_drm_handle_event still.
Can you try this branch:
If it still happens with that, in addition to a new gdb backtrace, can you try running Xorg in valgrind and attaching valgrind's output? Let me know if you run into trouble with that.
Been using the second patchset for 3 days now - 2 in a release build - and worked from home over the weekend so I put in a good ~20 hours of uptime. Seems to have worked, no segfaults.
If you still want a valgrind run / any more logs / info I can still run some traces if you want any more data on this.
Otherwise I'll leave the bug open and you can close it when you merge the branch. Thank you so much for all the hard work.
Thanks for the report and testing, fixed in Git master:
Author: Michel Dänzer <email@example.com>
Date: Fri Nov 9 11:00:04 2018 +0100
Move deferred vblank events to separate drm_vblank_deferred list