Bug 108600 - Regression: Segfault on video playback with XScale
Summary: Regression: Segfault on video playback with XScale
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/AMDgpu (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: xf86-video-ati maintainers
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 108459 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-10-30 13:54 UTC by Matthew Scheirer
Modified: 2018-11-16 15:54 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Xrandr output (2.09 KB, text/plain)
2018-10-30 13:54 UTC, Matthew Scheirer
no flags Details
Xorg Segfault Error (3.98 KB, text/plain)
2018-10-30 13:57 UTC, Matthew Scheirer
no flags Details
Xorg + drv with debug symbols (230.49 KB, text/plain)
2018-11-01 23:31 UTC, Matthew Scheirer
no flags Details
gdb of segfault (33.68 KB, text/plain)
2018-11-02 20:11 UTC, Matthew Scheirer
no flags Details
second gdb log from crash (40.58 KB, text/plain)
2018-11-07 20:11 UTC, Matthew Scheirer
no flags Details

Description Matthew Scheirer 2018-10-30 13:54:31 UTC
Created attachment 142275 [details]
Xrandr output
Comment 1 Matthew Scheirer 2018-10-30 13:57:08 UTC
Created attachment 142276 [details]
Xorg Segfault Error
Comment 2 Matthew Scheirer 2018-10-30 13:59:04 UTC
Ugh, wiped the summary box while submitting this, so heres the summary:

Found a regression in 18.1 from 18.0. I use xscale for multi monitor DPI handling and video playback under 18.1 with scaling enabled crashes the server regularly. Doesn't happen in 18.0, does still happen on trunk. Doesn't happen without scaling enabled.
Comment 3 Matthew Scheirer 2018-10-30 14:02:17 UTC
Versions: Arch, Xorg-server 1.20.3, kernel 4.18.16. Like I said, no segfaults on 18.0, 18.1+ segfaults in every video player I can find (HW accelerated or not - happens in Firefox, VLC, MPV). Doesn't seem to crash without playing video - I've had a desktop last several hours actively avoiding any video playback.
Comment 4 Michel Dänzer 2018-10-30 17:45:48 UTC
Please make sure debugging symbols are available for /usr/lib/xorg/modules/drivers/amdgpu_drv.so and /usr/lib/Xorg, reproduce the crash again and attach the full corresponding Xorg log file.
Comment 5 Matthew Scheirer 2018-11-01 23:31:34 UTC
Created attachment 142332 [details]
Xorg + drv with debug symbols

Heres a full log with debug symbols as requested. Doesn't seem to want to give the name of whatever its calling into AMDGPU for though.
Comment 6 Michel Dänzer 2018-11-02 10:59:05 UTC
Thanks, but yeah I'm afraid that's still not very useful. Can you try, in order of preference:

1. Get a backtrace with gdb. See https://www.x.org/wiki/Development/Documentation/ServerDebugging/ for some detailed information about that.

2. Make sure xserver is compiled with --enable-libunwind, and attach another log file from that.

3. Provide the output of

   addr2line -e /usr/lib/xorg/modules/drivers/amdgpu_drv.so 0x7f727ae89000+0xc8e1
Comment 7 Matthew Scheirer 2018-11-02 20:11:13 UTC
Created attachment 142349 [details]
gdb of segfault

#1 Attached the GDB. Got a SIGPIPE that seems unrelated and didn't crash the server but I kept it in case it was relevant. The segfault starts at line 108. Would the <optimized out> stuff be useful if I rebuilt without optimizations?

#2 The meson build of X doesn't support libunwind and the PKGBUILD in the AUR for X git is completely broken (and seems non-trivial to fix). I could try updating the older 1.19 autotools based script if necessary.

#3 addr2line doesn't give any useful output n the debug trunk drv, just ??:0.
Comment 9 Michel Dänzer 2018-11-06 11:16:19 UTC
*** Bug 108459 has been marked as a duplicate of this bug. ***
Comment 10 Matthew Scheirer 2018-11-06 22:50:58 UTC
(In reply to Michel Dänzer from comment #8)
> Does
> https://gitlab.freedesktop.org/daenzer/xf86-video-amdgpu/commit/
> 5d9dee908543c141641fe8b6178874f772179937 help by any chance?

Built it last night and after a good four hours so far I haven't been able to reproduce any crashes. Before I could pretty reliably kill the server rapidly opening and closing applications while moving them between screens.

Going to mark as resolved and fixed. If it crashes again I'll produce some more gdb logs and reopen. Thank you so much for the quick response!
Comment 11 Matthew Scheirer 2018-11-06 23:05:24 UTC
Karma is really a fierce one. It did crash, same indexes into amdgpu. Took a really long while this time it seems! I'll generate new, unoptimized gdb logs against the latest gits of the server and ddx. Reopened.

> [  9275.266] (EE) 3: /usr/lib/xorg/modules/drivers/amdgpu_drv.so (0x7fa221cb8000+0xc4fb) [0x7fa221cc44fb]
> [  9275.266] (EE) 4: /usr/lib/xorg/modules/drivers/amdgpu_drv.so (0x7fa221cb8000+0xc8f0) [0x7fa221cc48f0]
Comment 12 Michel Dänzer 2018-11-07 08:51:55 UTC
Thanks for testing, will need to look at the gdb backtrace with that patch applied.

FYI, bug reports should only be resolved once a fix lands on the main Git master branch.
Comment 13 Matthew Scheirer 2018-11-07 20:11:58 UTC
Created attachment 142401 [details]
second gdb log from crash

New gdb logs from revised ddx. 

Optimized outs are there because I didn't realize makepkg options override optimizations from envvars. Next run I'll set -Og in the build step.

I got another sigpipe first again, included for posterity, but this crash is happening in amdgpu_drm_handle_event still.
Comment 14 Michel Dänzer 2018-11-09 10:41:35 UTC
Can you try this branch:

https://gitlab.freedesktop.org/daenzer/xf86-video-amdgpu/commits/amdgpu_drm_queue_alloc-is_flip

If it still happens with that, in addition to a new gdb backtrace, can you try running Xorg in valgrind and attaching valgrind's output? Let me know if you run into trouble with that.
Comment 15 Matthew Scheirer 2018-11-12 22:31:01 UTC
Been using the second patchset for 3 days now - 2 in a release build - and worked from home over the weekend so I put in a good ~20 hours of uptime. Seems to have worked, no segfaults.

If you still want a valgrind run / any more logs / info I can still run some traces if you want any more data on this.

Otherwise I'll leave the bug open and you can close it when you merge the branch. Thank you so much for all the hard work.
Comment 16 Michel Dänzer 2018-11-16 15:54:37 UTC
Thanks for the report and testing, fixed in Git master:

commit 51ba6dddee40c3688d4c7b12eabeab516ed153b7
Author: Michel Dänzer <michel.daenzer@amd.com>
Date:   Fri Nov 9 11:00:04 2018 +0100

    Move deferred vblank events to separate drm_vblank_deferred list


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.