Bug 91491 - DRI_PRIME crash with fullsceen programs.
Summary: DRI_PRIME crash with fullsceen programs.
Status: RESOLVED MOVED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: xf86-video-ati maintainers
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-28 15:05 UTC by Kristian Klausen
Modified: 2019-11-19 07:51 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg.0.log (68.85 KB, text/plain)
2015-07-28 15:05 UTC, Kristian Klausen
no flags Details
dmesg (243.87 KB, text/plain)
2015-07-28 15:06 UTC, Kristian Klausen
no flags Details
gdb (9.79 KB, text/plain)
2015-07-28 15:43 UTC, Kristian Klausen
no flags Details
GDB with 7.5.0.r88.g5510cd6-1 debug symbols. (10.15 KB, text/plain)
2015-07-28 16:44 UTC, Kristian Klausen
no flags Details

Description Kristian Klausen 2015-07-28 15:05:53 UTC
Created attachment 117420 [details]
Xorg.0.log

Hello

I have a HP Pavilion dv6-6145eo with:
00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] BeaverCreek [Radeon HD 6520G]
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Whistler [Radeon HD 6630M/6650M/6750M/7670M/7690M] (rev ff)
The last one being a 6750M.

Everytime I try to run a fullscreen program with DRI_PRIME X crash, no problems with window program.

[  1470.130] (EE) 0: /usr/lib/xorg-server/Xorg (OsLookupColor+0x139) [0x596d09]
[  1470.158] (EE) 1: /usr/lib/libc.so.6 (__restore_rt+0x0) [0x7f2851e955af]
[  1470.159] (EE) 2: /usr/lib/xorg/modules/drivers/radeon_drv.so (_init+0x3c722) [0x7f284c9ba7a2]
[  1470.160] (EE) 3: /usr/lib/xorg-server/Xorg (BlockHandler+0x4a) [0x43d84a]
[  1470.160] (EE) 4: /usr/lib/xorg-server/Xorg (WaitForSomething+0x163) [0x58f4c3]
[  1470.161] (EE) 5: /usr/lib/xorg-server/Xorg (SendErrorToClient+0x111) [0x438c61]
[  1470.161] (EE) 6: /usr/lib/xorg-server/Xorg (remove_fs_handlers+0x41b) [0x43cf4b]
[  1470.163] (EE) 7: /usr/lib/libc.so.6 (__libc_start_main+0xf0) [0x7f2851e82790]
[  1470.163] (EE) 8: /usr/lib/xorg-server/Xorg (_start+0x29) [0x427319]
[  1470.165] (EE) 9: ? (?+0x29) [0x29]

Tested with Counter-Strike: Source, Minecraft and Rust.
The dmesg and Xorg is from a Minecraft test.


-Kristian
Comment 1 Kristian Klausen 2015-07-28 15:06:38 UTC
Created attachment 117421 [details]
dmesg

Note: I did a couple of DRI_PRIME test, so some of the dmesg, if from the other etsts.
Comment 2 Alex Deucher 2015-07-28 15:12:34 UTC
Can you install the debugging symbols and get a proper backtrace with gdb?

http://www.x.org/wiki/Development/Documentation/ServerDebugging/
Comment 3 Kristian Klausen 2015-07-28 15:43:17 UTC
Created attachment 117422 [details]
gdb
Comment 4 Kristian Klausen 2015-07-28 16:44:52 UTC
Created attachment 117423 [details]
GDB with 7.5.0.r88.g5510cd6-1 debug symbols.
Comment 5 Greg Turner 2016-01-20 01:05:58 UTC
FWIW I have an r600+si dual-radeon system that until recently has been trucking along like a champ (after I fix the output name duplication bug which I have filed a bug for here and a patch).  Lately I get these lockups all the damn time.  I see stack traces somewhat like yours, OP.

Best way I have to trigger them, although they are infuriatingly intermittent and unpredictable, is to play wine video games -- I didn't realize the fullscreen component but I usually have something fullscreen on one screen or another so that may be the same here.  FWIW I run the bleeding-edge KF5 compositor.

I am reluctant to leave my system in the state these lockups leave me and debug them because I'm scared about shit burning up on me (although sometimes it happens when I'm not around and sits there for hours in this state I admit).
 
I haven't entirely ruled out bad hardware on either side of the PCIe bus, but I have more-or-less ruled out thermal crisis as the root cause.
Comment 6 Michel Dänzer 2016-01-20 03:33:35 UTC
Looks like it might be memory corruption. Can one of you try running Xorg in valgrind and see if that gives more clues?
Comment 7 Kristian Klausen 2016-01-20 09:46:22 UTC
I have switched to DRI3 where the bug is gone. If Greg Turner can't test with valgrind, maybe I can find time for it.
Comment 8 Greg Turner 2016-01-24 01:11:43 UTC
After rebuilding lots of stuff I haven't seen these crashes for a conspicuous period of time.  Could have just gotten lucky so far, but I'm beginning to wonder if this is a subtle reverse build-time dependency or something like that.
Comment 9 Greg Turner 2016-01-24 01:22:19 UTC
Next time it happens to me, if ever, I'll immediately cook up some backtraces and try the valgrind thing.

I like the memory corruption hypothesis, that's consistent with the sort of intermittent and seemingly inexplicable behavior I was having.
Comment 10 Greg Turner 2016-01-27 13:03:04 UTC
Ugh, well behavior keeps changing on me.

Lately I get DOS'ed on all interfaces most of the time which isn't helpful.

Time I looked up how to access the simulated scroll-lock on my fancy-pants tiny keyboard -- or better yet printed it out and taped it to my wall :)

Currently trying not to crash as I'm rebuilding Everything (capital e!) just to rule out any subtle toolchain abi shenanigans, as I did recently upgrade gcc (51->53)

Unfortunately, I tried DRI3 and it was way, way worse, suggesting Kristian and I don't have the same problem (except maybe "memory corruption").

Also, looked into Valgrind.  It's going to be an uphill battle without changing the cflags I used here (march=bdver2).  I didn't adjust it for fear of heisen-fixing the problem, I'll built it one more time if need be -- takes a couple days, but for me it'd be well worth it if it leads to bugsquishification.
Comment 11 Greg Turner 2016-01-29 10:41:04 UTC
(In reply to Greg Turner from comment #10)
> Unfortunately, I tried DRI3 and it was way, way worse, suggesting Kristian
> and I don't have the same problem (except maybe "memory corruption").

Scratch that.

Just found that backporting (maybe that's not technically the right word -- I simply applied the patch without incident) commit 25eca802656 from xorg-server to 1.18.0 fixes the DRI3-mode trouble I was formerly able to repeatably cause by moving gl windows around between crtc's.

That makes it much more plausible OP and I suffer from the same bug since, but for this known and already-fixed-upstream problem, my issue also seems to be fixed by moving to DRI3.

Also, after my huge full-system rebuild I see much better behavior overall on this box.  So, as I kind of suspected, some kind of toolchain or reverse-dependency issue remains a plausible cause of my non-repeatable lockups under DRI2.

I'll move back to DRI2 now*, stress the hell out of this machine (I do so naturally in the course of my day-to-day activities) and if everything works we can infer my issue was caused by something other than a bug in xorg.

--

* For a while, at least.  DRI3 is /so/ much less janky here than DRI2 was, I'm fairly eager to migrate for good.
Comment 12 Greg Turner 2016-02-23 13:17:42 UTC
Just an update.  Unfortunately with kernel 4.4.1 and xorg 1.18.1 I can't keep the server running long enough to reach this bug.  Instead, within minutes the primary display begins having what appears to be some sort of memory geometry crisis.  Content from the framebuffer starts bleeding all over the place, and quickly it resolves to a state where the affected output is frozen or reduced to rendering some kind of non-3-dimensional hyper-shape that barely intersects the plane of the screen.

Maybe I simply have a hardware problem on top of the original bug.  FWIW since I upgraded I've had uncharacteristically long stable runs without hitting the old lockup -- it could very well have been fixed by my kernel/xorg update.  Indeed at one point I almost came on here and posted the bug was fixed (simply never got around to it and then observed the new crash which gave me pause).

I'm going to pull the card and play some musical hardware.  Even if it's software-related the new behavior seems orthogonal to the old and prevents me from testing further.  I'm comfortable closing this bug on the basis that it's not reproducible for anybody ATM.
Comment 13 Martin Peres 2019-11-19 07:51:36 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-ati/issues/136.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.