Bug 89534

Summary: radeonsi GPU lockup / crash with wine
Product: Mesa Reporter: John <john.ettedgui>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: farmboy0+freedesktop
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: dmesg part for the lockup
part of DMESG where BAC lockup but came back after about 20 seconds
and 2 minutes later BAC crashed the computer and didn't recover
dmsg with dpm on and hyperz off after it crashed
dmesg with a cpu stuck instead of gpu, might be unrelated
regdump when the computer starts
regdump with dpm on and hyperz off, after it crashed
regdump with dpm off and hyperz on, after it crashed
xorg log when it crashed, though I don't see anything of interest in it.
R600_DEBUG=ps,gs,vs for Tera Online
Tera Online dmesg errors
Tera Online dmesg errors (fixed)
output of the command given by Tom
dmesg
R600_DEBUG=ps,gs,vs ouput

Description John 2015-03-11 13:13:06 UTC
Created attachment 114220 [details]
dmesg part for the lockup

Hi there,

I have found some games that cause some GPU lockups, some more violent than others. All of them unfortunately are running through wine.
I am not exactly sure if they are all related or not, but I'll give a small summary of these, they are all using the Unreal Engine 3.

I started with Injustice Gods Among US: I didn't see any crash with pure wine, only with nine. They seemed more likely with higher graphical settings, though there was one stage that I had to play without nine as it always crashed.

With the same configuration using Nine I played Batman Arkham Asylum with no crash (older game same engine, maybe less demanding?).

Then Batman Arkham City has been similar to Injustice, but it also has crashed with standard wine (meaning without nine), just less frequently.

And finally, the one that really brings me here now, Batman Arkham Origins, I can't get past the main screen, with nine or pure wine it crashes at the same place.
I've tried with both linux 3.19 and linux 4.0rc3, with the radeon and the generic modesetting ddx and it's all the same.
I am on latest mesa-git and llvm-svn.

The end results is the screen is dead, but I can still remote ssh in the machine most times, then there's not much I can do but restart the machine...
I'll attach the dmesg I grabbed from my last try with Batman Arkham Origins.

I am not exactly sure where the issue really is, so for now I've put it in DRI.

My specs are:
Radeon 280x
Xorg 1.17
Linux 3.19/4.0rc3 x64
mesa-git
llvm-svn

As always I'm open to trying patches, bisect etc... though since I never played these games before, I cannot say if it worked or not better in previous releases.

Thanks!
Comment 1 John 2015-03-11 14:18:03 UTC
Created attachment 114222 [details]
part of DMESG where BAC lockup but came back after about 20 seconds
Comment 2 John 2015-03-11 14:19:10 UTC
Created attachment 114224 [details]
and 2 minutes later BAC crashed the computer and didn't recover

Though I could not log in through SSH right away...
It took a few minutes before it worked, not sure what the kernel was blocked on then, but I thought I'd mention it.
Comment 3 John 2015-03-12 07:40:32 UTC
I have tried various combinations of Linux and Mesa with no better result, so I'm *guessing* it is not a regression but something that never worked.

I've tried Linux 3.14.5, 3.19, 3.19.1 and 4.0rc1 to 4.0rc3.
I've tried Mesa 10.1, 10.2, 10.3, 10.4 and git (with the corresponding llvm builds).

I have more dmesg saved, but I am not sure if they would be helpful, especially using older versions...

I'm out of ideas of what to try now...
Comment 4 John 2015-03-12 13:34:43 UTC
I tested it and it works fine with the Intel driver (well fine... if you call 2fps or so fine... but at least no crash)
Comment 5 John 2015-03-13 07:16:54 UTC
I was able to finally play the game by disabling dpm and disabling HYPERZ, but then in my last try it crashed again even with these disabled. So I am not sure how much they helped... maybe I just got lucky?

I'll add some logs.
Comment 6 John 2015-03-13 07:17:51 UTC
Created attachment 114270 [details]
dmsg with dpm on and hyperz off after it crashed
Comment 7 John 2015-03-13 07:18:20 UTC
Created attachment 114271 [details]
dmesg with a cpu stuck instead of gpu, might be unrelated
Comment 8 John 2015-03-13 07:18:43 UTC
Created attachment 114272 [details]
regdump when the computer starts
Comment 9 John 2015-03-13 07:19:06 UTC
Created attachment 114273 [details]
regdump with dpm on and hyperz off, after it crashed
Comment 10 John 2015-03-13 07:20:10 UTC
Created attachment 114274 [details]
regdump with dpm off and hyperz on, after it crashed
Comment 11 John 2015-03-13 07:20:50 UTC
Created attachment 114275 [details]
xorg log when it crashed, though I don't see anything of interest in it.
Comment 12 John 2015-03-19 21:09:56 UTC
I've tried a bit more digging, but eventually so many crashes corrupted my FS so now I'm less likely to dig on my own.
Since I don't want to completely corrupt my FS for no result, please tell me what to try. :)
Comment 13 John 2015-04-13 07:25:07 UTC
Is there anything I can add to the report to help?
Thanks!
Comment 14 Tom Stellard 2015-04-29 14:58:41 UTC
Can you run the program with the environment variable R600_DEBUG=ps,gs,vs and post the output.
Comment 15 Macera 2015-04-30 20:29:14 UTC
Created attachment 115486 [details]
R600_DEBUG=ps,gs,vs for Tera Online

I am having the same crash with another Unreal Engine 3 game, Tera Online.
I am also using wine with gallium nine. Pure wine works but is slow.

I have attached the whole output of R600_DEBUG=ps,gs,vs before the crash.


Gpu:
radeon r7 250x

Software:
Archlinux
linux 4.0.1-1-ARCH x86_64
mesa 10.5.4-1
llvm-libs 3.6.0-5
xorg-server 1.17.1-5
Comment 16 Macera 2015-04-30 20:31:17 UTC
Created attachment 115487 [details]
Tera Online dmesg errors

Parts of dmesg showing the errors.
Comment 17 John 2015-04-30 20:33:56 UTC
I haven't tried Tom's comment yet, as I need my PC to be stable till the rest of the day, but I've extensively played Tera Online, so I doubt the issue is related.

Plus I get my issue both with nine and without it (just more often with nine).
Comment 18 Macera 2015-04-30 21:27:08 UTC
Created attachment 115489 [details]
Tera Online dmesg errors (fixed)

The last file had a line missing at the end.

Well just in case it matters, I only get the crash sometimes when using the warrior combo attack skill.
Comment 19 Michel Dänzer 2015-05-01 01:01:31 UTC
Macera, please file your own report.
Comment 20 John 2015-05-01 02:57:18 UTC
Created attachment 115492 [details]
output of the command given by Tom

I don't know if it matters, but the 2 times I've tried running the game today (with and without the debug option) the game died even quicker than it used to. At the launch screen, though since it was random, it may not really mean anything..

(I had to zip the file at it was too big to attach...)
Comment 21 John 2015-09-22 10:45:41 UTC
Since it'd been a long time since I had tried, and another similar games that used to crash doesn't seem to anymore (IGAU) I thought I'd try again.

Well the result is mitigated somehow.
The game still crashes, but this time it did not lock my computer. I was able to go to a tty to look at dmesg, and eventually Xorg auto restarted (though I didn't try to see if that session would worked, I prefered to restart the machine completely).
I will attach the dmesg.

This was done on mesa-git, llvm-svn and linux 4.2.0.
Comment 22 John 2015-09-22 10:47:35 UTC
Created attachment 118395 [details]
dmesg
Comment 23 John 2015-10-03 09:33:36 UTC
This time I thought of trying with a clean wine profile (yes I'm still using wine for this...) and did not install anything but the game in it (no DX or other lib).

Well... that didn't change much.
Without nine my computer didn't crash for a few seconds in game, but then I was taken to tty1 with a lot of
[  178.083898] radeon 0000:01:00.0: ring 4 stalled for more than 10070msec
[  178.083901] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000010d3c last fence id 0x0000000000010df7 on ring 4)

I have the full dmesg if needed.


With nine on, it crashed quicker and completely killed the computer, I had to restart it manually.
I am guessing it's an issue within the kernel for the hard lock, but no clue why nothing else runs into it, and why this game does so easily..
Comment 24 John 2015-10-03 09:46:18 UTC
Created attachment 118641 [details]
R600_DEBUG=ps,gs,vs ouput

(zipped file as the file itself was too big for here)
Comment 25 John 2015-10-03 09:50:52 UTC
I've just added the current output of R600_DEBUG=ps,gs,vs in case it shows anything more interesting than last time Tom had asked...

I don't get most of it, but the end seems moot:
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
      after 241 requests (241 known processed) with 0 events remaining.
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
      after 1080 requests (928 known processed) with 0 events remaining

which makes me thing that the "error" didn't happen within whatever this monitor was looking at, but something else and when we get there oops there's no more X... but maybe the llvm functions path can be useful to somehow that knows what's going on...


I'm hoping to get this working one day, but apart from testing every now and then with a newer radeon and/or wine stack I'm out of ideas... It reproduces itself very easily (alas...).
Comment 26 John 2015-10-12 23:07:51 UTC
Someone on irc suggested adding an apitrace trace.

Here are 2 (in case on crashed before the trace got the right stuff.. I don't know...). These are made with standard wine/ogl no nine.

They are compressed with xz as they are quite big, and alas replaying them doesn't crash the computer... so I don't know if they are that useful, but I see a lot of warning and errors on the command prompt so maybe? Though you get to see the part where the computer "locks" (but the replay doesn't lock...)

https://mega.nz/#!howWwJBD!o1Lr8b5NlSOfrW45TFo3fNuS-EzyWN-yYaq5ctvEyvc
https://mega.nz/#!01IhxboB!lCHxUD6gY65yYmHQ8w3MWq8L_YO0il54kR83LJvWHTE


Thanks!
Comment 27 PsychoDariusz 2015-12-02 11:23:39 UTC
it happens to me to in wine with rocket league, orion prelude and hard reset, after a few minutes the game stalls and if go for a terminal output what i get is [this](http://imgur.com/a/PYyR0).

it happens both with the r7 250 i had before and my new r7 260x on ubuntu 15.10 with [padoka ppa](https://launchpad.net/~paulo-miguel-dias/+archive/ubuntu/mesa).

any suggestions how to post better debug info?
Comment 28 John 2016-05-23 23:05:16 UTC
I try every now and then, but it's still the same :/
Comment 29 hedlx 2016-07-03 19:17:19 UTC
(In reply to PsychoDariusz from comment #27)
> it happens to me to in wine with rocket league, orion prelude and hard
> reset, after a few minutes the game stalls and if go for a terminal output
> what i get is [this](http://imgur.com/a/PYyR0).
> 
> it happens both with the r7 250 i had before and my new r7 260x on ubuntu
> 15.10 with [padoka
> ppa](https://launchpad.net/~paulo-miguel-dias/+archive/ubuntu/mesa).
> 
> any suggestions how to post better debug info?

Same here, GPU lockup after 10-15 minutes in Rocket League, 4.7-rc3 kernel & latest for current date mesa+llvm from padoka-ppa (12.1), wine version is 1.9.13

Always lockup with "radeon failed to deallocate virtual address for buffer" error, even in wine without gallium nine patches.
Comment 30 Marek Olšák 2016-07-05 15:40:19 UTC
The GPU no longer hangs with the apitraces for me.

I'm using LLVM git with this fix:
http://reviews.llvm.org/D21961
Comment 31 John 2016-07-05 19:00:50 UTC
Aaaah a dev is back here.
Marek, if you are talking about my traces, they didn't crash my computer either, but I hoped they might show something... I was out of ideas of what to add here.
Comment 32 Marek Olšák 2016-12-08 16:30:21 UTC
GPU hangs in Batman Arkham: Origins were fixed by this commit:

https://cgit.freedesktop.org/mesa/mesa/commit/?id=6dc96de303290e8d1fc294da478c4f370be98dea

Closing. You can create a new bug for hangs in other apps or search existing bugs.
Comment 33 John 2016-12-08 16:45:10 UTC
WoooooW!

I had given up on this, thank you Marek!
And thank you for finding this bug report as well!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.