Bug 89785 - GPU Fault 147 and Ring Stalls and Tests Fail in Pillars of Eternity
Summary: GPU Fault 147 and Ring Stalls and Tests Fail in Pillars of Eternity
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: 10.5
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-27 00:46 UTC by Matthew Scheirer
Modified: 2017-02-15 21:37 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Kernel log of GPU Hang (12.16 KB, text/plain)
2015-03-27 00:46 UTC, Matthew Scheirer
Details
player.log R9 270X for crash after char creation (499.52 KB, application/gzip)
2015-03-28 00:21 UTC, Andreas Grois
Details
player.log R9 270X loading screen crash with patches from ~tstellar (323.02 KB, application/gzip)
2015-03-28 21:00 UTC, Andreas Grois
Details
Fix for bug described in commet 4 (2.85 KB, patch)
2015-03-31 19:04 UTC, Tom Stellard
Details | Splinter Review

Description Matthew Scheirer 2015-03-27 00:46:25 UTC
Created attachment 114656 [details]
Kernel log of GPU Hang

On my r9 290 GPU, the newly released Pillars of Eternity causes a GPU fault and kernel panic on 10.5.1. Works fine with an Intel part. Requires a REISUB to reset the system every time.

Attached is the kernel log of the crashing behavior - it is reproducible every time I run the game as soon as it hits a loading screen. Kernel is 3.19.2.
Comment 1 Michel Dänzer 2015-03-27 01:14:28 UTC
> radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x01000000
> radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C0C8001
> VM fault (0x01, vmid 6) at page 16777216, read from 'TC6' (0x54433600) (200)

Looks like a shader tries to read from virtual address 0x1000000000, which is probably not what was intended, so the shader code might be incorrect.

Please run the game with the environment variable R600_DEBUG=vs,gs,ps , redirect its stderr output to a file and attach that file here after the hang.
Comment 2 Matthew Scheirer 2015-03-27 18:55:01 UTC
The game is capturing stderr and won't give me any shader output, the syntax I was using was R600_DEBUG=vs,gs,ps ./PillarsOfEternity 2> ~/poe-crash-log.txt, also tried &> log and only got its Unity output. Other Gallium tunables like gallium_hud work, so it seems just to be internal stderr capture in Mono or Unity. Not well versed in either enough to know if there is a way to circumvent them, though, and Google isn't helping.

Any other options? It seems really bizarre that this game works on Intel, Nvidia (blob), and OSX drivers if it has broken GLSL shaders.
Comment 3 Andreas Grois 2015-03-28 00:21:52 UTC
Created attachment 114675 [details]
player.log R9 270X for crash after char creation

The STDERR output is redirected to 
~/.config/unity3d/.config/unity3d/Obsidian Entertainment/Pillars of Eternity/player.log

I'm experiencing crashes in Pillars of Eternity as well with an R9 270X card. Sometimes they happen on the loading screen, sometimes directly after character creation. Attached is the gzipped player.log file of my latest crash (right after character creation, using R600_DEBUG=vs,gs,ps). While sometimes I can REISUB after the screen turns black, most of the time (and also with the crash to which the attached file corresponds) the only way to go is a hard reset. For this reason I'm also not sure if the relevant information was written to disc, or if the system froze before that...

I'm using the shim linked below, as Pillars of Eternity fails to detect the VRAM size, but the crashes also happen without this workaround.
https://github.com/dscharrer/void/blob/master/hacks/glamdmeminfo.c
(Description: https://forums.obsidian.net/topic/71852-linux-crashes-after-character-creation/)
Comment 4 Tom Stellard 2015-03-28 00:57:43 UTC
Here is the bug:

s_mov_b64 vcc, s[6:7]  ; Store pointer for s_load_dword* in vcc

<snip>

v_cmpx_le_f32_e32 vcc, 0, v6 ; Kill instructions
v_cmpx_le_f32_e32 vcc, 0, v6
v_cmpx_le_f32_e32 vcc, 0, v6
v_cmpx_le_f32_e32 vcc, 0, v6

<snip>

s_load_dwordx8 s[8:15], vcc, 0x48 ; vcc no longer contains a pointer.


The lowering of kill instructions needs to save and restore VCC if it is live.

As a work around, you can try this branch: http://cgit.freedesktop.org/~tstellar/llvm/log/?h=sched-perf-Mar-27-2015
or take the top two commits and apply it to llvm master.  I think this will avoid the bug.
Comment 5 Andreas Grois 2015-03-28 17:55:33 UTC
While I cannot get portage to compile llvm from git (neither from master, nor from your branch) without getting an error (https://forums.gentoo.org/viewtopic-t-1013718.html), I've tried to apply the commits "R600/SI: Disable register pressure tracking in the scheduler" (11a8738d716fa9f67da9fce2892c30352780e89b) and "XXX: Schedmodul" (b08f2e3be487f399b04363972bfd5dfc05652673) to llvm-3.6. 
The patches apply and the resulting code compiles and installs fine, but the game still crashes at the same spot.
Comment 6 Andreas Grois 2015-03-28 21:00:01 UTC
Created attachment 114688 [details]
player.log R9 270X loading screen crash with patches from ~tstellar

This is the Player.log when using llvm-3.6 with the two patches 11a8738d716fa9f67da9fce2892c30352780e89b and b08f2e3be487f399b04363972bfd5dfc05652673 from http://cgit.freedesktop.org/~tstellar/llvm/log/?h=sched-perf-Mar-27-2015. This time the crash happened on the loading screen, before character creation.
Comment 7 Tilman Sauerbeck 2015-03-29 09:48:10 UTC
I uploaded an apitrace that lets me trigger the bug to http://files.code-monkey.de/PillarsOfEternity.trace .
Comment 8 Tilman Sauerbeck 2015-03-29 20:24:27 UTC
(In reply to Tilman Sauerbeck from comment #7)
> I uploaded an apitrace that lets me trigger the bug to
> http://files.code-monkey.de/PillarsOfEternity.trace .

Please ignore this one; it only caused a GPU fault with drivers that didn't have the fix for bug #88301.

With a proper driver (and LLVM trunk r227583; 3.6.0 is known to be "bad"), I can finish character creation and click through the first dialog in the game. Haven't tried doing anything else yet.
Comment 9 Andreas Grois 2015-03-30 18:28:04 UTC
I'm terribly sorry. Please also ignore my previous player.log files, as they were recorded with llvm-3.6.0. I now finally managed to get a working installation of llvm git master and mesa 10.5.2 (with  a few bolts and nuts from mesa git master), and while the game is stuttering slightly, it indeed doesn't crash with the current git version of llvm, at least not in the parts I've tested up to now.
Comment 10 Tom Stellard 2015-03-31 19:04:01 UTC
Created attachment 114780 [details] [review]
Fix for bug described in commet 4

This patch should fix the problem I described in comment 4, can you test it?
Comment 11 Alex B 2015-04-01 19:58:07 UTC
I can confirm, patch from the last comment fixes this issue on my Radeon HD 7750.
Comment 12 Michel Dänzer 2015-04-02 01:43:28 UTC
Comment on attachment 114780 [details] [review]
Fix for bug described in commet 4

Review of attachment 114780 [details] [review]:
-----------------------------------------------------------------

I think it would be better if SI_KILL could be fixed not to clobber VCC in the first place, e.g. by using the 64-bit encoding of the v_cmp(x)_lt_f32[0] instruction. But failing that, this looks good.

[0] BTW, it's currently using v_cmpx_le_f32, but I think that's incorrect, as the comment says "Clear this thread from the exec mask if the operand is negative".
Comment 13 Andreas Grois 2015-04-03 11:58:31 UTC
(In reply to Alex B from comment #11)
> I can confirm, patch from the last comment fixes this issue on my Radeon HD
> 7750.

Same here. Pillars of Eternity seems to work now on my system (llvm-3.6.0 with the patch and mesa-10-5.2, Radeon R9 270X).
Comment 14 Matthew Scheirer 2015-04-05 21:09:37 UTC
I built mesa-git and llvm-svn and tested quite a bit of the game, after an hour of play nothing crashes. I then tried to do everything in game that would generate new shaders (casted all the spells, traveled to several zones, changed equipment, etc) to see if anything triggered the bug, and it seems to be resolved on my end. Here is the player.log of around five minutes of gameplay containing the debug output from llvm. Its about 20MB big, so I threw it on Google Drive. Framerates are good, no graphical glitches, and no crashes.

https://drive.google.com/file/d/0ByAFDLY1V3W2T1ZUYUREZHFjRkE/view?usp=sharing

So this is fixed on my end upstream.
Comment 15 Samuel Pitoiset 2017-02-15 21:37:43 UTC
At least, three people said: "the issue is fixed" and I can confirm that "Pillars of Eternity" works like a charm on my end. I think it's enough for closing this very old issue (almost 2 years ago).


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.