Bug 75005

Summary: "Upvoid" segfault in radeonsi/llvm
Product: Mesa Reporter: Christoph Haag <haagch>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on: 75276    
Bug Blocks:    
Attachments: gdb "bt full" of the segfault
stderr with R600_DEBUG=ps,vs
GPU fault after it sort of works with patch from #3
stderr of upvoid with R600_DEBUG=ps,vs,gs that triggered GPU fault
dmesg with patched llvm
more recent R600_DEBUG=ps,vs,gs that triggers gpu fault

Description Christoph Haag 2014-02-14 22:08:17 UTC
Created attachment 94105 [details]
gdb "bt full" of the segfault

Software: Latest mesa git etc.

The program causing it is closed source: https://upvoid.com/

On intel ivy bridge it works.

On radeonsi it segfaults every time when starting the game.

full gdb backtrace with at least most debug information is attached.
Comment 1 Tom Stellard 2014-02-14 22:10:33 UTC
Can you also post the output produced with the environment variable:
R600_DEBUG=ps,vs
Comment 2 Christoph Haag 2014-02-14 22:13:40 UTC
Created attachment 94106 [details]
stderr with R600_DEBUG=ps,vs
Comment 3 Tom Stellard 2014-02-24 17:48:18 UTC
Can you try this patch: https://bugs.freedesktop.org/attachment.cgi?id=94675
Comment 4 Christoph Haag 2014-02-24 22:38:36 UTC
Created attachment 94690 [details]
GPU fault after it sort of works with patch from #3

Well.

At least initially it does not crash anymore. It does start now. Nice!

The problems begin very quickly after gameplay with GPU faults in the attached dmesg. Amazingly it keeps running with decent FPS for some time while these GPU faults are dumped into the log. But eventually the system will hard lockup (I got to 160 MB log before the lockup so I arbitrarily trimmed after the first few GPU fault messages).
This was with R600_DEBUG=nohyperz, by the way.



I have also once seen another segfault in radeonsi, but I didn't have debug symbols at that time and haven't reproduced it with debug symbols now, because the hard lockup after a few seconds of gameplay is a bit annoying when trying to reproduce something. :)
Maybe I can give a better backtrace later, but maybe it's unrelated.
Comment 5 Tom Stellard 2014-03-03 19:54:18 UTC
Can you try this branch:
http://cgit.freedesktop.org/~tstellar/llvm/log/?h=si-spill-fixes

If you experience any crashes or hangs please post the output of R600_DEBUG=ps,vs,gs
Comment 6 Christoph Haag 2014-03-03 22:34:40 UTC
Created attachment 95062 [details]
stderr of upvoid with R600_DEBUG=ps,vs,gs that triggered GPU fault

(In reply to comment #5)
> Can you try this branch:
> http://cgit.freedesktop.org/~tstellar/llvm/log/?h=si-spill-fixes
> 
> If you experience any crashes or hangs please post the output of
> R600_DEBUG=ps,vs,gs

I wasted some time compiling, clang was fixed for this branch version in 202737... Just in case anyone else is trying this.

Anyway, my very comprehensive tests of about 5 runs :) seem like the GPU faults and hangs still happen, mostly if the game window is maximized.

If it is not maximized and only a small window it does run longer and much more rarely hangs the GPU, and sometimes the game fails in several ways. That might be partly due to it being an alpha release, but maybe sometimes it's because of the driver? Don't know. Maybe you can decide whether it is caused by the problems here or needs new bugs.

E.g. this results in SIGABRT and then(?) SIGILL:
UpvoidEngine: r600_query.c:749: r600_suspend_nontimer_queries: Assertion `ctx->num_cs_dw_nontimer_queries_suspend == 0' failed.
I can post a full backtrace if you want.



It's currently segfaulting too much in unrelated code to get the segfault backtrace I originally wanted that involves almost only radeonsi, so I'll just attach the stderr output of one of the gpu fault hangs with R600_DEBUG=ps,vs,gs and try again later.
Comment 7 Tom Stellard 2014-03-31 15:13:06 UTC
Is there anything printed to your dmesg log when the game locks up?
Comment 8 Christoph Haag 2014-04-13 13:49:11 UTC
Created attachment 97307 [details]
dmesg with patched llvm

Sorry, I wasn't very active recently.

I'm not sure what you're asking for. In comment #4 there is a whole dmesg, but I can add another one...

I am using linux 3.14 by now and recent mesa git, but with your branch of llvm.

I can maybe add a few details to the behavior:
When starting upvoid it displays a menu over a view of the game world. The gpu faults start appearing in dmesg right when it starts displaying this. It doesn't directly lockup and keeps rendering relatively well. When starting the game I can walk and look around a bit and I noticed: When looking at the ground the messages stop, but when looking in the distance, the messages are again created, so I would think it's directly related to the complexity of the stuff it is rendering.

After a while the game window stops reacting. At this point the game will take up 100% "red" cpu time in htop and shortly after that the whole machine will hard lockup.

I can't say if the lockup is because of excessive error logging or not. dmesg --follow | pv > /dev/null says it's about 35 kilobyte/second.

The dmesg here is from starting the game, waiting a few seconds and then killing it. When killing it early enough it doesn't seem to cause any problems, seems to recover nicely.
Comment 9 Christoph Haag 2014-04-13 13:54:42 UTC
Created attachment 97308 [details]
more recent R600_DEBUG=ps,vs,gs that triggers gpu fault

Since there were some updates, maybe this should be updated too...

I have tried to take an apitrace but apitrace segfaults (?) when replaying it, so I can post that too when I can resolve that.
Comment 10 Christoph Haag 2014-04-13 14:42:51 UTC
(In reply to comment #9)
> I have tried to take an apitrace but apitrace segfaults (?) when replaying
> it, so I can post that too when I can resolve that.

That was actually just my failure to use it correctly. With apitrace replay --core it works:

64 Megabyte download, 101 Megabyte uncompressed:
http://w3studi.informatik.uni-stuttgart.de/~haagch/UpvoidEngine.trace.bz2

Replaying this renders on intel without bigger problems, but on my HD 7970M it creates GPU faults (but not so much that it causes a lockup).
Comment 11 Tom Stellard 2014-04-28 20:52:22 UTC
Can you test this branch:
http://cgit.freedesktop.org/~tstellar/llvm/log/?h=si-spill-fixes-v2
Comment 12 Christoph Haag 2014-04-29 16:36:15 UTC
(In reply to comment #11)
> Can you test this branch:
> http://cgit.freedesktop.org/~tstellar/llvm/log/?h=si-spill-fixes-v2

I'm using 3.15-rc3 by now and I don't know whether it's fixes in linux or fixes in llvm, but there are no gpu faults and no crashes anymore.

At least for the few minutes I have "tested" for now it worked absolutely fine.

Awesome!
Comment 13 Tom Stellard 2014-04-29 21:37:30 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > Can you test this branch:
> > http://cgit.freedesktop.org/~tstellar/llvm/log/?h=si-spill-fixes-v2
> 
> I'm using 3.15-rc3 by now and I don't know whether it's fixes in linux or
> fixes in llvm, but there are no gpu faults and no crashes anymore.
> 
> At least for the few minutes I have "tested" for now it worked absolutely
> fine.
> 
> Awesome!

Thanks for testing.  I made a few improvements, can you test this branch:
http://cgit.freedesktop.org/~tstellar/llvm/log/?h=si-spill-fixes-v3
Comment 14 Christoph Haag 2014-04-29 22:46:00 UTC
Seems to still work fine for Upvoid, at least when only running it for very few minutes.
Comment 15 Tom Stellard 2014-05-16 19:29:06 UTC
I have committed a fix for this.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.