Bug 60929 - [r600-llvm] mono games with opengl are blocking on start
[r600-llvm] mono games with opengl are blocking on start
Status: RESOLVED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600
git
x86-64 (AMD64) Linux (All)
: medium normal
Assigned To: Default DRI bug account
:
: 64788 70650 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-15 21:29 UTC by Laurent carlier
Modified: 2013-11-11 18:48 UTC (History)
5 users (show)

See Also:


Attachments
apitrace from hanging startup of OpenRA (2.72 KB, application/octet-stream)
2013-08-22 17:17 UTC, Torsten Kaiser
Details
temporary workaround patch (954 bytes, patch)
2013-08-26 23:12 UTC, Nicholas Miell
Details | Splinter Review
Another approach (452 bytes, patch)
2013-10-31 21:23 UTC, zuxez
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Laurent carlier 2013-02-15 21:29:44 UTC
Mono games using opengl (Rochard, Bastion, Splice, ...) are blocking on start when llvm is in use. The only way to make them start is to define R600_LLVM=0

* Mesa from git
* llvm from tstellar repo
* radeon HD6870

I've tried to trace through strace/MONO_LOG_LEVEL=debug/apitrace without much success. It seem (for bastion) always stuck in 'glXGetCurrentContext'
Comment 1 Laurent carlier 2013-03-25 13:33:44 UTC
This seem fixed with --enable-shared-llvm
Comment 2 Laurent carlier 2013-05-20 12:21:40 UTC
*** Bug 64788 has been marked as a duplicate of this bug. ***
Comment 3 Tom Stellard 2013-06-03 16:52:15 UTC
Is this still an issue?
Comment 4 Laurent carlier 2013-06-05 07:26:32 UTC
Yes, it's still an issue, i have to disable R600_LLVM to play mono games
Comment 5 hadack 2013-07-15 20:22:20 UTC
Any news on this one? Its really bad on radeonsi, since theres no workaround.
Still happens with llvm and mesa from git.
Comment 6 Michel Dänzer 2013-07-16 09:16:23 UTC
(In reply to comment #5)
> Any news on this one? Its really bad on radeonsi, since theres no workaround.

Doesn't --enable-shared-llvm work for you?

Any pointers to freely downloadable games for testing?
Comment 7 Laurent carlier 2013-07-16 10:00:59 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > Any news on this one? Its really bad on radeonsi, since theres no workaround.
> 
> Doesn't --enable-shared-llvm work for you?
> 
> Any pointers to freely downloadable games for testing?

It's reproducible with Surgeon Simulator 2013 Demo:
http://downloads.bossastudios.com/ss2013/surgeonsimulator2013_linux.zip
Comment 8 hadack 2013-07-16 19:38:06 UTC
(In reply to comment #6)
 
> Doesn't --enable-shared-llvm work for you?
> 
> Any pointers to freely downloadable games for testing?

--enable-shared-llvm doesn't make a difference.
Here is a small and free game: http://www.desura.com/games/battlemass/download
Comment 9 Tom Stellard 2013-07-18 17:15:24 UTC
I'm not really sure what's happening here, but I don't think these closed source games are good enough tests cases to diagnose the problem.  Could you try to find a very simple Open Source mono program that will reproduce this bug?
Comment 10 Laurent carlier 2013-07-20 07:43:02 UTC
(In reply to comment #9)
> I'm not really sure what's happening here, but I don't think these closed
> source games are good enough tests cases to diagnose the problem.  Could you
> try to find a very simple Open Source mono program that will reproduce this
> bug?

It seems i'm able to reproduce the problem with opentk opengl examples:
http://www.opentk.com/
http://sourceforge.net/projects/opentk/
Comment 11 Tom Stellard 2013-07-20 08:07:52 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > I'm not really sure what's happening here, but I don't think these closed
> > source games are good enough tests cases to diagnose the problem.  Could you
> > try to find a very simple Open Source mono program that will reproduce this
> > bug?
> 
> It seems i'm able to reproduce the problem with opentk opengl examples:
> http://www.opentk.com/
> http://sourceforge.net/projects/opentk/

Can you point me to instructions for how to compile this code.  There are no makefiles, only visual studio project files.
Comment 12 Laurent carlier 2013-07-20 09:04:14 UTC
(In reply to comment #11)
> > It seems i'm able to reproduce the problem with opentk opengl examples:
> > http://www.opentk.com/
> > http://sourceforge.net/projects/opentk/
> 
> Can you point me to instructions for how to compile this code.  There are no
> makefiles, only visual studio project files.

I've used the "package" from AUR:
https://aur.archlinux.org/packages/opentk/
where you can find a tarball with the source package
Comment 13 hadack 2013-07-22 13:38:05 UTC
It is described in the building from source section here:
http://www.opentk.com/doc/chapter/1/linux

I did this in the opentk folder:

xbuild OpenTK.sln /p:Configuration=Debug
cd Binaries/OpenTK/Debug
mono Examples.exe

Some OpenGL examples show similar symptoms, they just stop at some point, others are quitting with a timelimit exceeded message.
Comment 14 Torsten Kaiser 2013-08-22 17:17:10 UTC
Created attachment 84467 [details]
apitrace from hanging startup of OpenRA

I'm seeing the same problem with the mono game OpenRA from http://open-ra.org/ on an RV730 PRO [Radeon HD 4650] with mesa-9.2-rc1 (but early mesa versions showed the same behaviour).

With Gentoo I'm able to switch the R600_LLVM via useflag, but as soon as I'm using a mesa version with this enabled OpenRA will no longer start. It will just display a black window, the loading symbols never apear.

Running apitrace gives (full apitrace as attachment):
10 glXChooseVisual(dpy = 0x15fbef0, [snip]) = &{visual = 0x1661f58, [snip]}
11 glXCreateContext(dpy = 0x15fbef0, vis = &{visual = 0x1661f58, [snip]) = 0x16734e0
12 glXMakeCurrent(dpy = 0x15fbef0, drawable = 20971535, ctx = 0x16734e0) = True
43 glXMakeCurrent(dpy = 0x15fbef0, drawable = 20971535, ctx = 0x16734e0) = True

Trying gdb it seems one of the mono threads get stuck in radeon_drm_cs_emit_ioctl(), the other 7 threads look like mono internal things relating to its garbage collector.

strace gives:
socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0) = 7
connect(7, {sa_family=AF_LOCAL, sun_path=@"/tmp/.X11-unix/X0"}, 20) = 0
[snip]
open("/dev/dri/card0", O_RDWR|O_CLOEXEC) = 9
[snip]
ioctl(9, 0xc010640b, 0x7fffeb471ea0)    = 0
ioctl(9, 0xc00c6469, 0x7fffeb471ec0)    = 0
ioctl(9, 0xc020645d, 0x7fffeb471d10)    = 0
ioctl(9, 0xc020645d, 0x7fffeb471b10)    = 0
ioctl(9, 0xc020645e, 0x7fffeb471b20)    = 0
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0x112992000) = 0x7f921dfe9000
ioctl(9, 0xc020645d, 0x7fffeb471b20)    = 0
ioctl(9, 0xc020645e, 0x7fffeb471b30)    = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0x1129a2000) = 0x7f921dfe8000
ioctl(9, VIDIOC_INT_RESET, 0x24460b0)   = 0
ioctl(9, 0xc020645d, 0x7fffeb471db0)    = 0
Then some more interactions with fd=7 until it gets stuck with:
futex(0x984280, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>

At that point only kill -9 helps.

Do you have anything I should try or any info I should provide?
Comment 15 Nicholas Miell 2013-08-26 09:11:58 UTC
r600g initializes LLVM without first setting the llvm::DisablePrettyStackTrace variable to true. If this variable is false (the default), LLVM will register a bunch of signal handlers, including for SIGXCPU and SIGPWR, both of which are used by Mono's garbage collector.

gallivm correctly sets llvm::DisablePrettyStackTrace to true, but it runs after r600g has already started calling into LLVM and the signal handlers have been registered.

If you set a breakpoint on r600_create_context, manually set llvm::DisablePrettyStackTrace to true and then continue, the application will function correctly. I tested this using Fractal (a Unity game which deadlocks in sem_wait on startup), Bastion (a MonoGame, also deadlocks in sem_wait), and RepetierHost (an OpenTK app which dies in the SIGXCPU handler at startup).
Comment 16 Nicholas Miell 2013-08-26 23:12:34 UTC
Created attachment 84675 [details] [review]
temporary workaround patch

Here's a temporary workaround patch. Not for merging, obviously.
Comment 17 Tom Stellard 2013-08-27 02:34:15 UTC
(In reply to comment #15)
> r600g initializes LLVM without first setting the
> llvm::DisablePrettyStackTrace variable to true. If this variable is false
> (the default), LLVM will register a bunch of signal handlers, including for
> SIGXCPU and SIGPWR, both of which are used by Mono's garbage collector.
> 
> gallivm correctly sets llvm::DisablePrettyStackTrace to true, but it runs
> after r600g has already started calling into LLVM and the signal handlers
> have been registered.
> 
> If you set a breakpoint on r600_create_context, manually set
> llvm::DisablePrettyStackTrace to true and then continue, the application
> will function correctly. I tested this using Fractal (a Unity game which
> deadlocks in sem_wait on startup), Bastion (a MonoGame, also deadlocks in
> sem_wait), and RepetierHost (an OpenTK app which dies in the SIGXCPU handler
> at startup).

Thanks for tracking this down.  I think we'll need to extend the LLVM C API in order to get access to this variable.  However, looking through the LLVM code it looks like the PrettyStackTrace handler is registered by a static initializer, so I wonder if setting this variable is enough and if we can guarantee that r600g will set this variable before the handler is initialized.

Also, this seems to me like it is a bug in LLVM.  Is it common practice for libraries to override signal handlers of applications?
Comment 18 Nicholas Miell 2013-08-27 05:22:38 UTC
(In reply to comment #17)
> Thanks for tracking this down.  I think we'll need to extend the LLVM C API
> in order to get access to this variable.  However, looking through the LLVM
> code it looks like the PrettyStackTrace handler is registered by a static
> initializer, so I wonder if setting this variable is enough and if we can
> guarantee that r600g will set this variable before the handler is
> initialized.

I don't think this is true -- IIRC, all the stack traces I saw were the result of one of the runOnFunction methods (either BBPassManager or FPPassManager, I wasn't paying attention) creating a PassManagerPrettyStackEntry object.

> Also, this seems to me like it is a bug in LLVM.  Is it common practice for
> libraries to override signal handlers of applications?

Common enough that both Mono and LLVM stomp on each other, but its unambiguously wrong for a shared library to globally modify signal handlers. (Temporarily setting a new handler on entry to your library and later restoring the saved handler before returning is fine, but that only works in the single-threaded case since handlers aren't per-thread. Arguably modern applications shouldn't use any signals at all.)

Mono (generally) gets away with it because it uses crazy signals that applications never touch (SIGPWR is only sent to PID 1 by the kernel on power failure, SIGXCPU is relic of obsolete job billing infrastructure that nobody uses), but had the bad luck of LLVM deciding to future-proof itself against all possible fatal signals.

If I were to be prescriptive, llvm::DisablePrettyStackTrace should be true by default, should only ever be set by clang, and shouldn't be a global variable.
Comment 19 hadack 2013-09-14 18:49:33 UTC
I can confirm that changing DisablePrettyStackTrace to true generally in llvm fixes the startup hang. Tested with different mono based Games(Expedition Conquistador, Rochard, Bastion) on radeonsi. And i have to say I'm quite happy with the performance in the games. Thanks, guys!
Comment 20 Laurent carlier 2013-10-19 16:50:45 UTC
*** Bug 70650 has been marked as a duplicate of this bug. ***
Comment 21 Radist Morse 2013-10-25 12:51:37 UTC
Confirm the bug on the several unity3d games.

Quickfix works.
Comment 22 zuxez 2013-10-31 21:23:47 UTC
Created attachment 88452 [details] [review]
Another approach

I can confirm the issue with Mono's signal handling and the hanging of the applications. After digging around Mesa, I came up with the attached patch. Used it successfully with Mesa 9.2.{0,1,2} and git. Probably the fine semantics are still up to the devs, though the static llvm variable is set in a static context and thus hopefully early enough.

Anyway I guess the default llvm value for that flag should probably be inverted.
Comment 23 Kai 2013-11-03 11:28:16 UTC
I can confirm this bug with radeonsi with various Unity-based games. With attachment 88452 [details] [review] applied everything works.


Stack:
GPU: "PITCAIRN" (ChipID = 0x6819)
Linux: 3.11.6
libdrm: 2.4.47
LLVM: SVN:trunk/r193475
libclc: Git:master/4c18120c1a
Mesa: Git:master/fa8b1514d3
GLAMOR: Git:master/ba209eeef2
DDX: Git:master/f1dc677e79
Comment 24 Laurent carlier 2013-11-04 08:26:41 UTC
Fixed since llvm-3.4svn rev193971. Now the default behavior in LLVM is to have PrettyStackTrace disabled.

Mesa needs also the following patches to build:
http://lists.freedesktop.org/archives/mesa-dev/2013-November/047501.html
http://lists.freedesktop.org/archives/mesa-dev/2013-November/047625.html