Summary: | [r600-llvm] mono games with opengl are blocking on start | ||
---|---|---|---|
Product: | Mesa | Reporter: | Laurent carlier <lordheavym> |
Component: | Drivers/Gallium/r600 | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | hadack, kai, nmiell, openproggerfreak, x11 |
Version: | git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
See Also: |
http://llvm.org/bugs/show_bug.cgi?id=12109 https://bugzilla.novell.com/show_bug.cgi?id=839074 |
||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
apitrace from hanging startup of OpenRA
temporary workaround patch Another approach |
Description
Laurent carlier
2013-02-15 21:29:44 UTC
This seem fixed with --enable-shared-llvm *** Bug 64788 has been marked as a duplicate of this bug. *** Is this still an issue? Yes, it's still an issue, i have to disable R600_LLVM to play mono games Any news on this one? Its really bad on radeonsi, since theres no workaround. Still happens with llvm and mesa from git. (In reply to comment #5) > Any news on this one? Its really bad on radeonsi, since theres no workaround. Doesn't --enable-shared-llvm work for you? Any pointers to freely downloadable games for testing? (In reply to comment #6) > (In reply to comment #5) > > Any news on this one? Its really bad on radeonsi, since theres no workaround. > > Doesn't --enable-shared-llvm work for you? > > Any pointers to freely downloadable games for testing? It's reproducible with Surgeon Simulator 2013 Demo: http://downloads.bossastudios.com/ss2013/surgeonsimulator2013_linux.zip (In reply to comment #6) > Doesn't --enable-shared-llvm work for you? > > Any pointers to freely downloadable games for testing? --enable-shared-llvm doesn't make a difference. Here is a small and free game: http://www.desura.com/games/battlemass/download I'm not really sure what's happening here, but I don't think these closed source games are good enough tests cases to diagnose the problem. Could you try to find a very simple Open Source mono program that will reproduce this bug? (In reply to comment #9) > I'm not really sure what's happening here, but I don't think these closed > source games are good enough tests cases to diagnose the problem. Could you > try to find a very simple Open Source mono program that will reproduce this > bug? It seems i'm able to reproduce the problem with opentk opengl examples: http://www.opentk.com/ http://sourceforge.net/projects/opentk/ (In reply to comment #10) > (In reply to comment #9) > > I'm not really sure what's happening here, but I don't think these closed > > source games are good enough tests cases to diagnose the problem. Could you > > try to find a very simple Open Source mono program that will reproduce this > > bug? > > It seems i'm able to reproduce the problem with opentk opengl examples: > http://www.opentk.com/ > http://sourceforge.net/projects/opentk/ Can you point me to instructions for how to compile this code. There are no makefiles, only visual studio project files. (In reply to comment #11) > > It seems i'm able to reproduce the problem with opentk opengl examples: > > http://www.opentk.com/ > > http://sourceforge.net/projects/opentk/ > > Can you point me to instructions for how to compile this code. There are no > makefiles, only visual studio project files. I've used the "package" from AUR: https://aur.archlinux.org/packages/opentk/ where you can find a tarball with the source package It is described in the building from source section here: http://www.opentk.com/doc/chapter/1/linux I did this in the opentk folder: xbuild OpenTK.sln /p:Configuration=Debug cd Binaries/OpenTK/Debug mono Examples.exe Some OpenGL examples show similar symptoms, they just stop at some point, others are quitting with a timelimit exceeded message. Created attachment 84467 [details] apitrace from hanging startup of OpenRA I'm seeing the same problem with the mono game OpenRA from http://open-ra.org/ on an RV730 PRO [Radeon HD 4650] with mesa-9.2-rc1 (but early mesa versions showed the same behaviour). With Gentoo I'm able to switch the R600_LLVM via useflag, but as soon as I'm using a mesa version with this enabled OpenRA will no longer start. It will just display a black window, the loading symbols never apear. Running apitrace gives (full apitrace as attachment): 10 glXChooseVisual(dpy = 0x15fbef0, [snip]) = &{visual = 0x1661f58, [snip]} 11 glXCreateContext(dpy = 0x15fbef0, vis = &{visual = 0x1661f58, [snip]) = 0x16734e0 12 glXMakeCurrent(dpy = 0x15fbef0, drawable = 20971535, ctx = 0x16734e0) = True 43 glXMakeCurrent(dpy = 0x15fbef0, drawable = 20971535, ctx = 0x16734e0) = True Trying gdb it seems one of the mono threads get stuck in radeon_drm_cs_emit_ioctl(), the other 7 threads look like mono internal things relating to its garbage collector. strace gives: socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0) = 7 connect(7, {sa_family=AF_LOCAL, sun_path=@"/tmp/.X11-unix/X0"}, 20) = 0 [snip] open("/dev/dri/card0", O_RDWR|O_CLOEXEC) = 9 [snip] ioctl(9, 0xc010640b, 0x7fffeb471ea0) = 0 ioctl(9, 0xc00c6469, 0x7fffeb471ec0) = 0 ioctl(9, 0xc020645d, 0x7fffeb471d10) = 0 ioctl(9, 0xc020645d, 0x7fffeb471b10) = 0 ioctl(9, 0xc020645e, 0x7fffeb471b20) = 0 mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0x112992000) = 0x7f921dfe9000 ioctl(9, 0xc020645d, 0x7fffeb471b20) = 0 ioctl(9, 0xc020645e, 0x7fffeb471b30) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0x1129a2000) = 0x7f921dfe8000 ioctl(9, VIDIOC_INT_RESET, 0x24460b0) = 0 ioctl(9, 0xc020645d, 0x7fffeb471db0) = 0 Then some more interactions with fd=7 until it gets stuck with: futex(0x984280, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> At that point only kill -9 helps. Do you have anything I should try or any info I should provide? r600g initializes LLVM without first setting the llvm::DisablePrettyStackTrace variable to true. If this variable is false (the default), LLVM will register a bunch of signal handlers, including for SIGXCPU and SIGPWR, both of which are used by Mono's garbage collector. gallivm correctly sets llvm::DisablePrettyStackTrace to true, but it runs after r600g has already started calling into LLVM and the signal handlers have been registered. If you set a breakpoint on r600_create_context, manually set llvm::DisablePrettyStackTrace to true and then continue, the application will function correctly. I tested this using Fractal (a Unity game which deadlocks in sem_wait on startup), Bastion (a MonoGame, also deadlocks in sem_wait), and RepetierHost (an OpenTK app which dies in the SIGXCPU handler at startup). Created attachment 84675 [details] [review] temporary workaround patch Here's a temporary workaround patch. Not for merging, obviously. (In reply to comment #15) > r600g initializes LLVM without first setting the > llvm::DisablePrettyStackTrace variable to true. If this variable is false > (the default), LLVM will register a bunch of signal handlers, including for > SIGXCPU and SIGPWR, both of which are used by Mono's garbage collector. > > gallivm correctly sets llvm::DisablePrettyStackTrace to true, but it runs > after r600g has already started calling into LLVM and the signal handlers > have been registered. > > If you set a breakpoint on r600_create_context, manually set > llvm::DisablePrettyStackTrace to true and then continue, the application > will function correctly. I tested this using Fractal (a Unity game which > deadlocks in sem_wait on startup), Bastion (a MonoGame, also deadlocks in > sem_wait), and RepetierHost (an OpenTK app which dies in the SIGXCPU handler > at startup). Thanks for tracking this down. I think we'll need to extend the LLVM C API in order to get access to this variable. However, looking through the LLVM code it looks like the PrettyStackTrace handler is registered by a static initializer, so I wonder if setting this variable is enough and if we can guarantee that r600g will set this variable before the handler is initialized. Also, this seems to me like it is a bug in LLVM. Is it common practice for libraries to override signal handlers of applications? (In reply to comment #17) > Thanks for tracking this down. I think we'll need to extend the LLVM C API > in order to get access to this variable. However, looking through the LLVM > code it looks like the PrettyStackTrace handler is registered by a static > initializer, so I wonder if setting this variable is enough and if we can > guarantee that r600g will set this variable before the handler is > initialized. I don't think this is true -- IIRC, all the stack traces I saw were the result of one of the runOnFunction methods (either BBPassManager or FPPassManager, I wasn't paying attention) creating a PassManagerPrettyStackEntry object. > Also, this seems to me like it is a bug in LLVM. Is it common practice for > libraries to override signal handlers of applications? Common enough that both Mono and LLVM stomp on each other, but its unambiguously wrong for a shared library to globally modify signal handlers. (Temporarily setting a new handler on entry to your library and later restoring the saved handler before returning is fine, but that only works in the single-threaded case since handlers aren't per-thread. Arguably modern applications shouldn't use any signals at all.) Mono (generally) gets away with it because it uses crazy signals that applications never touch (SIGPWR is only sent to PID 1 by the kernel on power failure, SIGXCPU is relic of obsolete job billing infrastructure that nobody uses), but had the bad luck of LLVM deciding to future-proof itself against all possible fatal signals. If I were to be prescriptive, llvm::DisablePrettyStackTrace should be true by default, should only ever be set by clang, and shouldn't be a global variable. I can confirm that changing DisablePrettyStackTrace to true generally in llvm fixes the startup hang. Tested with different mono based Games(Expedition Conquistador, Rochard, Bastion) on radeonsi. And i have to say I'm quite happy with the performance in the games. Thanks, guys! *** Bug 70650 has been marked as a duplicate of this bug. *** Confirm the bug on the several unity3d games. Quickfix works. Created attachment 88452 [details] [review] Another approach I can confirm the issue with Mono's signal handling and the hanging of the applications. After digging around Mesa, I came up with the attached patch. Used it successfully with Mesa 9.2.{0,1,2} and git. Probably the fine semantics are still up to the devs, though the static llvm variable is set in a static context and thus hopefully early enough. Anyway I guess the default llvm value for that flag should probably be inverted. I can confirm this bug with radeonsi with various Unity-based games. With attachment 88452 [details] [review] applied everything works. Stack: GPU: "PITCAIRN" (ChipID = 0x6819) Linux: 3.11.6 libdrm: 2.4.47 LLVM: SVN:trunk/r193475 libclc: Git:master/4c18120c1a Mesa: Git:master/fa8b1514d3 GLAMOR: Git:master/ba209eeef2 DDX: Git:master/f1dc677e79 Fixed since llvm-3.4svn rev193971. Now the default behavior in LLVM is to have PrettyStackTrace disabled. Mesa needs also the following patches to build: http://lists.freedesktop.org/archives/mesa-dev/2013-November/047501.html http://lists.freedesktop.org/archives/mesa-dev/2013-November/047625.html |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.