Created attachment 133142 [details] backtrace and dmesg Ubuntu 16.04.2 LTS x86_64 kernel 4.12.4-041204-generic Latest mesa git as of today 07/30 Latest UE4 4.17 out git branch as of today (4.17 not officially released yet) clang 3.9 3.9.1-4ubuntu3~16.04.1 (tags/RELEASE_391/rc2) (used for mesa and UE4) Starting the editor, or starting directly on a game, either in gl3 or gl4 mode causes a hang. You have to use -NOSPLASH to avoid another, unrelated issue. My typical command line looks like this: UnrealEngine> set ver 3.9 ; env LD_LIBRARY_PATH=$HOME/usr/lib:/usr/lib/llvm-$ver/lib LIBGL_DRIVERS_PATH=$HOME/usr/lib/dri:/usr/lib/i386-linux-gnu/dri LIBGL_DEBUG=verbose VK_ICD_FILENAMES=/opt/Valve/mesa/src/amd/vulkan/dev_icd.json ./Engine/Binaries/Linux/UE4Editor-Linux-Debug -opengl4 -NOSPLASH For GL3, pass -opengl3 on the command line and set env var MESA_GL_VERSION_OVERRIDE=3.2 (results are the same) Attaching backtrace and dmesg
Default/release config drivers will cause a system hang. Debug drivers will cause an assert: si_shader.c:7417: _Bool si_shader_select_ps_parts(struct si_screen *, LLVMTargetMachineRef, struct si_shader *, struct pipe_debug_callback *): Assertion `G_0286CC_LINEAR_CENTER_ENA(shader->config.spi_ps_input_addr)' failed.
hakzsam on irc was able to reproduce the issue on his system. also to note, hakzam uses a different LLVM version than I do. problem does not reproduce with nouveau driver, only radeonsi ran with MESA_GLSL=dump R600_DEBUG=ps,vs,tcs,tes,cs,gs to extract more information: https://gist.github.com/TTimo/4f08718e1c5d9de003d617e3f0daea2a after debugging some more, to the best of my understanding, the assertion happens against the config of 'shader 3' (https://gist.github.com/TTimo/4f08718e1c5d9de003d617e3f0daea2a#file-gistfile1-txt-L411)
Created attachment 133372 [details] apitrace Stepped through and commented out code until I could narrow down the last GL call that leads to a crash. Captured an apitrace up until the call that will ultimately cause the problem. https://gist.github.com/TTimo/05222bd524b534977c5e72bcb3df3dfc
Created attachment 133373 [details] apitrace FBlackCubeArrayTexture::InitRHI Another trace: same crash, but initializing a different resource (FBlackCubeArrayTexture::InitRHI). The skipped call triggering the crash: TexSubImage3D #2: target = 36873 level = 0 xoffset = 0 yoffset = 0 zoffset = 0 width = 1 height = 1 depth = 1 format = 32993 type = 33639 pixels = 0
This can't be reproduce with LLVM 6.0svn and Radeon Fury. Trying LLVM 3.9...
The issue can't be reproduced with the traces here as well. But if you build the UE4 editor from github, the issue can be reproduced with LLVM6.0svn.
Does the issue occur with any of these: R600_DEBUG=mono R600_DEBUG=nooptvariant
Yes for both.
(In reply to Samuel Pitoiset from comment #6) > The issue can't be reproduced with the traces here as well. But if you build > the UE4 editor from github, the issue can be reproduced with LLVM6.0svn. Have you modified the traces to include the call that triggers the crash? The traces I uploaded only recorded the GL calls up to the last call before the crash is triggered, so unless you know how to manually add a call, they won't cause a crash.
A similar crash happens if UE4 is started in Vulkan mode: https://gist.github.com/TTimo/28fa434142fb59e66ae469ed7f7ef034 SIGFPE happens because pipeline->shaders[4]->config.num_vgprs == 0, which is consistent with the empty config reads from the LLVM shader compilation
Samuel, can you share here what you found out when you were looking into the issue? Thanks.
.AMDGPU.config coming from LLVM is empty with UE4Editor. This assertion fails: diff --git a/src/amd/common/ac_binary.c b/src/amd/common/ac_binary.c index 618b5cf..2fbb575 100644 --- a/src/amd/common/ac_binary.c +++ b/src/amd/common/ac_binary.c @@ -148,6 +148,7 @@ void ac_elf_read(const char *elf_data, unsigned elf_size, } else if (!strcmp(name, ".AMDGPU.config")) { section_data = elf_getdata(section, section_data); binary->config_size = section_data->d_size; + assert(binary->config_size); binary->config = MALLOC(binary->config_size * sizeof(unsigned char)); memcpy(binary->config, section_data->d_buf, binary->config_size); } else if (!strcmp(name, ".AMDGPU.disasm")) { Any ideas?
llvm seems fine, it seems to be libelf that is broken. I've noticed UE4 exports it's own libelf symbols but I haven't determined that is the problem yet.
Yup once I hid the libelf symbols, it all works.
Interesting. So it's a UE4 bug after all. If UE4 didn't export its own libelf functions, it would work. When the driver is loaded, the dynamic linker loads libelf, but since UE4 exports the same function names as libelf does, libelf functions are not loaded at all and the functions from UE4 are exposed to the driver instead. The driver, thinking it's calling libelf, is actually invoking the UE4 functions of the same name. A temporary workaround is to load the system libelf first by doing: LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libelf.so ./UE4Editor It will have the opposite effect. UE4 will use system libelf instead of its own, because the symbols conflict and the system one is loaded first. This bug should be fixed in UE4 though. I'm closing this bug, because there is nothing Mesa can do here.
Thanks for the help and thorough investigation! I will follow up from there with Epic to get this addressed UE4 side.
Enabling the UE4 linker script fixes this also. (It got disabled for 4.17). It appears that versions the symbols on the UE side.
https://github.com/EpicGames/UnrealEngine/pull/3901
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.