Bug 101977

Summary: UE4 4.17 causes Assertion `G_0286CC_LINEAR_CENTER_ENA(shader->config.spi_ps_input_addr)' failed
Product: Mesa Reporter: Timothee Besset <ttimo>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED NOTOURBUG QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: linuxdonald
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: backtrace and dmesg
apitrace
apitrace FBlackCubeArrayTexture::InitRHI

Description Timothee Besset 2017-07-31 00:55:38 UTC
Created attachment 133142 [details]
backtrace and dmesg

Ubuntu 16.04.2 LTS x86_64
kernel 4.12.4-041204-generic
Latest mesa git as of today 07/30
Latest UE4 4.17 out git branch as of today (4.17 not officially released yet)
clang 3.9 3.9.1-4ubuntu3~16.04.1 (tags/RELEASE_391/rc2) (used for mesa and UE4)

Starting the editor, or starting directly on a game, either in gl3 or gl4 mode causes a hang. You have to use -NOSPLASH to avoid another, unrelated issue. My typical command line looks like this:

UnrealEngine> set ver 3.9 ; env LD_LIBRARY_PATH=$HOME/usr/lib:/usr/lib/llvm-$ver/lib LIBGL_DRIVERS_PATH=$HOME/usr/lib/dri:/usr/lib/i386-linux-gnu/dri LIBGL_DEBUG=verbose VK_ICD_FILENAMES=/opt/Valve/mesa/src/amd/vulkan/dev_icd.json ./Engine/Binaries/Linux/UE4Editor-Linux-Debug -opengl4 -NOSPLASH

For GL3, pass -opengl3 on the command line and set env var MESA_GL_VERSION_OVERRIDE=3.2 (results are the same)

Attaching backtrace and dmesg
Comment 1 Timothee Besset 2017-08-02 22:08:58 UTC
Default/release config drivers will cause a system hang.
Debug drivers will cause an assert:

si_shader.c:7417: _Bool si_shader_select_ps_parts(struct si_screen *, LLVMTargetMachineRef, struct si_shader *, struct pipe_debug_callback *): Assertion `G_0286CC_LINEAR_CENTER_ENA(shader->config.spi_ps_input_addr)' failed.
Comment 2 Timothee Besset 2017-08-02 23:26:00 UTC
hakzsam on irc was able to reproduce the issue on his system.
also to note, hakzam uses a different LLVM version than I do.

problem does not reproduce with nouveau driver, only radeonsi

ran with MESA_GLSL=dump R600_DEBUG=ps,vs,tcs,tes,cs,gs to extract more information:

https://gist.github.com/TTimo/4f08718e1c5d9de003d617e3f0daea2a

after debugging some more, to the best of my understanding, the assertion happens against the config of 'shader 3' (https://gist.github.com/TTimo/4f08718e1c5d9de003d617e3f0daea2a#file-gistfile1-txt-L411)
Comment 3 Timothee Besset 2017-08-07 22:57:25 UTC
Created attachment 133372 [details]
apitrace

Stepped through and commented out code until I could narrow down the last GL call that leads to a crash. Captured an apitrace up until the call that will ultimately cause the problem.

https://gist.github.com/TTimo/05222bd524b534977c5e72bcb3df3dfc
Comment 4 Timothee Besset 2017-08-07 23:49:49 UTC
Created attachment 133373 [details]
apitrace FBlackCubeArrayTexture::InitRHI

Another trace: same crash, but initializing a different resource (FBlackCubeArrayTexture::InitRHI).

The skipped call triggering the crash:

TexSubImage3D #2: target = 36873 level = 0 xoffset = 0 yoffset = 0 zoffset = 0 width = 1 height = 1 depth = 1 format = 32993 type = 33639 pixels = 0
Comment 5 Marek Olšák 2017-08-08 11:07:31 UTC
This can't be reproduce with LLVM 6.0svn and Radeon Fury. Trying LLVM 3.9...
Comment 6 Samuel Pitoiset 2017-08-08 11:19:48 UTC
The issue can't be reproduced with the traces here as well. But if you build the UE4 editor from github, the issue can be reproduced with LLVM6.0svn.
Comment 7 Marek Olšák 2017-08-08 11:31:14 UTC
Does the issue occur with any of these:

R600_DEBUG=mono
R600_DEBUG=nooptvariant
Comment 8 Samuel Pitoiset 2017-08-08 11:44:47 UTC
Yes for both.
Comment 9 Timothee Besset 2017-08-08 12:09:45 UTC
(In reply to Samuel Pitoiset from comment #6)
> The issue can't be reproduced with the traces here as well. But if you build
> the UE4 editor from github, the issue can be reproduced with LLVM6.0svn.

Have you modified the traces to include the call that triggers the crash? The traces I uploaded only recorded the GL calls up to the last call before the crash is triggered, so unless you know how to manually add a call, they won't cause a crash.
Comment 10 Timothee Besset 2017-08-08 16:28:43 UTC
A similar crash happens if UE4 is started in Vulkan mode:

https://gist.github.com/TTimo/28fa434142fb59e66ae469ed7f7ef034

SIGFPE happens because pipeline->shaders[4]->config.num_vgprs == 0, which is consistent with the empty config reads from the LLVM shader compilation
Comment 11 Marek Olšák 2017-08-08 17:03:43 UTC
Samuel, can you share here what you found out when you were looking into the issue? Thanks.
Comment 12 Marek Olšák 2017-08-08 21:12:58 UTC
.AMDGPU.config coming from LLVM is empty with UE4Editor. This assertion fails:

diff --git a/src/amd/common/ac_binary.c b/src/amd/common/ac_binary.c
index 618b5cf..2fbb575 100644
--- a/src/amd/common/ac_binary.c
+++ b/src/amd/common/ac_binary.c
@@ -148,6 +148,7 @@ void ac_elf_read(const char *elf_data, unsigned elf_size,
                } else if (!strcmp(name, ".AMDGPU.config")) {
                        section_data = elf_getdata(section, section_data);
                        binary->config_size = section_data->d_size;
+                       assert(binary->config_size);
                        binary->config = MALLOC(binary->config_size * sizeof(unsigned char));
                        memcpy(binary->config, section_data->d_buf, binary->config_size);
                } else if (!strcmp(name, ".AMDGPU.disasm")) {

Any ideas?
Comment 13 Dave Airlie 2017-08-08 22:23:56 UTC
llvm seems fine, it seems to be libelf that is broken.

I've noticed UE4 exports it's own libelf symbols but I haven't determined that is the problem yet.
Comment 14 Dave Airlie 2017-08-08 22:28:44 UTC
Yup once I hid the libelf symbols, it all works.
Comment 15 Marek Olšák 2017-08-08 22:54:31 UTC
Interesting. So it's a UE4 bug after all. If UE4 didn't export its own libelf functions, it would work.

When the driver is loaded, the dynamic linker loads libelf, but since UE4 exports the same function names as libelf does, libelf functions are not loaded at all and the functions from UE4 are exposed to the driver instead. The driver, thinking it's calling libelf, is actually invoking the UE4 functions of the same name.

A temporary workaround is to load the system libelf first by doing:

LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libelf.so ./UE4Editor

It will have the opposite effect. UE4 will use system libelf instead of its own, because the symbols conflict and the system one is loaded first.

This bug should be fixed in UE4 though.

I'm closing this bug, because there is nothing Mesa can do here.
Comment 16 Timothee Besset 2017-08-08 23:06:54 UTC
Thanks for the help and thorough investigation! I will follow up from there with Epic to get this addressed UE4 side.
Comment 17 Dave Airlie 2017-08-08 23:25:27 UTC
Enabling the UE4 linker script fixes this also. (It got disabled for 4.17).

It appears that versions the symbols on the UE side.
Comment 18 Timothee Besset 2017-08-13 16:59:12 UTC
https://github.com/EpicGames/UnrealEngine/pull/3901

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.