Summary: | Older UnrealEngine v4 demos compute shaders compilation never finishes | ||
---|---|---|---|
Product: | Mesa | Reporter: | Eero Tamminen <eero.t.tamminen> |
Component: | Drivers/DRI/i965 | Assignee: | Ian Romanick <idr> |
Status: | CLOSED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | lemody |
Version: | git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
MESA_GLSL=dump INTEL_DEBUG=perf,cs,vs,fs,hs,ds output
Full Valgrind log |
Created attachment 127560 [details]
Full Valgrind log
Warnings about invalid reads have disappeared with latest Mesa Git version, only uninitialized jump/move/use warnings remain (in _mesa_gl_vdebug(), libX11 and demo itself). I.e. earlier invalid read Valgrind warnings are unrelated to this issue. VehicleGame demo also uses CS, and also freezes at startup, also with today's Mesa git version. One minute period of perf recording shows process 100% CPU usage being split like this (i.e. it's compiling compute shader forever, like with others): # Overhead Command Shared Object Symbol # ........ ............... ....................... 37.66% i965_dri.so ra_add_node_adjacency 18.93% i965_dri.so ra_allocate 8.14% i965_dri.so brw::fs_live_variables::compute_start_end 7.46% i965_dri.so fs_visitor::virtual_grf_interferes 6.75% i965_dri.so decrement_q.isra.2 5.40% i965_dri.so fs_visitor::assign_regs 4.77% i965_dri.so ra_add_node_interference 1.80% libc-2.23.so __memcpy_sse2 1.74% i965_dri.so brw::fs_live_variables::compute_live_variables 1.21% libc-2.23.so _int_malloc 0.69% libc-2.23.so realloc Some of the other UnrealEngine v4 demos have also few compute shaders, this issue happens only with the ones where Mesa outputs this warning: Unsupported form of variable indexing in CS; falling back to very inefficient code generation (Freeze doesn't happen in CSDof test which also outputs that warning couple of times, though.) After Jason's "nir, i965/fs: Lower indirect local variables to scratch" patchset: https://patchwork.freedesktop.org/series/16385/ Mesa freeze spends its time in "instruction_scheduler::add_dep". After maybe 10 minutes, Mesa outputs: --------------------------------------- compute shader triggered register spilling. Try reducing the number of live scalar values to improve performance. Mesa 13.1.0-devel implementation error: Failed to compile compute shader: CS compile failed: Register spilling not supported with m14 used Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa Signal 11 caught. EngineCrashHandler: Signal=11 --------------------------------------- Reason why Mesa compiler finishes compilation with Martin's setup, is that his versions of demos have different shaders. Only my versions output this when Mesa compiles compute shaders: Unsupported form of variable indexing in CS; falling back to very inefficient code generation (My UE4 demos are from June 2014, Martin's from October 2014.) While this may be a bug in the shaders, Mesa still shouldn't just freeze forever in compilation. -> Leaving this still open, but it could be considered lower priority. There isn't anymore warning for: Unsupported form of variable indexing in CS; falling back to very inefficient code generation Only about register allocation failure. The older versions of Effects Cave and Subway Reflections demos work now. -> Marking as fixed. Note: compilation of the problematic CS shader takes 1/2 hour and the demo performance is approximately 1/100th of the performance without compute shaders, or of the performance of the new versions of the demos. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 127553 [details] MESA_GLSL=dump INTEL_DEBUG=perf,cs,vs,fs,hs,ds output Mesa 12.0 and latest Mesa from git (aca491341b57fac05901943d693e264b589925f5) never finish compiling Subway Reflections and Effects Cave demos compute shaders. This is where it stops progressing: ----------- $ INTEL_DEBUG=perf Engine/Binaries/Linux/Effects ... SIMD16 shader failed to compile: FS compile failed: Failure to register allocate. Reduce number of live scalar values to avoid this. Multi-LOD fast clear - giving up (512x256x7). Multi-LOD fast clear - giving up (512x256x7). VS compile took 0.430 ms and stalled the GPU SIMD16 shader failed to compile: FS compile failed: Failure to register allocate. Reduce number of live scalar values to avoid this. Unsupported form of variable indexing in VS; falling back to very inefficient code generation Layered fast clear - giving up. (16x1616) Layered fast clear - giving up. (16x1616) Recompiling fragment shader for program 274 number of color buffers 2->1 Multi-LOD fast clear - giving up (128x128x7). 128x57624 miptree too large to blit, falling back to untiledRecompiling fragment shader for program 127 number of color buffers 0->5 Recompiling fragment shader for program 284 number of color buffers 1->0 Recompiling fragment shader for program 286 number of color buffers 1->0 SIMD16 shader failed to compile: FS compile failed: Failure to register allocate. Reduce number of live scalar values to avoid this. Unsupported form of variable indexing in CS; falling back to very inefficient code generation compute shader triggered register spilling. Try reducing the number of live scalar values to improve performance. ----------- I waited until next day without Mesa outputting anything more and demo not starting. According to perf, Mesa is still compiling shaders. At that point, demo memory usage had grown from 2GB to 10GB. If I use MESA_GL_VERSION_OVERRIDE=4.2 to disable compute shader use, demos start in <10 seconds and use <2GB memory. According to perf, after 20 hours of compiling, this is where Mesa spends its time (within one 5 minute period): ----------------------------------------------------- 33,23% RenderThread i965_dri.so ra_allocate 20,88% RenderThread i965_dri.so fs_visitor::virtual_grf_interferes 17,70% RenderThread i965_dri.so ra_add_node_adjacency 9,44% RenderThread i965_dri.so fs_visitor::assign_regs 5,51% RenderThread i965_dri.so brw::fs_live_variables::compute_start_end 2,94% RenderThread i965_dri.so decrement_q.isra.2 2,06% RenderThread i965_dri.so ra_add_node_interference 1,08% RenderThread i965_dri.so brw::fs_live_variables::compute_live_variables ----------------------------------------------------- According to Martin, he cannot reproduce the issue on Arch Linux, so it's possible that it's some bug in code that manifests only with certain build / run-time environment. However, it happens both with Mesa compiled on up to date Ubuntu 16.04, also when using GCC 4.9 instead of the default GCC 5.4. It happens also another machine, when using latest drm-intel kernel and X instead of Ubuntu versions of them. Attached is the 40MB shader code output from Mesa until that point. Just search for the last "falling back to very inefficient code generation" message to see what Mesa is trying to compile. Not sure whether it's related, but before the issue, there are several Valgrind warnings about shader compilation accessing already freed memory (in some earlier compiled shaders), like this: -------------------------------- ==16996== Thread 9 RenderThread 1: ==16996== Invalid read of size 1 ==16996== at 0x4C30F62: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==16996== by 0xEA94D16: ralloc_strdup (ralloc.c:350) ==16996== by 0xEAD1104: nir_shader_clone (nir_clone.c:714) ==16996== by 0xEBC0D00: brw_compile_fs (brw_fs.cpp:6422) ==16996== by 0xEB641FD: brw_codegen_wm_prog (brw_wm.c:147) ==16996== by 0xEB64D34: brw_upload_wm_prog (brw_wm.c:587) ==16996== by 0xEB5CC02: brw_upload_programs (brw_state_upload.c:736) ==16996== by 0xEB5CC02: brw_upload_pipeline_state (brw_state_upload.c:835) ==16996== by 0xEB5CC02: brw_upload_render_state (brw_state_upload.c:896) ==16996== by 0xEB459B9: brw_try_draw_prims (brw_draw.c:584) ==16996== by 0xEB459B9: brw_draw_prims (brw_draw.c:675) ==16996== by 0xE9730AF: vbo_validated_drawrangeelements (vbo_exec_array.c:813) ==16996== by 0xE97314A: vbo_exec_DrawElementsInstanced (vbo_exec_array.c:989) ==16996== at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==16996== by 0xEA9490C: unsafe_free (ralloc.c:251) ==16996== by 0xEA9490C: unsafe_free (ralloc.c:251) ==16996== by 0xEB642B9: brw_codegen_wm_prog (brw_wm.c:189) ==16996== by 0xEB64E66: brw_fs_precompile (brw_wm.c:647) ==16996== by 0xEB4CE36: brw_shader_precompile (brw_link.cpp:53) ==16996== by 0xEB4CE36: brw_link_shader (brw_link.cpp:289) ==16996== by 0xE9F95A1: _mesa_glsl_link_shader (ir_to_mesa.cpp:3067) ==16996== Block was alloc'd at ==16996== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==16996== by 0xEA94A41: rzalloc_size (ralloc.c:125) ==16996== by 0xEA94D24: ralloc_size (ralloc.c:119) ==16996== by 0xEA94D24: ralloc_array_size (ralloc.c:190) ==16996== by 0xEA94D24: ralloc_strdup (ralloc.c:351) ==16996== by 0xEAD1104: nir_shader_clone (nir_clone.c:714) ==16996== by 0xEBC0D00: brw_compile_fs (brw_fs.cpp:6422) ==16996== by 0xEB641FD: brw_codegen_wm_prog (brw_wm.c:147) ==16996== by 0xEB64E66: brw_fs_precompile (brw_wm.c:647) ==16996== by 0xEB4CE36: brw_shader_precompile (brw_link.cpp:53) ==16996== by 0xEB4CE36: brw_link_shader (brw_link.cpp:289) ==16996== by 0xE9F95A1: _mesa_glsl_link_shader (ir_to_mesa.cpp:3067) --------------------------------