Bug 98455 - Older UnrealEngine v4 demos compute shaders compilation never finishes
Summary: Older UnrealEngine v4 demos compute shaders compilation never finishes
Status: CLOSED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-26 16:49 UTC by Eero Tamminen
Modified: 2017-01-23 14:49 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
MESA_GLSL=dump INTEL_DEBUG=perf,cs,vs,fs,hs,ds output (4.30 MB, application/gzip)
2016-10-26 16:49 UTC, Eero Tamminen
Details
Full Valgrind log (91.13 KB, text/plain)
2016-10-27 09:40 UTC, Eero Tamminen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eero Tamminen 2016-10-26 16:49:50 UTC
Created attachment 127553 [details]
MESA_GLSL=dump INTEL_DEBUG=perf,cs,vs,fs,hs,ds output

Mesa 12.0 and latest Mesa from git (aca491341b57fac05901943d693e264b589925f5) never finish compiling Subway Reflections and Effects Cave demos compute shaders.

This is where it stops progressing:
-----------
$ INTEL_DEBUG=perf Engine/Binaries/Linux/Effects
...
SIMD16 shader failed to compile: FS compile failed: Failure to register allocate.  Reduce number of live scalar values to avoid this.
Multi-LOD fast clear - giving up (512x256x7).
Multi-LOD fast clear - giving up (512x256x7).
VS compile took 0.430 ms and stalled the GPU
SIMD16 shader failed to compile: FS compile failed: Failure to register allocate.  Reduce number of live scalar values to avoid this.
Unsupported form of variable indexing in VS; falling back to very inefficient code generation
Layered fast clear - giving up. (16x1616)
Layered fast clear - giving up. (16x1616)
Recompiling fragment shader for program 274
  number of color buffers 2->1
Multi-LOD fast clear - giving up (128x128x7).
128x57624 miptree too large to blit, falling back to untiledRecompiling fragment shader for program 127
  number of color buffers 0->5
Recompiling fragment shader for program 284
  number of color buffers 1->0
Recompiling fragment shader for program 286
  number of color buffers 1->0
SIMD16 shader failed to compile: FS compile failed: Failure to register allocate.  Reduce number of live scalar values to avoid this.
Unsupported form of variable indexing in CS; falling back to very inefficient code generation
compute shader triggered register spilling.  Try reducing the number of live scalar values to improve performance.
-----------

I waited until next day without Mesa outputting anything more and demo not starting.  According to perf, Mesa is still compiling shaders.  At that point, demo memory usage had grown from 2GB to 10GB.

If I use MESA_GL_VERSION_OVERRIDE=4.2 to disable compute shader use, demos start in <10 seconds and use <2GB memory.


According to perf, after 20 hours of compiling, this is where Mesa spends its time (within one 5 minute period):
-----------------------------------------------------
33,23%  RenderThread  i965_dri.so  ra_allocate
20,88%  RenderThread  i965_dri.so  fs_visitor::virtual_grf_interferes
17,70%  RenderThread  i965_dri.so  ra_add_node_adjacency
 9,44%  RenderThread  i965_dri.so  fs_visitor::assign_regs
 5,51%  RenderThread  i965_dri.so  brw::fs_live_variables::compute_start_end
 2,94%  RenderThread  i965_dri.so  decrement_q.isra.2
 2,06%  RenderThread  i965_dri.so  ra_add_node_interference
 1,08%  RenderThread  i965_dri.so  brw::fs_live_variables::compute_live_variables
-----------------------------------------------------

According to Martin, he cannot reproduce the issue on Arch Linux, so it's possible that it's some bug in code that manifests only with certain build / run-time environment.

However, it happens both with Mesa compiled on up to date Ubuntu 16.04, also when using GCC 4.9 instead of the default GCC 5.4.  It happens also another machine, when using latest drm-intel kernel and X instead of Ubuntu versions of them.

Attached is the 40MB shader code output from Mesa until that point.  Just search for the last "falling back to very inefficient code generation" message to see what Mesa is trying to compile.


Not sure whether it's related, but before the issue, there are several Valgrind warnings about shader compilation accessing already freed memory (in some earlier compiled shaders), like this:
--------------------------------
==16996== Thread 9 RenderThread 1:
==16996== Invalid read of size 1
==16996==    at 0x4C30F62: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16996==    by 0xEA94D16: ralloc_strdup (ralloc.c:350)
==16996==    by 0xEAD1104: nir_shader_clone (nir_clone.c:714)
==16996==    by 0xEBC0D00: brw_compile_fs (brw_fs.cpp:6422)
==16996==    by 0xEB641FD: brw_codegen_wm_prog (brw_wm.c:147)
==16996==    by 0xEB64D34: brw_upload_wm_prog (brw_wm.c:587)
==16996==    by 0xEB5CC02: brw_upload_programs (brw_state_upload.c:736)
==16996==    by 0xEB5CC02: brw_upload_pipeline_state (brw_state_upload.c:835)
==16996==    by 0xEB5CC02: brw_upload_render_state (brw_state_upload.c:896)
==16996==    by 0xEB459B9: brw_try_draw_prims (brw_draw.c:584)
==16996==    by 0xEB459B9: brw_draw_prims (brw_draw.c:675)
==16996==    by 0xE9730AF: vbo_validated_drawrangeelements (vbo_exec_array.c:813)
==16996==    by 0xE97314A: vbo_exec_DrawElementsInstanced (vbo_exec_array.c:989)
==16996==    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16996==    by 0xEA9490C: unsafe_free (ralloc.c:251)
==16996==    by 0xEA9490C: unsafe_free (ralloc.c:251)
==16996==    by 0xEB642B9: brw_codegen_wm_prog (brw_wm.c:189)
==16996==    by 0xEB64E66: brw_fs_precompile (brw_wm.c:647)
==16996==    by 0xEB4CE36: brw_shader_precompile (brw_link.cpp:53)
==16996==    by 0xEB4CE36: brw_link_shader (brw_link.cpp:289)
==16996==    by 0xE9F95A1: _mesa_glsl_link_shader (ir_to_mesa.cpp:3067)
==16996==  Block was alloc'd at
==16996==    at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16996==    by 0xEA94A41: rzalloc_size (ralloc.c:125)
==16996==    by 0xEA94D24: ralloc_size (ralloc.c:119)
==16996==    by 0xEA94D24: ralloc_array_size (ralloc.c:190)
==16996==    by 0xEA94D24: ralloc_strdup (ralloc.c:351)
==16996==    by 0xEAD1104: nir_shader_clone (nir_clone.c:714)
==16996==    by 0xEBC0D00: brw_compile_fs (brw_fs.cpp:6422)
==16996==    by 0xEB641FD: brw_codegen_wm_prog (brw_wm.c:147)
==16996==    by 0xEB64E66: brw_fs_precompile (brw_wm.c:647)
==16996==    by 0xEB4CE36: brw_shader_precompile (brw_link.cpp:53)
==16996==    by 0xEB4CE36: brw_link_shader (brw_link.cpp:289)
==16996==    by 0xE9F95A1: _mesa_glsl_link_shader (ir_to_mesa.cpp:3067)
--------------------------------
Comment 1 Eero Tamminen 2016-10-27 09:40:40 UTC
Created attachment 127560 [details]
Full Valgrind log
Comment 2 Eero Tamminen 2016-11-07 14:33:08 UTC
Warnings about invalid reads have disappeared with latest Mesa Git version, only uninitialized jump/move/use warnings remain (in _mesa_gl_vdebug(), libX11 and demo itself).  I.e. earlier invalid read Valgrind warnings are unrelated to this issue.
Comment 3 Eero Tamminen 2016-11-21 15:47:44 UTC
VehicleGame demo also uses CS, and also freezes at startup, also with today's Mesa git version.

One minute period of perf recording shows process 100% CPU usage being split like this (i.e. it's compiling compute shader forever, like with others):

# Overhead  Command          Shared Object    Symbol
# ........  ...............  .......................
37.66%  i965_dri.so   ra_add_node_adjacency
18.93%  i965_dri.so   ra_allocate
 8.14%  i965_dri.so   brw::fs_live_variables::compute_start_end
 7.46%  i965_dri.so   fs_visitor::virtual_grf_interferes
 6.75%  i965_dri.so   decrement_q.isra.2
 5.40%  i965_dri.so   fs_visitor::assign_regs
 4.77%  i965_dri.so   ra_add_node_interference
 1.80%  libc-2.23.so  __memcpy_sse2
 1.74%  i965_dri.so   brw::fs_live_variables::compute_live_variables
 1.21%  libc-2.23.so  _int_malloc
 0.69%  libc-2.23.so  realloc

Some of the other UnrealEngine v4 demos have also few compute shaders, this issue happens only with the ones where Mesa outputs this warning:
Unsupported form of variable indexing in CS; falling back to very inefficient code generation

(Freeze doesn't happen in CSDof test which also outputs that warning couple of times, though.)
Comment 4 Eero Tamminen 2016-12-07 15:56:40 UTC
After Jason's "nir, i965/fs: Lower indirect local variables to scratch" patchset:
  https://patchwork.freedesktop.org/series/16385/

Mesa freeze spends its time in "instruction_scheduler::add_dep".

After maybe 10 minutes, Mesa outputs:
---------------------------------------
compute shader triggered register spilling.  Try reducing the number of live scalar values to improve performance.
Mesa 13.1.0-devel implementation error: Failed to compile compute shader: CS compile failed: Register spilling not supported with m14 used


Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa
Signal 11 caught.
EngineCrashHandler: Signal=11
---------------------------------------
Comment 5 Eero Tamminen 2016-12-20 14:47:21 UTC
Reason why Mesa compiler finishes compilation with Martin's setup, is that his versions of demos have different shaders.

Only my versions output this when Mesa compiles compute shaders:
Unsupported form of variable indexing in CS; falling back to very inefficient code generation

(My UE4 demos are from June 2014, Martin's from October 2014.)

While this may be a bug in the shaders, Mesa still shouldn't just freeze forever in compilation. -> Leaving this still open, but it could be considered lower priority.
Comment 6 Eero Tamminen 2017-01-23 14:48:58 UTC
There isn't anymore warning for:
Unsupported form of variable indexing in CS; falling back to very inefficient code generation

Only about register allocation failure.  The older versions of Effects Cave and Subway Reflections demos work now.

-> Marking as fixed.

Note: compilation of the problematic CS shader takes 1/2 hour and the demo performance is approximately 1/100th of the performance without compute shaders, or of the performance of the new versions of the demos.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.