78468 – Compiling of shader gets stuck in infinite loop

Bug 78468 - Compiling of shader gets stuck in infinite loop

Summary: Compiling of shader gets stuck in infinite loop

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i965 (show other bugs)
Version:	git
Hardware:	Other Linux (All)

Importance:	medium normal
Assignee:	Kenneth Graunke
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-05-09 04:16 UTC by bugReporter92
Modified:	2014-09-15 07:18 UTC (History)
CC List:	4 users (show)

See Also:
i915 platform:
i915 features:

Attachments
dof.frag Accessed August 3rd, 2014 (3.60 KB, text/plain) 2014-08-03 17:15 UTC, bugReporter92	Details
dof.frag dumped August 3rd, 2014 (4.05 KB, text/plain) 2014-08-03 17:34 UTC, bugReporter92	Details
dof.frag dumped glsl ir (22.03 KB, text/plain) 2014-08-03 17:41 UTC, bugReporter92	Details
Show Obsolete (1) View All

Description bugReporter92 2014-05-09 04:16:11 UTC

While compiling a shader from the application supertuxkart, I get an infinite loop. This is in the currect git checkout of supertuxkart which can be found at https://github.com/supertuxkart/stk-code.git
The shader in question is data/shaders/dof.frag

No matter how wrong this shader is, I figure the expected outcome is never for the compiler to just hang there.

What follows is a partial stack trace from supertuxkart when compiling that shader. If the full stack trace would be useful, then let me know and I can get it.

#0  ir_expression::accept (this=0x1fb3200, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:156
#1  0x00007fffef29bd1b in ir_swizzle::accept (this=0x20ebf70, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:243
#2  0x00007fffef2c8d59 in (anonymous namespace)::ir_constant_folding_visitor::handle_rvalue (this=0x7fffffffcd70, rvalue=0x1fb6250)
    at ../../src/glsl/opt_constant_folding.cpp:87
#3  0x00007fffef29e6e1 in ir_rvalue_base_visitor::rvalue_visit (this=0x7fffffffcd70, ir=0x1fb6220) at ../../src/glsl/ir_rvalue_visitor.cpp:43
#4  0x00007fffef29eb89 in ir_rvalue_visitor::visit_leave (this=0x7fffffffcd70, ir=0x1fb6220) at ../../src/glsl/ir_rvalue_visitor.cpp:156
#5  0x00007fffef29b8d0 in ir_expression::accept (this=0x1fb6220, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:156
#6  0x00007fffef29bd1b in ir_swizzle::accept (this=0x20df810, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:243
#7  0x00007fffef29b881 in ir_expression::accept (this=0x1fb9240, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:142
#8  0x00007fffef29bd1b in ir_swizzle::accept (this=0x20dfdc0, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:243
#9  0x00007fffef2c8d59 in (anonymous namespace)::ir_constant_folding_visitor::handle_rvalue (this=0x7fffffffcd70, rvalue=0x1fbc290)
    at ../../src/glsl/opt_constant_folding.cpp:87
#10 0x00007fffef29e6e1 in ir_rvalue_base_visitor::rvalue_visit (this=0x7fffffffcd70, ir=0x1fbc260) at ../../src/glsl/ir_rvalue_visitor.cpp:43
#11 0x00007fffef29eb89 in ir_rvalue_visitor::visit_leave (this=0x7fffffffcd70, ir=0x1fbc260) at ../../src/glsl/ir_rvalue_visitor.cpp:156
#12 0x00007fffef29b8d0 in ir_expression::accept (this=0x1fbc260, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:156
#13 0x00007fffef29bd1b in ir_swizzle::accept (this=0x20e0300, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:243
#14 0x00007fffef2c8d59 in (anonymous namespace)::ir_constant_folding_visitor::handle_rvalue (this=0x7fffffffcd70, rvalue=0x1fbf2b0)
    at ../../src/glsl/opt_constant_folding.cpp:87
#15 0x00007fffef29e6e1 in ir_rvalue_base_visitor::rvalue_visit (this=0x7fffffffcd70, ir=0x1fbf280) at ../../src/glsl/ir_rvalue_visitor.cpp:43
#16 0x00007fffef29eb89 in ir_rvalue_visitor::visit_leave (this=0x7fffffffcd70, ir=0x1fbf280) at ../../src/glsl/ir_rvalue_visitor.cpp:156
#17 0x00007fffef29b8d0 in ir_expression::accept (this=0x1fbf280, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:156
#18 0x00007fffef29bd1b in ir_swizzle::accept (this=0x20e08b0, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:243
#19 0x00007fffef2c8d59 in (anonymous namespace)::ir_constant_folding_visitor::handle_rvalue (this=0x7fffffffcd70, rvalue=0x1fc22d0)
    at ../../src/glsl/opt_constant_folding.cpp:87
#20 0x00007fffef29e6e1 in ir_rvalue_base_visitor::rvalue_visit (this=0x7fffffffcd70, ir=0x1fc22a0) at ../../src/glsl/ir_rvalue_visitor.cpp:43
#21 0x00007fffef29eb89 in ir_rvalue_visitor::visit_leave (this=0x7fffffffcd70, ir=0x1fc22a0) at ../../src/glsl/ir_rvalue_visitor.cpp:156
#22 0x00007fffef29b8d0 in ir_expression::accept (this=0x1fc22a0, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:156
#23 0x00007fffef29bd1b in ir_swizzle::accept (this=0x20e0e60, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:243
#24 0x00007fffef2c8d59 in (anonymous namespace)::ir_constant_folding_visitor::handle_rvalue (this=0x7fffffffcd70, rvalue=0x1fc52f0)
#25 0x00007fffef29e6e1 in ir_rvalue_base_visitor::rvalue_visit (this=0x7fffffffcd70, ir=0x1fc52c0) at ../../src/glsl/ir_rvalue_visitor.cpp:43
#26 0x00007fffef29eb89 in ir_rvalue_visitor::visit_leave (this=0x7fffffffcd70, ir=0x1fc52c0) at ../../src/glsl/ir_rvalue_visitor.cpp:156
#27 0x00007fffef29b8d0 in ir_expression::accept (this=0x1fc52c0, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:156
#28 0x00007fffef29bd1b in ir_swizzle::accept (this=0x20e1410, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:243
#29 0x00007fffef29b881 in ir_expression::accept (this=0x1fc82e0, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:142
#30 0x00007fffef29bd1b in ir_swizzle::accept (this=0x20e1950, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:243
#31 0x00007fffef2c8d59 in (anonymous namespace)::ir_constant_folding_visitor::handle_rvalue (this=0x7fffffffcd70, rvalue=0x1fcb330)
    at ../../src/glsl/opt_constant_folding.cpp:87
#32 0x00007fffef29e6e1 in ir_rvalue_base_visitor::rvalue_visit (this=0x7fffffffcd70, ir=0x1fcb300) at ../../src/glsl/ir_rvalue_visitor.cpp:43
#33 0x00007fffef29eb89 in ir_rvalue_visitor::visit_leave (this=0x7fffffffcd70, ir=0x1fcb300) at ../../src/glsl/ir_rvalue_visitor.cpp:156
#34 0x00007fffef29b8d0 in ir_expression::accept (this=0x1fcb300, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:156
#35 0x00007fffef29bd1b in ir_swizzle::accept (this=0x20e1f00, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:243
#36 0x00007fffef29b881 in ir_expression::accept (this=0x1fce320, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:142
#37 0x00007fffef29bd1b in ir_swizzle::accept (this=0x20e24b0, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:243
#38 0x00007fffef2c8d59 in (anonymous namespace)::ir_constant_folding_visitor::handle_rvalue (this=0x7fffffffcd70, rvalue=0x1fd1370)
    at ../../src/glsl/opt_constant_folding.cpp:87
#39 0x00007fffef29e6e1 in ir_rvalue_base_visitor::rvalue_visit (this=0x7fffffffcd70, ir=0x1fd1340) at ../../src/glsl/ir_rvalue_visitor.cpp:43
#40 0x00007fffef29eb89 in ir_rvalue_visitor::visit_leave (this=0x7fffffffcd70, ir=0x1fd1340) at ../../src/glsl/ir_rvalue_visitor.cpp:156
#41 0x00007fffef29b8d0 in ir_expression::accept (this=0x1fd1340, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:156
#42 0x00007fffef29bd1b in ir_swizzle::accept (this=0x20e2a60, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:243
#43 0x00007fffef2c8d59 in (anonymous namespace)::ir_constant_folding_visitor::handle_rvalue (this=0x7fffffffcd70, rvalue=0x1fd4390)
    at ../../src/glsl/opt_constant_folding.cpp:87
#44 0x00007fffef29e6e1 in ir_rvalue_base_visitor::rvalue_visit (this=0x7fffffffcd70, ir=0x1fd4360) at ../../src/glsl/ir_rvalue_visitor.cpp:43
#45 0x00007fffef29eb89 in ir_rvalue_visitor::visit_leave (this=0x7fffffffcd70, ir=0x1fd4360) at ../../src/glsl/ir_rvalue_visitor.cpp:156
#46 0x00007fffef29b8d0 in ir_expression::accept (this=0x1fd4360, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:156
#47 0x00007fffef29bd1b in ir_swizzle::accept (this=0x20e3010, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:243
#48 0x00007fffef29b881 in ir_expression::accept (this=0x1fd7380, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:142
#49 0x00007fffef29bd1b in ir_swizzle::accept (this=0x20e35c0, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:243
#50 0x00007fffef29b881 in ir_expression::accept (this=0x1fda3a0, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:142
#51 0x00007fffef29bd1b in ir_swizzle::accept (this=0x20e3b70, v=0x7fffffffcd70) at ../../src/glsl/ir_hv_accept.cpp:243
...
There's more to the backtrace, up to 127 levels, but I wasn't sure how much more would be useful.

OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Ivybridge Mobile 
OpenGL core profile version string: 3.3 (Core Profile) Mesa 10.3.0-devel (git-23e9f06)
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 10.3.0-devel (git-23e9f06)
OpenGL shading language version string: 1.30
OpenGL context flags: (none)

Debian Jessie using GNOME3 and a fresh git build of mesa.

Comment 1 Christoph Haag 2014-05-14 12:45:50 UTC

Yes, I have also found that out, but only said something in the IRC channel so far:

http://people.freedesktop.org/~cbrill/dri-log/?channel=dri-devel&date=2014-05-07

It's also a regression that is not yet in any stable release of mesa. It was introduced with this commit:

http://cgit.freedesktop.org/mesa/mesa/commit/?id=e9822f77a9cc024f528d30382fd5ad21c73a173b

Comment 2 Eric Anholt 2014-05-14 20:49:23 UTC

I haven't pulled down all of stk's data yet, but I ran into a at-least-nearly-infinite loop with a similar flavor with a followon patch.  The one I'm looking at appears to be related to dead code elimination problems, so I'm working on that at the moment.

Comment 3 bugReporter92 2014-08-03 17:08:33 UTC

Now that Debian updated, I can confirm this happening on 10.2.4, as well as on the master branch.
I bisected back to e9822f77a9cc024f528d30382fd5ad21c73a173b. It's a little too complex for me to decipher anything from the commit message or code change. Hopefully this helps.
The issue is still happening on the latest code from supertuxkart (https://github.com/supertuxkart/stk-code)

Comment 4 bugReporter92 2014-08-03 17:15:23 UTC

Created attachment 103933 [details]
dof.frag Accessed August 3rd, 2014

I've attached the shader. The program (supertuxkart) doesn't make it through the compilation of this file. All I do to test this is build and run supertuxkart. I checked before that the backtrace was getting stuck in the Intel glsl compiler.

Comment 5 bugReporter92 2014-08-03 17:34:38 UTC

Created attachment 103934 [details]
dof.frag dumped August 3rd, 2014

After taking a look, I've noticed that the attached shader is put through a pre-processing step before compiling. So I set MESA_GLSL=dump and this is the actual shader that is getting compiled. I think this is more useful. I'll attach the glsl_ir dumped afterwards in a moment.

Comment 6 bugReporter92 2014-08-03 17:41:33 UTC

Created attachment 103935 [details]
dof.frag dumped glsl ir

Attaching dof_dumped_ir.txt.

As you can see with the '^C' at the end of this file, this was the last output before I had to kill the process because it hangs.

Comment 7 bugReporter92 2014-08-03 19:46:40 UTC

Looking closer at the backtrace, I think it might actually be in the linking phase. The top of the backtrace looks like this:

#75 0x00007fffee7ab79b in ir_assignment::accept(ir_hierarchical_visitor*) () from /media/matto/Programming/src/mesa/lib/i965_dri.so
#76 0x00007fffee7ab216 in visit_list_elements(ir_hierarchical_visitor*, exec_list*, bool) () from /media/matto/Programming/src/mesa/lib/i965_dri.so
#77 0x00007fffee7ab377 in ir_function_signature::accept(ir_hierarchical_visitor*) () from /media/matto/Programming/src/mesa/lib/i965_dri.so
#78 0x00007fffee7ab216 in visit_list_elements(ir_hierarchical_visitor*, exec_list*, bool) () from /media/matto/Programming/src/mesa/lib/i965_dri.so
#79 0x00007fffee7ab3ee in ir_function::accept(ir_hierarchical_visitor*) () from /media/matto/Programming/src/mesa/lib/i965_dri.so
#80 0x00007fffee7ab216 in visit_list_elements(ir_hierarchical_visitor*, exec_list*, bool) () from /media/matto/Programming/src/mesa/lib/i965_dri.so
#81 0x00007fffee7cdaf1 in do_constant_folding(exec_list*) () from /media/matto/Programming/src/mesa/lib/i965_dri.so
#82 0x00007fffee79ece3 in do_common_optimization(exec_list*, bool, bool, gl_shader_compiler_options const*, bool) ()
   from /media/matto/Programming/src/mesa/lib/i965_dri.so
#83 0x00007fffee876b86 in brw_link_shader () from /media/matto/Programming/src/mesa/lib/i965_dri.so
#84 0x00007fffee733356 in _mesa_glsl_link_shader () from /media/matto/Programming/src/mesa/lib/i965_dri.so
#85 0x00007fffee63fdea in link_program () from /media/matto/Programming/src/mesa/lib/i965_dri.so
#86 0x000000000084899e in LoadProgram<int, char const*, int, char const*> ()
    at /media/matto/Programming/src/supertuxkart/stk-code/src/graphics/glwrap.hpp:164
#87 0x0000000000833d65 in FullScreenShader::DepthOfFieldShader::init ()
    at /media/matto/Programming/src/supertuxkart/stk-code/src/graphics/shaders.cpp:1709
#88 0x0000000000844fde in Shaders::loadShaders (this=0x132f6a0) at /media/matto/Programming/src/supertuxkart/stk-code/src/graphics/shaders.cpp:378
#89 0x00000000008616eb in IrrDriver::initDevice (this=0xff5be0) at /media/matto/Programming/src/supertuxkart/stk-code/src/graphics/irr_driver.cpp:483
#90 0x0000000000657f19 in initRest () at /media/matto/Programming/src/supertuxkart/stk-code/src/main.cpp:1047
#91 0x00000000005a1dc7 in main (argc=<optimized out>, argv=<optimized out>) at /media/matto/Programming/src/supertuxkart/stk-code/src/main.cpp:1186

Comment 8 Ian Romanick 2014-08-04 18:13:11 UTC

It sounds like optimization is getting stuck.  This could happen if two optimization passes in the loop "fight."  Perhaps we should put a hard limit on the number of times the loop can execute?

Comment 9 bugReporter92 2014-08-10 00:10:20 UTC

I'm looking at the code for this. Just to confirm, in order to add that limit, you would change the code at brw_link_shader in /drivers/dri/i965/brw_shader.cpp at around line 204?
I'm asking because I don't think the problem is two optimization passes fighting. I'm seeing in the debugger the process gets stuck inside "do_constant_folding" and never returns from that function. To me that sounds like a single optimization pass, which is below the optimization loop.

Comment 10 Matt Turner 2014-08-15 23:44:49 UTC

Want to take a look, Eric?

Comment 11 Iago Toral 2014-09-09 14:10:50 UTC

The bad commit mentioned above seems to avoid making temporary variables for rvalues in certain scenarios where, in theory, we do not need them. After that commit, do_assignment() in ast_to_hir.cpp takes a needs_rvalue parameter to decide if we need to compute the rvalue or not. In that same commit, this parameter is set to false in a number of situations.

I did a quick test switching this parameter to true in all the cases and that seems to fix the problem for me, so I guess there is one case where we are wrongly passing false and for some reason that produces this recurrent behavior, I suppose because it really needs to compute an rvalue that never gets computed. I'll see if I can find more.

Comment 12 Iago Toral 2014-09-09 14:43:16 UTC

This fixes the problem for me:

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 897505c..18ae9c4 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -1815,31 +1815,31 @@ ast_expression::do_hir(exec_list *instructions,
 
 ir_rvalue *
 ast_expression_statement::hir(exec_list *instructions,
                               struct _mesa_glsl_parse_state *state)
 {
    /* It is possible to have expression statements that don't have an
     * expression.  This is the solitary semicolon:
     *
     * for (i = 0; i < 5; i++)
     *     ;
     *
     * In this case the expression will be NULL.  Test for NULL and don't do
     * anything in that case.
     */
    if (expression != NULL)
-      expression->hir_no_rvalue(instructions, state);
+      expression->hir(instructions, state);
 
    /* Statements do not have r-values.
     */
    return NULL;
 }

The bad commit assumes that we never need to compute rvalues for expression statements, but it seems that we do... Maybe there is a problem somewhere else and this should not be happening?

Comment 13 Iago Toral 2014-09-10 10:05:00 UTC

FWIW, I can reproduce the problem with a simple fragment shader like this one:

#version 140

uniform sampler2D tex;
out vec4 FragColor;

void main()
{
   vec4 col = texture(tex, vec2(0, 0));  // 1
   for (int i=0; i<30; i++)              // 2
      col += vec4(0.1, 0.1, 0.1, 0.1);   // 3
   col = vec4(col.rgb/ 2.0, col.a);      // 4
   FragColor = col;                      // 5
}

Some things worth noting:

- The hang goes away if I only replace (1) by:
     vec4 col = vec4(0, 0, 0, 0);
  or even something like this
     vec4 col = vec4(texture(tex, vec2(0, 0)).r, 0.0, 0.0, 0.0);

- The hang goes away if I only replace (4) by:
   col = vec4(col.rgb, col.a);

- The hang goes away if we reduce the number of iterations in the loop below a certain threshold. Actually, for some thresholds you can see that it seems to hang, but will eventually finished after a few seconds. It just seems that the time required to complete the constant_folding optimization pass grows exponentially with the number of iterations in the cases where the test seems to hang.

Comment 14 Iago Toral 2014-09-10 10:10:41 UTC

(In reply to comment #13)
> Some things worth noting:
> 
> - The hang goes away if I only replace (1) by:
>      vec4 col = vec4(0, 0, 0, 0);
>   or even something like this
>      vec4 col = vec4(texture(tex, vec2(0, 0)).r, 0.0, 0.0, 0.0);
> 
> - The hang goes away if I only replace (4) by:
>    col = vec4(col.rgb, col.a);
   or even with this:
     col /=  2.0;

Comment 15 Iago Toral 2014-09-11 07:17:43 UTC

That small fragment shader example I mention above does not get fixed by altering the needs_rvalue stuff, so it seems that the problem is more generic or could have multiple causes leading to it.

In the case of that small fragment shader, I traced the problem down to the fact that during the optimization passes we generate large assignment instructions like these:

(assign  (x) (var_ref flattening_tmp_y@116)  (expression float * (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (swiz x (expression float + (var_ref col_y) (constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.100000)) ) )(constant float (0.500000)) ) ) 

These are the result of trying to pack the multiple additions we have in the loop into one single instruction. What is weird about this, is that this only happens when we have a certain number of iterations in the loop, if we have too few or too many, this never happens, at least for the example I mentioned above. For example, for a loop with only 10 iterations this is the code we produce (notice how it adds 0.1 nine times to make 0.9 rather than chaining 9 add expressions:

(assign  (x) (var_ref flattening_tmp_y)  (expression float * (expression float + (constant float (0.900000)) (var_ref col_y) ) (constant float (0.500000)) ) ) 

If we have too many iterations in the loop, it does not get unrolled, and again, we avoid the problem.

The optimization pass that introduces the large chain of expressions leading to the problem in do_constant_folding is do_tree_grafting(). Indeed, removing this pass fixes the problem for the small fragment shader I mention above.

Comment 16 Mike Stroyan 2014-09-12 17:17:40 UTC

This extremely slow compilation is not actually an infinite loop.
But the compile time does increase with every unrolled loop step in the shader.
The time to complete is 2^N, where N is the number of loop iterations.

The call to
 (*rvalue)->accept(this);
in ir_constant_folding_visitor::handle_rvalue is key to this.
Dropping that call for the case when rvalue is not a constant makes compilation
finish very quickly.  And for at least this shader it produces exactly the
same results.  Constant folding is done very effectively for the y and z channels.

But the x channel still produces a series of adds of constants instead of one add with the sum.  That same pattern made the compilation faster for operations on a full vec4 than for separate channels by limiting the complexity of the expressions.  But it is less efficient than the instructions created for the y and z channels.
That is a separate issue that could still be investigated.

Comment 17 Kenneth Graunke 2014-09-12 22:16:42 UTC

Thanks for the great analysis!  With that, the fix was simple:
http://lists.freedesktop.org/archives/mesa-dev/2014-September/067705.html

We might get into trouble with ir_dereference_array still, but you'd have to have constant array literals indexed with constant expressions in huge expression trees...

Comment 18 Darius Spitznagel 2014-09-13 12:24:03 UTC

This patch also fixes the game Dreadout for me on Intel.
I have tested the patch successfully with mesa 10.2.7.
Without this patch Dreadout gets stuck on start and shows only a black screen (not crashing).

Thank you Ken.
Hope this patch lands in 10.2.8 and 10.3.0.

Comment 19 Kenneth Graunke 2014-09-13 18:46:26 UTC

(In reply to comment #18)
> This patch also fixes the game Dreadout for me on Intel.
> I have tested the patch successfully with mesa 10.2.7.
> Without this patch Dreadout gets stuck on start and shows only a black
> screen (not crashing).
> 
> Thank you Ken.
> Hope this patch lands in 10.2.8 and 10.3.0.

Awesome!  I'm glad to hear it.

It looks like Ian committed my patch, so marking this as fixed.  Emil will pick up the patches for the new stable releases.

commit 84a40ce86b1010873b194eb9bf0b8744234b829c
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Fri Sep 12 15:16:57 2014 -0700

    glsl: Speed up constant folding for swizzles.
    
    ir_rvalue::constant_expression_value() recursively walks down an IR
    tree, attempting to reduce it to a single constant value.  This is
    useful when you want to know whether a variable has a constant
    expression value at all, and if so, what it is.
    
    The constant folding optimization pass attempts to replace rvalues with
    their constant expression value from the bottom up.  That way, we can
    optimize subexpressions, and ideally stop as soon as we find a
    non-constant subexpression.
    
    In order to obtain the actual value of an expression, the optimization
    pass calls constant_expression_value().  But it should only do so if it
    knows the value can be combined into a constant.  Otherwise, at each
    step of walking back up the tree, it will walk down the tree again, only
    to discover what it already knew: it isn't constant.
    
    We properly avoided this call for ir_expression nodes, but not for
    ir_swizzle nodes.  This patch fixes that, drastically reducing compile
    times on certain shaders where tree grafting has given us huge
    expression trees.  It also fixes SuperTuxKart.
    
    Thanks to Iago and Mike for help in tracking this down.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78468
    Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
    Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
    Cc: mesa-stable@lists.freedesktop.org

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.