Bug 71254 - Mesa crash on shaders that have large number of active uniforms.
Summary: Mesa crash on shaders that have large number of active uniforms.
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 9.1
Hardware: Other All
: medium normal
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-05 10:26 UTC by Kevin Rogovin
Modified: 2014-02-28 23:53 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Machine generate shader. (790.95 KB, text/plain)
2013-11-05 10:26 UTC, Kevin Rogovin
Details
Fragment shader of program. (8.79 KB, text/plain)
2013-11-05 10:26 UTC, Kevin Rogovin
Details

Description Kevin Rogovin 2013-11-05 10:26:01 UTC
Created attachment 88681 [details]
Machine generate shader.

Mesa crash on shaders that have large number of active uniforms.

Attached is a shader with a large number of explicit uniforms (i.e. non-array) that uses each uniform.

Applications crashed from Mesa during glLinkProgram() and corrupts stack.
Comment 1 Kevin Rogovin 2013-11-05 10:26:53 UTC
Created attachment 88683 [details]
Fragment shader of program.
Comment 2 Petri Latvala 2013-11-05 10:39:51 UTC
Reproduced with current git version.

brw_vec4.hpp has

class vec4_visitor {
...
int uniform_size[MAX_UNIFORMS];
int uniform_vector_size[MAX_UNIFORMS];
...
};

Accesses to those arrays are not checked for valid index.

For a test, I added checks when vec4_visitor::uniforms gets increased (that seems to be used as the index for those arrays directly) and called fail(...) to get the compilation abort (after adding enough if (failed) checks elsewhere). That reveals another issue when compiling that shader, which is that visit(ir_expression*) calls visit(ir_expression*) recursively enough times to overflow the stack. The default 8MB stack was overflown, 16MB was enough.

In a nutshell: This is an i965-specific bug, and will crash the process in one of two ways.
Comment 3 Ian Romanick 2013-11-05 16:04:08 UTC
It appears the problem is that MAX_UNIFORMS is 4096.  We treat MAX_UNIFORMS as meaning "maximum number of uniform vec4s," so the maximum number of uniform components is 16k... which means you can legally have 16k uniform variables.  At least some of our tables are sides for 4k uniform variables.

I suspect the fix may be as easy as adding '* 4' to all (or at least most) of the places that use MAX_UNIFORMS to size arrays.

Perhaps one of the guys more familiar with the back-end will have a thought on that.
Comment 4 Kevin Rogovin 2013-11-06 09:12:48 UTC
I do not think changing the array size from MAX_UNIFORMS to 4*MAX_UNIFORMS will fix either crash causes.

The first crash cause, writing past the array. The first main issue is this: the real beans is not so much maximum number of uniforms, but maximum number of -active- uniforms. If that is true, then unless the GLSL front-end walks the IR and removes dead code to then compute the "active" uniforms, then the issue shall remain. If GLSL IR does do that or is modified to do so, or in a pinch modified to simply reject immediately GLSL code that goes past the MAX_UNIFORMS limit then leaving that array as MAX_UNIFORMS in size is fine.

However, the correct solution is to just make the array dynamic in size; the obvious benefit is that it will likely even save a little memory since only room needed to enumerate the uniforms for an active shader will be allocated.


The second cause, the stack overflow on visitors. One can machine generate expressions that only use -one- uniform (or just an in, be it shader global scope of function scope) that cause the stack overflow. Fixing this is non-trivial except to just not bother walking such large expressions.

One can argue that "real application" code would not have such expressions. That argument does not pan out so well when inlining comes into play. Imagine a noise computation where the the arguments fed to it are also noise values and so on. Even ignoring that case, a GL implementation must also guard against malicious shaders (in particular well formed malicious shaders). The severe burning example is using Mesa on a system that has a browser that implement WebGL. At the end of the day, all those shaders from the *web* will be passed to the GL implementation. Crash driver, will potentially crash the browser. For the case of Chrome/Blink, this will crash the GPU process of a browser which will likely bring down all GL state for all WebGL canvases and likely just crash the entire browser too.
Comment 5 Ian Romanick 2013-11-06 17:31:29 UTC
(In reply to comment #4)
> I do not think changing the array size from MAX_UNIFORMS to 4*MAX_UNIFORMS
> will fix either crash causes.
> 
> The first crash cause, writing past the array. The first main issue is this:
> the real beans is not so much maximum number of uniforms, but maximum number
> of -active- uniforms. If that is true, then unless the GLSL front-end walks
> the IR and removes dead code to then compute the "active" uniforms, then the
> issue shall remain. If GLSL IR does do that or is modified to do so, or in a
> pinch modified to simply reject immediately GLSL code that goes past the
> MAX_UNIFORMS limit then leaving that array as MAX_UNIFORMS in size is fine.

It basically does that.  If the shader has more than MAX_UNFORMS*4 components of data, the linker will generate an error... because the shader exceeds hardware limits and won't run.

> However, the correct solution is to just make the array dynamic in size; the
> obvious benefit is that it will likely even save a little memory since only
> room needed to enumerate the uniforms for an active shader will be allocated.

There is no benefit to the added complexity of making this dyanamic.

> The second cause, the stack overflow on visitors. One can machine generate
> expressions that only use -one- uniform (or just an in, be it shader global
> scope of function scope) that cause the stack overflow. Fixing this is
> non-trivial except to just not bother walking such large expressions.
> 
> One can argue that "real application" code would not have such expressions.
> That argument does not pan out so well when inlining comes into play.
> Imagine a noise computation where the the arguments fed to it are also noise
> values and so on. Even ignoring that case, a GL implementation must also
> guard against malicious shaders (in particular well formed malicious
> shaders). The severe burning example is using Mesa on a system that has a
> browser that implement WebGL. At the end of the day, all those shaders from
> the *web* will be passed to the GL implementation. Crash driver, will
> potentially crash the browser. For the case of Chrome/Blink, this will crash
> the GPU process of a browser which will likely bring down all GL state for
> all WebGL canvases and likely just crash the entire browser too.
Comment 6 Petri Latvala 2013-11-07 08:19:15 UTC
(In reply to comment #5)
> It basically does that.  If the shader has more than MAX_UNFORMS*4
> components of data, the linker will generate an error... because the shader
> exceeds hardware limits and won't run.

Okay I edited the array sizes and confirmed that an error is properly generated. I'll prepare a patch to be submitted.

That leaves us with the stack overflow issue. I think it should be a separate bug, and needs a testcase without a humongous amount of uniforms.
Comment 7 Petri Latvala 2013-11-07 11:19:11 UTC
(In reply to comment #6)
> Okay I edited the array sizes and confirmed that an error is properly
> generated. I'll prepare a patch to be submitted.

Patch sent to mesa-dev for review.
Comment 8 Ian Romanick 2014-02-28 23:53:20 UTC
commit 7189fce237cc7f4bc76a85cca8bcf75756d9affc
Author: Petri Latvala <petri.latvala@intel.com>
Date:   Thu Feb 27 16:15:04 2014 +0200

    i965: Allocate vec4_visitor's uniform_size and uniform_vector_size arrays dynamically.
    
    v2: Don't add function parameters, pass the required size in
    prog_data->nr_params.
    
    v3:
    - Use the name uniform_array_size instead of uniform_param_count.
    - Round up when dividing param_count by 4.
    - Use MAX2() instead of taking the maximum by hand.
    - Don't crash if prog_data passed to vec4_visitor constructor is NULL
    
    v4: Rebase for current master
    
    v5 (idr): Trivial whitespace change.
    
    Signed-off-by: Petri Latvala <petri.latvala@intel.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71254
    Cc: "10.1" <mesa-stable@lists.freedesktop.org>
    Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.