Bug 90207

Summary: [r600g, bisected] regression: NI/Turks crash on WebGL Water (most WebGL stuff)
Product: Mesa Reporter: Dieter Nützel <Dieter>
Component: Mesa coreAssignee: mesa-dev
Status: RESOLVED FIXED QA Contact: mesa-dev
Severity: critical    
Priority: medium CC: jfonseca, lemody, mesa-dev
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: konqueror-20150428-051127.kcrash.txt
API Trace of konqueror crash

Description Dieter Nützel 2015-04-28 03:43:40 UTC
Current Mesa git (9143940) crash on WebGL Water demo and most WebGL stuff.

http://madebyevan.com/webgl-water/
http://www.ibiblio.org/e-notes/webgl/webgl.htm

I've bisected it to:

/opt/mesa> git bisect good
a563689a408b7a28c710fb0e382272a0d823f38a is the first bad commit
commit a563689a408b7a28c710fb0e382272a0d823f38a
Author: Tapani Pälli <tapani.palli@intel.com>
Date:   Thu Apr 23 11:13:17 2015 +0300

    mesa: refactor active attrib queries for glGetProgramiv
    
    Main motivation here is to get rid of iterating IR and
    encapsulate queries within program resources.
    No functional changes.
    
    Piglit tests calling the modified functionality:
    
       - gl-get-active-attrib-returns-all-inputs
       - glsl-1.50-get-active-attrib-array
       - getactiveattrib
    
    Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
    Reviewed-by: Martin Peres <martin.peres@linux.intel.com>

:040000 040000 8983df05bac6cc3a455b2e5100a2038939e24eaa cd5fa256be39b9b02e9201a9ce6d313c670f2942 M src

When I revert 'a563689' all is smooth, again.
Comment 1 Dieter Nützel 2015-04-28 03:46:58 UTC
Created attachment 115389 [details]
konqueror-20150428-051127.kcrash.txt
Comment 2 Tapani Pälli 2015-04-28 03:54:53 UTC
seems to work with chrome and firefox, will attempt to reproduce with konqueror
Comment 3 Tapani Pälli 2015-04-28 04:02:00 UTC
My konqueror does not seems to support webgl (?) is there something special I need to do to enable it?
Comment 4 Dieter Nützel 2015-04-28 04:22:15 UTC
(In reply to Tapani Pälli from comment #3)
> My konqueror does not seems to support webgl (?) is there something special
> I need to do to enable it?

Hello Tapani!

My Konqueror 4.14.6 and former versions (on openSUSE x86 and x86-64) support it for some months (years?) out of the box.

What's weird that it do NOT run webgl if I go back to repro package Mesa-10.5.4 (from 27 Apr 2015). I can swear it worked with former versions...
Have to retest on my poor RV730/x86 system tomorrow.

Maybe you have to use WebKit not KHTML.
http://html5test.com/index.html
Comment 5 Dieter Nützel 2015-04-28 04:33:22 UTC
(In reply to Dieter Nützel from comment #4)
> (In reply to Tapani Pälli from comment #3)
> > My konqueror does not seems to support webgl (?) is there something special
> > I need to do to enable it?
> 
> Hello Tapani!

[snip]

> Maybe you have to use WebKit not KHTML.
> http://html5test.com/index.html

It is in since

11./13.10.2011

through WebKit.
Comment 6 Dieter Nützel 2015-04-28 04:36:31 UTC
(In reply to Dieter Nützel from comment #5)
> (In reply to Dieter Nützel from comment #4)
> > (In reply to Tapani Pälli from comment #3)
> > > My konqueror does not seems to support webgl (?) is there something special
> > > I need to do to enable it?
> > 
> > Hello Tapani!
> 
> [snip]
> 
> > Maybe you have to use WebKit not KHTML.
> > http://html5test.com/index.html
> 
> It is in since
> 
> 11./13.10.2011
> 
> through WebKit.

kwebkitpart
kwebkitpart-1.3.3-3.1.5.x86_64
Comment 7 Tapani Pälli 2015-04-28 04:49:14 UTC
(In reply to Dieter Nützel from comment #6)
> (In reply to Dieter Nützel from comment #5)
> > (In reply to Dieter Nützel from comment #4)
> > > (In reply to Tapani Pälli from comment #3)
> > > > My konqueror does not seems to support webgl (?) is there something special
> > > > I need to do to enable it?
> > > 
> > > Hello Tapani!
> > 
> > [snip]
> > 
> > > Maybe you have to use WebKit not KHTML.
> > > http://html5test.com/index.html
> > 
> > It is in since
> > 
> > 11./13.10.2011
> > 
> > through WebKit.
> 
> kwebkitpart
> kwebkitpart-1.3.3-3.1.5.x86_64

Yes, installing this package made it work, demos seem to work for me though .. I will try some more and also investigate possible side-effects caused by my patch.
Comment 8 Tapani Pälli 2015-04-28 09:30:29 UTC
I cannot reproduce this bug. I used following page to verify that Konqueror runs against my Mesa:

http://renderingpipeline.com/webgl-extension-viewer/

"WebGL version: WebGL 1.0 (3.0 Mesa 10.6.0-devel (git-9143940))"

webgl-water and other webgl tests run fine, Konqueror version information:

konqueror-14.12.3-1.fc21.x86_64
kwebkitpart-1.3.4-5.fc21.x86_64

this is on Ivybridge laptop, i965 driver.
Comment 9 Aaron Watry 2015-04-28 15:02:53 UTC
Note that firefox 37 works fine for me on my CEDAR (Radeon 5400) on r600g on current mesa master.

Konqueror 4.14.6 crashes while loading the OpenGL demo.

When running konqueror with LIBGL_DEBUG=verbose, I get the following:

awatry@ws-awatry:~$ LIBGL_DEBUG=verbose konqueror 
libGL: OpenDriver: trying /usr/local/lib/dri/tls/r600_dri.so
libGL: OpenDriver: trying /usr/local/lib/dri/r600_dri.so
libGL: OpenDriver: trying /usr/local/lib/dri/tls/r600_dri.so
libGL: OpenDriver: trying /usr/local/lib/dri/r600_dri.so
NOT SANDBOXED
Mesa: User error: GL_INVALID_VALUE in glGetActiveAttrib(index)
KCrash: Application 'konqueror' crashing...
Comment 10 Aaron Watry 2015-04-28 15:10:34 UTC
Created attachment 115405 [details]
API Trace of konqueror crash
Comment 11 Jose Fonseca 2015-04-28 19:36:34 UTC

(In reply to Dieter Nützel from comment #0)
[...]
> a563689a408b7a28c710fb0e382272a0d823f38a is the first bad commit
> commit a563689a408b7a28c710fb0e382272a0d823f38a
> Author: Tapani Pälli <tapani.palli@intel.com>
> Date:   Thu Apr 23 11:13:17 2015 +0300
> 
>     mesa: refactor active attrib queries for glGetProgramiv
>     
>     Main motivation here is to get rid of iterating IR and
>     encapsulate queries within program resources.
>     No functional changes.
>     
>     Piglit tests calling the modified functionality:
>     
>        - gl-get-active-attrib-returns-all-inputs
>        - glsl-1.50-get-active-attrib-array
>        - getactiveattrib
>     
>  [...]

Piglit `getactiveattrib 110` test started to randomly fail on llvmpipe:

$ for i in `seq 1 100`; do .../piglit/bin/getactiveattrib 110 -auto -fbo; done
PIGLIT: {"result": "pass" }
PIGLIT: {"result": "pass" }
PIGLIT: {"result": "pass" }
PIGLIT: {"result": "pass" }
PIGLIT: {"result": "pass" }
PIGLIT: {"result": "pass" }
PIGLIT: {"result": "pass" }
PIGLIT: {"result": "pass" }
Mesa: User error: GL_INVALID_VALUE in glGetActiveAttrib(index)

Failing shader:
attribute vec4 vertex;
attribute vec2 alternate;
uniform bool use_alternate;
void main() {
    gl_Position = vertex;
    if (use_alternate) gl_Position = alternate.xyxy;
}

Attribute `alternate' listed multiple times in active list.
PIGLIT: {"result": "fail" }
PIGLIT: {"result": "pass" }
PIGLIT: {"result": "pass" }


> When I revert 'a563689' all is smooth, again.

And likewise, if I revert a563689 it passes consistently.
Comment 12 Jose Fonseca 2015-04-28 19:52:16 UTC
I ran the test inside apitrace multiple times.  When it passes we have

  glGetProgramiv(program = 36, pname = GL_ACTIVE_ATTRIBUTES, params = &2)

when it fails we have 

  glGetProgramiv(program = 36, pname = GL_ACTIVE_ATTRIBUTES, params = &3)


I.e, the bug is somewhere inside _mesa_count_active_attribs.


I also tried to run the test inside valgrind sucessively, but then it stops failing....
Comment 13 Jose Fonseca 2015-04-28 20:48:36 UTC
I think I figured out the problem.

As commented on DECL_RESOURCE_FUNC macro, the RESOURCE_VAR inline function is not type safe, and stuff that's not a ir_variable is wrongly being casted into it. 

This patch seems to do the trick, but I'm not 100% sure.

diff --git a/src/mesa/main/shader_query.cpp b/src/mesa/main/shader_query.cpp
index a84ec84..d2ca49b 100644
--- a/src/mesa/main/shader_query.cpp
+++ b/src/mesa/main/shader_query.cpp
@@ -302,8 +302,10 @@ _mesa_count_active_attribs(struct gl_shader_program *shProg)
    struct gl_program_resource *res = shProg->ProgramResourceList;
    unsigned count = 0;
    for (unsigned j = 0; j < shProg->NumProgramResourceList; j++, res++) {
-         if (is_active_attrib(RESOURCE_VAR(res)))
-            count++;
+      if (res->Type == GL_PROGRAM_INPUT &&
+          res->StageReferences & (1 << MESA_SHADER_VERTEX) &&
+          is_active_attrib(RESOURCE_VAR(res)))
+         count++;
    }
    return count;
 }



I think we should invest the time to make RESOURCE_VAR and friends more robust, by adding assertions that the types are indeed correct.  (Replace DECL_RESOURCE_FUNC helper macro by manually written code if necessary.)

This sort of bugs is hard to find otherwise.
Comment 14 Aaron Watry 2015-04-28 21:34:39 UTC
(In reply to José Fonseca from comment #13)
> I think I figured out the problem.
> 
> As commented on DECL_RESOURCE_FUNC macro, the RESOURCE_VAR inline function
> is not type safe, and stuff that's not a ir_variable is wrongly being casted
> into it. 
> 
> This patch seems to do the trick, but I'm not 100% sure.
> 

That patch fixes the crash in Konqueror for me.

Whether or not it's correct, that's up to people who know the code better than myself.
Comment 15 Dieter Nützel 2015-04-29 02:01:34 UTC
(In reply to Aaron Watry from comment #14)
> (In reply to José Fonseca from comment #13)
> > I think I figured out the problem.
> > 
> > As commented on DECL_RESOURCE_FUNC macro, the RESOURCE_VAR inline function
> > is not type safe, and stuff that's not a ir_variable is wrongly being casted
> > into it. 
> > 
> > This patch seems to do the trick, but I'm not 100% sure.
> > 
> 
> That patch fixes the crash in Konqueror for me.
> 
> Whether or not it's correct, that's up to people who know the code better
> than myself.

For me, too.
git-6fe0d4f + José's patch

Thanks José!

Your WebGL implementation:

Renderer: WebKit WebGL
Vendor: WebKit
WebGL version: WebGL 1.0 (3.0 Mesa 10.6.0-devel (git-6fe0d4f))
GLSL version: WebGL GLSL ES 1.0 (1.30)
Supported WebGL extensions:

OES_texture_float                           <-- Were is this defined? ;-)
OES_standard_derivatives
WEBKIT_EXT_texture_filter_anisotropic
OES_vertex_array_object
OES_element_index_uint
WEBKIT_WEBGL_lose_context
WEBKIT_WEBGL_compressed_texture_s3tc
WEBKIT_WEBGL_depth_texture
Comment 16 Tapani Pälli 2015-04-29 05:19:48 UTC
(In reply to José Fonseca from comment #13)
> I think I figured out the problem.
> 
> As commented on DECL_RESOURCE_FUNC macro, the RESOURCE_VAR inline function
> is not type safe, and stuff that's not a ir_variable is wrongly being casted
> into it. 
> 
> This patch seems to do the trick, but I'm not 100% sure.

Yep it is correct, just like with _mesa_longest_attribute_name_length. Not sure why I did not manage to hit this.
 
> diff --git a/src/mesa/main/shader_query.cpp b/src/mesa/main/shader_query.cpp
> index a84ec84..d2ca49b 100644
> --- a/src/mesa/main/shader_query.cpp
> +++ b/src/mesa/main/shader_query.cpp
> @@ -302,8 +302,10 @@ _mesa_count_active_attribs(struct gl_shader_program
> *shProg)
>     struct gl_program_resource *res = shProg->ProgramResourceList;
>     unsigned count = 0;
>     for (unsigned j = 0; j < shProg->NumProgramResourceList; j++, res++) {
> -         if (is_active_attrib(RESOURCE_VAR(res)))
> -            count++;
> +      if (res->Type == GL_PROGRAM_INPUT &&
> +          res->StageReferences & (1 << MESA_SHADER_VERTEX) &&
> +          is_active_attrib(RESOURCE_VAR(res)))
> +         count++;
>     }
>     return count;
>  }
> 
> 
> 
> I think we should invest the time to make RESOURCE_VAR and friends more
> robust, by adding assertions that the types are indeed correct.  (Replace
> DECL_RESOURCE_FUNC helper macro by manually written code if necessary.)
> 
> This sort of bugs is hard to find otherwise.

I agree, current way is too fragile. IMO a type template would be best solution. Maybe first just adding the assert for the correct type is good start.
Comment 17 Jose Fonseca 2015-04-29 05:54:12 UTC
(In reply to Tapani Pälli from comment #16)
> Yep it is correct, just like with _mesa_longest_attribute_name_length. 

Thanks. I pushed the fix now.

> Not sure why I did not manage to hit this.

It's random -- I think that is_active_attrib ends up interpreting random pointer addresses as enums and decide on them.  Maybe the randomness is higher with gallium drivers.

It was only when I try to print var->name that things started to crash consistently for me.

> > I think we should invest the time to make RESOURCE_VAR and friends more
> > robust, by adding assertions that the types are indeed correct.  (Replace
> > DECL_RESOURCE_FUNC helper macro by manually written code if necessary.)
> > 
> > This sort of bugs is hard to find otherwise.
> 
> I agree, current way is too fragile. IMO a type template would be best
> solution. Maybe first just adding the assert for the correct type is good
> start.

Yep.
Comment 18 Mark Janes 2015-04-29 16:51:31 UTC
The fix for this bug triggered a piglit failure:

spec.!opengl 3_2.get-active-attrib-returns-all-inputs

/tmp/build_root/m32/lib/piglit/bin/gl-get-active-attrib-returns-all-inputs -auto -fbo
piglit_vertex was not counted as active.

Should I write up a separate bug for this?
Comment 19 Tapani Pälli 2015-04-29 17:05:15 UTC
(In reply to Mark Janes from comment #18)
> The fix for this bug triggered a piglit failure:
> 
> spec.!opengl 3_2.get-active-attrib-returns-all-inputs
> 
> /tmp/build_root/m32/lib/piglit/bin/gl-get-active-attrib-returns-all-inputs
> -auto -fbo
> piglit_vertex was not counted as active.
> 
> Should I write up a separate bug for this?

Yes please, assign to me. I think I know what's up here, it's probably the builtin attribs that are not referenced by the vertex stage.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.