Bug 103955 - Using array in structure results in wrong GLSL compilation output
Summary: Using array in structure results in wrong GLSL compilation output
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: glsl-compiler (show other bugs)
Version: 17.3
Hardware: All Linux (All)
: medium minor
Assignee: mesa-dev
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-28 14:08 UTC by Younghun Jang
Modified: 2017-12-18 23:56 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Windows port of the test program (1.90 MB, application/zip)
2017-11-28 14:08 UTC, Younghun Jang
Details
shader_test that repros the issue (503 bytes, text/plain)
2017-11-28 14:25 UTC, Ilia Mirkin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Younghun Jang 2017-11-28 14:08:38 UTC
Created attachment 135756 [details]
Windows port of the test program

Declaring a uniform structure, whose one member is an array of float, results in wrong shader code. Consider the following fragment shader:


	struct Set {
		vec3 color;
		float arr[3];
	};

	uniform Set u_s;


	void main () {
		gl_FragColor = vec4(u_s.color * u_s.arr[0], 1.0);
		// gl_FragColor = vec4(u_s.color * vec3(1.0, 0.0, 0.0), 1.0);
	}


This fragment shader should output FragColor of white(that is `vec4(1.0, 1.0, 1.0, 1.0)`) if the value of `u_s.arr[0]` is 1.0. However, the shader compiled by Mesa driver produces `vec4(1.0, 0.0, 0.0, ?)`, which is red. The alpha component is disregarded since GL_BLEND is disabled in the test application.


I wanted to know if this bug is hardware related. The attachment is the makeshift test program I wrote with an existing example(in the courtesy of opengl-tutorial.org) to demonstrate the bug. I found this bug when I was developing a WebGL web app, so I wrote the test web page as well.

   https://ashegoulding.github.io/attmnts/array_in_struct_uniform.html
   https://ashegoulding.github.io/attmnts/array_in_struct_uniform.zip

I also ported the test program to Windows. The bug is not reproducible on Windows in whichever way.

On Linux running the Mesa driver, Firefox, Google Chrome displays a red framebuffer. The bug is still seen running the browsers/test app with `LIBGL_ALWAYS_SOFTWARE=1`, except for Chrome[1]

I'd also like to mention that I ran the test program and the web page with `MESA_GLSL=dump`, `MESA_GLSL=uniform` to see if there's any form of "shader source translation" like ANGLE does on Windows. The dump of the source code and uniform upload were identical.

The bug is still there on 17.3.0-rc5 and 17.2.6.



[1]
Chrome browser defaults to its own software renderer when a slower renderer(SW or VBox GPU) is detected. The canvas is white with that renderer.
Comment 1 Younghun Jang 2017-11-28 14:18:14 UTC
(correction)
I made a wrong assumption about Chrome's SW renderer. Running Chrome with `LIBGL_ALWAYS_SOFTWARE=1` does not make Chrome default to its own SW renderer. This only happens with VBox GPU.
Comment 2 Ilia Mirkin 2017-11-28 14:25:59 UTC
Created attachment 135757 [details]
shader_test that repros the issue

In the TGSI this comes out as

FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL OUT[0], COLOR
DCL CONST[0][0..3]
DCL TEMP[0..1], LOCAL
IMM[0] FLT32 {    1.0000,     0.0000,     0.0000,     0.0000}
  0: MOV TEMP[0].w, IMM[0].xxxx
  1: MUL TEMP[0].xyz, CONST[0][0].xyzz, CONST[0][1]
  2: MOV TEMP[1], TEMP[0]
  3: MOV OUT[0], TEMP[1]
  4: END

Which is clearly bogus. That needs to be CONST[0][1].xxx

Haven't looked further than that. This sounds familiar though for some reason.
Comment 3 Ilia Mirkin 2017-11-28 14:29:36 UTC
The GLSL IR seems questionable too...

: GLSL IR for linked fragment program 3:
(
(declare (location=2 shader_out ) vec4 gl_FragColor)
(declare (temporary ) vec4 gl_FragColor)
(declare (location=0 uniform ) Set@0x22dfe20 u_s)
( function main
  (signature void
    (parameters
    )
    (
      (declare (temporary ) vec4 vec_ctor)
      (assign  (w) (var_ref vec_ctor)  (constant float (1.000000)) ) 
      (assign  (xyz) (var_ref vec_ctor)  (expression vec3 * (record_ref (var_ref u_s)  color) (array_ref (record_ref (var_ref u_s)  arr) (constant int (0)) ) ) ) 
      (assign  (xyzw) (var_ref gl_FragColor)  (var_ref vec_ctor) ) 
      (assign  (xyzw) (var_ref gl_FragColor@3)  (var_ref gl_FragColor) ) 
    ))

)

)

It multiplies directly by the array_ref -- there should be a swizzle or other vec3 conversion on there, no? Or perhaps you're allowed to have vecN * float directly in the IR. Not sure.
Comment 4 Ian Romanick 2017-11-29 03:03:30 UTC
(In reply to Ilia Mirkin from comment #3)

>    (expression vec3 * (record_ref (var_ref u_s) color)
>                       (array_ref (record_ref (var_ref u_s) arr) (constant int (0)))
>    )

This should be fine.  For a bunch of things we allow mixing scalars and either vectors or matrices.  The shader_runner test passes on i965, so maybe the problem is in st_glsl_to_tgsi?

(In reply to Younghun Jang from comment #0)
> I wanted to know if this bug is hardware related.

What hardware were you running on?  The output of 'glxinfo | grep "OpenGL renderer string"' would help.
Comment 5 Ian Romanick 2017-11-29 03:09:29 UTC
(In reply to Ian Romanick from comment #4)
> (In reply to Ilia Mirkin from comment #3)
> 
> >    (expression vec3 * (record_ref (var_ref u_s) color)
> >                       (array_ref (record_ref (var_ref u_s) arr) (constant int (0)))
> >    )
> 
> This should be fine.  For a bunch of things we allow mixing scalars and
> either vectors or matrices.  The shader_runner test passes on i965, so maybe
> the problem is in st_glsl_to_tgsi?

This is a thing that SPIR-V does not allow (except for multiply which has special vector-times-scalar opcodes).  For part of the SPIR-V work I wrote a pass that adds the missing swizzles.  It may be easiest to use that to massage the GLSL IR before TGSI conversion.

We don't want to use this in general because may of the optimization passes cannot "see through" a swizzle.  Algebraic optimizations, for example, can't see that (+ (a) (swizzle xxxx (* (b) (c)))) is a MAD.  Also related to the SPIR-V work, this is a topic I've been thinking about lately...
Comment 6 Younghun Jang 2017-11-29 05:33:05 UTC
I don't think this would help because as I mentioned in the original comment, I could find the bug pretty much everywhere. VBox, my PC(AMD), GTX 960(nouveau) with or without `LIBGL_ALWAYS_SOFTWARE=1`. I'm more than confident that any Linux browser running Mesa driver would display the red canvas.



  $ glxinfo | grep "OpenGL renderer string"
  OpenGL renderer string: AMD Radeon (TM) RX 480 Graphics (AMD POLARIS10 / DRM 3.18.0 / 4.13.13-300.fc27.x86_64, LLVM 4.0.1)
  $ LIBGL_ALWAYS_SOFTWARE=1 glxinfo | grep "OpenGL renderer string"
  OpenGL renderer string: llvmpipe (LLVM 4.0, 256 bits)
Comment 7 Tapani Pälli 2017-11-29 05:45:16 UTC
(In reply to Younghun Jang from comment #6)
> I don't think this would help because as I mentioned in the original
> comment, I could find the bug pretty much everywhere. VBox, my PC(AMD), GTX
> 960(nouveau) with or without `LIBGL_ALWAYS_SOFTWARE=1`. I'm more than
> confident that any Linux browser running Mesa driver would display the red
> canvas.
> 
> 
> 
>   $ glxinfo | grep "OpenGL renderer string"
>   OpenGL renderer string: AMD Radeon (TM) RX 480 Graphics (AMD POLARIS10 /
> DRM 3.18.0 / 4.13.13-300.fc27.x86_64, LLVM 4.0.1)
>   $ LIBGL_ALWAYS_SOFTWARE=1 glxinfo | grep "OpenGL renderer string"
>   OpenGL renderer string: llvmpipe (LLVM 4.0, 256 bits)

It does usually help in figuring out where the bug is. As example I cannot reproduce this bug on i965 driver (on Kabylake) so it indicates that this bug might be gallium specific.
Comment 8 Ilia Mirkin 2017-12-02 16:35:30 UTC
This patch fixes the sample case for me:

https://patchwork.freedesktop.org/patch/191532/
Comment 9 Younghun Jang 2017-12-05 03:01:28 UTC
(In reply to Ilia Mirkin from comment #8)
> This patch fixes the sample case for me:
> 
> https://patchwork.freedesktop.org/patch/191532/

I concur. The patch fixes the bug.

And I believe this was a typo:

   uint16_t swizzle_x = GET_SWZ(op[2]->swizzle, 0);
   uint16_t swizzle_x = GET_SWZ(op[2].swizzle, 0);
Comment 10 Younghun Jang 2017-12-18 23:54:11 UTC
No one seems to care about this easily avoidable bug. Alright. Just waiting for someone to rediscover this and make a duplicate.
Comment 11 Ilia Mirkin 2017-12-18 23:56:57 UTC
(In reply to Younghun Jang from comment #10)
> No one seems to care about this easily avoidable bug. Alright. Just waiting
> for someone to rediscover this and make a duplicate.

No, I care. And I pushed out a fix... just forgot to update the tracker.

https://cgit.freedesktop.org/mesa/mesa/commit/?id=0332c7484b712e56ce1a6648c5fa04c90e286c37


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.