Created attachment 114179 [details] Piglit shader test The attached piglit shader test fails with the i965 driver, and works with softpipe. The line: addr0 = int(floatBitsToInt(temps[10].xxxx)); always retuns 0. (however int(floatBitsToInt(temps[10].x)) works)
Thanks for the report. It's really nice to get shader_test files that demonstrate the bug. :) We're storing temps[] to scratch space -- I expect that has more to do with the problem than int() or floatBitsToInt(). I'll hopefully take a closer look at this soon. If there's a smaller test case, that'd be nice as well. As an aside: we're almost certainly going to generate better shader code for a uniform array than a non-const temporary array. Assuming you hit this bug in a real application, it's something you might want to change if possible.
Hi thanks for confirming the bug (In reply to Matt Turner from comment #1) > I'll hopefully take a closer look at this soon. If there's a smaller test > case, that'd be nice as well. Yeah, I have tried without much success to make the example more succint. > As an aside: we're almost certainly going to generate better shader code for > a uniform array than a non-const temporary array. Assuming you hit this bug > in a real application, it's something you might want to change if possible. The bug can be hit by virglrenderer: the glsl code is "translated" from tgsi dynamically. Can't do much without temps here I think.
(In reply to Marc-Andre Lureau from comment #0) > Created attachment 114179 [details] > Piglit shader test > > The attached piglit shader test fails with the i965 driver, and works with > softpipe. The line: > > addr0 = int(floatBitsToInt(temps[10].xxxx)); > > always retuns 0. mmm... this is not what I see, if addr0 is always 0, then result should be a red color (0.8, 0, 0, 0) and I see black (0.0, 0.0, 0.0, 0.0). Just in case it helps, the expected execution of this to produce the expected result should look like this: Init: temp[0] = 0.2,0,0,0 * 1,1,1,1 = 0.2,0,0,0 temp[9] = 0,0,0,0 temp[10] = 0,0,0,0 It0: addr = 0 temp[addr] = 0.2,0,0,0 temps[9] = 0.2,0,0,0 temps[10].x = i2f(0,0,0,0 + 1,1,1,1).x = i2f(1,1,1,1).x = 1.45e-45 temps[10] = 1.4e-45,0.0.0 It1: temp[11].x = 0 addr = 1 temp[addr] = 0,0.2,0.0 * 1,1,1,1 = 0,0.2,0,0 temps[9] = 0.2,0.2,0,0 temps[10].x = i2f(1,1,1,1 + 1,1,1,1).x = i2f(2,2,2,2).x = 2.8e-45 temps[10] = 2.8e-45,0.0.0 It2: temp[11].x = 0 addr = 2 temp[addr] = 0,0,0.2.0 * 1,1,1,1 = 0,0,0.2,0 temps[9] = 0.2,0.2,0.2,0 temps[10].x = i2f(2,2,2,2 + 1,1,1,1).x = i2f(3,3,3,3).x = 4.2e-45 temps[10] = 4.2e-45,0.0.0 It3: temp[11].x = 0 addr = 3 temp[addr] = 0.2,0.2,0,0 * 1,1,1,1 = 0.2,0.2,0,0 temps[9] = 0.4,0.4,0.2,0 temps[10].x = i2f(3,3,3,3 + 1,1,1,1).x = i2f(4,4,4,4).x = 5.6e-45 temps[10] = 5.6e-45,0.0.0 It4: temp[11] = 1 => break; Expected color = temp[9] = 0.4,0.4,0.2,0.0 I'll try to simplify the shader code, seems like it should be possible. If I am successful I'll attach a new version.
Created attachment 114256 [details] Simplified shader_test file This is a much simpler version of the same shader_test file reproducing the problem.
Created attachment 114257 [details] Even simpler version
I found something interesting: the test expects addr0 to get values 0, 1, 2, 3, then exit the loop in the 5th iteration, so I edited the code in the loop to add this conditional for the assignment of temp[9]: addr0 = int(floatBitsToInt(temps[10].xxxx)); if (addr0 >= 4) temps[9] = vec4(1.0, 0, 0, 0); else temps[9] += temps[addr0]; Basically, I wanted to confirm that addr0 was being assigned a value that was not expected (by forcing a red color as the output), but surprisingly, adding this conditional makes the test pass! Based on this, I think the problem here must be related with an optimization pass that, for some reason, changes its behavior when the conditional is present.
Created attachment 114260 [details] Simplest piglit shader_test showcasing the problem This is the simplest version of the shader test I could get. By comparing the generated vec4 code for the case that fails: addr0 = int(floatBitsToInt(temps[5].xxxx)) and the case that passes: addr0 = int(floatBitsToInt(temps[10].x)); it seems that the problem is that we are losing this code in the failed version: mov(8) g126<1>UD g0<4,4,1>UD { align16 WE_all 1Q }; mov(1) g127<1>UD 0x0000000aUD { align1 WE_all compacted }; mov(1) g127.4<1>UD 0x0000000bUD { align1 WE_all compacted }; send(8) g27<1>F g126<4,4,1>F data ( DC OWORD dual block read, 255, 0) mlen 2 rlen 1 { align16 1Q }; so we are losing a variable load into a register that for some reason.
Created attachment 114261 [details] Vec4 code for version that passes
Created attachment 114262 [details] Vec4 code for version that fails
Ok, I think I know what is happening. The problem is that in case that fails we generate IR like this: (assign (xyzw) (array_ref (var_ref temps) (constant int (1)) ) (expression vec4 + (array_ref (var_ref temps) (constant int (1)) ) (array_ref (var_ref temps) (swiz x (expression ivec4 bitcast_f2i (swiz xxxx (array_ref (var_ref temps) (constant int (2)) ) )) )) ) ) The second operand to '+' has a variable indexing into temps (so it will use a reladdr access), but then the register holding the reladdr is also an access into the same array (so it will also need a scratch load). The code in move_grf_array_access_to_scratch() does not handle this scenario. The reason why the other case passes is because it splits the bitcast and assigns it to a temporary like this: (declare (temporary ) int floatBitsToInt_retval) (assign (x) (var_ref floatBitsToInt_retval) (expression int bitcast_f2i (swiz x (array_ref (var_ref temps) (constant int (2)) ) )) ) (assign (xyzw) (array_ref (var_ref temps) (constant int (1)) ) (expression vec4 + (array_ref (var_ref temps) (constant int (1)) ) (array_ref (var_ref temps) (var_ref floatBitsToInt_retval) ) ) ) This way we resolve the reladdr to a temp variable (so the register holding the reladdr value is no longer an array access that needs a scratch read), so we do not hit the unsupported scenario. I'll see if I can fix this.
Sent a patch for review to the mailing list: http://lists.freedesktop.org/archives/mesa-dev/2015-March/079331.html
Fixed with: commit 3818dfcf3c2d03809774bba613d7dd92752b36db Author: Iago Toral Quiroga <itoral@igalia.com> Date: Tue Mar 17 10:48:04 2015 +0100 i965: Handle scratch accesses where reladdr also points to scratch space This is a problem when we have IR like this: (array_ref (var_ref temps) (swiz x (expression ivec4 bitcast_f2i (swiz xxxx (array_ref (var_ref temps) (constant int (2)) ) )) )) ) ) where we are indexing an array with the result of an expression that accesses the same array. In this scenario, temps will be moved to scratch space and we will need to add scratch reads/writes for all accesses to temps, however, the current implementation does not consider the case where a reladdr pointer (obtained by indexing into temps trough a expression) points to a register that is also stored in scratch space (as in this case, where the expression used to index temps access temps[2]), and thus, requires a scratch read before it is accessed. v2 (Francisco Jerez): - Handle also recursive reladdr addressing. - Do not memcpy dst_reg into src_reg when rewriting reladdr. v3 (Francisco Jerez): - Reduce complexity by moving recursive reladdr scratch access handling to a separate recursive function. - Do not skip demoting reladdr index registers to scratch space if the top level GRF has already been visited. v4 (Francisco Jerez) - Remove redundant checks. - Simplify code by making emit_resolve_reladdr return a register with the original src data except for reg, reg_offset and reladdr. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89508 Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.