Bug 89508

Summary:	Bad int(floatBitsToInt(vec4))
Product:	Mesa	Reporter:	Marc-Andre Lureau <marcandre.lureau>
Component:	Drivers/DRI/i965	Assignee:	Ian Romanick <idr>
Status:	RESOLVED FIXED	QA Contact:	Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity:	normal
Priority:	medium	CC:	itoral
Version:	unspecified
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:
Attachments:	Piglit shader test Simplified shader_test file Even simpler version Simplest piglit shader_test showcasing the problem Vec4 code for version that passes Vec4 code for version that fails

Description Marc-Andre Lureau 2015-03-09 23:22:11 UTC

Created attachment 114179 [details]
Piglit shader test

The attached piglit shader test fails with the i965 driver, and works with softpipe. The line:

addr0 = int(floatBitsToInt(temps[10].xxxx));

always retuns 0.

(however int(floatBitsToInt(temps[10].x)) works)

Comment 1 Matt Turner 2015-03-11 01:05:22 UTC

Thanks for the report. It's really nice to get shader_test files that demonstrate the bug. :)

We're storing temps[] to scratch space -- I expect that has more to do with the problem than int() or floatBitsToInt().

I'll hopefully take a closer look at this soon. If there's a smaller test case, that'd be nice as well.


As an aside: we're almost certainly going to generate better shader code for a uniform array than a non-const temporary array. Assuming you hit this bug in a real application, it's something you might want to change if possible.

Comment 2 Marc-Andre Lureau 2015-03-11 01:52:25 UTC

Hi thanks for confirming the bug

(In reply to Matt Turner from comment #1)
> I'll hopefully take a closer look at this soon. If there's a smaller test
> case, that'd be nice as well.

Yeah, I have tried without much success to make the example more succint.

> As an aside: we're almost certainly going to generate better shader code for
> a uniform array than a non-const temporary array. Assuming you hit this bug
> in a real application, it's something you might want to change if possible.

The bug can be hit by virglrenderer: the glsl code is "translated" from tgsi dynamically. Can't do much without temps here I think.

Comment 3 Iago Toral 2015-03-12 11:00:03 UTC

(In reply to Marc-Andre Lureau from comment #0)
> Created attachment 114179 [details]
> Piglit shader test
> 
> The attached piglit shader test fails with the i965 driver, and works with
> softpipe. The line:
> 
> addr0 = int(floatBitsToInt(temps[10].xxxx));
> 
> always retuns 0.

mmm... this is not what I see, if addr0 is always 0, then result should be a red color (0.8, 0, 0, 0) and I see black (0.0, 0.0, 0.0, 0.0).

Just in case it helps, the expected execution of this to produce the expected result should look like this:

Init:

temp[0] = 0.2,0,0,0 * 1,1,1,1 = 0.2,0,0,0
temp[9] = 0,0,0,0
temp[10] = 0,0,0,0

It0:

addr = 0
temp[addr] = 0.2,0,0,0
temps[9] = 0.2,0,0,0
temps[10].x = i2f(0,0,0,0 + 1,1,1,1).x = i2f(1,1,1,1).x = 1.45e-45
temps[10] = 1.4e-45,0.0.0

It1:

temp[11].x = 0
addr = 1
temp[addr] = 0,0.2,0.0 * 1,1,1,1 = 0,0.2,0,0
temps[9] = 0.2,0.2,0,0
temps[10].x = i2f(1,1,1,1 + 1,1,1,1).x = i2f(2,2,2,2).x = 2.8e-45
temps[10] = 2.8e-45,0.0.0

It2:
temp[11].x = 0
addr = 2
temp[addr] = 0,0,0.2.0 * 1,1,1,1 = 0,0,0.2,0
temps[9] = 0.2,0.2,0.2,0
temps[10].x = i2f(2,2,2,2 + 1,1,1,1).x = i2f(3,3,3,3).x = 4.2e-45
temps[10] = 4.2e-45,0.0.0

It3:
temp[11].x = 0
addr = 3
temp[addr] = 0.2,0.2,0,0 * 1,1,1,1 = 0.2,0.2,0,0
temps[9] = 0.4,0.4,0.2,0
temps[10].x = i2f(3,3,3,3 + 1,1,1,1).x = i2f(4,4,4,4).x = 5.6e-45
temps[10] = 5.6e-45,0.0.0

It4:
temp[11] = 1 => break;
Expected color = temp[9] = 0.4,0.4,0.2,0.0

I'll try to simplify the shader code, seems like it should be possible. If I am successful I'll attach a new version.

Comment 4 Iago Toral 2015-03-12 11:38:21 UTC

Created attachment 114256 [details]
Simplified shader_test file

This is a much simpler version of the same shader_test file reproducing the problem.

Comment 5 Iago Toral 2015-03-12 11:51:25 UTC

Created attachment 114257 [details]
Even simpler version

Comment 6 Iago Toral 2015-03-12 12:18:28 UTC

I found something interesting: the test expects addr0 to get values 0, 1, 2, 3, then exit the loop in the 5th iteration, so I edited the code in the loop to add this conditional for the assignment of temp[9]:

addr0 = int(floatBitsToInt(temps[10].xxxx));
if (addr0 >= 4)
   temps[9] = vec4(1.0, 0, 0, 0);
else
   temps[9] += temps[addr0];

Basically, I wanted to confirm that addr0 was being assigned a value that was not expected (by forcing a red color as the output), but surprisingly, adding this conditional makes the test pass!

Based on this, I think the problem here must be related with an optimization pass that, for some reason, changes its behavior when the conditional is present.

Comment 7 Iago Toral 2015-03-12 13:24:59 UTC

Created attachment 114260 [details]
Simplest piglit shader_test showcasing the problem

This is the simplest version of the shader test I could get. By comparing the generated vec4 code for the case that fails:

addr0 = int(floatBitsToInt(temps[5].xxxx))

and the case that passes:

addr0 = int(floatBitsToInt(temps[10].x));

it seems that the problem is that we are losing this code in the failed version:

mov(8)    g126<1>UD     g0<4,4,1>UD        { align16 WE_all 1Q };
mov(1)    g127<1>UD     0x0000000aUD       { align1 WE_all compacted };
mov(1)    g127.4<1>UD   0x0000000bUD       { align1 WE_all compacted };
send(8)   g27<1>F       g126<4,4,1>F
    data ( DC OWORD dual block read, 255, 0) mlen 2 rlen 1 { align16 1Q };

so we are losing a variable load into a register that for some reason.

Comment 8 Iago Toral 2015-03-12 13:26:15 UTC

Created attachment 114261 [details]
Vec4 code for version that passes

Comment 9 Iago Toral 2015-03-12 13:26:41 UTC

Created attachment 114262 [details]
Vec4 code for version that fails

Comment 10 Iago Toral 2015-03-13 10:41:31 UTC

Ok, I think I know what is happening. The problem is that in case that fails we generate IR like this:


(assign  (xyzw) (array_ref (var_ref temps) (constant int (1)) )
   (expression vec4 +
      (array_ref (var_ref temps) (constant int (1)) )
      (array_ref (var_ref temps) (swiz x (expression ivec4 bitcast_f2i (swiz xxxx (array_ref (var_ref temps) (constant int (2)) ) )) )) ) ) 

The second operand to '+' has a variable indexing into temps (so it will use a reladdr access), but then the register holding the reladdr is also an access into the same array (so it will also need a scratch load).

The code in move_grf_array_access_to_scratch() does not handle this scenario.

The reason why the other case passes is because it splits the bitcast and assigns it to a temporary like this:

(declare (temporary ) int floatBitsToInt_retval)
(assign  (x) (var_ref floatBitsToInt_retval)
   (expression int bitcast_f2i (swiz x (array_ref (var_ref temps) (constant int (2)) ) )) ) 
(assign  (xyzw) (array_ref (var_ref temps) (constant int (1)) ) 
   (expression vec4 +
      (array_ref (var_ref temps) (constant int (1)) ) (array_ref (var_ref temps) 
      (var_ref floatBitsToInt_retval) ) ) )

This way we resolve the reladdr to a temp variable (so the register holding the reladdr value is no longer an array access that needs a scratch read), so we do not hit the unsupported scenario.

I'll see if I can fix this.

Comment 11 Iago Toral 2015-03-13 13:44:43 UTC

Sent a patch for review to the mailing list:
http://lists.freedesktop.org/archives/mesa-dev/2015-March/079331.html

Comment 12 Iago Toral 2015-04-01 13:41:30 UTC

Fixed with:

commit 3818dfcf3c2d03809774bba613d7dd92752b36db
Author: Iago Toral Quiroga <itoral@igalia.com>
Date:   Tue Mar 17 10:48:04 2015 +0100

    i965: Handle scratch accesses where reladdr also points to scratch space
    
    This is a problem when we have IR like this:
    
    (array_ref (var_ref temps) (swiz x (expression ivec4 bitcast_f2i
       (swiz xxxx (array_ref (var_ref temps) (constant int (2)) ) )) )) ) )
    
    where we are indexing an array with the result of an expression that
    accesses the same array.
    
    In this scenario, temps will be moved to scratch space and we will need
    to add scratch reads/writes for all accesses to temps, however, the
    current implementation does not consider the case where a reladdr pointer
    (obtained by indexing into temps trough a expression) points to a register
    that is also stored in scratch space (as in this case, where the expression
    used to index temps access temps[2]), and thus, requires a scratch read
    before it is accessed.
    
    v2 (Francisco Jerez):
     - Handle also recursive reladdr addressing.
     - Do not memcpy dst_reg into src_reg when rewriting reladdr.
    
    v3 (Francisco Jerez):
     - Reduce complexity by moving recursive reladdr scratch access handling
       to a separate recursive function.
     - Do not skip demoting reladdr index registers to scratch space if the
       top level GRF has already been visited.
    
    v4 (Francisco Jerez)
     - Remove redundant checks.
     - Simplify code by making emit_resolve_reladdr return a register with
       the original src data except for reg, reg_offset and reladdr.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89508
    Reviewed-by: Francisco Jerez <currojerez@riseup.net>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.