89970 – i965/fs: copy propagate should propagate bitwise not into logic operations

Bug 89970 - i965/fs: copy propagate should propagate bitwise not into logic operations

Summary: i965/fs: copy propagate should propagate bitwise not into logic operations

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i965 (show other bugs)
Version:	git
Hardware:	All All

Importance:	medium enhancement
Assignee:	Ian Romanick
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	i965-perf
	Show dependency tree / graph

Reported:	2015-04-09 19:49 UTC by Ian Romanick
Modified:	2019-04-17 22:36 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments

Description Ian Romanick 2015-04-09 19:49:20 UTC

With some other Boolean and gl_FrontFacing work that I'm doing, I've noticed that occasionally something like

not(8)          g37<1>D         g29<8,8,1>D                     { align1 1Q compacted };
and(8)          g38<1>D         g37<8,8,1>D     0x3f800000UD    { align1 1Q };

is generated.  Instead, this should be

and(8)          g38<1>UD        -g29<8,8,1>UD   0x3f800000UD    { align1 1Q };

I'm submitting a bug so that I don't forget about this.

Comment 1 Matt Turner 2015-04-30 18:13:08 UTC

Presumably you're talking about BDW+ where the negate source modifier on logical operations performs a bitwise-NOT?

This doesn't seem like a task for CSE. I think you meant copy propagation.

Comment 2 Matt Turner 2015-10-01 22:33:27 UTC

Okay, talked to idr. The problem seems to be that the not(8) in his example can't be propagated into the and(8) because the types are signed...? I don't see any code in brw_fs_copy_propagate.cpp to handle this, so I don't think it's related to the argument types.

But yeah, we should be propagating NOT into logic operations.

Comment 3 Eero Tamminen 2016-07-12 10:57:18 UTC

Ian, is this still an issue, and if yes, could you attach an example shader?

Comment 4 Ian Romanick 2018-07-18 22:14:40 UTC

At the time, we generated intBitsToFloat(b & floatBitsToInt(1.0)) to do b2f conversions.  Since b could only be 0 or -1, ~b & 0x38f00000 produces the same possible values as -b & 0x37f00000.  However, we now emit float(-b) instead, so I don't think this can occur anymore.

Maybe generating ~b & 0x3f800000 (BDW+) or -b & 0x38f00000 (SNB - HSW) for float(!b) is better?  Right now there's a late optimization in nir_opt_algebraic that does this:

   (('b2f(is_used_more_than_once)', ('inot', a)), ('bcsel', a, 0.0, 1.0)),
   (('fneg(is_used_more_than_once)', ('b2f', ('inot', a))), ('bcsel', a, -0.0, -1.0)),

In at least some cases, this results in some extra load-immediate instructions

mov(16)         g16<1>D         1065353216D                     { align1 1H };
mov.nz.f0(16)   null<1>D        g8<8,8,1>D                      { align1 1H };
(-f0) sel(16)   g126<1>UD       g16<8,8,1>UD    0x00000000UD    { align1 1H };

Which also seems dumb.  This should be

mov.z.f0(16)    null<1>D        g8<8,8,1>D
(-f0) sel(16)   g126<1>F        g8<8,8,1>F     1.0F

Comment 5 Ian Romanick 2018-07-19 04:05:40 UTC

(In reply to Ian Romanick from comment #4)
> At the time, we generated intBitsToFloat(b & floatBitsToInt(1.0)) to do b2f
> conversions.  Since b could only be 0 or -1, ~b & 0x38f00000 produces the
> same possible values as -b & 0x37f00000.  However, we now emit float(-b)
> instead, so I don't think this can occur anymore.
> 
> Maybe generating ~b & 0x3f800000 (BDW+) or -b & 0x38f00000 (SNB - HSW) for
> float(!b) is better?  Right now there's a late optimization in
> nir_opt_algebraic that does this:
> 
>    (('b2f(is_used_more_than_once)', ('inot', a)), ('bcsel', a, 0.0, 1.0)),
>    (('fneg(is_used_more_than_once)', ('b2f', ('inot', a))), ('bcsel', a,
> -0.0, -1.0)),
> 
> In at least some cases, this results in some extra load-immediate
> instructions
> 
> mov(16)         g16<1>D         1065353216D
> mov.nz.f0(16)   null<1>D        g8<8,8,1>D
> (-f0) sel(16)   g126<1>UD       g16<8,8,1>UD    0x00000000UD
> 
> Which also seems dumb.  This should be
> 
> mov.z.f0(16)    null<1>D        g8<8,8,1>D
> (-f0) sel(16)   g126<1>F        g8<8,8,1>F     1.0F

Ignore this last part.  That's rubbish.

Comment 6 Ian Romanick 2019-04-17 22:36:26 UTC

I believe this patch series resolves all the issues that I wanted this bug to record:

1edf67fc3f6 intel/fs: Generate if instructions with inverted conditions
d40640efe8a nir/algebraic: Replace a bcsel of a b2f sources with a b2f(!(a || b))
7725d609387 intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))
cb3e21cd192 intel/fs: Use De Morgan's laws to avoid logical-not of a logic result on Gen8+
8eb36c91295 intel/fs: Emit logical-not of operands on Gen8+
06eaaf2de94 intel/fs: Refactor ALU source and destination handling to a separate function
fb3ca9109cb intel/fs: Handle OR source modifiers in algebraic optimization
c9d5bd050c3 intel/fs: Relax type matching rules in cmod propagation from MOV instructions
eae19f5f19f nir/algebraic: Replace i2b used by bcsel or if-statement with comparison

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.