76862 – clamp with bounds inside [0, 1] generates slow code

Bug 76862 - clamp with bounds inside [0, 1] generates slow code

Summary: clamp with bounds inside [0, 1] generates slow code

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	glsl-compiler (show other bugs)
Version:	git
Hardware:	Other All

Importance:	medium enhancement
Assignee:	Abdiel Janulgue
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	i965-perf
	Show dependency tree / graph

Reported:	2014-03-31 21:39 UTC by Ian Romanick
Modified:	2014-09-01 22:54 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments

Description Ian Romanick 2014-03-31 21:39:51 UTC

Code sequences like

     a = clamp(b, 0, .5)
     c = clamp(d, .1, 1);

generates both instructions.  However, each of these could be implemented with a single instruction with a SAT modifier.  The first example above should be

    MAX_SAT   a, b, .5;

and the second should be

    MIN_SAT   c, d, .1;

The requirements for this optimiation are that the lower bound be zero and the upper bound less than 1, or the lower bound greater than zero and the upper bound equal to 1.

Comment 1 Matt Turner 2014-04-22 17:06:12 UTC

Are you thinking this should be implemented in src/glsl? There's no IR for saturate. We emit saturate modifiers for clamp(x, 0.0, 1.0) in the i965 backend.

I think the IR should grow an ir_unop_saturate operation.

Comment 2 Matt Turner 2014-06-04 01:45:11 UTC

Ian's description has bugs, so let me try again:

GLSL's clamp(A, B, C) clamps A to a lower bound of B and an upper bound of C. We implement this in the compiler with min() and max() operations: min(max(a, b), c).

i965 assembly for clamp(A, B, C) would look like

(select from A, B the argument that is greater than or equal; i.e., max)
sel.ge tmp, A, B
(select from tmp, C the argument that is less that; i.e., min)
sel.l  dst, tmp, C

Saturate is a special case of clamp, specifically when the bounds are 0.0 to 1.0 (for floating point types). Probably all GPUs can perform saturate for free -- it's a destination modifier in i965 assembly.

The i965 backend's try_emit_saturate() function recognizes min(max(a, 0.0), 1.0) as a saturate operation, and sets the saturate modifier (or emits a MOV instruction with saturate).

i965 assembly for clamp(A, 0.0, 1.0) (after try_emit_saturate()) would turn into

mov.sat dst, A

The proposed optimization idea here is that for immediate arguments that satisfy the condition in comment #0, we can emit a single min/max instruction with a saturate modifier instead of a min and a max instruction.

So, clamp(A, 0, 0.5) would be

(select the least of A and 0.5, saturate the result, and store in dst)
sel.l.sat dst, A, 0.5

and similarly clamp(A, 0.1, 1) -> sel.ge.sat dst, A, 0.1.

Comment 3 Matt Turner 2014-06-04 02:32:26 UTC

I think the project will look like this:

== Add saturate(x) to the GLSL IR ==

 1) Add a new GLSL IR instruction, "ir_unop_saturate". See commit 499d7a7f as an example.

 2) Implement constant evaluation of saturate. See commit 9c04b8c2 as an example.

 3) Implement a lowering pass to lower saturate(a) to min(max(a, b), c) (to be used for old ARB vertex programs that cannot do saturate). See commit dafd0508 as an example.

 4) Update the backend visitors (ir_to_mesa.cpp, st_glsl_to_tgsi.cpp, brw_fs_visitor.cpp, brw_vec4_visitor.cpp) to handle ir_unop_saturate. All of these emit instructions that have a saturate flag that can be set, so emit a MOV with the saturate flag. See 84772629 as an example (for the i965 backends).

Note the small difficulty of not being able to use the saturate flag in certain ARB vertex program versions in ir_to_mesa and st_glsl_to_tgsi, noted in their try_emit_sat functions. For those, modify the call to lower_instructions() to include your new flag according to the condition at the top of try_emit_sat().

 5) Update src/glsl/ir_builder.cpp's saturate() function to use the new operation.

 6) Delete the try_emit_sat functions, since they'll no longer be useful.

This should be one patch series. Send it as soon as it's done.

== Use saturate to optimize code ==

 1) Write some piglit tests for the last two cases below. Submit them as soon as they're ready.

 2) Extend opt_algebraic.cpp to recognize min(max(a, 0.0), 1.0) as saturate. Make sure tests/spec/glsl-1.10/execution/?s-saturate-*.shader_test pass.

 3) Extend opt_algebraic.cpp to recognize min(max(a, b), c) where (b == 0.0 and c < 1.0) and turn it into saturate(min(a, c)).

 4) Extend opt_algebraic.cpp to recognize min(max(a, b), c) where (b > 0.0 and c == 1.0) and turn it into saturate(max(a, b))

With these two optimizations on top of the previously laid infrastructure, your piglit tests should generate fewer instructions.

Comment 4 Matt Turner 2014-09-01 22:54:59 UTC

Abdiel's series has been committed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.