Bug 66806

Summary:	[softpipe] glxgears floating point exception
Product:	Mesa	Reporter:	Vinson Lee <vlee>
Component:	Mesa core	Assignee:	mesa-dev
Status:	RESOLVED FIXED	QA Contact:
Severity:	critical
Priority:	medium	CC:	jfonseca, jsg, sroland, zackr
Version:	git	Keywords:	regression
Hardware:	x86-64 (AMD64)
OS:	Linux (All)
Whiteboard:
i915 platform:		i915 features:

Description Vinson Lee 2013-07-10 23:58:36 UTC

mesa: a06ee5a09e7072aab29abac7a3afe7b04b6d6ee6 (master)

Run glxgears on softpipe.

$ glxgears
Floating point exception (core dumped)

Core was generated by `glxgears'.
Program terminated with signal 8, Arithmetic exception.
#0  0x00007fe644cece5f in micro_mul (dst=0x7fff545deb50, src0=0x7fff545deb30, src1=0x7fff545deb40)
    at tgsi/tgsi_exec.c:983
983	   dst->f[0] = src0->f[0] * src1->f[0];
(gdb) bt
#0  0x00007fe644cece5f in micro_mul (dst=0x7fff545deb50, src0=0x7fff545deb30, src1=0x7fff545deb40)
    at tgsi/tgsi_exec.c:983
#1  0x00007fe644cf151c in exec_vector_binary (mach=0x7fe642910010, inst=0x19a30c0, 
    op=0x7fe644cece3f <micro_mul>, dst_datatype=TGSI_EXEC_DATA_FLOAT, src_datatype=TGSI_EXEC_DATA_FLOAT)
    at tgsi/tgsi_exec.c:2549
#2  0x00007fe644cf5048 in exec_instruction (mach=0x7fe642910010, inst=0x19a30c0, pc=0x7fff545deca4)
    at tgsi/tgsi_exec.c:3513
#3  0x00007fe644cf7ebc in tgsi_exec_machine_run (mach=0x7fe642910010) at tgsi/tgsi_exec.c:4366
#4  0x00007fe644cc69fa in vs_exec_run_linear (shader=0x19a1dd0, input=0x19a5e64, output=0x19a7764, 
    constants=0x18cbe88, const_size=0x18cbf88, count=80, input_stride=84, output_stride=84)
    at draw/draw_vs_exec.c:155
#5  0x00007fe644cba048 in draw_vertex_shader_run (vshader=0x19a1dd0, constants=0x18cbe88, const_size=0x18cbf88, 
    input_verts=0x7fff545dee00, output_verts=0x7fff545dee20) at draw/draw_pt_fetch_shade_pipeline.c:207
#6  0x00007fe644cba2b0 in fetch_pipeline_generic (middle=0x18d9300, fetch_info=0x0, in_prim_info=0x7fff545def50)
    at draw/draw_pt_fetch_shade_pipeline.c:266
#7  0x00007fe644cba68f in fetch_pipeline_linear_run (middle=0x18d9300, start=82, count=80, prim_flags=0)
    at draw/draw_pt_fetch_shade_pipeline.c:400
#8  0x00007fe644cc312c in vsplit_segment_simple_linear (vsplit=0x18d6710, flags=0, istart=82, icount=80)
    at draw/draw_pt_vsplit_tmp.h:240
#9  0x00007fe644cc3414 in vsplit_run_linear (frontend=0x18d6710, start=82, count=80) at draw/draw_split_tmp.h:60
#10 0x00007fe644cb5b29 in draw_pt_arrays (draw=0x18cb670, prim=7, start=82, count=80) at draw/draw_pt.c:149
#11 0x00007fe644cb68fb in draw_vbo (draw=0x18cb670, info=0x7fff545df0b0) at draw/draw_pt.c:560
#12 0x00007fe644dca6e0 in softpipe_draw_vbo (pipe=0x18324a0, info=0x7fff545df200) at sp_draw_arrays.c:128
#13 0x00007fe644c9c7da in cso_draw_vbo (cso=0x1966930, info=0x7fff545df200) at cso_cache/cso_context.c:1413
#14 0x00007fe644babaa2 in st_draw_vbo (ctx=0x18facc0, prims=0x1959580, nr_prims=2, ib=0x0, 
    index_bounds_valid=1 '\001', min_index=0, max_index=161, tfb_vertcount=0x0)
    at ../../src/mesa/state_tracker/st_draw.c:286
#15 0x00007fe644b8b95c in vbo_save_playback_vertex_list (ctx=0x18facc0, data=0x1958de8)
    at ../../src/mesa/vbo/vbo_save_draw.c:309
#16 0x00007fe644a29eb3 in ext_opcode_execute (ctx=0x18facc0, node=0x1958de0) at ../../src/mesa/main/dlist.c:598
#17 0x00007fe644a3f3f7 in execute_list (ctx=0x18facc0, list=1) at ../../src/mesa/main/dlist.c:7333
#18 0x00007fe644a45880 in _mesa_CallList (list=1) at ../../src/mesa/main/dlist.c:8733
#19 0x0000000000401a72 in ?? ()
#20 0x0000000000403176 in ?? ()
#21 0x00007fe647141ea5 in __libc_start_main (main=0x402aa0, argc=1, ubp_av=0x7fff545dfbf8, 
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff545dfbe8)
    at libc-start.c:260
#22 0x00000000004017b9 in ?? ()
#23 0x00007fff545dfbe8 in ?? ()
#24 0x000000000000001c in ?? ()
#25 0x0000000000000001 in ?? ()
#26 0x00007fff545e0440 in ?? ()
#27 0x0000000000000000 in ?? ()


63386b2f66a6d450889cd5368bc599beb7f1efbf is the first bad commit
commit 63386b2f66a6d450889cd5368bc599beb7f1efbf
Author: Zack Rusin <zackr@vmware.com>
Date:   Mon Jul 8 23:45:55 2013 -0400

    util: treat denorm'ed floats like zero
    
    The D3D10 spec is very explicit about treatment of denorm floats and
    the behavior is exactly the same for them as it would be for -0 or
    +0. This makes our shading code match that behavior, since OpenGL
    doesn't care and on a few cpu's it's faster (worst case the same).
    Float16 conversions will likely break but we'll fix them in a follow
    up commit.
    
    Signed-off-by: Zack Rusin <zackr@vmware.com>
    Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
    Reviewed-by: Roland Scheidegger <sroland@vmware.com>

:040000 040000 a92f3a68cc7b1ea78d521df7ea88d347359d0e7f 6ef306940ae30adc64843a7943458d2e681891cc M	src
bisect run success

Comment 1 Roland Scheidegger 2013-07-11 01:18:56 UTC

I can't reproduce this here for some reason.
That said I think we should probably manipulate mxcsr a bit more, in particular we should explicitly disable all exceptions (they are disabled by default but app may enable them), since a shader can legitimately generate such an exception regardless of denorms, and certainly getting that exception reported is NOT the right answer.
(The other question is why there's an exception there in the first place for simple glxgears shader. IIRC there were some problems with viewport being all zero for the first draw call because some resize was missing or something like that for some of the demos a while ago but I would expect that to produce zeros not denorms.)

Comment 2 Jose Fonseca 2013-07-11 06:48:16 UTC

Indeed 
http://software.intel.com/en-us/articles/x87-and-sse-floating-point-assists-in-ia-32-flush-to-zero-ftz-and-denormals-are-zero-daz

suggests to mask certain exceptions together with the FTZ/DAZ bits:

  // UNDERFLOWS
  set_mxcsr_on(FTZ_BIT);
  set_mxcsr_off(UNDERFLOW_EXCEPTION_MASK);
  make_denormal();
  clear_flags();


  // DENORMALS
  set_mxcsr_off(DAZ_BIT);
  set_mxcsr_on(DENORMAL_EXCEPTION_MASK);
  make_denormal();
  clear_flags();

So we should probably do the same.

Comment 3 Roland Scheidegger 2013-07-11 13:22:02 UTC

(In reply to comment #2)
> Indeed 
> http://software.intel.com/en-us/articles/x87-and-sse-floating-point-assists-
> in-ia-32-flush-to-zero-ftz-and-denormals-are-zero-daz
> 
> suggests to mask certain exceptions together with the FTZ/DAZ bits:
> So we should probably do the same.

Yes but I think we really should mask ALL exceptions regardless of denorms. A shader might easily do things causing exceptions (divide by zero or whatnot) and there's certainly no signaled exceptions in 3d graphics. I think we've just not seen issues because they are masked by default and no app unmasked them. Might be worth adding a piglit test for that.

Comment 4 Vinson Lee 2013-07-18 17:54:16 UTC

commit 63386b2f66a6d450889cd5368bc599beb7f1efbf also regresses ~4000 piglit tests on softpipe.

Comment 5 Roland Scheidegger 2013-07-23 02:02:25 UTC

Vinson,
since I can't reproduce this here could you do a "info registers mxcsr"? I'm curious what the exception mask actually is. Also, what are the src0->f[0] and src1->f[0] values?

Comment 6 Vinson Lee 2013-07-23 06:14:19 UTC

Program received signal SIGFPE, Arithmetic exception.
0x00007ffff4d4b21f in micro_mul (dst=0x7fffffffd270, src0=0x7fffffffd250, 
    src1=0x7fffffffd260) at tgsi/tgsi_exec.c:983
983	   dst->f[0] = src0->f[0] * src1->f[0];
(gdb) info registers mxcsr
mxcsr          0x8060	[ PE DAZ FZ ]
(gdb) print src0
$1 = (const union tgsi_exec_channel *) 0x7fffffffd250
(gdb) print src1
$2 = (const union tgsi_exec_channel *) 0x7fffffffd260
(gdb) print src0->f
$3 = {3.6500001, 4.33659029, 4.29644442, 3.54915023}
(gdb) print src1->f
$4 = {4.33012676, 4.33012676, 4.33012676, 4.33012676}
(gdb) print dst
$5 = (union tgsi_exec_channel *) 0x7fffffffd270
(gdb) print dst->f
$6 = {1.08687624e-38, 2.80259693e-45, 9.03817331e-39, 0}

Comment 7 Roland Scheidegger 2013-07-23 13:38:23 UTC

(In reply to comment #6)
> (gdb) info registers mxcsr
> mxcsr          0x8060	[ PE DAZ FZ ]
Hmm that is just crazy, somehow all exceptions got unmasked and we get a precision exception (which is really useless as you get that with just about any floating point instruction).
Doesn't make sense to me since the code does:
unsigned fpstate = util_fpstate_get();
util_fpstate_set_denorms_to_zero(fpstate); 

which should preserve all already masked exceptions, as the latter simply adds some bits. Maybe somehow the compiler reorders things or something like that, though I'm pretty sure it shouldn't, or need to add some more keywords like volatile or whatnot, but I don't see anything obvious.

Comment 8 Roland Scheidegger 2013-07-23 16:27:10 UTC

Vinson,

this looks like a compiler issue to me, what compiler are you using? I suspect something funky might be going on when setting/getting mxcsr value. A disassembly of draw_vbo() where these are set might help (there should be some stmxcsr, doing some ors with the return value followed by a ldmxcsr with that value).
Another solution would be to just set the exception mask bits regardless what the value before was. Something like this:

diff --git a/src/gallium/auxiliary/util/u_math.c b/src/gallium/auxiliary/util/u_math.c
index f3fe392..8c394d9 100644
--- a/src/gallium/auxiliary/util/u_math.c
+++ b/src/gallium/auxiliary/util/u_math.c
@@ -110,7 +110,7 @@ util_fpstate_set_denorms_to_zero(unsigned current_mxcsr)
 #if defined(PIPE_ARCH_SSE)
    if (util_cpu_caps.has_sse) {
       /* Enable flush to zero mode */
-      current_mxcsr |= _MM_FLUSH_ZERO_MASK;
+      current_mxcsr |= _MM_FLUSH_ZERO_MASK | _MM_MASK_MASK;
       if (util_cpu_caps.has_sse3) {
          /* Enable denormals are zero mode */
          current_mxcsr |= _MM_DENORMALS_ZERO_MASK;

In fact could potentially skip getting the previously set values entirely and just set them to whatever we want (and setting back to some sane values at the end, as it seems all ABIs require the same bits set. But I'm not 100% certain the same bits are really right in all environments, so setting them back to what they were before might be safer (as long as it works...).

Comment 9 Vinson Lee 2013-07-23 17:30:45 UTC

I can reproduce this bug with gcc 4.4.7 on CentOS 6, gcc 4.6.3 on Ubuntu 12.04, gcc 4.7.3 on Ubuntu 13.04, and gcc 4.8.1 on Ubuntu 13.10.

Comment 10 Roland Scheidegger 2013-07-24 14:00:10 UTC

Fixed by 1e003b44e83dde3912ec48eb3df0e25802b5101e.

Comment 11 Jonathan Gray 2013-08-01 04:35:31 UTC

*** Bug 67587 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.