Bug 108856

Summary: floating-point-exceptions in gallium/auxiliary/tgsi/ functions
Product: Mesa Reporter: popinet
Component: Drivers/OSMesaAssignee: mesa-dev
Status: RESOLVED MOVED QA Contact: mesa-dev
Severity: normal    
Priority: medium    
Version: 18.2   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description popinet 2018-11-25 10:25:18 UTC
Calling the gallium osmesa implementation generates multiple floating-point exceptions.

For example the src/gallium/auxiliary/tgsi/tgsi_exec.c:554, micro_rsq() function contains the code:

   dst->f[0] = 1.0f / sqrtf(src->f[0]);

and is sometimes called with src->f[0] == 0.

This is an important issue for (scientific) codes which typically turn floating-point-exceptions on.

A workaround is to turn FPE off before each GL/OSmesa call and turn them back on after.

Note that the non-gallium OSMesa implementation does not have this issue.
Comment 1 Roland Scheidegger 2018-11-26 14:27:13 UTC
I do not think this is a mesa problem.
(Note I believe you could trigger fpe even without gallium osmesa, but you might need to try harder.)
As far as I can tell, the "workaround" is the correct solution. Do not call into libraries using non-default floating point environment. It is the responsibility of the caller to ensure this.
Comment 2 popinet 2018-11-26 14:50:21 UTC
Thanks for your quick reply.

IMHO a numerical algorithm/code which uses the results from undefined mathematical operations should be considered wrong.

In effect, I believe that the default floating-point environment you refer to should be FPE on. I realise that this would break a (probably large) number of numerical libraries, and is thus not possible, however this should not be an excuse to keep implementing libraries ignoring undefined operations.
Comment 3 Roland Scheidegger 2018-11-26 21:08:39 UTC
FWIW in the example you gave the result wouldn't even be undefined (it would be infinite).
Either that was an overflow exception, or what was happening there wasn't quite what you used in the example. That said, yes things like 0.0/0.0 can equally happen.
Generally, it is completely impossible for software rendering to avoid this, since doing such operations with undefined results can come from the application itself, as a trivial example a shader could do this directly. That might be quite different to scientific computing, but with rendering you never ever want to deal with FPEs (gpus do not support exceptions neither at all, at least not when running graphics, they might for compute kernels). (The graphic APIs actually may define what happens with the resulting NaNs, for instance when you later try to convert them to integers, although in general with OpenGL the results are all undefined, just must not crash.)
The default FPE is defined by the abi, not what you or I think it should be (for my part, IMHO exceptions are a totally broken concept, but I understand for what you're doing you might think otherwise...). FWIW if you use llvmpipe instead of softpipe with osmesa, we actually do mess with the floating point state (when processing draw commands) - but we only use this to always flush the useless (and I bet you disagree on that part too...) denormals to zero (we do this because GL doesn't care about denormals, and d3d10 requires them to be flushed to zero, and they might be very slow). We would restore the floating point state on returning from the draw command, however. But even there, we don't mess with enabling/disabling exceptions, as it's simply not our responsibility, we're just fine with the default of all exceptions masked.
Comment 4 popinet 2018-11-27 07:41:25 UTC
Thanks for your feedback.

I understand we may have a different point-of-view due to different fields of application.

> But even there, we don't mess with enabling/disabling exceptions, as it's simply 
> not our responsibility, we're just fine with the default of all exceptions 
> masked.

I am sure you are fine with it, as are many other libraries. What I am saying is that turning them on may well help you find bugs that you could easily miss otherwise. At least this has been my experience over the years. It is a bit like using assert() and turning assertions on. If you are worried about performance, you can of course revert to the default in production.
Comment 5 GitLab Migration User 2019-09-18 20:13:00 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/886.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.