Summary: | Wrong and unstable image rendering from GLSL fragment shaders | ||
---|---|---|---|
Product: | Mesa | Reporter: | Hugues Evrard <h.evrard> |
Component: | Drivers/DRI/i965 | Assignee: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Status: | RESOLVED INVALID | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | 0xe2.0x9a.0x9b, alastair.donaldson, chemecse, h.evrard, mattst88 |
Version: | 17.0 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
URL: | https://github.com/mc-imperial/shader-compiler-bugs/tree/master/Intel-i7-2600-Mesa-Linux/wrong_images/large-v100-webgl-9803840f5a05935c_inv_variant_77 | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Archive with pair of shaders and their expected renderings
apitrace of variant shader |
Description
Hugues Evrard
2017-02-23 11:04:19 UTC
Some info on the system where the bug was found: CPU: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz GPU: "2nd Generation Core Processor Family Integrated Graphics Controller" OS: Arch Linux: 4.9.6-1-ARCH #1 SMP PREEMPT x86_64 GNU/Linux Mesa: compiled from 17.0.0 source downloaded from website GCC version: gcc (GCC) 6.3.1 20170109 Created attachment 129905 [details]
apitrace of variant shader
For convenience, attaching a trace of the variant.
Very surprising that this breaks... the diff between shaders is
--- large-v100-webgl-9803840f5a05935c_inv_variant_77/original.frag 2017-02-23 05:18:29.000000000 -0500
+++ large-v100-webgl-9803840f5a05935c_inv_variant_77/variant.frag 2017-02-23 05:18:58.000000000 -0500
@@ -67,12 +70,18 @@
}
else
{
+ for(int i = 0; i < 1; i++) {
if((s2.x < s1.x || s1.x <= 0.0) && s2.x > 0.0)
{
hd.c = ca.y;
+ if(injectionSwitch.x > injectionSwitch.y)
+ {
+ break;
+ }
hd.d = s2.x;
hd.n = s2.yzw;
}
+ }
}
return hd;
}
(and injectionSwitch is set to 0,1 so that should never be true)
Tested on a SKL GT2, so it's not just some weird SNB problem.
Thanks for the report! I can reproduce on Haswell, in addition to your report on Sandybridge. I expect this means it will apply to all generations supported by the i965 driver. The first thing that pops out at me is in the NIR for the variant, we have: vec1 32 ssa_0 = undefined vec1 32 ssa_1 = undefined vec1 32 ssa_2 = undefined [snip] loop { block block_6: /* preds: block_5 block_15 */ vec1 32 ssa_118 = phi block_5: ssa_0, block_15: ssa_130 vec1 32 ssa_119 = phi block_5: ssa_1, block_15: ssa_131 vec1 32 ssa_120 = phi block_5: ssa_2, block_15: ssa_132 whereas there are no undefined SSA values in the original. Since the undefined values are flowing into the loop from the entrance, I believe this indeed indicates a bug in our control flow analysis in NIR. As an aside, for gallium-based drivers this seems to generate a checkerboard. (Tested with at least llvmpipe and softpipe.) Both the original and the variant. radeonsi renders original.png, so reassigning. I can confirm softpipe renders a checkerboard though. Interesting it would render an incorrect checkerboard both with softpipe and llvmpipe (as these typically don't share any bugs wrt shader execution which aren't shared elsewhere too). Is the shader NaN-safe? I think there's a potential issue in the original shader itself. The opU function has an out parameter HitData hd, which should probably be inout. Otherwise depending on some conditions the value returned will be undefined (as it gets initialized before in the calling function). Not sure if these conditions are actually hittable and matter here - I've looked at the tgsi code and indeed the tgsi code was different with this change, but it made no difference to the checkerboard pattern being rendered with llvmpipe. This is not a bug, the shader has undefined output. I've confirmed the path taken through the shader in the opU() function is always the else clause, then not satisfying the if there, so the returned HitData value is always undefined due to using out instead of inout for the hd parameter. Therefore the path taken in render() (using the undefined trc.d value) is completely undefined, and of course such undefined results can change randomly if injecting code which seemingly should have no effect. (As for the checkerboard pattern on llvmpipe/softpipe, I'm not sure but I've verified that with these drivers in the opU() function always the first if clause is taken - hence things are well defined but the result is different. Not sure why exactly the numbers in opU() are different, the shader could use float math which isn't quite stable given the inputs but whatever it's not connected to the shader injection and unless someone says otherwise I'm not convinced it's driver bugs neither.) (In reply to Roland Scheidegger from comment #8) Thanks a lot for the analysis. I agree with your assessment. Changing -HitData opU(out HitData hd, in vec4 s1, in vec4 s2, in vec2 ca) +HitData opU(inout HitData hd, in vec4 s1, in vec4 s2, in vec2 ca) indeed changes the shader program generated for original.frag (but not its output), and the same change to variant.frag "fixes" its output to match original.frag. (In reply to Matt Turner from comment #9) > > [...], and the same change to variant.frag "fixes" its output to match > original.frag. Thanks for the analysis indeed! This bug report comes from experiments on fuzzing shaders to chase bugs in graphics drivers. Unfortunately here the original shader that we fuzzed exposes undefined behavior in opU() as explained by Roland, we didn't spot that earlier. I'll try to confirm the fix suggested by Matt on our setup, but I'm currently travelling so it may take a few days. I confirme that changing the parameter to "inout" fixes the issue on our setup too. Thanks again for the analysis. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.