Bug 105271

Summary: WebGL2 shader crashes i965_dri.so 17.3.3
Product: Mesa Reporter: Hugues Evrard <hugues.evrard>
Component: Drivers/DRI/i965Assignee: Timothy Arceri <t_arceri>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: jason, kinetocore, mark.a.janes
Version: 17.3Keywords: bisected, regression
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 104757    

Description Hugues Evrard 2018-02-27 15:56:57 UTC
Running a WebGL2 shader through Firefox crashes the tab, and the crash originates from i965_dri.so. See the Mozilla crash details here:

https://crash-stats.mozilla.com/report/index/89138e8c-9743-4697-b67b-9185b1180227

dmesg indicates what seems like a division by zero:

[ 9949.151041] traps: Web Content[1889] trap divide error ip:7fcb84f87bd0 sp:7ffedcf28570 error:0 in i965_dri.so[7fcb84c48000+7ed000]

For ease of reproduction, this web page will automatically run this WebGL shader:
http://hevrard.org/unlinked/graphicsfuzz/benchmark/android-v1/shader06.html

Setup:

CPU: i3-6100U
GPU: Intel HD 520
Mesa: 17.3.3 (as shipped in Debian testing)
Linux 4.14.0-3-amd64 #1 SMP Debian 4.14.17-1 (2018-02-14)

This shader is part of the GraphicsFuzz demo, a small excerpt of shaders that trigger issues on most Android platforms. It does not use to cause much trouble on Mesa, but updating to Mesa 17.3.3 lead to appearance of that crash issue. You can easily try the 15 tests of the on-line demo here:

http://www.graphicsfuzz.com/#demo
Comment 1 Mark Janes 2018-02-27 18:01:31 UTC
bisected to:

40e9f2f13847ddd94e1216088aa00456d7b02d2b is the first bad commit
commit 40e9f2f13847ddd94e1216088aa00456d7b02d2b
Author: Timothy Arceri <timothy.arceri@collabora.com>
Date:   Tue Dec 13 11:37:25 2016 +1100
    i965: disable loop unrolling in GLSL IR
    
    There is a single regression in loop unrolling which is:
    
    loops HURT:   shaders/orbital_explorer.shader_test GS SIMD8:    0 -> 1
    
    However the loop is huge so it seems reasonable not to unroll it. It's
    surprising that GLSL IR does unroll it.
    
    shader-db results BDW:
    
    total instructions in shared programs: 13037455 -> 13036947 (-0.00%)
    instructions in affected programs: 17982 -> 17474 (-2.83%)
    helped: 63
    HURT: 25
    
    total cycles in shared programs: 262217870 -> 262227990 (0.00%)
    cycles in affected programs: 2287046 -> 2297166 (0.44%)
    helped: 969
    HURT: 844
    
    total loops in shared programs: 2951 -> 2952 (0.03%)
    loops in affected programs: 0 -> 1
    helped: 0
    HURT: 1
    
    LOST:   0
    GAINED: 1
    
    Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Comment 2 Timothy Arceri 2018-02-28 02:07:00 UTC
This looks like a buggy shader to me.

Inside one of the loops it contains:

((v687 / (((true ? (0 + ((v687) + 0)) : (- 63176))

Where for the first iteration of the loop v687 == 0 so we end up with:

0 / 0

My guess is GLSL IR just happened to optimise away this expression before constant folding tried to evaluate the constants expression.
Comment 3 Mark Janes 2018-02-28 02:41:41 UTC
Yes, the shader is intentionally buggy, to crash drivers that don't suitably check the input.  Mesa has taken a few patches to that have come out of these fuzz tests in the past.
Comment 4 Timothy Arceri 2018-02-28 02:48:58 UTC
Ok.

From the GLSL 4.60 spec Section 5.9 (Expressions):

   "Dividing by zero does not cause an exception but does result in an unspecified value."

So it seems to be an existing bug with constant evaluation in NIR. Will see if I can figure out a fix.
Comment 5 Timothy Arceri 2018-02-28 03:08:13 UTC
This should fix the crash:

https://patchwork.freedesktop.org/patch/207227/
Comment 6 Timothy Arceri 2018-02-28 03:55:01 UTC
V2 of fix:

https://patchwork.freedesktop.org/patch/207228/
Comment 7 Jason Ekstrand 2018-02-28 04:14:03 UTC
(In reply to Timothy Arceri from comment #6)
> 
> https://patchwork.freedesktop.org/patch/207228/

For some reason, that's not showing up in my e-mail.  R-B none the less.
Comment 8 Timothy Arceri 2018-02-28 04:57:32 UTC
Should be fixed by:

commit 0c1f37cc2d8555223ade73b244a3ee374be8d9cd
Author: Timothy Arceri <tarceri@itsqueeze.com>
Date:   Wed Feb 28 14:33:55 2018 +1100

    nir: fix interger divide by zero crash during constant folding
    
    From the GLSL 4.60 spec Section 5.9 (Expressions):
    
       "Dividing by zero does not cause an exception but does result in
        an unspecified value."
    
    Fixes: 89285e4d47a6 "nir: add new constant folding infrastructure"
    
    Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105271
Comment 9 Mark Janes 2018-02-28 05:20:24 UTC
Does anyone have an opinion on the ease and relative utility of covering this case with a piglit test?
Comment 10 Jason Ekstrand 2018-02-28 05:31:47 UTC
Getting a division by zero in constant propagation is trivial.  All we would have to do is write a simple shader_runner test.  Getting it to only trigger in NIR would be a bit trickier, but I think it could be done.  Now that NIR is the only thing doing loop unrolling, we could do something like this:

int j = 0;
for (int i = 0; i < 4; i++)
   j += 42/i;

gl_FragColor.x = 255.0f / float(i);

That should do the trick.
Comment 11 Jason Ekstrand 2018-02-28 05:32:20 UTC
(In reply to Jason Ekstrand from comment #10)
> int j = 0;
> for (int i = 0; i < 4; i++)
>    j += 42/i;
> 
> gl_FragColor.x = 255.0f / float(i);

That should be float(j)
Comment 12 vadym 2018-02-28 14:03:01 UTC
*** Bug 104703 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.