Bug 110507

Summary: [Regression] [Bisected] assert in fragment shader compilation when SIMD32 is enabled
Product: Mesa Reporter: Iago Toral <itoral>
Component: Drivers/DRI/i965Assignee: Rafael Antognolli <rafael.antognolli>
Status: VERIFIED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: currojerez, eero.t.tamminen, itoral, jason, mattst88, rafael.antognolli
Version: gitKeywords: bisected, regression
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Minimal SDL2 program that reproduces the problem

Description Iago Toral 2019-04-24 11:25:27 UTC
Created attachment 144082 [details]
Minimal SDL2 program that reproduces the problem

I filed this for i965, but the issue is in the shader compiler, so it also affects the Vulkan driver and, presumably, Iris.

The problem only happens when INTEL_DEBUG=do32 is used, in which case SDL2 generates a shader that hits the following assertion:

../src/intel/compiler/brw_fs.cpp:1745: void fs_visitor::assign_urb_setup(): Assertion `inst->src[i].offset < REG_SIZE / 2' failed.

I bisected the regression to this commit:

commit c0504569eac5e5c305e9f0c240e248aca9d8891f (HEAD -> master)
Author: Rafael Antognolli <rafael.antognolli@intel.com>
Date:   Fri Oct 19 15:44:15 2018 -0700

    intel/fs: Move the scalar-region conversion to the generator.
    
    Move the scalar-region conversion from the IR to the generator, so it
    doesn't affect the Gen11 path. We need the non-scalar regioning
    for a later lowering pass that we are adding.
    
    v2: Better commit message (Matt)
    
    Reviewed-by: Matt Turner <mattst88@gmail.com>

I am attaching a minimal SDL2 sample program that reproduces the problem.
Comment 1 Eero Tamminen 2019-06-12 13:33:19 UTC
*** Bug 110905 has been marked as a duplicate of this bug. ***
Comment 2 Eero Tamminen 2019-06-12 13:39:15 UTC
(In reply to Iago Toral from comment #0)
> I am attaching a minimal SDL2 sample program that reproduces the problem.

This is enough to reproduce: INTEL_DEBUG=do32 glxgears
Comment 3 Eero Tamminen 2019-06-12 13:45:16 UTC
Valgrind doesn't give any additional info, just abort backtrace (same as I got with systemd-coredump from Weston when enabling SIMD32 heuristics):
==5950== Process terminating with default action of signal 6 (SIGABRT): dumping core
==5950==    at 0x48CE6A5: raise (in /usr/lib64/libc-2.29.so)
==5950==    by 0x48AF860: abort (in /usr/lib64/libc-2.29.so)
==5950==    by 0x48AF736: ??? (in /usr/lib64/libc-2.29.so)
==5950==    by 0x48BF545: __assert_fail (in /usr/lib64/libc-2.29.so)
==5950==    by 0x6BEBC7E: fs_visitor::assign_urb_setup() (brw_fs.cpp:1745)
==5950==    by 0x6BFB847: fs_visitor::run_fs(bool, bool) (brw_fs.cpp:7664)
==5950==    by 0x6BFD659: brw_compile_fs (brw_fs.cpp:8039)
==5950==    by 0x66ED3FB: brw_codegen_wm_prog (brw_wm.c:123)
==5950==    by 0x66EE44F: brw_fs_precompile (brw_wm.c:590)
==5950==    by 0x66DADD3: brw_shader_precompile (brw_link.cpp:56)
==5950==    by 0x66DADD3: brw_link_shader (brw_link.cpp:374)
==5950==    by 0x68E9798: _mesa_glsl_link_shader (ir_to_mesa.cpp:3170)
==5950==    by 0x6910023: create_new_program (ff_fragment_shader.cpp:1131)
==5950==    by 0x6910023: _mesa_get_fixed_func_fragment_program (ff_fragment_shader.cpp:1161)
Comment 4 Matt Turner 2019-09-01 22:27:45 UTC
Curro and/or Jason had a fix for this.
Comment 5 Jason Ekstrand 2019-09-02 03:13:53 UTC
Typed this up and tested it w/ glxgears:

https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1839
Comment 6 Paul 2019-09-02 09:30:33 UTC
Hi guys
Just tested the issue with attached test and glxgears - both tests looks good, no core dumps with applied patch.
Comment 7 Francisco Jerez 2019-09-03 18:55:52 UTC
(In reply to Jason Ekstrand from comment #5)
> Typed this up and tested it w/ glxgears:
> 
> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1839

Yes, I had a more complete branch with SIMD32 fixes, but go ahead and duplicate the rest of my work if you don't have anything else to do.
Comment 8 Mark Janes 2019-09-05 15:53:06 UTC
This will not affect users unless they are overriding the simd32 configuration for fragment shaders.  During our engineering meeting, we decided that this issue should not block the release.
Comment 9 Eero Tamminen 2019-09-06 07:37:16 UTC
(In reply to Mark Janes from comment #8)
> This will not affect users unless they are overriding the simd32
> configuration for fragment shaders.  During our engineering meeting, we
> decided that this issue should not block the release.

Should there be e.g. mesa-19.3 tracker where regressions that don't block current release would get moved?  (to make sure they don't fall between the cracks later on)
Comment 10 Jason Ekstrand 2019-09-06 19:33:12 UTC
Fixed by the following commit in master:

commit d15fe8ca8262d502435c4f83985ac414f950bc5f
Author: Jason Ekstrand <jason@jlekstrand.net>
Date:   Sun Sep 1 22:12:07 2019 -0500

    Revert "intel/fs: Move the scalar-region conversion to the generator."
    
    This reverts commit c0504569eac5e5c305e9f0c240e248aca9d8891f.  Now that
    we're doing interpolation lowering in NIR, we can continue to stride the
    FS input registers directly in the brw_fs_nir code like we did before.
    This fixes SIMD32 fragment shaders which broke because lower_simd_width
    depended on the 0 stride to split PLN instructions correctly.
    
    Reviewed-by: Francisco Jerez <currojerez@riseup.net>

I didn't CC stable because nothing currently requires SIMD32 FS there.
Comment 11 Eero Tamminen 2019-09-09 14:13:37 UTC
Tested with Iris.  Works fine without any asserts -> VERIFIED

Btw. With a simple patch to enable SIMD32 in most cases where it could make sense:
------------------------------------------------
-   /* Currently, the compiler only supports SIMD32 on SNB+ */
-   if (v8.max_dispatch_width >= 32 && !use_rep_send &&
+   /* Currently, the compiler only supports SIMD32 on SNB+
+    *
+    * Compile SIMD32 only if:
+    * - SIMD16 didn't fail (there are enough regs)
+    * - GEN6+ & 32-wide dispatch supported
+    * - there's only single RT (with MRT, perf is lower due to RCC write trashing)
+    * - or it's force enabled
+    */
+   if (!simd16_failed &&
+       v8.max_dispatch_width >= 32 &&
        compiler->devinfo->gen >= 6 &&
-       unlikely(INTEL_DEBUG & DEBUG_DO32)) {
+       !use_rep_send &&
+       (shader->info.outputs_written <= BITFIELD64_BIT(FRAG_RESULT_DATA0) ||
+	unlikely(INTEL_DEBUG & DEBUG_DO32))) {
------------------------------------------------

SIMD32 has approximately following perf impact with Iris...

On BXT J4205 (with 2-channel memory):
* +30-35% GfxBench ALU2
* +25% SynMark PSPom
* +5-10% GfxBench Manhattan 3.0, SynMark Deferred & ShMapVsm, GpuTest Julia FP32
* +5% GfxBench CarChase & T-Rex
* +4% SynMark PSPhong
* +2-3% GfxBench Manhattan 3.1, GpuTest FurMark
...
* -5-10% SynMark TexFilterTri [1]
* -20% SynMark DrvShComp

On SKL GT2:
* +20% GfxBench ALU2
* +7-8% SynMark PSPom
* +3-4% GfxBench CarChase & Manhattan 3.0
* +2% GfxBench T-Rex, SynMark PSPhong & Deferred
...
* -3% SynMark PSBump2
* -5% SynMark DrvRes
* -10-15% SynMark TexFilterTri [1], DrvShComp

[1] These TexFilterTri drops are clear regressions compared to earlier (1-2 years ago) measured impact of SIMD32 vs. SIMD16.

(And I'm a bit surprised that shader compilation speed test didn't regress more from compiling additional SIMD32 variant.)
Comment 12 Mark Janes 2019-09-09 20:23:44 UTC
Creating a 19.3 tracker before the branch point would generate a lot of bug spam.  With that mechanism, every new regression would be tracked, even when the vast majority would soon get fixed.

We use the following mechanisms to fill out the tracker after the branch point:

 - unfixed bugs with "bisected" or "regression" keywords
 - automated tests that changed status since the last release
 - developers and end users reviewing bugs

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.