Bug 92196

Summary:	[BDW] ES3-CTS.shaders.uniform_block.random.nested_structs_arrays.9 Fails
Product:	Mesa	Reporter:	Marta Löfstedt <marta.lofstedt>
Component:	Drivers/DRI/i965	Assignee:	Intel 3D Bugs Mailing List <intel-3d-bugs>
Status:	RESOLVED MOVED	QA Contact:	Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity:	normal
Priority:	medium
Version:	git
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:

Description Marta Löfstedt 2015-09-30 14:56:36 UTC

The OpenGL ES 3.0 CTS: ES3-CTS.shaders.uniform_block.random.nested_structs_arrays.9 fails for BDW with Mesa 11.0.2 (git-51e0b06).

I have bisected the first problematic commit to:

commit 18e73bf7f8b12022e02db3230ee109657581900b
Author: Connor Abbott <cwabbott0@gmail.com>
Date:   Tue Jun 30 13:38:20 2015 -0700

    i965/fs: remove special case in setup_payload_interference()
    
    regs_read() will handle LINTERP for us since the previous commit. In
    addition, we were being too conservative, since it will only read 2
    registers on SIMD8.
    
    instructions in affected programs:     9061 -> 8893 (-1.85%)
    helped:                                10
    HURT:                                  0
    GAINED:                                0
    LOST:                                  0
    
    All of the changes were due to spills being eliminated, mostly in KSP
    shaders.
    
    Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
    Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>

With a revert of above commit the test pass. However, the revert cause regressions for other CTS tests. Also, the commit appear to be part of a cooperative patch-set from Jordan and Connor.

Potential relation to https://bugs.freedesktop.org/show_bug.cgi?id=92113 has not yet been proven.

Comment 1 Marta Löfstedt 2015-10-05 07:53:42 UTC

Please note that on newer version of the Khronos CTS tests this test has been renamed:
ES3-CTS.shaders.uniform_block.random.nested_structs_arrays.8

The Fail is reproduced on Mesa: git@102f6

Comment 2 Jason Ekstrand 2015-10-08 00:52:41 UTC

I did a little digging on this this afternoon.  I'm going to write down what I found here before I head home.  I don't know when I'll have time to poke at this again, so I want to leave notes for the next person.

I believe Connor's patch is a complete red herring; it only changes things because it causes register allocation to be subtly different on SIMD8.  Running the test, I noticed that if you run it with INTEL_DEBUG=no16, it fails and, before the window closes, it flashes yellow.  Looking at the shader it generates we see

> dEQP_FragColor = vec4(1.0, v_vtxResult, result, 1.0);

So a color of yellow means that v_vtxResult is fine, but result is 0.  This means that the failure is in one of the comparisons in the fragment shader.  In particular, the fault is not the vertex shader or nor is it in the interpolation of the vertex result.  I see two potential ways forward:

 1) Play with the dEQP test case and see if we can get a smaller failing case

 2) Step through the fragment shader in the simulator and see if we can see where the false comparison is coming from.

My first attempt would probably be number 2 but, as I said above, I don't know when I'll have time to look at it again so if someone else wants to give it a try, feel free.

Comment 3 Jason Ekstrand 2015-10-08 17:51:34 UTC

So, I did a little more investigation last night and Matt and I looked at it a bit this morning.  Here's what we found:

1) If you step through it in the simulator, the simulator produces "correct" (would pass the test) results.  I double and tripple-checked this and I can't get the simulator to give me yellow.

2) If you hack up could_coissue() in brw_fs_combine_constants.cpp, it generates the exact same code for HSW as it does for BDW but HSW passes.

3) We generate the exact same code for BSW as we do for BDW and BSW passes.

The above seems to rule out any sort of register allocation or scheduling errors.  Instead, it's almost certain that we're hitting some undocumented and unsimulated hardware bug.  Unfortunately, the shader (and hitting the bug) is very sensitive to register allocation and anything you do that changes register allocation is liable to make the shader not hit the bug.

This means that there are a pile of different things that one could do to make the shader go from pass to fail but none of them actually fix anything!  We need to be very careful when we claim that a patch fixes this bug to make sure that the patch actually does fix something and doesn't just hide it again.

Comment 4 Mark Janes 2015-11-02 17:37:33 UTC

This bug has been fixed by:

Author:     Connor Abbott <cwabbott0@gmail.com>
AuthorDate: Sat Jun 6 13:32:21 2015 -0400
Commit:     Connor Abbott <cwabbott0@gmail.com>
CommitDate: Fri Oct 30 02:19:00 2015 -0400

    i965: always run the post-RA scheduler
    
    Before, we would only do scheduling after register allocation if we
    spilled, despite the fact that the pre-RA scheduler was only supposed to
    be for register pressure and set the latencies of every instruction to
    1. This meant that unless we spilled, which we rarely do, then we never
    considered instruction latencies at all, and we usually never bothered
    to try and hide texture fetch latency. Although a later commit removes
    the setting the latency to 1 part, we still want to always run the
    post-RA scheduler since it's able to take the false dependencies that
    the register allocator creates into account, and it can be more
    aggressive than the pre-RA scheduler since it doesn't have to worry
    about register pressure at all.
    
    Test                   master      post-ra-sched     diff       %diff
    bench_OglPSBump2       396.730     402.386           5.656      +1.400%
    bench_OglPSBump8       244.370     247.591           3.221      +1.300%
    bench_OglPSPhong       241.117     242.002           0.885      +0.300%
    bench_OglPSPom         59.555      59.725            0.170      +0.200%
    bench_OglShMapPcf      86.149      102.346           16.197     +18.800%
    bench_OglVSTangent     388.849     395.489           6.640      +1.700%
    bench_trex             65.471      65.862            0.390      +0.500%
    bench_trexoff          69.562      70.150            0.588      +0.800%
    bench_heaven           25.179      25.254            0.074      +0.200%
    
    Reviewed-by: Jason Ekstrand <jasoan.ekstrand@intel.com>

Comment 5 Jason Ekstrand 2015-11-02 18:24:16 UTC

(In reply to Mark Janes from comment #4)
> This bug has been fixed by:
> 
> Author:     Connor Abbott <cwabbott0@gmail.com>
> AuthorDate: Sat Jun 6 13:32:21 2015 -0400
> Commit:     Connor Abbott <cwabbott0@gmail.com>
> CommitDate: Fri Oct 30 02:19:00 2015 -0400
> 
>     i965: always run the post-RA scheduler

Not fixed, just hidden.  It's a very obscure hardware bug and any sort of shuffling around of instructions is liable to hide it.  However, it's interesting that running post-RA scheduling changes things because that means it has nothing to do with register allocation.  I think this is more evidence of it being some sort of dependency-tracking bug in the hardware.

I'm re-opening this so we don't forget it's here.

Comment 6 GitLab Migration User 2019-09-25 18:54:46 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1496.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.