Bug 89457

Summary: [BSW Bisected]ogles3conform ES3-CTS.gtf.GL3Tests.shadow.shadow_execution_vert fails
Product: Mesa Reporter: lu hua <huax.lu>
Component: Drivers/DRI/i965Assignee: Kristian Høgsberg <krh>
Status: VERIFIED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: highest CC: idr, jani.saarinen
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: output

Description lu hua 2015-03-06 07:32:21 UTC
Created attachment 114071 [details]
output

System Environment:
--------------------------
Platform: BSW
Mesa:           (10.5)c32d83528189be988275c9ccbd3bbb2e48ee4362
Xserver:                (server-1.16-branch)xorg-server-1.16.4
Xf86_video_intel:               (master)2.99.917-165-gf6ba71ac194a19c80aa64f4769f76a47ebb7bd16
Cairo:          (master)70cc8f250b5669e757b4f044571ba0f71e3dea9e
Libva:          (master)f9741725839ea144e9a6a1827f74503ee39946c3
Libva_intel_driver:             (master)e8fde1cdaafb93c2b54d6092a728d099ad7cdd11

Bug detailed description:
-------------------------
It fails master branch long ago. It also fails on Mesa 10.5, but works well on 10.4 branch.

good commit: fecedb6c430e4dc2c8417989d873cc77d97fc1d6
bad commit: c32d83528189be988275c9ccbd3bbb2e48ee4362

Reproduce steps:
----------------
1. start X
2. ./glcts --deqp-case=ES3-CTS.gtf.GL3Tests.shadow.shadow_execution_vert
Comment 1 lu hua 2015-03-12 06:53:12 UTC
Bisect shows: ee5fb8d1ba7f50ed94e1a34fa0f6e15a0588145e is the first bad commit
commit ee5fb8d1ba7f50ed94e1a34fa0f6e15a0588145e
Author:     Kristian Høgsberg <krh@bitplanet.net>
AuthorDate: Mon Oct 20 23:29:41 2014 -0700
Commit:     Kristian Høgsberg <krh@bitplanet.net>
CommitDate: Wed Dec 10 12:29:27 2014 -0800

    i965: Generate vs code using scalar backend for BDW+

    With everything in place, we can now use the scalar backend compiler for
    vertex shaders on BDW+.  We make scalar vertex shaders the default on
    BDW+ but add a new vec4vs debug option to force the vec4 backend.

    No piglit regressions.

    Performance impact is minimal, I see a ~1.5 improvement on the T-Rex
    GLBenchmark case, but in general it's in the noise.  Some of our
    internal synthetic, vs bounded benchmarks show great improvement, 20%-40%
    in some cases, but real-world cases are mostly unaffected.

    Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Comment 2 lu hua 2015-03-12 06:53:28 UTC

*** This bug has been marked as a duplicate of bug 89438 ***
Comment 3 Tapani Pälli 2015-03-12 07:32:14 UTC
I don't see how this can be duplicate for bug #89438 , that bug is about performance regression, this bug is about failing functionality, therefore reopening this bug.
Comment 4 Marta Löfstedt 2015-03-24 11:49:47 UTC
The https://bugs.freedesktop.org/show_bug.cgi?id=89438 is for both BDW and BSW. 

Can it be confirmed that this bug is only for BSW?

This bug can't be reproduced on BDW; Mesa 10.6 git@2fd21d8
Comment 5 lu hua 2015-03-25 06:17:59 UTC
(In reply to Marta Löfstedt from comment #4)
> The https://bugs.freedesktop.org/show_bug.cgi?id=89438 is for both BDW and
> BSW. 
> 
> Can it be confirmed that this bug is only for BSW?
> 
> This bug can't be reproduced on BDW; Mesa 10.6 git@2fd21d8

It works well on BDW. Only happens on BSW.
Test run totals:
  Passed:        1/1 (100.00%)
  Failed:        0/1 (0.00%)
  Not supported: 0/1 (0.00%)
  Warnings:      0/1 (0.00%)
Comment 6 Ian Romanick 2015-04-09 18:40:39 UTC
There are several subtests listed in that .test file (2D, 2DArray, and Cube).  Do they all fail, or does just a subset fail?
Comment 7 Ian Romanick 2015-04-09 18:54:26 UTC
(In reply to Ian Romanick from comment #6)
> There are several subtests listed in that .test file (2D, 2DArray, and
> Cube).  Do they all fail, or does just a subset fail?

I guess I could have just read the output. :(

The really weird thing is that the values differ only in the alpha channel.  However, every path through shadow_execution_2D.vert writes 1.0 to alpha.

Someone will have to sift through the generated assembly... which should be fun since that shader has a giant switch statement.

Lu: Could you re-test with current master with INTEL_USE_NIR=yes and with INTEL_USE_NIR=no?  I'm curious whether that will change anything.
Comment 8 Ian Romanick 2015-04-09 19:03:47 UTC
Additional data point:  I cobbled together a .shader_test and generated VS assembly for PCI ID 0x162E (Broadwell) and PCI ID 0x22B0 (Cherryview), and the output was identical.

Lu: What is the PCI ID on your test system?
Comment 9 lu hua 2015-04-10 08:49:30 UTC
(In reply to Ian Romanick from comment #7)

> 
> Lu: Could you re-test with current master with INTEL_USE_NIR=yes and with
> INTEL_USE_NIR=no?  I'm curious whether that will change anything.

Ian, do you mean when build mesa with INTEL_USE_NIR=yes/no?
I build the latest master add --with-INTEL_USE_NIR=yes and --with-INTEL_USE_NIR=no, both they fail.


(In reply to Ian Romanick from comment #8)
> Additional data point:  I cobbled together a .shader_test and generated VS
> assembly for PCI ID 0x162E (Broadwell) and PCI ID 0x22B0 (Cherryview), and
> the output was identical.
> 
> Lu: What is the PCI ID on your test system?

PCI id: 0x2280 (rev 21)
Comment 10 Eero Tamminen 2015-04-10 12:31:49 UTC
(In reply to lu hua from comment #9)
> Ian, do you mean when build mesa with INTEL_USE_NIR=yes/no?
> I build the latest master add --with-INTEL_USE_NIR=yes and
> --with-INTEL_USE_NIR=no, both they fail.

You don't need to rebuild Mesa to use NIR, just set the indicated environment variable.
Comment 11 lu hua 2015-04-13 08:35:26 UTC
export INTEL_USE_NIR=yes or no, both they fail.
Comment 12 Marta Löfstedt 2015-04-13 10:08:37 UTC
I can confirm that the test fail on a BSW PCI_ID: 0x22B0 (Rev 15), both with and without INTEL_USE_NIR. 

(kernel 4.0.rc7 Mesa git: 5ddeab8a0 (Nightly build 174))

Looking at Kristians patch I noticed that setting INTEL_DEBUG=vec4vs will basically unset the new functionality and then the test will then pass.
Comment 13 Kristian Høgsberg 2015-04-14 22:11:03 UTC
It looks like we have a fix. The SIMD8 sampler_c message tries to compute the LOD, but that doesn't make sense for vertex shaders. Running this patch through piglit now:

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 06337c9..155010d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1839,6 +1839,15 @@ fs_visitor::emit_texture_gen7(ir_texture_opcode op, fs_reg dst,
       offset_value.file != BAD_FILE && offset_value.file != IMM;
    bool coordinate_done = false;
 
+   /* The sampler can only meaningfully compute LOD for fragment shader
+    * messages. For all other stages, we change the opcode to ir_txl and
+    * hardcode the LOD to 0.
+    */
+   if (stage != MESA_SHADER_FRAGMENT && op == ir_tex) {
+      op = ir_txl;
+      lod = fs_reg(0.0f);
+   }
+
    /* Set up the LOD info */
    switch (op) {
    case ir_tex:
Comment 14 Kenneth Graunke 2015-04-14 23:20:51 UTC
Hey Kristian,

Can you make the equivalent change in fs_visitor::nir_emit_texture() (brw_fs_nir.cpp) as well?  That way, we'll keep this fix with INTEL_USE_NIR=1.

Thanks :)
Comment 15 Kristian Høgsberg 2015-04-15 04:08:38 UTC
(In reply to Kenneth Graunke from comment #14)
> Hey Kristian,
> 
> Can you make the equivalent change in fs_visitor::nir_emit_texture()
> (brw_fs_nir.cpp) as well?  That way, we'll keep this fix with
> INTEL_USE_NIR=1.
> 
> Thanks :)

Oh yeah, good point. I took a look and nir uses fs_visitor::emit_texture() for emitting the texture operation so the fix is in the nir path as well.
Comment 16 Kristian Høgsberg 2015-04-16 16:28:13 UTC
Pushed to master:

commit 993a6288f72fa98932df7cdb6f64d9dd645e670d
Author: Kristian Høgsberg <krh@bitplanet.net>
Date:   Tue Apr 14 15:02:18 2015 +0000

    i965: Rewrite ir_tex to ir_txl with lod 0 for vertex shaders
    
    The ir_tex opcode turns into a sample or sample_c message, which will try to
    compute derivatives to determine the lod. This produces garbage for
    non-fragment shaders where the sample coordinates don't correspond to
    subspans.
    
    We fix this by rewriting the opcode from ir_tex to ir_txl and setting the
    lod to 0.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89457
    Cc: "10.5" <mesa-stable@lists.freedesktop.org>
    Signed-off-by: Kristian Høgsberg <kristian.h.kristensen@intel.com>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Comment 17 ye.tian 2015-04-17 03:26:16 UTC
Tested on the latest mase (8638e3ae1), this issue does not exists.
Verified it.

output:
-------------

  Pass (Pass)

DONE!

Test run totals:
  Passed:        1/1 (100.00%)
  Failed:        0/1 (0.00%)
  Not supported: 0/1 (0.00%)
  Warnings:      0/1 (0.00%)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.