Bug 92696 - [g965] Implement dependency workaround in vec4 backend
Summary: [g965] Implement dependency workaround in vec4 backend
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-10-27 19:26 UTC by Mark Janes
Modified: 2016-11-02 07:27 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Before (14.51 KB, text/plain)
2015-10-27 22:49 UTC, Ian Romanick
Details
After (14.43 KB, text/plain)
2015-10-27 22:50 UTC, Ian Romanick
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Janes 2015-10-27 19:26:25 UTC
commit 2458ea95c5676807a064f24ec720f12506975402 regressed 8 GL conformance tests on g965:

Author:     Jason Ekstrand <jason.ekstrand@intel.com>
AuthorDate: Wed Sep 9 14:40:06 2015 -0700
Commit:     Jason Ekstrand <jason.ekstrand@intel.com>
CommitDate: Tue Sep 15 12:38:07 2015 -0700

    nir/lower_vec_to_movs: Coalesce movs on-the-fly when possible
    
    The old pass blindly inserted a bunch of moves into the shader with no
    concern for whether or not it was really needed.  This adds code to try and
    coalesce into the destination of the instruction providing the value.
    
    Shader-db results for vec4 shaders on Haswell:
    
       total instructions in shared programs: 1754420 -> 1747753 (-0.38%)
       instructions in affected programs:     231230 -> 224563 (-2.88%)
       helped:                                1017
       HURT:                                  2
    
    This approach is heavily based on a different patch by Eduardo Lima Mitev
    <elima@igalia.com>.  Eduardo's patch did this in a separate pass as opposed
    to integrating it into nir_lower_vec_to_movs.
    
    Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>

Intel never submitted conformance results for g965, so I'm not sure if this bug is a blocker.

Regressions:
es2-cts.gtf.gl.cross.cross_vec3_vert_xvaryyconst
es2-cts.gtf.gl.exp.exp_float_vert_xvaryneg
es2-cts.gtf.gl.exp.exp_vec2_vert_xvaryneg
es2-cts.gtf.gl.exp2.exp2_float_vert_xvaryneg
es2-cts.gtf.gl.exp2.exp2_vec2_vert_xvaryneg
es2-cts.gtf.gl.faceforward.faceforward_vec3_vert_nvaryiconst
es2-cts.gtf.gl.reflect.reflect_vec3_vert_ivarynconst
es2-cts.gtf.gl.refract.refract_vec3_vert_ivarynconst


Sample Standard Output:

/tmp/build_root/m64/bin/cts/glcts --deqp-case=ES2-CTS.gtf.GL.exp.exp_float_vert_xvaryneg
dEQP Core GL-CTS-2.0 (0x0052484b) starting..
  target implementation = 'intel-gbm'

Test case 'ES2-CTS.gtf.GL.exp.exp_float_vert_xvaryneg'..
#+ GTF/GL/exp/exp_float_vert_xvaryneg.shader1.ppm and GTF/GL/exp/exp_float_vert_xvaryneg.shader2.ppm are different
  Fail (Fail)

DONE!

Test run totals:
  Passed:        0/1 (0.00%)
  Failed:        1/1 (100.00%)
  Not supported: 0/1 (0.00%)
  Warnings:      0/1 (0.00%)
Comment 1 Ian Romanick 2015-10-27 22:10:43 UTC
Did this only affect g965?  Since these are all vertex shader tests, it seems like it should have affected other vec4 vertex shader GPUs.  g965 and G45 did have additional limitations about the use of source modifiers with certain kinds of instructions, so it could be related to that.
Comment 2 Ian Romanick 2015-10-27 22:49:29 UTC
Created attachment 119235 [details]
Before

INTEL_NO_HW=true INTEL_DEVID_OVERRIDE=0x29A2 INTEL_DEBUG=vs mesa master ./glcts --deqp-case=ES2-CTS.gtf.GL.exp.exp_float_vert_xvaryneg > /tmp/exp_float_vert_xvaryneg-before.txt 2>&1
Comment 3 Ian Romanick 2015-10-27 22:50:00 UTC
Created attachment 119236 [details]
After

INTEL_NO_HW=true INTEL_DEVID_OVERRIDE=0x29A2 INTEL_DEBUG=vs mesa master ./glcts --deqp-case=ES2-CTS.gtf.GL.exp.exp_float_vert_xvaryneg > /tmp/exp_float_vert_xvaryneg-after.txt 2>&1
Comment 4 Mark Janes 2015-10-27 22:54:46 UTC
(In reply to Ian Romanick from comment #1)
> Did this only affect g965?  Since these are all vertex shader tests, it
> seems like it should have affected other vec4 vertex shader GPUs.  g965 and
> G45 did have additional limitations about the use of source modifiers with
> certain kinds of instructions, so it could be related to that.

The following platforms were tested:
bdw bsw byt g33 g45 g965 hsw ilk ivb skl snb

Only g965 fails these tests.

For example, g45:
http://otc-mesa-ci.jf.intel.com/job/Leeroy/229701/testReport/piglit.es2-cts.gtf.gl/exp/exp_float_vert_xvaryneg_g45m64/

The tests are reported daily, eg:
http://otc-mesa-ci.jf.intel.com/job/mesa_master_daily/1569/
Comment 5 Ian Romanick 2015-10-27 23:14:26 UTC
Each of those output files contains two shaders.  The first is the actual test, and the second generates the reference image.  That is why the tests say they pass... since it's not running on hardware, both just generate black.  Since they're the same, the test passes.

Looking at the diff (I like diff --side-by-side -W 220 exp_float_vert_xvaryneg-before.txt exp_float_vert_xvaryneg-after.txt), the only significant change I can see is the before assembly doesn't depend on any of the old values in the other components of the math result register:

mov(8)          m5<1>.yzUD      0x00000000UD                    { align16 NoDDClr };
mul(8)          g5<1>.xF        -g3<4,4,1>.xF   4.32808F        { align16 };
mov(8)          g7<1>UD         0x00000000UD                    { align16 };
mov(8)          m5<1>.wD        1065353216D                     { align16 NoDDChk };
send(8) 1       g5<1>.xF        g5<4,4,1>.xF
                            math exp mlen 1 rlen 1                          { align16 };
mov(8)          m5<1>.xD        g5<4,4,1>.xD                    { align16 };

And no other part of g5 is write before being written again.  I don't know of that actually matters.

Mark: There is a method to have the gles2conform tests dump result image files, but you have to actually have hardware to do that.  Would it be possible for you to collect just the "after" images for us?
Comment 6 Mark Janes 2015-10-27 23:24:45 UTC
I'll take a look.
Comment 7 Ian Romanick 2015-10-27 23:27:16 UTC
(In reply to Ian Romanick from comment #5)
> Mark: There is a method to have the gles2conform tests dump result image
> files, but you have to actually have hardware to do that.  Would it be
> possible for you to collect just the "after" images for us?

Never mind.

This is a known issue on g965 and 965gm.  There is no destination write hazard detection for send messages.  As a result, a sequence like

mov(8)          g5<1>.wD        1065353216D                     { align16 };
send(8) 1       g5<1>.xF        g6<4,4,1>.xF
                            math exp mlen 1 rlen 1                          { align16 };
mov(8)          m5<1>F          g5<4,4,1>F                      { align16 };

will cause the final mov to execute as soon as the first mov is complete... even though the send is not done.  The generator will just need to account for this.  This leads me to believe that this is an existing bug in the NIR vec4 backend, and Jason's change just caused these tests to tickle it.
Comment 8 Jason Ekstrand 2015-10-28 00:05:11 UTC
(In reply to Ian Romanick from comment #7)
> will cause the final mov to execute as soon as the first mov is complete...
> even though the send is not done.  The generator will just need to account
> for this.  This leads me to believe that this is an existing bug in the NIR
> vec4 backend, and Jason's change just caused these tests to tickle it.

It's actually a known bug in vec4 in general.  We have a piglit test that tickles it with the old compiler as well.  As matt said in the e-mail, we need send dependency work-arounds.  We already have them in the FS, it just needs to be ported over.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.