Bug 79029

Summary: INTEL_DEBUG=shader_time is full of lies
Product: Mesa Reporter: Kenneth Graunke <kenneth>
Component: Drivers/DRI/i965Assignee: Kenneth Graunke <kenneth>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium    
Version: git   
Hardware: Other   
OS: All   
i915 platform: i915 features:

Description Kenneth Graunke 2014-05-21 17:44:40 UTC
INTEL_DEBUG=shader_time reports that almost 0 time is spent in the vertex shader in a microbenchmark (OglTerrainFlyInst).  However, the FS is really simple, and the VS is pretty large.  I managed to cut roughly 50% of the VS instructions, and that roughly doubled the performance.  So clearly the VS is important.

Using performance monitoring, it looks like the VS actually takes up ~70% of the time.  So, shader_time is just full of lies.  It appears that it's undercounting the VS time in Unigine Valley as well.

I believe the timestamp register is getting reset in almost every VS, so we just count 0 most of the time.

Perhaps it can be improved?  Perhaps we need to develop something better?
Comment 1 Kenneth Graunke 2014-06-14 11:09:57 UTC

Comment 2 Kenneth Graunke 2014-06-15 23:53:26 UTC
This should pretty much be fixed by:

commit d0575d98fc595dcc17706dc73d1eb461027ca17a
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Sat Jun 14 03:53:07 2014 -0700

    i965/vec4: Fix dead code elimination for VGRFs of size > 1.
    When faced with code such as:
        mov vgrf31.0:UD, 960D
        mov vgrf31.1:UD, vgrf30.xxxx:UD
    The dead code eliminator didn't consider reg_offsets, so it decided that
    the second instruction was writing was writing to the same register as
    the first one, and eliminated the first one.  But they're actually
    different registers.
    This fixes INTEL_DEBUG=shader_time for vertex shaders.  In the above
    code, vgrf31.0 represents the offset into the shader_time buffer where
    the data should be written, and vgrf31.1 represents the actual time
    data.  With a completely undefined offset, results were...unexpected.
    I think this is probably one of the few cases (maybe only case) where we
    generate multiple MOVs to a large VGRF.  Normally, we just use them as
    texturing results; the other SEND-from-GRF uses a size 1 VGRF.
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79029
    Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: Matt Turner <mattst88@gmail.com>
    Cc: mesa-stable@lists.freedesktop.org

It's at least now giving me data in the same ballpark as the other methods, though I haven't checked if it's entirely the same.  Thankfully it was something simple.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.