Bug 79029 - INTEL_DEBUG=shader_time is full of lies
Summary: INTEL_DEBUG=shader_time is full of lies
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Kenneth Graunke
QA Contact: Intel 3D Bugs Mailing List
Depends on:
Reported: 2014-05-21 17:44 UTC by Kenneth Graunke
Modified: 2014-06-15 23:53 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Note You need to log in before you can comment on or make changes to this bug.
Description Kenneth Graunke 2014-05-21 17:44:40 UTC
INTEL_DEBUG=shader_time reports that almost 0 time is spent in the vertex shader in a microbenchmark (OglTerrainFlyInst).  However, the FS is really simple, and the VS is pretty large.  I managed to cut roughly 50% of the VS instructions, and that roughly doubled the performance.  So clearly the VS is important.

Using performance monitoring, it looks like the VS actually takes up ~70% of the time.  So, shader_time is just full of lies.  It appears that it's undercounting the VS time in Unigine Valley as well.

I believe the timestamp register is getting reset in almost every VS, so we just count 0 most of the time.

Perhaps it can be improved?  Perhaps we need to develop something better?
Comment 1 Kenneth Graunke 2014-06-14 11:09:57 UTC

Comment 2 Kenneth Graunke 2014-06-15 23:53:26 UTC
This should pretty much be fixed by:

commit d0575d98fc595dcc17706dc73d1eb461027ca17a
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Sat Jun 14 03:53:07 2014 -0700

    i965/vec4: Fix dead code elimination for VGRFs of size > 1.
    When faced with code such as:
        mov vgrf31.0:UD, 960D
        mov vgrf31.1:UD, vgrf30.xxxx:UD
    The dead code eliminator didn't consider reg_offsets, so it decided that
    the second instruction was writing was writing to the same register as
    the first one, and eliminated the first one.  But they're actually
    different registers.
    This fixes INTEL_DEBUG=shader_time for vertex shaders.  In the above
    code, vgrf31.0 represents the offset into the shader_time buffer where
    the data should be written, and vgrf31.1 represents the actual time
    data.  With a completely undefined offset, results were...unexpected.
    I think this is probably one of the few cases (maybe only case) where we
    generate multiple MOVs to a large VGRF.  Normally, we just use them as
    texturing results; the other SEND-from-GRF uses a size 1 VGRF.
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79029
    Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: Matt Turner <mattst88@gmail.com>
    Cc: mesa-stable@lists.freedesktop.org

It's at least now giving me data in the same ballpark as the other methods, though I haven't checked if it's entirely the same.  Thankfully it was something simple.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.