|Summary:||INTEL_DEBUG=shader_time is full of lies|
|Product:||Mesa||Reporter:||Kenneth Graunke <kenneth>|
|Component:||Drivers/DRI/i965||Assignee:||Kenneth Graunke <kenneth>|
|Status:||RESOLVED FIXED||QA Contact:||Intel 3D Bugs Mailing List <intel-3d-bugs>|
|i915 platform:||i915 features:|
Description Kenneth Graunke 2014-05-21 17:44:40 UTC
INTEL_DEBUG=shader_time reports that almost 0 time is spent in the vertex shader in a microbenchmark (OglTerrainFlyInst). However, the FS is really simple, and the VS is pretty large. I managed to cut roughly 50% of the VS instructions, and that roughly doubled the performance. So clearly the VS is important. Using performance monitoring, it looks like the VS actually takes up ~70% of the time. So, shader_time is just full of lies. It appears that it's undercounting the VS time in Unigine Valley as well. I believe the timestamp register is getting reset in almost every VS, so we just count 0 most of the time. Perhaps it can be improved? Perhaps we need to develop something better?
Comment 1 Kenneth Graunke 2014-06-14 11:09:57 UTC
Comment 2 Kenneth Graunke 2014-06-15 23:53:26 UTC
This should pretty much be fixed by: commit d0575d98fc595dcc17706dc73d1eb461027ca17a Author: Kenneth Graunke <email@example.com> Date: Sat Jun 14 03:53:07 2014 -0700 i965/vec4: Fix dead code elimination for VGRFs of size > 1. When faced with code such as: mov vgrf31.0:UD, 960D mov vgrf31.1:UD, vgrf30.xxxx:UD The dead code eliminator didn't consider reg_offsets, so it decided that the second instruction was writing was writing to the same register as the first one, and eliminated the first one. But they're actually different registers. This fixes INTEL_DEBUG=shader_time for vertex shaders. In the above code, vgrf31.0 represents the offset into the shader_time buffer where the data should be written, and vgrf31.1 represents the actual time data. With a completely undefined offset, results were...unexpected. I think this is probably one of the few cases (maybe only case) where we generate multiple MOVs to a large VGRF. Normally, we just use them as texturing results; the other SEND-from-GRF uses a size 1 VGRF. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79029 Signed-off-by: Kenneth Graunke <firstname.lastname@example.org> Reviewed-by: Matt Turner <email@example.com> Cc: firstname.lastname@example.org It's at least now giving me data in the same ballpark as the other methods, though I haven't checked if it's entirely the same. Thankfully it was something simple.