92744 – [g965 Regression bisected] Performance regression and piglit assertions due to liveness analysis

Bug 92744 - [g965 Regression bisected] Performance regression and piglit assertions due to liveness analysis

Summary: [g965 Regression bisected] Performance regression and piglit assertions due t...

Status:	CLOSED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i965 (show other bugs)
Version:	git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	highest blocker
Assignee:	Connor Abbott
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-10-30 19:20 UTC by Mark Janes
Modified:	2015-11-11 16:12 UTC (History)
CC List:	3 users (show)

See Also:
i915 platform:
i915 features:

Attachments

Description Mark Janes 2015-10-30 19:20:06 UTC

Many g965 assertions were generated due to the following commit:

Author:     Connor Abbott <cwabbott0@gmail.com>
AuthorDate: Tue Jun 9 10:26:53 2015 -0700
Commit:     Connor Abbott <cwabbott0@gmail.com>
CommitDate: Fri Oct 30 02:19:43 2015 -0400

    i965/sched: use liveness analysis for computing register pressure
    
    Previously, we were using some heuristics to try and detect when a write
    was about to begin a live range, or when a read was about to end a live
    range. We never used the liveness analysis information used by the
    register allocator, though, which meant that the scheduler's and the
    allocator's ideas of when a live range began and ended were different.
    Not only did this make our estimate of the register pressure benefit of
    scheduling an instruction wrong in some cases, but it was preventing us
    from knowing the actual register pressure when scheduling each
    instruction, which we want to have in order to switch to register
    pressure scheduling only when the register pressure is too high.
    
    This commit rewrites the register pressure tracking code to use the same
    model as our register allocator currently uses. We use the results of
    liveness analysis, as well as the compute_payload_ranges() function that
    we split out in the last commit. This means that we compute live ranges
    twice on each round through the register allocator, although we could
    speed it up by only recomputing the ranges and not the live in/live out
    sets after scheduling, since we only shuffle around instructions within
    a single basic block when we schedule.
    
    Shader-db results on bdw:
    
    total instructions in shared programs: 7130187 -> 7129880 (-0.00%)
    instructions in affected programs: 1744 -> 1437 (-17.60%)
    helped: 1
    HURT: 1
    
    total cycles in shared programs: 172535126 -> 172473226 (-0.04%)
    cycles in affected programs: 11338636 -> 11276736 (-0.55%)
    helped: 876
    HURT: 873
    
    LOST:   8
    GAINED: 0
    
    v2: use regs_read() in more places.
    
    Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>


The test regressions:

shaders.zero-tex-coord bias
shaders.zero-tex-coord texture2d
spec.!opengl 2_0.gl-2.0-active-sampler-conflict
spec.arb_framebuffer_object.fbo-drawbuffers-none glblitframebuffer
spec.arb_sampler_objects.sampler-incomplete
spec.arb_shader_texture_lod.compiler.tex_grad-texture2d-2d-vec2.frag
spec.arb_shader_texture_lod.compiler.tex_grad-texture2dproj-2d-vec4.frag
spec.arb_shader_texture_lod.compiler.tex_lod-texture1d-1d-float.frag
spec.arb_shader_texture_lod.compiler.tex_lod-texture1dproj-1d-vec2.frag
spec.arb_shader_texture_lod.compiler.tex_lod-texture1dproj-1d-vec4.frag
spec.arb_shader_texture_lod.compiler.tex_lod-texture2d-2d-vec2.frag
spec.arb_shader_texture_lod.compiler.tex_lod-texture3d-3d-vec3.frag
spec.ext_texture_array.maxlayers
spec.ext_texture_swizzle.depth_texture_mode_and_swizzle
spec.glsl-1_10.compiler.constant-expressions.sampler-array-index-01.frag
spec.glsl-es-1_00.compiler.structure-and-array-operations.sampler-array-index.frag


Sample output:

arb_sampler_objects-sampler-incomplete: /mnt/space/jenkins/jobs/Leeroy/workspace/repos/mesa/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp:112: void brw::fs_live_variables::setup_one_write(brw::block_data*, fs_inst*, int, const fs_reg&): Assertion `var < num_vars' failed.

glslparsertest: /mnt/space/jenkins/jobs/Leeroy/workspace@3/repos/mesa/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp:60: void brw::fs_live_variables::setup_one_read(brw::block_data*, fs_inst*, int, const fs_reg&): Assertion `var < num_vars' failed.

Additionally, this commit generates a large performance regression in g965.  Even when running glslparsertest on a non-g965 machine (INTEL_DEVID_OVERRIDE=0x29A2), cpu tests take significantly longer.

Comment 1 Connor Abbott 2015-10-30 22:23:22 UTC

I sent a patch that should fix the assertion failures. I'm not so sure about the performance regression, though -- to be clear, do you get it on other platforms without the INTEL_DEVID_OVERRIDE? There are some things that could be improved about the performance, since like the commit message says we're calculating the livein/liveout sets unnecessarily, but if that's the reason then I'd rather we not revert this patch just because of that.

Comment 2 Mark Janes 2015-10-31 00:20:56 UTC

The big performance regression was for g965.  I noted it when I looked at one of the slower builds, where platforms other than g965 finished in 1-2 minutes but g965 finished in 15 minutes.

This behavior is not consistent.  It may depend on the machine executing the tests (seemed like the slow runs were on BDW, but I wouldn't spend time on the low quality of this information).

I'll test your patch and see if it affects the failures and the perf.

Comment 3 Ian Romanick 2015-10-31 00:45:31 UTC

I suspect it hurts the g965 machines more because they're just slower CPUs.  Since the overhead /should/ just be added in app (or test) start up time (as opposed to during rendering), I wouldn't worry about it too much.

Just to be sure, someone ought to try a couple of the CPU-bound benchmarks that we have.

Comment 4 Matt Turner 2015-11-02 00:14:59 UTC

(In reply to Ian Romanick from comment #3)
> I suspect it hurts the g965 machines more because they're just slower CPUs. 
> Since the overhead /should/ just be added in app (or test) start up time (as
> opposed to during rendering), I wouldn't worry about it too much.

It affects (affected) piglit. Piglit runtime on i965 on Jenkins went from 9 minutes to 33, noticeably slowing down overall Jenkins runtimes.

It seems to be fixed now, presumably by Connor's patch that fixed the assertion failure. Would be interesting to understand more what was going on.

I'll let Mark confirm and close the bug.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.