Many g965 assertions were generated due to the following commit:
Author: Connor Abbott <firstname.lastname@example.org>
AuthorDate: Tue Jun 9 10:26:53 2015 -0700
Commit: Connor Abbott <email@example.com>
CommitDate: Fri Oct 30 02:19:43 2015 -0400
i965/sched: use liveness analysis for computing register pressure
Previously, we were using some heuristics to try and detect when a write
was about to begin a live range, or when a read was about to end a live
range. We never used the liveness analysis information used by the
register allocator, though, which meant that the scheduler's and the
allocator's ideas of when a live range began and ended were different.
Not only did this make our estimate of the register pressure benefit of
scheduling an instruction wrong in some cases, but it was preventing us
from knowing the actual register pressure when scheduling each
instruction, which we want to have in order to switch to register
pressure scheduling only when the register pressure is too high.
This commit rewrites the register pressure tracking code to use the same
model as our register allocator currently uses. We use the results of
liveness analysis, as well as the compute_payload_ranges() function that
we split out in the last commit. This means that we compute live ranges
twice on each round through the register allocator, although we could
speed it up by only recomputing the ranges and not the live in/live out
sets after scheduling, since we only shuffle around instructions within
a single basic block when we schedule.
Shader-db results on bdw:
total instructions in shared programs: 7130187 -> 7129880 (-0.00%)
instructions in affected programs: 1744 -> 1437 (-17.60%)
total cycles in shared programs: 172535126 -> 172473226 (-0.04%)
cycles in affected programs: 11338636 -> 11276736 (-0.55%)
v2: use regs_read() in more places.
Reviewed-by: Jason Ekstrand <firstname.lastname@example.org>
The test regressions:
arb_sampler_objects-sampler-incomplete: /mnt/space/jenkins/jobs/Leeroy/workspace/repos/mesa/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp:112: void brw::fs_live_variables::setup_one_write(brw::block_data*, fs_inst*, int, const fs_reg&): Assertion `var < num_vars' failed.
glslparsertest: /mnt/space/jenkins/jobs/Leeroy/workspace@3/repos/mesa/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp:60: void brw::fs_live_variables::setup_one_read(brw::block_data*, fs_inst*, int, const fs_reg&): Assertion `var < num_vars' failed.
Additionally, this commit generates a large performance regression in g965. Even when running glslparsertest on a non-g965 machine (INTEL_DEVID_OVERRIDE=0x29A2), cpu tests take significantly longer.
I sent a patch that should fix the assertion failures. I'm not so sure about the performance regression, though -- to be clear, do you get it on other platforms without the INTEL_DEVID_OVERRIDE? There are some things that could be improved about the performance, since like the commit message says we're calculating the livein/liveout sets unnecessarily, but if that's the reason then I'd rather we not revert this patch just because of that.
The big performance regression was for g965. I noted it when I looked at one of the slower builds, where platforms other than g965 finished in 1-2 minutes but g965 finished in 15 minutes.
This behavior is not consistent. It may depend on the machine executing the tests (seemed like the slow runs were on BDW, but I wouldn't spend time on the low quality of this information).
I'll test your patch and see if it affects the failures and the perf.
I suspect it hurts the g965 machines more because they're just slower CPUs. Since the overhead /should/ just be added in app (or test) start up time (as opposed to during rendering), I wouldn't worry about it too much.
Just to be sure, someone ought to try a couple of the CPU-bound benchmarks that we have.
(In reply to Ian Romanick from comment #3)
> I suspect it hurts the g965 machines more because they're just slower CPUs.
> Since the overhead /should/ just be added in app (or test) start up time (as
> opposed to during rendering), I wouldn't worry about it too much.
It affects (affected) piglit. Piglit runtime on i965 on Jenkins went from 9 minutes to 33, noticeably slowing down overall Jenkins runtimes.
It seems to be fixed now, presumably by Connor's patch that fixed the assertion failure. Would be interesting to understand more what was going on.
I'll let Mark confirm and close the bug.