Bug 77740 - i965: Relax accumulator dependency scheduling on Gen < 6
Summary: i965: Relax accumulator dependency scheduling on Gen < 6
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Iago Toral
QA Contact: Intel 3D Bugs Mailing List
Depends on:
Blocks: i965-perf
  Show dependency treegraph
Reported: 2014-04-21 20:30 UTC by Matt Turner
Modified: 2014-05-14 05:34 UTC (History)
0 users

See Also:
i915 platform:
i915 features:

Patch (9.59 KB, patch)
2014-04-28 09:30 UTC, Iago Toral
Details | Splinter Review
Scheduling plan (before patch) (64.72 KB, text/plain)
2014-04-28 09:32 UTC, Iago Toral
Scheduling plan (after patch) (64.72 KB, text/plain)
2014-04-28 09:33 UTC, Iago Toral

Description Matt Turner 2014-04-21 20:30:45 UTC
Many instructions implicitly update the accumulator on Gen < 6. The instruction scheduling code just calls add_barrier_deps() for each accumulator access on these platforms, but a large class of operations don't actually update the accumulator -- mostly move and logical instructions.

Search the i965/GM45/ILK docs for "does not implicit" (they misspelled "implicitly" in about half of the cases...)

Teaching the scheduling code about this would allow more flexibility to schedule instructions.
Comment 1 Iago Toral 2014-04-24 08:24:55 UTC
There seems to be a 'writes_accumulator' flag that is set to TRUE for instructions that are known to write to the accumulator. The scheduling code currently checks this flag on Gen >= 6 to compute dependencies among instruction that read/write the accumulator.

For Gen < 6, there seems that for some very specific cases where this flag is set to TRUE even if the scheduling code is not going to use it. As far as I can see this is because the 'writes_accumulator' flag can also be used in the optimization passes to remove dead code (and we don't want to eliminate code that writes to the accumulator values that we intend to use).

So, I think for this task the 'writes_accumulator' flag alone is not enough: in the scheduling code we need to know if the instruction can write to the accumulator independently of whether we are expecting to use that value or not and in Gen < 6, as far as I can see, there is a bunch of operations that can write implicitly to the accumulator (mostly opcodes > 63).

I think the way to go about this would be to have:

bool instruction_backend::writes_accumulator_implicitly(int gen);

that we can call from the scheduling code to know if a particular instruction can write implicitly  to the accumulator. In Gen >= 6, it would simply return the value of 'writes_accumulator' but for Gen < 6 it would check also the opcodes.

I think with that operation in place we could unify the scheduling code paths for all Gens.

I'll write a patch for this but I will need help with the testing since I am running on Gen 7.
Comment 2 Iago Toral 2014-04-28 09:30:39 UTC
Created attachment 98117 [details] [review]

I managed to get an ironlake laptop so I could test the patch myself. I tested with a small program that implemented a simple lightning model demo and seems to work fine.

According to the debug traces in instruction_scheduler::schedule_instructions() I can see that the patch is indeed influencing instruction scheduling (I'll attach the traces).

Might need more thorough testing though, please let me know if there are specific tests that you would like me to run to verify the patch before sending it for review to the mailing list.
Comment 3 Iago Toral 2014-04-28 09:32:34 UTC
Created attachment 98122 [details]
Scheduling plan (before patch)

Scheduling plan before the patch for sample program
Comment 4 Iago Toral 2014-04-28 09:33:30 UTC
Created attachment 98123 [details]
Scheduling plan (after patch)

Scheduling plan after the patch for sample program
Comment 5 Iago Toral 2014-04-29 12:07:03 UTC
A piglit test run raises problems though so this still needs more work.
Comment 6 Iago Toral 2014-05-06 08:21:40 UTC
Fixed patch and sent it to the mailing list for review. Passes piglit tests on  ironlake and ivybridge but I could not test on other gens:


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.