Bug 90887 - PhiMovesPass in register allocator broken
Summary: PhiMovesPass in register allocator broken
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/nouveau (show other bugs)
Version: git
Hardware: All All
: medium normal
Assignee: Nouveau Project
QA Contact: Nouveau Project
URL:
Whiteboard:
Keywords:
: 75776 86356 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-06-07 19:32 UTC by jr
Modified: 2015-12-07 19:56 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Candidate Fix for nv50 PhiMovesPass (4.64 KB, text/plain)
2015-06-07 19:32 UTC, jr
Details
another attempted fix (2.92 KB, patch)
2015-06-17 23:32 UTC, Ilia Mirkin
Details | Splinter Review
another attempt (3.28 KB, patch)
2015-06-28 07:51 UTC, Ilia Mirkin
Details | Splinter Review
Prototype of a more highlevel graph modification api (5.24 KB, patch)
2015-07-12 18:41 UTC, jr
Details | Splinter Review
fs-critical-edge.shader_test (386 bytes, text/plain)
2015-08-20 22:23 UTC, Ilia Mirkin
Details
Edge type fix for prototype patch (3.32 KB, patch)
2015-08-22 15:08 UTC, jr
Details | Splinter Review
Patch to simplify test case creation (1.61 KB, patch)
2015-12-07 19:50 UTC, jr
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description jr 2015-06-07 19:32:04 UTC
Created attachment 116339 [details]
Candidate Fix for nv50 PhiMovesPass

The PhiMovesPass depends on strict correspondence between phi argument and incoming flow edge order. Unfortunately it destroys this correspondence itself in the needNewElseBlock case resulting in the wrong (potentially uninitialized) registers being copied.

The attached patch fixes problems in the following games on NVA5 (Debian Jessie, wine git): Costume Quest (with FXAA option), Lifeless Planet, and Eidolon. These are not present when using the LLVMPIPE driver.

I have analyzed the situation starting with Lifeless Planet (but simplifying the shader until I could handle the problem): the postprocessing shader modifies part of the rendered image with a shader like this

 1. sample the vincinity of the current position
 2. sample the current position
 2. compute something based on all of these pixels
 3. if some condition is reached
 4.     do some more computations (including TEXL)
 5.     replace current position value sampled above by computed value
 6. return current position value or computed value (see if)

The PhiMovesPass detects needNewElseBlock conditions and splits the 'else' edge incoming from the if thereby switching the order of incoming edges leading to adding copies from the sampled value to the if branch (discarding the computed values) and copies from the (unitialized) computed values to the new else block corrupting the result for all pixels except the ones where the condition is true (where it produces the originally sampled value).

The fix slightly modifies the needNewElseBlock logic to not detach/attach the incoming edge. Costume Quest (with FXAA) and Eidolon look correct with this fix. Lifeless Planet is much better but still somewhat bad, but that may be intended (it looks similarly crappy with LLVMPIPE). The latter may also be a problem with wine).

NB: This might be related to bug #90347. The shader code given there looks like it could trigger this case, but I haven't tried it.
Comment 1 jr 2015-06-07 19:36:13 UTC
This is the minified shader I did the debugging with.

translating program of type 4
FRAG
DCL IN[0], GENERIC[9], PERSPECTIVE
DCL OUT[0], COLOR
DCL SAMP[0]
DCL TEMP[0..2], LOCAL
IMM[0] FLT32 {    0.0000,     1.9632,     0.8750,     0.0000}
  0: MOV TEMP[0].xy, IN[0].xyyy
  1: MOV TEMP[0].w, IMM[0].xxxx
  2: TXL TEMP[0].xyz, TEMP[0], SAMP[0], 2D
  3: MOV TEMP[1].xyz, TEMP[0].xyzx
  4: MAD TEMP[0].x, TEMP[0].yyyy, IMM[0].yyyy, TEMP[0].xxxx
  5: FSLT TEMP[0].x, TEMP[0].xxxx, IMM[0].zzzz
  6: UIF TEMP[0].xxxx :0
  7:   MOV TEMP[0].xy, TEMP[2].xyyy
  8:   MOV TEMP[0].w, TEMP[2].zzzz
  9:   TXL TEMP[0].xyz, TEMP[0], SAMP[0], 2D
 10:   MOV TEMP[1].xyz, TEMP[0].zyxz
 11: ENDIF
 12: MOV TEMP[0].xyz, TEMP[1].xyzx
 13: MOV TEMP[0].w, IMM[0].xxxx
 14: MOV OUT[0], TEMP[0]
 15: END
Comment 2 jr 2015-06-07 19:42:52 UTC
Forgot to add: Interestingly without the additional blocks inserted by TXL in the if block the bug is not triggered (i.e. other tex operations work fine).
Comment 3 Ilia Mirkin 2015-06-07 20:33:35 UTC
Gah! The inbound edge ordering having to be the same as in phi nodes has bit me before as well. I wonder if it wouldn't be simpler to just fix that, i.e. make phi nodes attach a bb reference to each argument.

In any case, I'll be able to take a closer look in a week or so. Great detective work btw.

FTR, nva5 should be able to allow manual reclocking on kernel 3.19+ (if you add nouveau.pstate=1), which should help decrease some of the perf difference wrt blob.

Also, bug 90347 is completely different... that findFirstUse stuff is pretty dodgy in the first place, and it happens post-RA. (And only on kepler+)
Comment 4 jr 2015-06-16 11:28:57 UTC
Thanks for reviewing and fixing the other patch! Meant to send it to the list, but was too busy elsewhere last week.

I've been thinking a little about a better way to fix this inbound edge ordering dependency. Unfortunately it seems that unless there is a way to efficiently recompute the information there always will be an invariant that has to be respected by graph transformations (and which attach and detach simply cannot guarantee on their own).

And because the PhiMoves pass actually needs the corresponding inbound edge instead of the BB the current encoding makes sense.

The conclusion being that neither the encoding of the information nor graph transformations during SSA per se are the real problem. But attach and detach are not the right tools to do graph transformations (at least during SSA). It may be an improvement to add some more high level helpers, e.g. an Edge::split for this particular problem. Would there be interest in a patch in that direction? (Seems easy enough, but I cannot promise I'm up to the task, though:-)
Comment 5 Ilia Mirkin 2015-06-17 04:44:08 UTC
(In reply to jr from comment #4)
> I've been thinking a little about a better way to fix this inbound edge
> ordering dependency. Unfortunately it seems that unless there is a way to
> efficiently recompute the information there always will be an invariant that
> has to be respected by graph transformations (and which attach and detach
> simply cannot guarantee on their own).

The way that LLVM deals with this is that the source actually contains a reference to the inbound bb as well. That way you can do whatever you want with the edges, each src of a phi node knows which bb it's coming from.

It'd be a bit of a pain to introduce here... my strategy so far has been to hold my nose and just fix things up as they come, which I guess is what you've done as well. I need to look at this logic very carefully though.

The fact that we were previously adding a FORWARD edge and aren't anymore is a bit worrying -- the meaning of those things is explained somewhere btw -- TREE == part of the MST; FORWARD/etc are various other edges.
Comment 6 Ilia Mirkin 2015-06-17 14:06:27 UTC
So... I looked at this closer and from the *looks* of it, the current code is correct. It fixes up all the phi nodes in bb, which is the block whose inbound edges got all messed up.

I will analyze your sample shader in more detail, hopefully it will become apparent to me why the current logic fails, despite *looking* right :)
Comment 7 Ilia Mirkin 2015-06-17 15:40:27 UTC
OK, so... as I suspected, the phi nodes are actually fine. I added this to the nv50 printer:

   for (Graph::EdgeIterator ei = bb->cfg.incident(); !ei.end(); ei.next())
      INFO(" <- BB:%i (%s)\n",
           BasicBlock::get(ei.getNode())->getId(),
           ei.getEdge()->typeStr());

Which makes it much easier to look at things. So right before RA we have:

 15: texlod 2D $r0 $s0 f32 { %r52 %r53 %r54 } %r45 %r47 %r49 (0)
BB:5 (7 instructions) - idom = BB:4, df = { }
 <- BB:4 (tree)
 -> BB:3 (forward)
 -> BB:2 (tree)
 16: join (0)
 17: mov u32 %r61 0x3ffb4a23 (0)
 18: mad f32 %r62 %r53 %r61 %r52 (0)
 19: mov u32 %r64 0x3f600000 (0)
 20: joinat BB:3 (0)
 21: set u8 %c68 lt f32 %r62 %r64 (0)
 22: eq %c68 bra BB:3 (0)
...
BB:9 (1 instructions) - idom = BB:2, df = { BB:3 }
 <- BB:13 (forward)
 <- BB:12 (forward)
 <- BB:11 (forward)
 <- BB:2 (forward)
 -> BB:10 (tree)
 32: texlod 2D $r0 $s0 f32 { %r76 %r77 %r78 } %r73 %r73 %r73 (0)
BB:10 (2 instructions) - idom = BB:9, df = { BB:3 }
 <- BB:9 (tree)
 -> BB:3 (forward)
 33: join ind BB:0 (0)
 34: bra BB:3 (0)
BB:3 (5 instructions) - idom = BB:5, df = { }
 <- BB:10 (forward)
 <- BB:5 (forward)
 -> BB:1 (tree)
 35: phi u32 %r88 %r78 %r52 (0)
 36: phi u32 %r89 %r77 %r53 (0)
 37: phi u32 %r90 %r76 %r54 (0)
 38: join (0)

Which is perfectly correct. Although it *is* a bit odd that BB:3 doesn't have any incoming TREE edges. Definitely a bit worrying. But let's hope that the thing doesn't ACTUALLY care about that, and that TREE is merely an optimization and it'd be just as happy with s/tree/forward everywhere.

And then after the various RA passes run:

 20: texlod 2D $r0 $s0 f32 %r107t %r108t (0)
 21: split b96 { %r52 %r53 %r54 } %r107t (0)
 22: bra BB:5 (0)
BB:5 (7 instructions) - idom = BB:4, df = { }
 <- BB:4 (tree)
 -> BB:17 (tree)
 -> BB:2 (tree)
 23: join ind BB:0 (0)
 24: mov u32 %r61 0x3ffb4a23 (0)
 25: mad f32 %r62 %r53 %r61 %r52 (0)
 26: mov u32 %r64 0x3f600000 (0)
 27: joinat BB:3 (0)
 28: set u8 %c68 lt f32 %r62 %r64 (0)
 29: eq %c68 bra BB:17 (0)
...
BB:9 (7 instructions) - idom = BB:2, df = { BB:3 }
 <- BB:20 (forward)
 <- BB:19 (forward)
 <- BB:18 (forward)
 <- BB:13 (forward)
 -> BB:10 (tree)
 42: mov u32 %r112 %r73 (0)
 43: mov u32 %r113 %r73 (0)
 44: mov u32 %r114 %r73 (0)
 45: merge b96 %r110t %r112 %r113 %r114 (0)
 46: texlod 2D $r0 $s0 f32 %r109t %r110t (0)
 47: split b96 { %r76 %r77 %r78 } %r109t (0)
 48: bra BB:10 (0)
BB:10 (5 instructions) - idom = BB:9, df = { BB:3 }
 <- BB:9 (tree)
 -> BB:3 (forward)
 49: join ind BB:0 (0)
 50: mov u32 %r118 %r52 (0)
 51: mov u32 %r119 %r53 (0)
 52: mov u32 %r120 %r54 (0)
 53: bra BB:3 (0)
BB:17 (4 instructions) - df = { }
 <- BB:5 (tree)
 -> BB:3 (forward)
 54: mov u32 %r115 %r78 (0)
 55: mov u32 %r116 %r77 (0)
 56: mov u32 %r117 %r76 (0)
 57: bra BB:3 (0)
BB:3 (6 instructions) - idom = BB:5, df = { }
 <- BB:17 (forward)
 <- BB:10 (forward)
 -> BB:1 (tree)
 58: phi u32 %r88 %r115 %r118 (0)
 59: phi u32 %r89 %r116 %r119 (0)
 60: phi u32 %r90 %r117 %r120 (0)
 61: join (0)

So..... the phi nodes are fine :) %r115 does indeed come from BB:17. HOWEVER! Note that the BB10/BB17 stuff is reversed! %r78 comes from BB:9, but the incident edge into BB:17 is BB:5. Same for BB:10. So the issue is somewhere in there.

Perhaps that's what you were saying all along and I was too dense to understand it? It picks the wrong mov source because the incident edge order is different now?

[As an aside, this whole quadop thing is *soooo* unnecessary. It's only needed for non-uniform lod/bias values, and in this case, they're clearly uniform. However this uniformity check is done pre-ssa, since doing the bb splitting post-ssa would be a huge pain. Ugh!]
Comment 8 jr 2015-06-17 18:59:38 UTC
Note that I really do have no clue of the whole system and only a very rough understanding of SSA, but I think your result is the same as mine. The targets of the newly inserted mov instructions are fine, because the PhiMoves pass inserts them corresponding to mthe new phi sources, but the sources are reversed, because the edges have been reordered (ironically by the PhiMoves pass itself).

An edge newly attached using attach will always be the first in iteration order. Therefore the problem happens as follows: when - as is the case here - the second edge fulfills the needsNewElseBlock() condition it will be split to insert the new BB for the copies. The old code accomplishes this using detach() and attach() resulting in the new BB being attached - BB17 here - by the first incoming edge and the previous first incoming edge - from BB10 in this case - is now the second, but without a corresponding reorder of the phi node sources.

AFAICT the problem is that attach() or detach() don't preserve the incoming edge/phi source correspondence. They shouldn't be used during SSA.
Comment 9 Ilia Mirkin 2015-06-17 23:32:16 UTC
Created attachment 116566 [details] [review]
another attempted fix

ok, this is my attempt at fixing this. upon a very quick perusal, it seems like it works for the sample program.

i like this a little more not only because i wrote it, but also because it avoids touching code that i barely understand and seems very fragile (including the graph code).

could you give this a shot and see if it also fixes your issues?
Comment 10 jr 2015-06-18 18:13:35 UTC
I can confirm that your patch fixes all occurences of the problem I've found so far (Lifeless Planet, Eidolon, Costume Quest with FXAA option, Two Worlds 2 sky texture).

I slightly prefer keeping the edge order intact, as this feels 'more correct' to me (which admittedly doesn't mean much) and allows to collapse the separate loops into one (which is only a micro optimization preventing a little bit of alloc/free). But since I cannot promise to have time to debug and fix problems with my patch (though I expect to be reachable by this email address for the forseeable future), it is best to choose the approach you feel most comfortable with.

I'm happy either way and very grateful that you could spend time on this issue. Thanks!
Comment 11 Ilia Mirkin 2015-06-18 19:19:59 UTC
Out of curiousity, could I trouble you to test out the trace in

https://bugs.freedesktop.org/show_bug.cgi?id=75776#c2

And see if it was this same problem (I don't have a nv50-family card plugged in and suck at rebooting)? By the way, it's really impressive that you were able to figure all of this out on your own; nouveau's a huge code-base, compilers are complicated, and the error was pretty subtle. If you're interested in further contributing to nouveau, join us at #nouveau on irc.freenode.net .
Comment 12 jr 2015-06-18 20:52:23 UTC
Thanks for pointing at the bug. I looked at it before filing this one. Cannot say now why I discarded it as a potential duplicate without testing the trace.

You'd be less impressed if I told you how long it took:-) But I'm probably better at debugging old than writing new code. And it did help that I do have some experience with compilers even if that was decades ago (when SSA was a new thing). Unfortunately after I changed jobs 7 years ago I had to drop pretty much all my free software activities due to severe lack of energy. I do miss that time, though. Feel free to mail me, if you want me to look at something.
Comment 13 Béla Gyebrószki 2015-06-22 12:24:02 UTC
I tried the patch from comment #9 and it indeed fixed the rendering issues in the games like Two Worlds 2, Trine: Enchanted Edition, Trine 2, XCOM:Enemy Unknown, Stacking (running in Wine).
The patch greatly improves image quality but doesn't fix the rendering issues completely in Sonic Generations or in The Raven: Legacy of a Master Thief, see the screenshot for comparison between unmodified and patched nouveau:
https://drive.google.com/open?id=0B-tTbLKBl-tOdldDb1pRNENnYUk

VGA compatible controller: NVIDIA Corporation G92 [GeForce GTS 250] (rev a2)
Mesa 3D 10.6-branchpoint-541-g104bff0
Comment 14 Ilia Mirkin 2015-06-28 06:17:58 UTC
OK, so among other changes, commit e43a3a66 had the following diff:

@@ -312,23 +337,26 @@ RegAlloc::PhiMovesPass::visit(BasicBlock *bb)
    Instruction *phi, *mov;
    BasicBlock *pb, *pn;
 
+   std::stack<BasicBlock *> stack;
+
    for (Graph::EdgeIterator ei = bb->cfg.incident(); !ei.end(); ei.next()) {
-      pb = pn = BasicBlock::get(ei.getNode());
+      pb = BasicBlock::get(ei.getNode());
       assert(pb);
-
-      if (needNewElseBlock(bb, pb)) {
-         pn = new BasicBlock(func);
-
-         // deletes an edge, iterator is invalid after this:
-         pb->cfg.detach(&bb->cfg);
-         pb->cfg.attach(&pn->cfg, Graph::Edge::TREE);
-         pn->cfg.attach(&bb->cfg, Graph::Edge::FORWARD); // XXX: check order !
-
-         assert(pb->getExit()->op != OP_CALL);
-         if (pb->getExit()->asFlow()->target.bb == bb)
-            pb->getExit()->asFlow()->target.bb = pn;
-         break;
-      }
+      if (needNewElseBlock(bb, pb))
+         stack.push(pb);
+   }
+   while (!stack.empty()) {
+      pb = stack.top();
+      pn = new BasicBlock(func);
+      stack.pop();
+
+      pb->cfg.detach(&bb->cfg);
+      pb->cfg.attach(&pn->cfg, Graph::Edge::TREE);
+      pn->cfg.attach(&bb->cfg, Graph::Edge::FORWARD);
+
+      assert(pb->getExit()->op != OP_CALL);
+      if (pb->getExit()->asFlow()->target.bb == bb)
+         pb->getExit()->asFlow()->target.bb = pn;
    }
 
    // insert MOVs (phi->src(j) should stem from j-th in-BB)

which the patch in attachment 116339 [details] effectively undoes. [Why was that in there? No idea. But presumably futzing with bb's affects needNewElseBlock's decisions.] So I'm still tending towards my solution. But I'm not particularly happy that it has to build up and use that hash table every time even if nothing's changed.
Comment 15 Ilia Mirkin 2015-06-28 06:23:41 UTC
*** Bug 86356 has been marked as a duplicate of this bug. ***
Comment 16 Ilia Mirkin 2015-06-28 06:25:33 UTC
*** Bug 75776 has been marked as a duplicate of this bug. ***
Comment 17 Ilia Mirkin 2015-06-28 07:51:09 UTC
Created attachment 116761 [details] [review]
another attempt

OK, I like this a bit better since it doesn't use any weird maps, and incurs minimal costs in the common case. It unfortunately encodes the graph's insertion strategy, but... meh.
Comment 18 Ilia Mirkin 2015-06-28 08:37:02 UTC
(In reply to Ilia Mirkin from comment #17)
> Created attachment 116761 [details] [review] [review]
> another attempt
> 
> OK, I like this a bit better since it doesn't use any weird maps, and incurs
> minimal costs in the common case. It unfortunately encodes the graph's
> insertion strategy, but... meh.

Ugh, actually this patch won't work. It needs to be

         phi->setSrc(j, tmp);

but then I lose the "old" phi source. So perhaps the map approach is ~required :(
Comment 19 jr 2015-06-29 21:43:49 UTC
I do prefer the map based approach over this last attempt. Keeping track of the relevant BB is quite obviously correct, but trying to keep track of edge reordering instead looks like some kind of thimbleriggery to me:-)

AFAICT the separate loop added by commit e43a3a66 was an attempt to fix the problem. Attach() and detach() influenced the iterator ei of the loop, possibly creating crashes or endless loops. These symptoms are corrected by the patch, but the real problem - reordering of the edge linked list - is still not accounted for. Your patch shows how to do that correctly.

Since my attempt doesn't change the incoming edge list, the separate loop becomes obsolete as a bonus feature. I do like that property, but I don't think it makes much of a difference in practice (since the lists in question are very short).
Comment 20 Ilia Mirkin 2015-06-29 23:06:35 UTC
(In reply to jr from comment #19)
> I do prefer the map based approach over this last attempt. Keeping track of
> the relevant BB is quite obviously correct, but trying to keep track of edge
> reordering instead looks like some kind of thimbleriggery to me:-)
> 
> AFAICT the separate loop added by commit e43a3a66 was an attempt to fix the
> problem. Attach() and detach() influenced the iterator ei of the loop,
> possibly creating crashes or endless loops. These symptoms are corrected by
> the patch, but the real problem - reordering of the edge linked list - is
> still not accounted for. Your patch shows how to do that correctly.
> 
> Since my attempt doesn't change the incoming edge list, the separate loop
> becomes obsolete as a bonus feature. I do like that property, but I don't
> think it makes much of a difference in practice (since the lists in question
> are very short).

Yeah, I'm less keen on that last approach. Esp since it basically can't work :)

There are a bunch of other situations where I'd *like* to be able to do BB rejiggers, so doing something generic is nice. I'm warming up to your patch. The only problem is that I've never been able to wrap my head around the way that the graph is stored, with the next/prev arrays which point to... who knows. Seems like a hand-rolled doubly-linked list. But just very confusing. But in the end the graph reparenting of edges may be the way to go here.
Comment 21 jr 2015-07-12 18:41:02 UTC
Created attachment 117077 [details] [review]
Prototype of a more highlevel graph modification api

After I realized that the prev/next arrays actually represent *two* linked lists (one for the source and one for the target side) and that these are circular it became easier:-)

I have attached a prototype of what I was hinting at in my first post. This is only lightly tested (but seems to work for Lifeless Planet at least) and I'm not sure that Pass is the correct place for the new method to live. It would probably also be nice if edge splitting kept both the outgoing and the incoming list of source and target resp. intact.

One improvement of this patch above my original proposal is that it produces the same edge types as the current code. My earlier patch would keep the edge type instead of setting it to FORWARD. While fixing that I was wondering whether TREE is always correct for the edge into the new BB.
Comment 22 Ilia Mirkin 2015-08-20 18:35:02 UTC
(In reply to jr from comment #21)
> Created attachment 117077 [details] [review] [review]
> Prototype of a more highlevel graph modification api
> 
> After I realized that the prev/next arrays actually represent *two* linked
> lists (one for the source and one for the target side) and that these are
> circular it became easier:-)
> 
> I have attached a prototype of what I was hinting at in my first post. This
> is only lightly tested (but seems to work for Lifeless Planet at least) and
> I'm not sure that Pass is the correct place for the new method to live. It
> would probably also be nice if edge splitting kept both the outgoing and the
> incoming list of source and target resp. intact.
> 
> One improvement of this patch above my original proposal is that it produces
> the same edge types as the current code. My earlier patch would keep the
> edge type instead of setting it to FORWARD. While fixing that I was
> wondering whether TREE is always correct for the edge into the new BB.

Hrm, I was about to (try to) push this out, but it doesn't seem to work. I made a few local adjustments, like

RegAlloc::PhiMovesPass::isCriticalEdge(BasicBlock *b, BasicBlock *p)
{
   return b->cfg.incidentCount() > 1 && p->cfg.outgoingCount() > 1;
}

instead of that needNewElseBlock thing, which seems like it's potentially wrong or at least I don't understand wtf it's counting. And more generally the logic is that you should avoid critical edges because it's unclear where to put the computation. Anyways, this makes it easy to trigger the code for me without TXL, and I get this before the RA passes:

MAIN:-1 ()
BB:0 (6 instructions) - df = { }
 -> BB:5 (tree)
 -> BB:2 (tree)
  0: ld u64 { %r32 %r34 } c0[0x10] (0)
  1: ld u64 { %r36 %r38 } c0[0x18] (0)
  2: mov u32 %r41 0x00000001 (0)
  3: joinat BB:6 (0)
  4: set u8 %p43 ne u32 %r41 c0[0x0] (0)
  5: not %p43 bra BB:5 (0)
BB:2 (4 instructions) - idom = BB:0, df = { BB:6 }
 -> BB:4 (forward)
 -> BB:3 (tree)
  6: mov u32 %r45 0x00000002 (0)
  7: joinat BB:4 (0)
  8: set u8 %p47 eq u32 %r45 c0[0x0] (0)
  9: not %p47 bra BB:4 (0)
BB:3 (2 instructions) - idom = BB:2, df = { BB:4 }
 -> BB:4 (forward)
 10: ld u64 { %r48 %r50 } c0[0x20] (0)
 11: bra BB:4 (0)
BB:4 (4 instructions) - idom = BB:2, df = { BB:6 }
 -> BB:6 (forward)
 12: phi u32 %r52 %r32 %r48 (0)
 13: phi u32 %r53 %r34 %r50 (0)
 14: join (0)
 15: bra BB:6 (0)
BB:5 (3 instructions) - idom = BB:0, df = { BB:6 }
 -> BB:6 (forward)
 16: ld u64 { %r66 %r68 } c0[0x20] (0)
 17: ld u64 { %r70 %r72 } c0[0x28] (0)
 18: bra BB:6 (0)
BB:6 (5 instructions) - idom = BB:0, df = { }
 -> BB:1 (tree)
 19: phi u32 %r54 %r52 %r66 (0)
 20: phi u32 %r55 %r53 %r68 (0)
 21: phi u32 %r56 %r36 %r70 (0)
 22: phi u32 %r57 %r38 %r72 (0)
 23: join (0)
BB:1 (5 instructions) - idom = BB:6, df = { }
 24: mov (SUBOP:1) f32 $r0 %r54 (0)
 25: mov (SUBOP:1) f32 $r1 %r55 (0)
 26: mov (SUBOP:1) f32 $r2 %r56 (0)
 27: mov (SUBOP:1) f32 $r3 %r57 (0)
 28: exit - # (0)


and then this after the RA passes:

MAIN:-1 ()
BB:0 (8 instructions) - df = { }
 -> BB:5 (tree)
 -> BB:2 (tree)
  0: ld u64 %r74d c0[0x10] (0)
  1: split u64 { %r32 %r34 } %r74d (0)
  2: ld u64 %r75d c0[0x18] (0)
  3: split u64 { %r36 %r38 } %r75d (0)
  4: mov u32 %r41 0x00000001 (0)
  5: joinat BB:6 (0)
  6: set u8 %p43 ne u32 %r41 c0[0x0] (0)
  7: not %p43 bra BB:5 (0)
BB:2 (4 instructions) - idom = BB:0, df = { BB:6 }
 -> BB:7 (tree)
 -> BB:3 (tree)
  8: mov u32 %r45 0x00000002 (0)
  9: joinat BB:4 (0)
 10: set u8 %p47 eq u32 %r45 c0[0x0] (0)
 11: not %p47 bra BB:7 (0)
BB:3 (5 instructions) - idom = BB:2, df = { BB:4 }
 -> BB:4 (forward)
 12: ld u64 %r76d c0[0x20] (0)
 13: split u64 { %r48 %r50 } %r76d (0)
 14: mov u32 %r89 %r48 (0)
 15: mov u32 %r90 %r50 (0)
 16: bra BB:4 (0)
BB:7 (3 instructions) - df = { }
 -> BB:4 (cross)
 17: mov u32 %r87 %r32 (0)
 18: mov u32 %r88 %r34 (0)
 19: bra BB:4 (0)
BB:5 (9 instructions) - idom = BB:0, df = { BB:6 }
 -> BB:6 (forward)
 20: ld u64 %r77d c0[0x20] (0)
 21: split u64 { %r66 %r68 } %r77d (0)
 22: ld u64 %r78d c0[0x28] (0)
 23: split u64 { %r70 %r72 } %r78d (0)
 24: mov u32 %r83 %r66 (0)
 25: mov u32 %r84 %r68 (0)
 26: mov u32 %r85 %r70 (0)
 27: mov u32 %r86 %r72 (0)
 28: bra BB:6 (0)

where did BB:4 go? poof. not great :( this is with piglit's tests/shaders/ssa/fs-critical-edge.shader_test

I guess I'll go with either your or my first patches. Want to get this fixed for Mesa 11, which is going to get branched off some time tomorrow.
Comment 23 jr 2015-08-20 20:20:19 UTC
(In reply to Ilia Mirkin from comment #22)
> (In reply to jr from comment #21)
> > Created attachment 117077 [details] [review] [review] [review]
> > Prototype of a more highlevel graph modification api
> > 
> > ...
> 
> Hrm, I was about to (try to) push this out, but it doesn't seem to work. I
> made a few local adjustments, like
> 
> RegAlloc::PhiMovesPass::isCriticalEdge(BasicBlock *b, BasicBlock *p)
> {
>    return b->cfg.incidentCount() > 1 && p->cfg.outgoingCount() > 1;
> }
> 
> ...
> 
> 
> where did BB:4 go? poof. not great :( this is with piglit's
> tests/shaders/ssa/fs-critical-edge.shader_test
> 
> I guess I'll go with either your or my first patches. Want to get this fixed
> for Mesa 11, which is going to get branched off some time tomorrow.

Sorry, most probably my prototype code is crap. I'll have an hour now to look at it. I'll let you know if I find something. Otherwise I'm fine with your first patch.
Comment 24 Ilia Mirkin 2015-08-20 20:21:57 UTC
(In reply to jr from comment #23)
> (In reply to Ilia Mirkin from comment #22)
> > (In reply to jr from comment #21)
> > > Created attachment 117077 [details] [review] [review] [review] [review]
> > > Prototype of a more highlevel graph modification api
> > > 
> > > ...
> > 
> > Hrm, I was about to (try to) push this out, but it doesn't seem to work. I
> > made a few local adjustments, like
> > 
> > RegAlloc::PhiMovesPass::isCriticalEdge(BasicBlock *b, BasicBlock *p)
> > {
> >    return b->cfg.incidentCount() > 1 && p->cfg.outgoingCount() > 1;
> > }
> > 
> > ...
> > 
> > 
> > where did BB:4 go? poof. not great :( this is with piglit's
> > tests/shaders/ssa/fs-critical-edge.shader_test
> > 
> > I guess I'll go with either your or my first patches. Want to get this fixed
> > for Mesa 11, which is going to get branched off some time tomorrow.
> 
> Sorry, most probably my prototype code is crap. I'll have an hour now to
> look at it. I'll let you know if I find something. Otherwise I'm fine with
> your first patch.

Awesome! While I don't know whether the isCriticalEdge thing is the right thing to do instead of the current needNewElseBlock logic, I do think that splitEdge should be able to work properly even if needNewElseBlock always returns true.
Comment 25 jr 2015-08-20 22:16:24 UTC
(In reply to Ilia Mirkin from comment #24)
> (In reply to jr from comment #23)
> > (In reply to Ilia Mirkin from comment #22)
> > > (In reply to jr from comment #21)
> > > > Created attachment 117077 [details] [review] [review] [review] [review] [review]
> > > > Prototype of a more highlevel graph modification api
> > > > 
> > > > ...
> > > 
> > > Hrm, I was about to (try to) push this out, but it doesn't seem to work. I
> > > made a few local adjustments, like
> > > 
> > > RegAlloc::PhiMovesPass::isCriticalEdge(BasicBlock *b, BasicBlock *p)
> > > {
> > >    return b->cfg.incidentCount() > 1 && p->cfg.outgoingCount() > 1;
> > > }
> > > 
> > > ...
> > > 
> > > 
> > > where did BB:4 go? poof. not great :( this is with piglit's
> > > tests/shaders/ssa/fs-critical-edge.shader_test
> > > 
> > > I guess I'll go with either your or my first patches. Want to get this fixed
> > > for Mesa 11, which is going to get branched off some time tomorrow.
> > 
> > Sorry, most probably my prototype code is crap. I'll have an hour now to
> > look at it. I'll let you know if I find something. Otherwise I'm fine with
> > your first patch.
> 
> Awesome! While I don't know whether the isCriticalEdge thing is the right
> thing to do instead of the current needNewElseBlock logic, I do think that
> splitEdge should be able to work properly even if needNewElseBlock always
> returns true.

Yes, it should. My usual test cases plus the patch below look fine. My piglit (from git://anongit.freedesktop.org/piglit) doesn't have a fs-critical-edge test, though. Where do I find it?

That prototype patch was more of a RFC anyway to show how graph manipulation could be made less intimidating. If you like the direction I will work some more on it. But I think it is safe to go with your patch for now.


--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
@@ -335,15 +335,7 @@ RegAlloc::BuildIntervalsPass::addLiveRange(Value *val,
 bool
 RegAlloc::PhiMovesPass::needNewElseBlock(BasicBlock *b, BasicBlock *p)
 {
-   if (b->cfg.incidentCount() <= 1)
-      return false;
-
-   int n = 0;
-   for (Graph::EdgeIterator ei = p->cfg.outgoing(); !ei.end(); ei.next())
-      if (ei.getType() == Graph::Edge::TREE ||
-          ei.getType() == Graph::Edge::FORWARD)
-         ++n;
-   return (n == 2);
+   return (b->cfg.incidentCount() > 1) && (p->cfg.outgoingCount() > 1);
 }

 // For each operand of each PHI in b, generate a new value by inserting a MOV
Comment 26 Ilia Mirkin 2015-08-20 22:23:16 UTC
Created attachment 117832 [details]
fs-critical-edge.shader_test

My bad, apparently I wrote it and forgot all about it.

I don't actually know if it produces a critical edge, but I think it does.
Comment 27 Ilia Mirkin 2015-08-20 22:45:57 UTC
If you're still having trouble, try building for a fermi or kepler target using nouveau_compiler, i.e.

src/gallium/drivers/nouveau/nouveau_compiler -a c0 -
<paste the tgsi shader, ^D>

and enjoy. [It also can take it from a file.]
Comment 28 jr 2015-08-20 23:34:59 UTC
(In reply to Ilia Mirkin from comment #27)
> If you're still having trouble, try building for a fermi or kepler target
> using nouveau_compiler, i.e.
> 
> src/gallium/drivers/nouveau/nouveau_compiler -a c0 -
> <paste the tgsi shader, ^D>
> 
> and enjoy. [It also can take it from a file.]

Nice, thanks for the hint. But it produces different code to your log (without any ld u64) and BB4 doesn't vanish. It seems to trigger edge splitting once on the 2 to 4 edge:

Compiling for NVC0
translating program of type 4
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL OUT[0], COLOR
DCL CONST[0..2]
DCL TEMP[0..1], LOCAL
IMM[0] INT32 {1, 2, 0, 0}
  0: MOV TEMP[0], CONST[1]
  1: USNE TEMP[1].x, CONST[0].xxxx, IMM[0].xxxx
  2: UIF TEMP[1].xxxx :0
  3:   USEQ TEMP[1].x, CONST[0].xxxx, IMM[0].yyyy
  4:   UIF TEMP[1].xxxx :0
  5:     MOV TEMP[0].xy, CONST[2].xyxx
  6:   ENDIF
  7: ELSE :0
  8:   MOV TEMP[0], CONST[2]
  9: ENDIF
 10: MOV OUT[0], TEMP[0]
 11: END

MAIN:-1 ()
---
BB:0 (15 instructions) - df = { }
 -> BB:5 (tree)
 -> BB:2 (tree)
  0: rdsv f32 %r0 sv[POSITION:3] (0)
  1: rcp f32 %r0 %r0 (0)
  2: ld u32 %r5 c0[0x10] (0)
  3: mov f32 %r1 %r5 (0)
  4: ld u32 %r6 c0[0x14] (0)
  5: mov f32 %r2 %r6 (0)
  6: ld u32 %r7 c0[0x18] (0)
  7: mov f32 %r3 %r7 (0)
  8: ld u32 %r8 c0[0x1c] (0)
  9: mov f32 %r4 %r8 (0)
 10: ld u32 %r10 c0[0x0] (0)
 11: mov u32 %r11 0x00000001 (0)
 12: set u32 %r9 ne %r10 %r11 (0)
 13: joinat BB:6 (0)
 14: eq %r9 bra BB:5 (0)
---
 <- BB:0 (tree)
BB:2 (5 instructions) - df = { }
 -> BB:4 (forward)
 -> BB:3 (tree)
 15: ld u32 %r12 c0[0x0] (0)
 16: mov u32 %r13 0x00000002 (0)
 17: set u32 %r9 eq %r12 %r13 (0)
 18: joinat BB:4 (0)
 19: eq %r9 bra BB:4 (0)
---
 <- BB:2 (tree)
BB:3 (5 instructions) - df = { }
 -> BB:4 (forward)
 20: ld u32 %r14 c0[0x20] (0)
 21: mov f32 %r1 %r14 (0)
 22: ld u32 %r15 c0[0x24] (0)
 23: mov f32 %r2 %r15 (0)
 24: bra BB:4 (0)
---
 <- BB:2 (forward)
 <- BB:3 (forward)
BB:4 (2 instructions) - df = { }
 -> BB:6 (forward)
 25: join (0)
 26: bra BB:6 (0)
---
 <- BB:0 (tree)
BB:5 (9 instructions) - df = { }
 -> BB:6 (forward)
 27: ld u32 %r16 c0[0x20] (0)
 28: mov f32 %r1 %r16 (0)
 29: ld u32 %r17 c0[0x24] (0)
 30: mov f32 %r2 %r17 (0)
 31: ld u32 %r18 c0[0x28] (0)
 32: mov f32 %r3 %r18 (0)
 33: ld u32 %r19 c0[0x2c] (0)
 34: mov f32 %r4 %r19 (0)
 35: bra BB:6 (0)
---
 <- BB:4 (forward)
 <- BB:5 (forward)
BB:6 (5 instructions) - df = { }
 -> BB:1 (tree)
 36: join (0)
 37: mov f32 %r20 %r1 (0)
 38: mov f32 %r21 %r2 (0)
 39: mov f32 %r22 %r3 (0)
 40: mov f32 %r23 %r4 (0)
---
 <- BB:6 (tree)
BB:1 (5 instructions) - df = { }
 41: export f32 # o[0x0] %r20 (0)
 42: export f32 # o[0x4] %r21 (0)
 43: export f32 # o[0x8] %r22 (0)
 44: export f32 # o[0xc] %r23 (0)
 45: exit - # (0)

MAIN:-1 ()
---
BB:0 (16 instructions) - df = { }
 -> BB:5 (tree)
 -> BB:2 (tree)
  0: linterp pass f32 %r30 a[0x7c] (0)
  1: rcp f32 %r31 %r30 (0)
  2: ld u32 %r32 c0[0x10] (0)
  3: mov f32 %r33 %r32 (0)
  4: ld u32 %r34 c0[0x14] (0)
  5: mov f32 %r35 %r34 (0)
  6: ld u32 %r36 c0[0x18] (0)
  7: mov f32 %r37 %r36 (0)
  8: ld u32 %r38 c0[0x1c] (0)
  9: mov f32 %r39 %r38 (0)
 10: ld u32 %r40 c0[0x0] (0)
 11: mov u32 %r41 0x00000001 (0)
 12: set u32 %r42 ne %r40 %r41 (0)
 13: joinat BB:6 (0)
 14: set u8 %p43 neu u32 0x00000000 %r42 (0)
 15: not %p43 bra BB:5 (0)
---
 <- BB:0 (tree)
BB:2 (6 instructions) - idom = BB:0, df = { BB:6 }
 -> BB:4 (forward)
 -> BB:3 (tree)
 16: ld u32 %r44 c0[0x0] (0)
 17: mov u32 %r45 0x00000002 (0)
 18: set u32 %r46 eq %r44 %r45 (0)
 19: joinat BB:4 (0)
 20: set u8 %p47 neu u32 0x00000000 %r46 (0)
 21: not %p47 bra BB:4 (0)
---
 <- BB:2 (tree)
BB:3 (5 instructions) - idom = BB:2, df = { BB:4 }
 -> BB:4 (forward)
 22: ld u32 %r48 c0[0x20] (0)
 23: mov f32 %r49 %r48 (0)
 24: ld u32 %r50 c0[0x24] (0)
 25: mov f32 %r51 %r50 (0)
 26: bra BB:4 (0)
---
 <- BB:2 (forward)
 <- BB:3 (forward)
BB:4 (4 instructions) - idom = BB:2, df = { BB:6 }
 -> BB:6 (forward)
 27: phi u32 %r52 %r33 %r49 (0)
 28: phi u32 %r53 %r35 %r51 (0)
 29: join (0)
 30: bra BB:6 (0)
---
 <- BB:0 (tree)
BB:5 (9 instructions) - idom = BB:0, df = { BB:6 }
 -> BB:6 (forward)
 31: ld u32 %r66 c0[0x20] (0)
 32: mov f32 %r67 %r66 (0)
 33: ld u32 %r68 c0[0x24] (0)
 34: mov f32 %r69 %r68 (0)
 35: ld u32 %r70 c0[0x28] (0)
 36: mov f32 %r71 %r70 (0)
 37: ld u32 %r72 c0[0x2c] (0)
 38: mov f32 %r73 %r72 (0)
 39: bra BB:6 (0)
---
 <- BB:4 (forward)
 <- BB:5 (forward)
BB:6 (9 instructions) - idom = BB:0, df = { }
 -> BB:1 (tree)
 40: phi u32 %r54 %r52 %r67 (0)
 41: phi u32 %r55 %r53 %r69 (0)
 42: phi u32 %r56 %r37 %r71 (0)
 43: phi u32 %r57 %r39 %r73 (0)
 44: join (0)
 45: mov f32 %r58 %r54 (0)
 46: mov f32 %r59 %r55 (0)
 47: mov f32 %r60 %r56 (0)
 48: mov f32 %r61 %r57 (0)
---
 <- BB:6 (tree)
BB:1 (5 instructions) - idom = BB:6, df = { }
 49: mov (SUBOP:1) f32 $r0 %r58 (0)
 50: mov (SUBOP:1) f32 $r1 %r59 (0)
 51: mov (SUBOP:1) f32 $r2 %r60 (0)
 52: mov (SUBOP:1) f32 $r3 %r61 (0)
 53: exit - # (0)
PEEPHOLE: DeadCodeElim

MAIN:-1 ()
---
BB:0 (14 instructions) - df = { }
 -> BB:5 (tree)
 -> BB:2 (tree)
  0: ld u32 %r32 c0[0x10] (0)
  1: mov f32 %r33 %r32 (0)
  2: ld u32 %r34 c0[0x14] (0)
  3: mov f32 %r35 %r34 (0)
  4: ld u32 %r36 c0[0x18] (0)
  5: mov f32 %r37 %r36 (0)
  6: ld u32 %r38 c0[0x1c] (0)
  7: mov f32 %r39 %r38 (0)
  8: ld u32 %r40 c0[0x0] (0)
  9: mov u32 %r41 0x00000001 (0)
 10: set u32 %r42 ne %r40 %r41 (0)
 11: joinat BB:6 (0)
 12: set u8 %p43 neu u32 0x00000000 %r42 (0)
 13: not %p43 bra BB:5 (0)
---
 <- BB:0 (tree)
BB:2 (6 instructions) - idom = BB:0, df = { BB:6 }
 -> BB:4 (forward)
 -> BB:3 (tree)
 14: ld u32 %r44 c0[0x0] (0)
 15: mov u32 %r45 0x00000002 (0)
 16: set u32 %r46 eq %r44 %r45 (0)
 17: joinat BB:4 (0)
 18: set u8 %p47 neu u32 0x00000000 %r46 (0)
 19: not %p47 bra BB:4 (0)
---
 <- BB:2 (tree)
BB:3 (5 instructions) - idom = BB:2, df = { BB:4 }
 -> BB:4 (forward)
 20: ld u32 %r48 c0[0x20] (0)
 21: mov f32 %r49 %r48 (0)
 22: ld u32 %r50 c0[0x24] (0)
 23: mov f32 %r51 %r50 (0)
 24: bra BB:4 (0)
---
 <- BB:2 (forward)
 <- BB:3 (forward)
BB:4 (4 instructions) - idom = BB:2, df = { BB:6 }
 -> BB:6 (forward)
 25: phi u32 %r52 %r33 %r49 (0)
 26: phi u32 %r53 %r35 %r51 (0)
 27: join (0)
 28: bra BB:6 (0)
---
 <- BB:0 (tree)
BB:5 (9 instructions) - idom = BB:0, df = { BB:6 }
 -> BB:6 (forward)
 29: ld u32 %r66 c0[0x20] (0)
 30: mov f32 %r67 %r66 (0)
 31: ld u32 %r68 c0[0x24] (0)
 32: mov f32 %r69 %r68 (0)
 33: ld u32 %r70 c0[0x28] (0)
 34: mov f32 %r71 %r70 (0)
 35: ld u32 %r72 c0[0x2c] (0)
 36: mov f32 %r73 %r72 (0)
 37: bra BB:6 (0)
---
 <- BB:4 (forward)
 <- BB:5 (forward)
BB:6 (9 instructions) - idom = BB:0, df = { }
 -> BB:1 (tree)
 38: phi u32 %r54 %r52 %r67 (0)
 39: phi u32 %r55 %r53 %r69 (0)
 40: phi u32 %r56 %r37 %r71 (0)
 41: phi u32 %r57 %r39 %r73 (0)
 42: join (0)
 43: mov f32 %r58 %r54 (0)
 44: mov f32 %r59 %r55 (0)
 45: mov f32 %r60 %r56 (0)
 46: mov f32 %r61 %r57 (0)
---
 <- BB:6 (tree)
BB:1 (5 instructions) - idom = BB:6, df = { }
 47: mov (SUBOP:1) f32 $r0 %r58 (0)
 48: mov (SUBOP:1) f32 $r1 %r59 (0)
 49: mov (SUBOP:1) f32 $r2 %r60 (0)
 50: mov (SUBOP:1) f32 $r3 %r61 (0)
 51: exit - # (0)
registerAllocation

MAIN:-1 ()
---
BB:0 (10 instructions) - df = { }
 -> BB:5 (tree)
 -> BB:2 (tree)
  0: ld u32 $r0 c0[0x10] (8)
  1: ld u32 $r1 c0[0x14] (8)
  2: ld u32 $r2 c0[0x18] (8)
  3: ld u32 $r3 c0[0x1c] (8)
  4: ld u32 $r4 c0[0x0] (8)
  5: mov u32 $r5 0x00000001 (8)
  6: set u32 $r4 ne $r4 $r5 (8)
  7: joinat BB:6 (8)
  8: set u8 $p0 neu u32 $r63 $r4 (8)
  9: not $p0 bra BB:5 (8)
---
 <- BB:0 (tree)
BB:2 (6 instructions) - idom = BB:0, df = { BB:6 }
 -> BB:7 (tree)
 -> BB:3 (tree)
 10: ld u32 $r4 c0[0x0] (8)
 11: mov u32 $r5 0x00000002 (8)
 12: set u32 $r4 eq $r4 $r5 (8)
 13: joinat BB:4 (8)
 14: set u8 $p0 neu u32 $r63 $r4 (8)
 15: not $p0 bra BB:7 (8)
---
 <- BB:2 (tree)
BB:3 (3 instructions) - idom = BB:2, df = { BB:4 }
 -> BB:4 (forward)
 16: ld u32 $r0 c0[0x20] (8)
 17: ld u32 $r1 c0[0x24] (8)
 18: join BB:4 (8)
---
 <- BB:2 (tree)
BB:7 (1 instructions) - df = { }
 -> BB:4 (forward)
 19: join BB:4 (8)
---
 <- BB:7 (forward)
 <- BB:3 (forward)
BB:4 (1 instructions) - idom = BB:2, df = { BB:6 }
 -> BB:6 (forward)
 20: join BB:6 (8)
---
 <- BB:0 (tree)
BB:5 (5 instructions) - idom = BB:0, df = { BB:6 }
 -> BB:6 (forward)
 21: ld u32 $r0 c0[0x20] (8)
 22: ld u32 $r1 c0[0x24] (8)
 23: ld u32 $r2 c0[0x28] (8)
 24: ld u32 $r3 c0[0x2c] (8)
 25: join BB:6 (8)
---
 <- BB:4 (forward)
 <- BB:5 (forward)
BB:6 (0 instructions) - idom = BB:0, df = { }
 -> BB:1 (tree)
---
 <- BB:6 (tree)
BB:1 (1 instructions) - idom = BB:6, df = { }
 26: exit - # (8)
nv50_ir_generate_code: ret = 0
program binary (216 bytes)
40001de4 28004000 50005de4 28004000 60009de4 28004000 7000dde4 28004000 
00011de4 28004000 04015de2 18000000 14411c03 128e0000 40000007 60000002 
13f1dc03 1e8e0000 600021e7 40000001 00011de4 28004000 08015de2 18000000 
14411c03 110e0000 c0000007 60000000 13f1dc03 1e8e0000 600021e7 40000000 
80001de4 28004000 90005de4 28004000 00001df4 40000000 00001df4 40000000 
00001df4 40000000 80001de4 28004000 90005de4 28004000 a0009de4 28004000 
b000dde4 28004000 00001df4 40000000 00001de7 80000000 


Unfortunately I have to go now. Am available tomorrow at the around the same time.
Comment 29 Ilia Mirkin 2015-08-21 00:03:07 UTC
(In reply to jr from comment #28)
> (In reply to Ilia Mirkin from comment #27)
> > If you're still having trouble, try building for a fermi or kepler target
> > using nouveau_compiler, i.e.
> > 
> > src/gallium/drivers/nouveau/nouveau_compiler -a c0 -
> > <paste the tgsi shader, ^D>
> > 
> > and enjoy. [It also can take it from a file.]
> 
> Nice, thanks for the hint. But it produces different code to your log
> (without any ld u64) and BB4 doesn't vanish. It seems to trigger edge
> splitting once on the 2 to 4 edge:

Hmmmm... maybe it was one of my local patches? I'll try to figure out what went wrong later tonight.

I added logic to be clever about edge types... when splitting an edge, the type should remain except a forward edge becomes a cross edge. (You can do it out on paper...) Perhaps that upset things? It doesn't seem like codegen is using those terms in the usual MST meanings :(
Comment 30 jr 2015-08-21 20:34:46 UTC
(In reply to Ilia Mirkin from comment #29)
> Hmmmm... maybe it was one of my local patches? I'll try to figure out what
> went wrong later tonight.
> 
> I added logic to be clever about edge types... when splitting an edge, the
> type should remain except a forward edge becomes a cross edge. (You can do
> it out on paper...) Perhaps that upset things? It doesn't seem like codegen
> is using those terms in the usual MST meanings :(

I did take a look at the edge classification. Cannot say I fully understand the implications yet, but I'm wondering whether the logic in Graph::classifyDFS is correct. Shouldn't the condition for FORWARD edge whenn looping over incoming edges (the second loop) be reversed?
Comment 31 Ilia Mirkin 2015-08-21 20:39:01 UTC
(In reply to jr from comment #30)
> (In reply to Ilia Mirkin from comment #29)
> > Hmmmm... maybe it was one of my local patches? I'll try to figure out what
> > went wrong later tonight.
> > 
> > I added logic to be clever about edge types... when splitting an edge, the
> > type should remain except a forward edge becomes a cross edge. (You can do
> > it out on paper...) Perhaps that upset things? It doesn't seem like codegen
> > is using those terms in the usual MST meanings :(
> 
> I did take a look at the edge classification. Cannot say I fully understand
> the implications yet, but I'm wondering whether the logic in
> Graph::classifyDFS is correct. Shouldn't the condition for FORWARD edge
> whenn looping over incoming edges (the second loop) be reversed?

It's definitely *not* correct. On the bright side, it's never used.

I tried using it when I was implementing a SPIR -> NV50 IR converter, and it was a pile of fail.

Unfortunately I didn't have time last night to investigate this. But I'll definitely get *some* fix into mesa 11. Sorry this has dragged on for so long.

Really if I could understand wtf the needNewElseBlock logic was trying to do, and could construct a test shader to hit this in *regular* scenarios, not just the lowered output of TXL, that would make me a lot more comfortable with any approach that we pick.

But really I'm just going to make it return true always, and play around with stuff.
Comment 32 jr 2015-08-21 21:53:14 UTC
(In reply to Ilia Mirkin from comment #31)
> Really if I could understand wtf the needNewElseBlock logic was trying to
> do, and could construct a test shader to hit this in *regular* scenarios,
> not just the lowered output of TXL, that would make me a lot more
> comfortable with any approach that we pick.

Looking at the TGSI->IR translation I'd guess that needNewElseBlock is trying to detect the edge from the 'bare' unconditional jump in an if without else, seemingly because adding the new moves is not allowed (though I'm not sure why). At least it seems to be the only construct creating a graph satisfying the condition, AFAICT.
Comment 33 Ilia Mirkin 2015-08-21 21:56:33 UTC
(In reply to jr from comment #32)
> (In reply to Ilia Mirkin from comment #31)
> > Really if I could understand wtf the needNewElseBlock logic was trying to
> > do, and could construct a test shader to hit this in *regular* scenarios,
> > not just the lowered output of TXL, that would make me a lot more
> > comfortable with any approach that we pick.
> 
> Looking at the TGSI->IR translation I'd guess that needNewElseBlock is
> trying to detect the edge from the 'bare' unconditional jump in an if
> without else, seemingly because adding the new moves is not allowed (though
> I'm not sure why). At least it seems to be the only construct creating a
> graph satisfying the condition, AFAICT.

Well, in general you can't add work if you have a critical edge -- if an block has multiple outgoing edges, then you can't put the work into that block. If a block has multiple incoming edges, then you can't leave the work in that block, so you have to create a separate block to put work that you want done on that specific block transition.

However here it's looking for something much more specific.
Comment 34 jr 2015-08-21 22:24:01 UTC
(In reply to Ilia Mirkin from comment #33)
> (In reply to jr from comment #32)
> > (In reply to Ilia Mirkin from comment #31)
> > > Really if I could understand wtf the needNewElseBlock logic was trying to
> > > do, and could construct a test shader to hit this in *regular* scenarios,
> > > not just the lowered output of TXL, that would make me a lot more
> > > comfortable with any approach that we pick.
> > 
> > Looking at the TGSI->IR translation I'd guess that needNewElseBlock is
> > trying to detect the edge from the 'bare' unconditional jump in an if
> > without else, seemingly because adding the new moves is not allowed (though
> > I'm not sure why). At least it seems to be the only construct creating a
> > graph satisfying the condition, AFAICT.
> 
> Well, in general you can't add work if you have a critical edge -- if an
> block has multiple outgoing edges, then you can't put the work into that
> block. If a block has multiple incoming edges, then you can't leave the work
> in that block, so you have to create a separate block to put work that you
> want done on that specific block transition.
> 
> However here it's looking for something much more specific.

Ok, that makes sense. It might be that the original author wasn't thinking about the problem in this terms and just wanted to fix the single problematic case at hand, i.e. if without else. Then splitting any critical edge would be the right solution, though in practice I don't think it'd trigger more often.
Comment 35 jr 2015-08-22 14:16:33 UTC
(In reply to jr from comment #34)
> (In reply to Ilia Mirkin from comment #33)
> > (In reply to jr from comment #32)
> > > (In reply to Ilia Mirkin from comment #31)
> > > > Really if I could understand wtf the needNewElseBlock logic was trying to
> > > > do, and could construct a test shader to hit this in *regular* scenarios,
> > > > not just the lowered output of TXL, that would make me a lot more
> > > > comfortable with any approach that we pick.
> > > 
> > > Looking at the TGSI->IR translation I'd guess that needNewElseBlock is
> > > trying to detect the edge from the 'bare' unconditional jump in an if
> > > without else, seemingly because adding the new moves is not allowed (though
> > > I'm not sure why). At least it seems to be the only construct creating a
> > > graph satisfying the condition, AFAICT.
> > 
> > Well, in general you can't add work if you have a critical edge -- if an
> > block has multiple outgoing edges, then you can't put the work into that
> > block. If a block has multiple incoming edges, then you can't leave the work
> > in that block, so you have to create a separate block to put work that you
> > want done on that specific block transition.
> > 
> > However here it's looking for something much more specific.
> 
> Ok, that makes sense. It might be that the original author wasn't thinking
> about the problem in this terms and just wanted to fix the single
> problematic case at hand, i.e. if without else. Then splitting any critical
> edge would be the right solution, though in practice I don't think it'd
> trigger more often.

After looking at it some more I'm convinced that this is what the original intent was. It also explains why the tree types of the new edges were hardcoded. Because the (critical) edge being split was always a forward edge.

I may also have found your problem with the missing base blocks. If you look at the log of the critical-edge test (it's easier to see in my version because it has dumping of incoming edges enabled) you'll see that BB 4 and 6 have only forward incoming edges and have been pruned (together with following blocks). This is not directly related to the PhiMovesProblem, but is a result of the TGSI->IR translation. One of the two edges inserted by TGSI_OPCODE_ENDIF should be TREE. I don't know which one to choose. If we choose the condBB one than the assumption of the PhiMovesPass that the split esge is always type FORWARD will be wrong and the resulting graph will be wrong. I'll modify my prototype patch in a second to take this into account correctly (and remove the type hardcoding, but differently than I thought, it is the second edge that has to keep the type), so that we can choose either way.
Comment 36 jr 2015-08-22 15:08:00 UTC
Created attachment 117865 [details] [review]
Edge type fix for prototype patch

With this patch on top of the prototype patch edge splitting during PhiMovesPass should no longer change non forward edges to forward. I think with current master this isn't a problem, because the split edge will always be a forward edge, but this still isn't a valid assumption.

After adding the following patch the fs-critical-edge test shows that edge splitting now preserves the edge type. This patch should also solve the 'all incoming edges are forward edges' problem that may be the cause of the missing blocks.

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index f153674..753bcbf 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -2838,7 +2838,7 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn)
       }

       if (prevBB->getExit()->op == OP_BRA) {
-         prevBB->cfg.attach(&convBB->cfg, Graph::Edge::FORWARD);
+         prevBB->cfg.attach(&convBB->cfg, Graph::Edge::TREE);
          prevBB->getExit()->asFlow()->target.bb = convBB;
       }
       setPosition(convBB, true);
Comment 37 Ilia Mirkin 2015-08-22 20:15:06 UTC
(In reply to jr from comment #36)
> Created attachment 117865 [details] [review] [review]
> Edge type fix for prototype patch
> 
> With this patch on top of the prototype patch edge splitting during
> PhiMovesPass should no longer change non forward edges to forward. I think
> with current master this isn't a problem, because the split edge will always
> be a forward edge, but this still isn't a valid assumption.
> 
> After adding the following patch the fs-critical-edge test shows that edge
> splitting now preserves the edge type. This patch should also solve the 'all
> incoming edges are forward edges' problem that may be the cause of the
> missing blocks.
> 
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
> index f153674..753bcbf 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
> @@ -2838,7 +2838,7 @@ Converter::handleInstruction(const struct
> tgsi_full_instruction *insn)
>        }
> 
>        if (prevBB->getExit()->op == OP_BRA) {
> -         prevBB->cfg.attach(&convBB->cfg, Graph::Edge::FORWARD);
> +         prevBB->cfg.attach(&convBB->cfg, Graph::Edge::TREE);
>           prevBB->getExit()->asFlow()->target.bb = convBB;
>        }
>        setPosition(convBB, true);

Actually the edge types are much much more wrong, unfortunately. I don't know how much it matters. For example in an if/else situation, i.e.

    *
   / \
  /   \
 *     *
  \   /
   \ /
    *

You should end up with 3 TREE edges and 1 CROSS edge. However nouveau happily makes both of the "bottom" edges FORWARD. I guess you fix one of them to be TREE, but the other is still going to be FORWARD instead of CROSS.

I think this is too big of a problem to fix in a fixup, esp since we don't really know what these are used for.

I'm thinking the following:

(a) Push my fixup patch with the hash map, cc'd to stable. It's a little inefficient, but it *works*, and is (in my mind) quite simple. I'll double-check that it doesn't affect perf too much.
(b) Figure out this edge type insanity, including fixing things up, and maybe adding a validator that makes sure that the edge types are correct.
(c) Revert my fixup, and implement actual edge splitting, (i.e. what you did) and based on critical edges rather than the current logic.
(d) Lots and lots of testing.

Does that sound reasonable? One of the considerations here is that neither you nor I have a *ton* of time to play around with this, and Mesa 11.0.0 will be released in mid-September, and I'd like to have *some* fix for nv50 in there.

As for your patch, the edge is preserved in all cases except forward, which becomes cross. (draw it out, should make sense, unless I messed up)
Comment 38 jr 2015-08-22 22:41:56 UTC
(In reply to Ilia Mirkin from comment #37)
> I think this is too big of a problem to fix in a fixup, esp since we don't
> really know what these are used for.
> 
> I'm thinking the following:
> 
> (a) Push my fixup patch with the hash map, cc'd to stable. It's a little
> inefficient, but it *works*, and is (in my mind) quite simple. I'll
> double-check that it doesn't affect perf too much.
> (b) Figure out this edge type insanity, including fixing things up, and
> maybe adding a validator that makes sure that the edge types are correct.
> (c) Revert my fixup, and implement actual edge splitting, (i.e. what you
> did) and based on critical edges rather than the current logic.
> (d) Lots and lots of testing.
> 
> Does that sound reasonable? One of the considerations here is that neither
> you nor I have a *ton* of time to play around with this, and Mesa 11.0.0
> will be released in mid-September, and I'd like to have *some* fix for nv50
> in there.

Looks like a good plan. Your patch shouldn't make anything worse and has been tested to fix the reported problems, so this and the other bug can be closed.
 
> As for your patch, the edge is preserved in all cases except forward, which
> becomes cross. (draw it out, should make sense, unless I messed up)

I'm afk for a week, but will take a longer look. Actually it shouldn't be too difficult, since there are not that many places creating edges.
Comment 39 Ilia Mirkin 2015-09-10 07:17:26 UTC
I pushed the following after some light testing:

commit a072ef8748a65d286e9b542bb9ea6e020fdcc7f8
Author: Ilia Mirkin <imirkin@alum.mit.edu>
Date:   Thu Sep 10 01:54:30 2015 -0400

    nv50/ir: make edge splitting fix up phi node sources

But it's not the final word on this stuff... as Béla mentioned in comment 13, some games aren't completely fixed by this. I have some Hearthstone traces (from bug #75776) which still have minor visual artifacts. Further, they are fixed by splitting more edges. [I made the condition only look at p->cfg.outgoingCount() > 1, which includes a whole lot more than critical edges. This is clearly wrong, but... gives some ideas.]
Comment 40 jr 2015-12-07 19:50:33 UTC
Created attachment 120393 [details] [review]
Patch to simplify test case creation

Thanks. I finally had some time to look at the problem again. With the attached patch it is easier to recreate the problem, because it causes every if without an else clause to create a critical edge with the right property (i.e. not being the first incoming edge).
Comment 41 jr 2015-12-07 19:56:27 UTC
After doing some tests with this change I'm sure that the reported problem is fully fixed by Ilja's patch. That splitting more edges seems to help in some cases most probably has a different cause. I'll try to take another look at the hearthstone shaders (may not be soon).


bug/show.html.tmpl processed on Jan 16, 2017 at 17:19:52.
(provided by the Example extension).