Created attachment 124145 [details]
compute shader dump
This happened on GK208, but I assume it'll happen everywhere. This is with the trace from bug 94858.
[259564.842264] nouveau 0000:02:00.0: fifo: read fault at 0000000000 engine 00 [GR] client 03 [GPC0/L1_1] reason 02 [PTE] on channel 6 [007f940000 X]
[259564.842268] nouveau 0000:02:00.0: fifo: gr engine fault on channel 6, recovering...
[259595.772036] nouveau 0000:02:00.0: X: failed to idle channel 6 [X]
[259610.772211] nouveau 0000:02:00.0: X: failed to idle channel 6 [X]
Created attachment 124146 [details]
compute shader dump - NV50_PROG_OPTIMIZE=1 (fail)
Created attachment 124147 [details]
compute shader dump - NV50_PROG_OPTIMIZE=0 (success)
Looks like one of the "level 1" optimizations cause the fail. Now to figure which one...
Interesting, let's see if I can reproduce that issue on GF119. Hopefully I will be able to.
I don't have this read fault on my GF119, but I have lot of:
[ 8473.891952] nouveau 0000:01:00.0: gr: DATA_ERROR 00000028 [CP_NO_REG_SPACE_STRIPED] ch 7 [007f9ed000 glretrace] subc 1 class 90c0 mthd 0368 data 00001000
[ 8473.906657] nouveau 0000:01:00.0: gr: DATA_ERROR 00000028 [CP_NO_REG_SPACE_STRIPED] ch 7 [007f9ed000 glretrace] subc 1 class 90c0 mthd 0368 data 00001000
Disabling compiler opttimizations doesn't change anything.
OK, I've pushed a fix for the GK208 issue (an issue in unspilling predicates):
Author: Ilia Mirkin <firstname.lastname@example.org>
Date: Sat May 28 13:07:12 2016 -0400
gk110/ir: fix unspilling of predicates from registers
Signed-off-by: Ilia Mirkin <email@example.com>
Cc: "11.2 11.1" <firstname.lastname@example.org>
However the issue around thread sizes remains for all except the GK10x keplers. On fermi we have 32K registers, on kepler+ we have 64K (not counting the mythical GK210). So we have to tell the RA to restrict the number of registers used based on thread size (or use 1024 as the number of threads when that information is not provided).
Also an observation - there's a bit of tearing in the sheet as it falls. Could be synchronization fail, or something else. I see it with NV50_PROG_OPTIMIZE=0 as well.
And the fermi issue is fixed now too by (a) fixing the GPR file size to take the thread count and # of SM regs into account and (b) fixing BitSet to work with multiple-of-32 numbers of registers (had been all 32n-1 up until now).