Created attachment 105745 [details]
Async DMA linear to tiled copies are causing GPU hangs in some cases. On Cape Verde, I can easily triggers this as described in . The game Brutal Legend also triggers similar hangs when it streams assets while gameplay.
Disabling usage of this function and using the resource_copy_region fallback instead fixes all hangs. The attached patch does that.
Thank you very much for tracking this down.
Created attachment 105755 [details] [review]
This is a possibly better fix that only disables DMA if 1D tiling is involved. Please give it a try.
Maybe we need to determine the other tiling parameters differently for 1D tiling? IIRC Marek fixed things like that before.
Anyway, in the command stream dump you provided before, it looked like the tiling parameters were totally bogus, mostly all 0. I suspect this needs more investigation.
(In reply to comment #3)
> Anyway, in the command stream dump you provided before, it looked like the
> tiling parameters were totally bogus, mostly all 0. I suspect this needs
> more investigation.
Yeah, that's what I noticed immediately as well. Maybe attach a gdb to X, set a breakpoint to si_dma_copy_tile and check what those parameters usually look like.
If they aren't usually all zero (which is likely) we should figure out why they are zero in this special case.
You can just make the breakpoint conditional, e.g.:
b si_dma.c:228 if array_mode == 0
The tiling parameters don't look bogus and they certainly aren't zero. In the dumped IB there's mt = 1, num_banks = 3, tile_split = 3 in DW 7. Looking at DW 3, there is array_mode = 2, bankw = 0, bankh = 0, mtilea = 0. That might as well be completely wrong for the given surface, but it's not bogus in the sense that the values are invalid.
At first I thought bankw/bankh/mtilea being all set to zero was strange, but this seems to match how libdrm sets up 1D tiled surfaces.
For reference, this is the IB we're talking about:
Yeah, sorry, I misread DW 3 as having array_mode == 0 when it's actually 2.
Oh, yeah you're right got that wrong as well.
But what I hoped to have checked as well is that it sounded like this copy command worked correctly in 90% of all cases and only in a minority just locked up.
Is that correct? Or in other words what makes this special case lock up? Does it work for other resolutions? etc...
Do the fixes from http://lists.freedesktop.org/archives/mesa-dev/2014-September/068738.html help?
Author: Michel Dänzer <firstname.lastname@example.org>
Date: Tue Nov 11 16:10:20 2014 +0900
radeonsi: Disable asynchronous DMA except for PIPE_BUFFER