Bug 83500 - si_dma_copy_tile causes GPU hangs
Summary: si_dma_copy_tile causes GPU hangs
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
Depends on:
Reported: 2014-09-04 13:49 UTC by Grigori Goronzy
Modified: 2015-02-09 12:01 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:

Workaround (930 bytes, text/plain)
2014-09-04 13:49 UTC, Grigori Goronzy
Better fix (1.01 KB, patch)
2014-09-04 15:50 UTC, Grigori Goronzy
Details | Splinter Review

Description Grigori Goronzy 2014-09-04 13:49:41 UTC
Created attachment 105745 [details]

Async DMA linear to tiled copies are causing GPU hangs in some cases. On Cape Verde, I can easily triggers this as described in [1]. The game Brutal Legend also triggers similar hangs when it streams assets while gameplay.

Disabling usage of this function and using the resource_copy_region fallback instead fixes all hangs. The attached patch does that.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=79980#c124
Comment 1 Marek Olšák 2014-09-04 13:59:20 UTC
Thank you very much for tracking this down.
Comment 2 Grigori Goronzy 2014-09-04 15:50:58 UTC
Created attachment 105755 [details] [review]
Better fix

This is a possibly better fix that only disables DMA if 1D tiling is involved. Please give it a try.
Comment 3 Michel Dänzer 2014-09-05 01:50:18 UTC
Maybe we need to determine the other tiling parameters differently for 1D tiling? IIRC Marek fixed things like that before.

Anyway, in the command stream dump you provided before, it looked like the tiling parameters were totally bogus, mostly all 0. I suspect this needs more investigation.
Comment 4 Christian König 2014-09-05 08:25:23 UTC
(In reply to comment #3)
> Anyway, in the command stream dump you provided before, it looked like the
> tiling parameters were totally bogus, mostly all 0. I suspect this needs
> more investigation.

Yeah, that's what I noticed immediately as well. Maybe attach a gdb to X, set a breakpoint to si_dma_copy_tile and check what those parameters usually look like.

If they aren't usually all zero (which is likely) we should figure out why they are zero in this special case.
Comment 5 Michel Dänzer 2014-09-05 08:45:19 UTC
You can just make the breakpoint conditional, e.g.:

 b si_dma.c:228 if array_mode == 0
Comment 6 Grigori Goronzy 2014-09-05 09:43:36 UTC
The tiling parameters don't look bogus and they certainly aren't zero. In the dumped IB there's mt = 1, num_banks = 3, tile_split = 3 in DW 7. Looking at DW 3, there is array_mode = 2, bankw = 0, bankh = 0, mtilea = 0. That might as well be completely wrong for the given surface, but it's not bogus in the sense that the values are invalid.

At first I thought bankw/bankh/mtilea being all set to zero was strange, but this seems to match how libdrm sets up 1D tiled surfaces.

For reference, this is the IB we're talking about:
Comment 7 Michel Dänzer 2014-09-05 10:06:23 UTC
Yeah, sorry, I misread DW 3 as having array_mode == 0 when it's actually 2.
Comment 8 Christian König 2014-09-05 11:19:57 UTC
Oh, yeah you're right got that wrong as well.

But what I hoped to have checked as well is that it sounded like this copy command worked correctly in 90% of all cases and only in a minority just locked up.

Is that correct? Or in other words what makes this special case lock up? Does it work for other resolutions? etc...
Comment 9 Michel Dänzer 2014-09-30 06:37:31 UTC
Do the fixes from http://lists.freedesktop.org/archives/mesa-dev/2014-September/068738.html help?
Comment 10 Michel Dänzer 2014-11-17 07:25:32 UTC
Module: Mesa
Branch: master
Commit: ae4536b4f71cbe76230ea7edc7eb4d6041e651b4
URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=ae4536b4f71cbe76230ea7edc7eb4d6041e651b4

Author: Michel Dänzer <michel.daenzer@amd.com>
Date:   Tue Nov 11 16:10:20 2014 +0900

radeonsi: Disable asynchronous DMA except for PIPE_BUFFER

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.