GPU hangs when running any Vulkan program. I tested with vulkan-smoketest but seems to happen with anything.
gmc_v6_0_process_interrupt: 28 callbacks suppressed
amdgpu 0000:01:00.0: GPU fault detected: 147 0x0f2a7001
amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0F47FFF9
amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A070001
amdgpu 0000:01:00.0: VM fault (0x01, vmid 5) at page 256376825, read from '' (0x00000000) (112)
Kernel 4.15.11 (current in Debian testing), LLVM 6.0.0, Pitcairn.
4ad7595f350462c704fbe5b2bd2ca406c904e78e is the first bad commit
Author: Samuel Pitoiset <email@example.com>
Date: Wed Apr 4 12:12:03 2018 +0200
radv: rename radv_emit_prefetch() to radv_emit_prefetch_L2()
Signed-off-by: Samuel Pitoiset <firstname.lastname@example.org>
Reviewed-by: Bas Nieuwenhuizen <email@example.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Despite the commit message it seems to contain functional changes, in particular it seems to enable some DMA transfers on all chips. It looks it doesn't work on SI.
Created attachment 138699 [details]
Created attachment 138700 [details]
I think this should be fixed by
As Bas said, this should already be fixed. Sorry for the breakage.
Can you update your repo and confirm, please?
Still present in latest (a055f5108dfb26522266095d9beb72857d2051f4)
[ 2862.614147] gmc_v6_0_process_interrupt: 28 callbacks suppressed
[ 2862.614150] amdgpu 0000:01:00.0: GPU fault detected: 147 0x0f227001
[ 2862.614155] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0F47FFF9
[ 2862.614157] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02070001
[ 2862.614159] amdgpu 0000:01:00.0: VM fault (0x01, vmid 1) at page 256376825, read from '' (0x00000000) (112)
Also I don't see how that commit would fix it since it refers to compute shaders and none of my test programs use those. Unless the commit message is misleading again.
Well, I did too many mistakes, sorry.
The following patch should fix the issue:
Well the error message changed but it still hangs...
[ 110.666337] gmc_v6_0_process_interrupt: 28 callbacks suppressed
[ 110.666340] amdgpu 0000:01:00.0: GPU fault detected: 146 0x028a8804
[ 110.666344] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00100014
[ 110.666346] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A088004
[ 110.666348] amdgpu 0000:01:00.0: VM fault (0x04, vmid 5) at page 1048596, read from '' (0x00000000) (136)
Created attachment 138703 [details]
radv trace from the hang
Still happens in 4381be4648b9ebb15b0a06885489998d5daac482
I did a little experiment, I rebased locally and removed the broken commit (4ad7595f350462c704fbe5b2bd2ca406c904e78e) and then the followups (942fdfe357, f1d7c16e85, 04e609f1f8) because they no longer applied cleanly. The resulting mesa works and does not exhibit this bug.
So there are no other confounding issuses and there's still some case in there which you've missed on SI.