Created attachment 142435 [details]
trace that crashes gpu on replay
Running some opengl applications results in random gpu crashes
I have made trace with apitrace and can reproduce crash when replaying traces. Not every replay results in crash, so it is some random behaviour
this is Ryzen 2400G APU
OpenGL renderer string: AMD RAVEN (DRM 3.27.0, 4.18.0-rc7+, LLVM 8.0.0)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 19.0.0-devel (git-552642066f)
llvm git 89fcd8b878977c9c467cb5d6e33a3404d2996822
Similar crashes was in mesa 17.x, 18.0, 18.2, 18.3 and amdgpu-pro drivers with 4.16 vanilla and 4.18 amdgpu-drm-next kernels, but i was not able to reproduce it at short time (it needed about 5-10 minutes playing game to crash)
[839408.306382] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:1 pasid:32778, for process xash64 pid 6233 thread xash64:cs0 pid 6273
[839408.306384] amdgpu 0000:08:00.0: at address 0x0000000000000000 from 27
[839408.306385] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010153C
[839418.327341] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=119603881, emitted seq=119603883
Trace in attachment creaded with llvmpipe, but crashes in some replays
second trace, which resulted in crash while recording (broken in the end):
Re-uploaded traces here
(In reply to mittorn from comment #1)
> Re-uploaded traces here
I can't get these to crash. Tested with Mesa 19.1-devel on Radeon RX Vega (VEGA10, DRM 3.27.0, 4.20.8-200.fc29.x86_64, LLVM 9.0.0-devel).
Is this still and issue for you? Are you able to get a stack trace from the crash?
Ok, I was able to produce a hang when testing on my RAVEN laptop, seems like this is a RAVEN specific issue. I could only test with the 4.19 kernel as my laptop would not boot with 4.20.
Still can reproduce on opensource drivers, but no crash on amdgpu-pro-18.50
Also i managed to recover system after gpu crash on "modesetting" xorg driver and ignoring recover errors. GPU seems to ignore all commands after "successful" recovery, but it is possible to suspend system. After entering suspend gpu completely restarting and becomes useful again (except you need restart all userspace that used gpu). Maybe it is only possible to implement gpu recovery by entering whole system to suspend (because it is not possible to shutdown power on APU)
-- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1338.