Bug 108711

Summary: [apitrace] GPU hangs in some opengl 1.x applications on RAVEN
Product: Mesa Reporter: mittorn <mittorn>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact: Default DRI bug account <dri-devel>
Severity: major    
Priority: medium CC: mittorn, t_arceri
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: trace that crashes gpu on replay

Description mittorn 2018-11-11 18:40:16 UTC
Created attachment 142435 [details]
trace that crashes gpu on replay

Running some opengl applications results in random gpu crashes
I have made trace with apitrace and can reproduce crash when replaying traces. Not every replay results in crash, so it is some random behaviour
this is Ryzen 2400G APU
OpenGL renderer string: AMD RAVEN (DRM 3.27.0, 4.18.0-rc7+, LLVM 8.0.0)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 19.0.0-devel (git-552642066f)
llvm git 89fcd8b878977c9c467cb5d6e33a3404d2996822
Similar crashes was in mesa 17.x, 18.0, 18.2, 18.3 and amdgpu-pro drivers with 4.16 vanilla and 4.18 amdgpu-drm-next kernels, but i was not able to reproduce it at short time (it needed about 5-10 minutes playing game to crash)

[839408.306382] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:1 pasid:32778, for process xash64 pid 6233 thread xash64:cs0 pid 6273
)
[839408.306384] amdgpu 0000:08:00.0: at address 0x0000000000000000 from 27
[839408.306385] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010153C
[839418.327341] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=119603881, emitted seq=119603883
Trace in attachment creaded with llvmpipe, but crashes in some replays
second trace, which resulted in crash while recording (broken in the end):
http://rgho.st/private/8HvcFZtwT/04b55f00128bc9e81f4cf3c4a205015a
Comment 2 Timothy Arceri 2019-02-18 23:40:24 UTC
(In reply to mittorn from comment #1)
> Re-uploaded traces here
> http://mittorn.the-swank.pp.ua/xash64.1.trace.xz
> http://mittorn.the-swank.pp.ua/xash64.trace.xz

I can't get these to crash. Tested with Mesa 19.1-devel on Radeon RX Vega (VEGA10, DRM 3.27.0, 4.20.8-200.fc29.x86_64, LLVM 9.0.0-devel).

Is this still and issue for you? Are you able to get a stack trace from the crash?
Comment 3 Timothy Arceri 2019-02-19 04:06:15 UTC
Ok, I was able to produce a hang when testing on my RAVEN laptop, seems like this is a RAVEN specific issue. I could only test with the 4.19 kernel as my laptop would not boot with 4.20.
Comment 4 mittorn 2019-02-19 06:25:27 UTC
Still can reproduce on opensource drivers, but no crash on amdgpu-pro-18.50
Also i managed to recover system after gpu crash on "modesetting" xorg driver and ignoring recover errors. GPU seems to ignore all commands after "successful" recovery, but it is possible to suspend system. After entering suspend gpu completely restarting and becomes useful again (except you need restart all userspace that used gpu). Maybe it is only possible to implement gpu recovery by entering whole system to suspend (because it is not possible to shutdown power on APU)
Comment 5 GitLab Migration User 2019-09-25 18:29:16 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1338.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.