Created attachment 137000 [details]
GALLIUM_DDEBUG: folder ddebug_dumps with multiple dumps
OpenGL renderer string: AMD RAVEN (DRM 3.23.0 / 4.16.0-2.fc27.x86_64, LLVM 6.0.0)
My system is an Acer SF315-41 (Ryzen Mobile 5 2500U) with Fedora 27, Kernel 4.16-drm-next (based on 4.15-rc8), LLVM 6.0.0-rc1, Mesa 18.0.0-rc2.
I can reproduce these crashes from kernel-4.15-rcX/mesa-17.3/llvm5 to kernel-4.16-drm-next/mesa-18-rc2/llvm6-rc1 and in between. They mostly appear while watching videos (firefox/totem), switching tabs in firefox, resizing windows (gnome-shell) or gaming.
With amdgpu.lockup_timeout=2000 and amdgpu.GALLIUM_DDEBUG=2000 I was able to gather lots of dumps within a few minutes (see attachment). As you can see in the dumps the GPU lockup results sometimes in a CPU lockup (kernel bluetooth deadlock) as a result of gnome shell’s complete freezing. I can reproduce amdgpu crashes also with an USB mouse and bluetooth disabled.
Not very often I can find some kernel errors in the logfiles that result from a crash. I’ll attach the few I found in the last two weeks.
Created attachment 137001 [details]
kernel: [drm:amdgpu_job_timedout [amdgpu]]
Created attachment 137002 [details]
kernel: amdgpu [gfxhub] VMC page fault (1)
Created attachment 137003 [details]
kernel: amdgpu [gfxhub] VMC page fault (2)
Same here with AMD 2500U on a HP Envy x360, details at:
I am also having this problem. Ryzen 2500u on kernel 4.16-DRM-next. Many hangs that require a reboot to fix.
Although it also seems very likely that this is a Kernel driver issue.
OP also filed a kernel bug about this. It missed the crucial information about how he was able to debug it! Glad I found this one.
It seems to me that this is in fact a CPU related problem. Since July 25 I don’t have any problems. My system is pretty stable. What helped was to add idle=nomwait to my GRUB command line. This has fixed those problems for me.
Please try to add idle=nomwait to your GRUB command line. I think this bug can be closed.
I added idle=nomwait recently and that has fixed it for me too. I thought I had already tried this, not sure, but perhaps there were two issues and the other has since been fixed.
See comment #8. Kernel parameter idle=nomwait fixed this bug for me. It seems to be a CPU related problem.