Summary: | [Raven][GALLIUM_DDEBUG] system crashes/freezes randomly every few minutes/hours | ||
---|---|---|---|
Product: | Mesa | Reporter: | Marcus Husar <marcus.husar> |
Component: | Drivers/Gallium/radeonsi | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED WORKSFORME | QA Contact: | Default DRI bug account <dri-devel> |
Severity: | critical | ||
Priority: | medium | CC: | chewi, david.cap, marcus.husar |
Version: | git | ||
Hardware: | All | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
GALLIUM_DDEBUG: folder ddebug_dumps with multiple dumps
kernel: [drm:amdgpu_job_timedout [amdgpu]] kernel: amdgpu [gfxhub] VMC page fault (1) kernel: amdgpu [gfxhub] VMC page fault (2) |
Created attachment 137001 [details]
kernel: [drm:amdgpu_job_timedout [amdgpu]]
Created attachment 137002 [details]
kernel: amdgpu [gfxhub] VMC page fault (1)
Created attachment 137003 [details]
kernel: amdgpu [gfxhub] VMC page fault (2)
Same here with AMD 2500U on a HP Envy x360, details at: - https://bugzilla.redhat.com/show_bug.cgi?id=1562530 - https://lists.freedesktop.org/archives/amd-gfx/2018-March/020580.html I am also having this problem. Ryzen 2500u on kernel 4.16-DRM-next. Many hangs that require a reboot to fix. Although it also seems very likely that this is a Kernel driver issue. OP also filed a kernel bug about this. It missed the crucial information about how he was able to debug it! Glad I found this one. https://bugzilla.kernel.org/show_bug.cgi?id=199653 It seems to me that this is in fact a CPU related problem. Since July 25 I don’t have any problems. My system is pretty stable. What helped was to add idle=nomwait to my GRUB command line. This has fixed those problems for me. Please try to add idle=nomwait to your GRUB command line. I think this bug can be closed. I added idle=nomwait recently and that has fixed it for me too. I thought I had already tried this, not sure, but perhaps there were two issues and the other has since been fixed. See comment #8. Kernel parameter idle=nomwait fixed this bug for me. It seems to be a CPU related problem. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 137000 [details] GALLIUM_DDEBUG: folder ddebug_dumps with multiple dumps OpenGL renderer string: AMD RAVEN (DRM 3.23.0 / 4.16.0-2.fc27.x86_64, LLVM 6.0.0) My system is an Acer SF315-41 (Ryzen Mobile 5 2500U) with Fedora 27, Kernel 4.16-drm-next (based on 4.15-rc8), LLVM 6.0.0-rc1, Mesa 18.0.0-rc2. I can reproduce these crashes from kernel-4.15-rcX/mesa-17.3/llvm5 to kernel-4.16-drm-next/mesa-18-rc2/llvm6-rc1 and in between. They mostly appear while watching videos (firefox/totem), switching tabs in firefox, resizing windows (gnome-shell) or gaming. With amdgpu.lockup_timeout=2000 and amdgpu.GALLIUM_DDEBUG=2000 I was able to gather lots of dumps within a few minutes (see attachment). As you can see in the dumps the GPU lockup results sometimes in a CPU lockup (kernel bluetooth deadlock) as a result of gnome shell’s complete freezing. I can reproduce amdgpu crashes also with an USB mouse and bluetooth disabled. Not very often I can find some kernel errors in the logfiles that result from a crash. I’ll attach the few I found in the last two weeks.