Summary: | Regression: Lenovo e585 (ryzen 2500u) freezes during boot with 4.20-rc5/rc6, amdgpu error | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | chris <christian.frank.uwb> | ||||||||
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> | ||||||||
Status: | RESOLVED MOVED | QA Contact: | |||||||||
Severity: | normal | ||||||||||
Priority: | medium | CC: | ckoenig.leichtzumerken, Dagobertstaler, ikidd3123, jv356, tones111, vicluo96 | ||||||||
Version: | unspecified | ||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||
OS: | Linux (All) | ||||||||||
See Also: |
https://bugs.freedesktop.org/show_bug.cgi?id=109200 https://bugzilla.kernel.org/show_bug.cgi?id=201727 |
||||||||||
Whiteboard: | |||||||||||
i915 platform: | i915 features: | ||||||||||
Attachments: |
|
Hi, tested with the newly released rc6, same issue. Many thanks ! Christian I have the same issue with a 2700U in a Dell Inspiron 7375. All of the 4.20 RC versions that I have tried show the same problem. The system is able to boot with a 4.19 kernel. The issue is still present in kernel 4.20.0-rc6-next-20181213. Can you boot the system without amdgpu loaded (e.g., append modprobe.blacklist=amdgpu)? Or is this a general platform problem? Can you boot the system without amdgpu loaded (e.g., append modprobe.blacklist=amdgpu) -> Doing this, i am able to boot my system. To clarify, the system can boot with the amdgpu module, but it will lock up when LightDM/X starts. Booting with the amdgpu module blacklisted works. Yes, same here. The system boots until GDM wants to start, then it freezes with the mentioned amdgpu error. Disabling amdgpu let the system start up completely including gdm. Can you bisect? 020aa2ec15fc4a5ffdfcab7dc0db648a137abc41 lets me log in before the system freezes. 770af5859d6903049b7f39ed4f4e6612b63fd82d locks up before LightDM can start. I'll do a bit more testing. Ignore that previous comment. I'm getting some strange results here and may have marked a commit with an intermittent crash as "good" while bisecting. "bc537a9cc47eec7f4e32b8164c494ddc35dca8ac is the first bad commit" Well, that's kind of useless. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?h=bc537a9cc47eec7f4e32b8164c494ddc35dca8ac Any suggestions on how to get a better idea of where the break was? Make sure you've tested a commit plenty before declaring it "good". FYI, as a workaround, you can use the kernel opt: iommu=pt ..at least, on 4.20 rc7, which is the only one I've tried that on, but it should work with others. I can confirm that the iommu=pt workaround works, also iommu=soft works to get gdm started and use the laptop. Sadly i have no idea what impact those workarounds have when it comes to performance of the gpu/cpu or battery lifetime ? (In reply to chris from comment #14) > I can confirm that the iommu=pt workaround works, also iommu=soft works to > get gdm started and use the laptop. Sadly i have no idea what impact those > workarounds have when it comes to performance of the gpu/cpu or battery > lifetime ? Sadly i had a freeze during desktop usage shortly after boot using iommu=pt. The driver situation for raven ridge is really sad atm :( . I've tested a next-20181221 kernel with IOMMU_DEFAULT_PASSTHROUGH set, and I'm able to get the system to start properly. Still seeing some system lockups, when playing games, but it's better than crashing on the login screen. Hi, the laptop is still freezing when trying to start with kernel 4.20 (release version) using latest amdgpu firmware from kernel firmware git. Using iommu=soft still solves that issue. I also tested with a kernel daily build from 26.12 which should include the latest drm changes, and it also shows the same issue. Is there anything we can provide to help finding the root cause ? Many thanks ! Christian *** Bug 109200 has been marked as a duplicate of this bug. *** Created attachment 142928 [details]
full kernel log
Seeing same issue with Dell 5575 (AMD 2500u, Vega mobile) on 4.20 Release. iommu=soft seems to allow boot. Kernel Log: https://gist.github.com/ikidd/692dea4c63cc7656247071322d066405 With iommu=soft I still occasionally experience frozen screen with following logs: Jan 02 16:11:18 lzThinkpad gnome-shell[1647]: Failed to flip: Cannot allocate memory Jan 02 16:11:18 lzThinkpad kernel: amdgpu 0000:05:00.0: 00000000a2e0b642 pin failed Jan 02 16:11:18 lzThinkpad kernel: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12 I would like to add that on my Lenovo E585 iommu=pt works reliably; even for hours and doing games/webvideos. But a few minutes in wayland produce a frozen screen (without iommu=pt is does not even start). Can anyone else try and bisect? No problem here with amdgpu and iommu enabled, running kernel 4.20.0 on Dell Latitude 5495 (2700U). So BIOS issue maybe? iommu=pt is however still needed for kfd (bug 107898). Created attachment 142974 [details] journalctl -b of lockup from bisected commit E585 owner here. Please let me know if I can provide any additional information that would be helpful. Thanks in advance for your help. This problem was very consistently reproduced during the bisect. I've attached a journalctl -b from the first bad commit. I was able to bisect the problem to... 284dec4317c8e76f45d3ce922f673c80331812f1 is the first bad commit commit 284dec4317c8e76f45d3ce922f673c80331812f1 Author: Christian König <christian.koenig@amd.com> Date: Wed Aug 22 16:44:56 2018 +0200 drm/amdgpu: enable GTT PD/PT for raven v3 Should work on Vega10 as well, but with an obvious performance hit. Older APUs can be enabled as well, but will probably be more work. v2: fix error checking v3: use more general check Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Hi, many thanks for that bisect. I googled the commit and found the following in addition which seems to be the same issue ? https://bugzilla.kernel.org/show_bug.cgi?id=201727 Hope that helps. Many thanks ! Christian Still the same issue with kernel 5.0-rc1. Any plan on when to tackle that issue ? Should be fixed with this commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1c1eba86339c8517814863bc7dd21e2661a84e77 I'm able to boot when building from that commit (1c1eba8) and looks like it will land in 4.20.4. Thanks! Very nice. Just tried 5.0-rc2 and booting works fine now without the iommu workaround ! -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/633. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 142765 [details] amdgpu error message Hi, i upgraded from mainline kernel 4.19.7 to 4.20-rc5. Sadly using that kernel the system freezes when it tries to show gdm. OS: Ubuntu 18.04.1 Kernel: Linux version 4.20.0-042000rc5-generic (kernel@gloin) (gcc version 8.2.0 (Ubuntu 8.2.0-10ubuntu1)) #201812030721 SMP Mon Dec 3 12:23:24 UTC 2018 Command line: BOOT_IMAGE=/boot/vmlinuz-4.20.0-042000rc5-generic root=UUID=1381a98d-77fd-481f-9cdb-115b30829bd8 ro ivrs_ioapic[32]=00:14.0 ivrs_ioapic[33]=00:00.1 vt.handoff=1 Mesa is at version 18.2.2 (X-Swat ppa) Firmware files: ll /lib/firmware/amdgpu/rav* -rw-r--r-- 1 root root 33280 Nov 6 21:32 /lib/firmware/amdgpu/raven_asd.bin -rw-r--r-- 1 root root 9344 Nov 6 21:32 /lib/firmware/amdgpu/raven_ce.bin -rw-r--r-- 1 root root 316 Apr 25 2018 /lib/firmware/amdgpu/raven_gpu_info.bin -rw-r--r-- 1 root root 17536 Nov 6 21:32 /lib/firmware/amdgpu/raven_me.bin -rw-r--r-- 1 root root 263808 Nov 6 21:32 /lib/firmware/amdgpu/raven_mec2.bin -rw-r--r-- 1 root root 263808 Nov 6 21:32 /lib/firmware/amdgpu/raven_mec.bin -rw-r--r-- 1 root root 21632 Nov 6 21:32 /lib/firmware/amdgpu/raven_pfp.bin -rw-r--r-- 1 root root 26948 Nov 6 21:32 /lib/firmware/amdgpu/raven_rlc.bin -rw-r--r-- 1 root root 17408 Nov 6 21:32 /lib/firmware/amdgpu/raven_sdma.bin -rw-r--r-- 1 root root 341728 Apr 25 2018 /lib/firmware/amdgpu/raven_vcn.bin christian@christian-ThinkPad-E585:~$ apt-cache show linux-firmware Package: linux-firmware Architecture: all Version: 1.173.2 Error-Log from journalctl: Dez 09 16:26:20 christian-ThinkPad-E585 set-cpufreq[874]: Setting ondemand scheduler for all CPUs Dez 09 16:26:20 christian-ThinkPad-E585 kernel: gmc_v9_0_process_interrupt: 28 callbacks suppressed Dez 09 16:26:20 christian-ThinkPad-E585 kernel: amdgpu 0000:05:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:1 pasid:32768, for process gnome-shell pid 1102 thread g Dez 09 16:26:20 christian-ThinkPad-E585 kernel: amdgpu 0000:05:00.0: in page starting at address 0x0000800100020000 from 18 Dez 09 16:26:20 christian-ThinkPad-E585 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010013C Dez 09 16:26:20 christian-ThinkPad-E585 kernel: amdgpu 0000:05:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:1 pasid:32768, for process gnome-shell pid 1102 thread g Dez 09 16:26:20 christian-ThinkPad-E585 kernel: amdgpu 0000:05:00.0: in page starting at address 0x0000800100020000 from 18 Dez 09 16:26:20 christian-ThinkPad-E585 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010013C ez 09 16:26:20 christian-ThinkPad-E585 kernel: [Hardware Error]: Deferred error, no action required. Dez 09 16:26:20 christian-ThinkPad-E585 kernel: [Hardware Error]: CPU:0 (17:11:0) MC20_STATUS[-|-|MiscV|-|AddrV|Deferred|-|SyndV Dez 09 16:26:20 christian-ThinkPad-E585 systemd-journald[378]: Missed 68239 kernel messages Dez 09 16:26:20 christian-ThinkPad-E585 kernel: [Hardware Error]: Deferred error, no action required. Dez 09 16:26:20 christian-ThinkPad-E585 systemd-journald[378]: Missed 6630 kernel messages Dez 09 16:26:20 christian-ThinkPad-E585 kernel: [Hardware Error]: Coherent Slave Extended Error Code: 1 Dez 09 16:26:20 christian-ThinkPad-E585 systemd-journald[378]: Missed 7875 kernel messages I attached an .txt file showing more of the error messages. I also have seen freezes with 4.19.7 with a similar error message, but this happens very rarely. With 4.20-rc5 the issue happens every time gdm tries to start, which makes the system unusable. If you need any other info, please ping me. Many thanks ! Christian