Summary: | 2500U: Graphics corruption on kernel 5.2 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | andreaskem | ||||||||||||
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> | ||||||||||||
Status: | RESOLVED MOVED | QA Contact: | |||||||||||||
Severity: | normal | ||||||||||||||
Priority: | medium | CC: | briancschott, chewi, chithanh, maraeo, pierre-eric.pelloux-prayer, riku, rush, syniurge | ||||||||||||
Version: | XOrg git | ||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||
OS: | Linux (All) | ||||||||||||||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=111244 | ||||||||||||||
Whiteboard: | |||||||||||||||
i915 platform: | i915 features: | ||||||||||||||
Attachments: |
|
Description
andreaskem
2019-07-13 06:56:39 UTC
Created attachment 144772 [details]
Xorg log
Reintroducing iommu=pt does, indeed, seem to fix these graphical issues. Why is this flag suddenly required for proper operation again? Is every laptop with an Raven Ridge APU different here or why can the kernel not just figure out how to properly configure the IOMMU so that everything works? I think that I'm seeing something related with my 2700u Inspiron 7375. If I have compositing enabled in XFWM4, the system will immediately stop responding after logging in with LightDM. If the window manager compositing is disabled, I'm able to log in, but then there is graphical corruption. With git bisect I traced the problem back to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=df8368be1382&id=df8368be1382b442384507a5147c89978cd60702 I can edit the source file, and by only changing the KMS_DRIVER_MINOR definition from 32 to 30, get the system working correctly with 5.2.0. Same symptoms here after upgrading from Linux 5.1 to 5.2 on Dell Latitude 5495, Ryzen Pro 2700U. Graphical corruption, and/or the GUI will stop responding. Magic SysRq is needed to reboot the computer. kernel log contains the following bits when this happens: [ 44.921571] [drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, signaled seq=1112, emitted seq=1115 [ 44.921574] [drm:amdgpu_job_timedout] *ERROR* Process information: process X pid 4477 thread X:cs0 pid 4587 [ 44.921575] [drm] GPU recovery disabled. Another confimation of same issue on HP Envy x360 15-bq181no with Ryzen 5 2500U with Manjaro. Kernel option iommu=pt fixes it, didn't try other workarounds. (In reply to Brian Schott from comment #3) > I think that I'm seeing something related with my 2700u Inspiron 7375. > > If I have compositing enabled in XFWM4, the system will immediately stop > responding after logging in with LightDM. If the window manager compositing > is disabled, I'm able to log in, but then there is graphical corruption. > > With git bisect I traced the problem back to > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?h=df8368be1382&id=df8368be1382b442384507a5147c89978cd60702 > > I can edit the source file, and by only changing the KMS_DRIVER_MINOR > definition from 32 to 30, get the system working correctly with 5.2.0. Your issue in particular is likely unrelated - it's an issue in userspace. The bisected commit is the one that allowed xf86-video-amdgpu to start scanning out DCC compressed buffers that mesa produces, with the caveat that mesa needs a hook on present for the re-tile. My guess is that hook isn't running when you aren't using "compositing". I'm not sure if mesa or xf86-video-amdgpu have options yet to disable DCC or not, but for that particular setup you'd probably want it disabled. I have the same problem using the Lenovo 530S-14ARR with Ryzen 5 2500U. The XFCE4 compositing makes the system immediately freeze as soon as I log in on Manjaro with the 5.2.4-1 kernel. On the 5.3rc2 kernel, I can enable it and it doesn't immediately hang without recovery. That said, the compositing doesn't work and we have a lot of corruption as soon as anything updates on screen. The "iommu=pt" option didn't do anything for me in regards to reducing the corruption. I didn't know which logs to include so I didn't. Send reply with the names and hopefully, approximate path of any logs to include. (In reply to Nicholas Kazlauskas from comment #6) > The bisected commit is the one that allowed xf86-video-amdgpu to start > scanning out DCC compressed buffers that mesa produces, with the caveat that > mesa needs a hook on present for the re-tile. My guess is that hook isn't > running when you aren't using "compositing". Yeah, could be that radeonsi is missing cases where this hook needs to run (or DCC needs to be disabled altogether). Pierre or Marek, can you look into this? (In reply to Brian Schott from comment #3) > I think that I'm seeing something related with my 2700u Inspiron 7375. > > If I have compositing enabled in XFWM4, the system will immediately stop > responding after logging in with LightDM. If the window manager compositing > is disabled, I'm able to log in, but then there is graphical corruption. > > With git bisect I traced the problem back to > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?h=df8368be1382&id=df8368be1382b442384507a5147c89978cd60702 > > I can edit the source file, and by only changing the KMS_DRIVER_MINOR > definition from 32 to 30, get the system working correctly with 5.2.0. I couldn't reproduce the problem (Ryzen 7 PRO 2700U laptop). Could you list the version number of the various component involved (kernel, mesa, xf86-video-amdgpu and libdrm) please? Also can you reproduce the problem with another desktop environment? Just rebuilt mesa, libdrm, and xf86-video-amdgpu from git this evening. The kernel is the gentoo patched version of 5.2.6. The problem is not limited to XFCE's window manager. This is what it looks like in Dolphin: https://i.imgur.com/b8VLVP6.png (In reply to Pierre-Eric Pelloux-Prayer from comment #9) > Could you list the version number of the various component involved (kernel, > mesa, xf86-video-amdgpu and libdrm) please? kernel 5.2.7 mesa 19.0.8 libdrm 2.4.97 xf86-video-amdgpu 19.0.1 llvm 7.1.0 (In reply to Brian Schott from comment #10) > Just rebuilt mesa, libdrm, and xf86-video-amdgpu from git this evening. The > kernel is the gentoo patched version of 5.2.6. The problem is not limited to > XFCE's window manager. This is what it looks like in Dolphin: > https://i.imgur.com/b8VLVP6.png Does using "AMD_DEBUG=nodcc" Mesa environment variable help? Could you capture an apitrace of the issue in Dolphin so I can reproduce more easily? (In reply to Pierre-Eric Pelloux-Prayer from comment #12) > Does using "AMD_DEBUG=nodcc" Mesa environment variable help? It does. Exporting that in my ~/.profile makes the desktop usable. > Could you capture an apitrace of the issue in Dolphin so I can reproduce > more easily? It seems that the Dolphin issue is not related to the desktop graphics corruption. I tested it again using a kernel that had my hack to under-report the version number and saw the same rendering issues in Dolphin, but that kernel doesn't require the nodcc environment variable. Created attachment 145035 [details]
screenshot
Setting AMD_DEBUG="nodcc" system-wide via /etc/profile will help against GUI freeze, but not against graphical corruption (screenshot attached).
It seems that the graphics corruption on my Dell Latitude 5495 with kernel 5.2 is a different issue. Kai-Heng Feng bisected and identified this in the Ubuntu Launchpad tracker already. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1837688#yui_3_10_3_1_1565621082983_410 (In reply to Brian Schott from comment #13) > (In reply to Pierre-Eric Pelloux-Prayer from comment #12) > > Does using "AMD_DEBUG=nodcc" Mesa environment variable help? > > It does. Exporting that in my ~/.profile makes the desktop usable. > Let's focus on this issue first. Can you paste the output of: "AMD_DEBUG=info glxgears" please? And would you be able to test other versions of Mesa to see if your issue could be bisected (if it happens to be a Mesa problem)? Created attachment 145069 [details] glxgears output > Can you paste the output of: "AMD_DEBUG=info glxgears" please? Done. I'll try to bisect Mesa as time allows. So, I've installed Compton as an alternative compositor on XFCE4 (I disabled the internal one), it works rather well The only problem I could find so far is that the graphics corruption persists when moving windows (the white blocky stuff that appears around all moving elements). Although, It does clear up as soon as I stop moving the window. Alternatively, I was thinking of moving to using Wayland, however, XFCE4 doesn't seem to support it so that's not an option for me. Perhaps someone can test that on their own system. Anyhow, I'm happy with the functionality of my current albeit partial solution. I will continue to check in for a full solution though. I haven't had a lot of time to work on this, but I do have one more data point: Mesa 19.1.4 has no corruption in Dolphin but still requires the nodcc workaround with kernel 5.2.6. The following applies to the graphics corruption seen in Dolphin: ea5b7de138bb7e9a4e7e4f0c39c4ceb16acae923 is the first bad commit commit ea5b7de138bb7e9a4e7e4f0c39c4ceb16acae923 Author: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Date: Wed Jul 3 19:27:12 2019 +0200 radeonsi: make gl_SampleMaskIn = 0x1 when MSAA is disabled gl_SampleMaskIn is 1 when R_028BE0_PA_SC_AA_CONFIG is 0, so this commit rework the conditions controlling this register. Before it was set if the sctx->framebuffer had a sample count > 1. Now we still require this condition, but we also need either: - GL_MULTISAMPLE to be enabled - to be executing an operation that doesn't depends on GL state using u_blitter. This fixes the arb_sample_shading/sample_mask piglit tests on radeonsi. Signed-off-by: Marek Olšák <marek.olsak@amd.com> src/gallium/drivers/radeonsi/si_state.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) As far as the issue about desktop corruption and lockup on login requiring the AMD_DEBUG=nodcc workaround: b563460b494e9228cf5bb1aa4a70ac2499ad81fe is the first bad commit commit b563460b494e9228cf5bb1aa4a70ac2499ad81fe Author: Marek Olšák <marek.olsak@amd.com> Date: Tue Jan 8 20:08:08 2019 -0500 radeonsi: enable displayable DCC on Ravens src/amd/common/ac_gpu_info.c | 8 ++++++++ src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c | 4 ++++ 2 files changed, 12 insertions(+) How do I reproduce the Xfce hang and corruption? It's not reproducible with Ubuntu 16.04. (In reply to Marek Olšák from comment #22) > How do I reproduce the Xfce hang and corruption? It's not reproducible with > Ubuntu 16.04. How old is XFCE in 16.04? I'm using 4.14, which was released this month. Here's a quote from the release notes: "The window manager received a slew of updates and features, including support for VSync (using either Present or OpenGL as backend) to reduce or remove display flickering, HiDPI support, improved GLX support with NVIDIA proprietary/closed source drivers, support for XInput2, various compositor improvements and a new default theme." Maybe older versions of the window manager don't trigger the issue. Either way, I have an extra SSD on the way. I should be able to swap that in to the machine and figure out some directions for reproducing the bug from a clean install. It's hard to tell who's talking about which issue as there seems to be two or even three different ones going on here. I believe I'm seeing the originally reported issue as iommu=pt helps on my 2700U. Without it, I get heavy corruption throughout KDE. I'm running OpenSUSE Leap 15.1 with kernel-default 5.2.10-1.g5878ee6. Going back to 5.1 also works. Apparently, Lenovo released a new BIOS update for my laptop (Thinkpad E485) today. The changelog mentions, "Sync IOAPICID in IVRS and APIC ACPI tables (Linux)." I installed the update and removed both the ivrs_ioapic[32]=00:14.0 flag, as well as the iommu=pt flag from my kernel command line for testing. To my surprise, the laptop booted just fine and from what I can tell, the graphical corruption seems to be gone. In the meantime, the kernel and mesa were updated a few times and I cannot pinpoint what, exactly, fixed my issues. -> mesa 19.1.5 -> kernel 5.2.10.arch1-1 -> llvm-libs 8.0.1-3 Created attachment 145172 [details] amd_fix.sh I've been having a problem with display corruption on Fedora using recent updates. Here is my report on their bug tracker: https://bugzilla.redhat.com/show_bug.cgi?id=1745380 I can confirm that when compiz is enabled, the system will lock up when it's started. Adding "iommu=pt" to the kernel flags in grub2 will allow the system to boot without freezing but the display is still corrupt. Adding "AMD_DEBUG=nodcc" to the environment provides a complete workaround (without the need for the kernel flag). I've attached a file "amd_fix.sh" which if placed in /etc/profile.d/ will provide a workaround until this can be fixed in the kernel or wherever the problem is. I am using an HP Envy x360 13m with a Ryzen 7 2700U with the latest BIOS update. (In reply to Brian Schott from comment #21) > As far as the issue about desktop corruption and lockup on login requiring > the AMD_DEBUG=nodcc workaround: > > b563460b494e9228cf5bb1aa4a70ac2499ad81fe is the first bad commit > commit b563460b494e9228cf5bb1aa4a70ac2499ad81fe > Author: Marek Olšák <marek.olsak@amd.com> > Date: Tue Jan 8 20:08:08 2019 -0500 > > radeonsi: enable displayable DCC on Ravens > > src/amd/common/ac_gpu_info.c | 8 ++++++++ > src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c | 4 ++++ > 2 files changed, 12 insertions(+) Could you test this commit https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2016/diffs?commit_id=4829f697ab2ceb2fc2772cc1220acc4185e6013d and let us know if it fixes this issue? (In reply to Pierre-Eric Pelloux-Prayer from comment #28) > > Could you test this commit > https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2016/ > diffs?commit_id=4829f697ab2ceb2fc2772cc1220acc4185e6013d and let us know if > it fixes this issue? Using kernel 5.3.0 and mesa git with that patch: * Graphics corruption in Dolphin is gone (without nodcc env variable) * System lockup is gone (without nodcc env variable) * Desktop corruption is still present (without nodcc env variable) * Desktop corruption is gone (nodcc env variable set) Fixed issue with a BIOS update on Lenovo E485 (v1.54 with AMD 2500U) with Fedora 30 KDE. I had an issue on previous BIOS that requires a kernel option to boot on all kernels (ivrs_ioapic[32]=00:14.0). After BIOS update, this option is not needed and there is no longer graphic corruption on Kernel 5.2+. This corruption was impacting at least Firefox, dolphin and the Plasma Desktop. KDE plasma 5.15.5 mesa 19.1.6 xorg-x11-drv-amdgpu 19.0.1 libdrm 2.4.99 As of my observation, the graphic corruption was introduced in the amdgpu driver inside Kernel 5.2 and occurs only when forcing boot with the ivrs_ioapic override. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/842. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.