Bug 111122

Summary: 2500U: Graphics corruption on kernel 5.2
Product: DRI Reporter: andreaskem
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: briancschott, chewi, chithanh, maraeo, pierre-eric.pelloux-prayer, riku, rush, syniurge
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=111244
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Kernel log
none
Xorg log
none
screenshot
none
glxgears output
none
amd_fix.sh none

Description andreaskem 2019-07-13 06:56:39 UTC
Created attachment 144771 [details]
Kernel log

Arch Linux
Lenovo E485 (16 GiB RAM)
AMD Ryzen 2500U

xorg-server 1.20.5-2
mesa 19.1.2-1
xf86-video-amdgpu 19.0.1-1
libdrm 2.4.99-1

After upgrading to the linux kernel 5.2 from the Arch Linux repositories, my laptop started to show graphical corruption in Firefox or Konsole. It is much worse if something is moving on the screen e.g., a video is playing. Sometimes Firefox is almost unusable as a result. A downgrade to 5.1.16 immediately fixes the issues.

Somebody mentioned similar corruption for bug 109206:
https://bugs.freedesktop.org/show_bug.cgi?id=109206#c57

My kernel command line is:

initrd=\amd-ucode.img initrd=\initramfs-linux.img root=PARTUUID=34098e4c-f1bf-4a43-a0a8-2ba3ed3c71a6 idle=nomwait psmouse.synaptics_intertouch=1 acpi_osi=Linux amdgpu.gpu_recovery=1 ivrs_ioapic[32]=00:14.0

I used to have iommu=pt or iommu=off on the command line to get this laptop to boot properly but I have not needed that switch for a while. I might try to reintroduce it with 5.2 just to see what happens. In any case, my setup worked before, so something does not seem right.
Comment 1 andreaskem 2019-07-13 06:57:17 UTC
Created attachment 144772 [details]
Xorg log
Comment 2 andreaskem 2019-07-13 07:03:02 UTC
Reintroducing iommu=pt does, indeed, seem to fix these graphical issues. Why is this flag suddenly required for proper operation again?

Is every laptop with an Raven Ridge APU different here or why can the kernel not just figure out how to properly configure the IOMMU so that everything works?
Comment 3 Brian Schott 2019-07-14 04:59:06 UTC
I think that I'm seeing something related with my 2700u Inspiron 7375.

If I have compositing enabled in XFWM4, the system will immediately stop responding after logging in with LightDM. If the window manager compositing is disabled, I'm able to log in, but then there is graphical corruption.

With git bisect I traced the problem back to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=df8368be1382&id=df8368be1382b442384507a5147c89978cd60702

I can edit the source file, and by only changing the KMS_DRIVER_MINOR definition from 32 to 30, get the system working correctly with 5.2.0.
Comment 4 Chí-Thanh Christopher Nguyễn 2019-07-15 16:42:45 UTC
Same symptoms here after upgrading from Linux 5.1 to 5.2 on Dell Latitude 5495, Ryzen Pro 2700U. Graphical corruption, and/or the GUI will stop responding. Magic SysRq is needed to reboot the computer.

kernel log contains the following bits when this happens:

[   44.921571] [drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, signaled seq=1112, emitted seq=1115
[   44.921574] [drm:amdgpu_job_timedout] *ERROR* Process information: process X pid 4477 thread X:cs0 pid 4587
[   44.921575] [drm] GPU recovery disabled.
Comment 5 riku 2019-07-22 07:45:18 UTC
Another confimation of same issue on HP Envy x360 15-bq181no with Ryzen 5 2500U with Manjaro. Kernel option iommu=pt fixes it, didn't try other workarounds.
Comment 6 Nicholas Kazlauskas 2019-07-22 12:30:23 UTC
(In reply to Brian Schott from comment #3)
> I think that I'm seeing something related with my 2700u Inspiron 7375.
> 
> If I have compositing enabled in XFWM4, the system will immediately stop
> responding after logging in with LightDM. If the window manager compositing
> is disabled, I'm able to log in, but then there is graphical corruption.
> 
> With git bisect I traced the problem back to
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?h=df8368be1382&id=df8368be1382b442384507a5147c89978cd60702
> 
> I can edit the source file, and by only changing the KMS_DRIVER_MINOR
> definition from 32 to 30, get the system working correctly with 5.2.0.

Your issue in particular is likely unrelated - it's an issue in userspace.

The bisected commit is the one that allowed xf86-video-amdgpu to start scanning out DCC compressed buffers that mesa produces, with the caveat that mesa needs a hook on present for the re-tile. My guess is that hook isn't running when you aren't using "compositing".

I'm not sure if mesa or xf86-video-amdgpu have options yet to disable DCC or not, but for that particular setup you'd probably want it disabled.
Comment 7 Wiktor Kaczor 2019-08-08 08:43:58 UTC
I have the same problem using the Lenovo 530S-14ARR with Ryzen 5 2500U. 

The XFCE4 compositing makes the system immediately freeze as soon as I log in on Manjaro with the 5.2.4-1 kernel.

On the 5.3rc2 kernel, I can enable it and it doesn't immediately hang without recovery. That said, the compositing doesn't work and we have a lot of corruption as soon as anything updates on screen.

The "iommu=pt" option didn't do anything for me in regards to reducing the corruption.

I didn't know which logs to include so I didn't. Send reply with the names and hopefully, approximate path of any logs to include.
Comment 8 Michel Dänzer 2019-08-08 09:30:09 UTC
(In reply to Nicholas Kazlauskas from comment #6)
> The bisected commit is the one that allowed xf86-video-amdgpu to start
> scanning out DCC compressed buffers that mesa produces, with the caveat that
> mesa needs a hook on present for the re-tile. My guess is that hook isn't
> running when you aren't using "compositing".

Yeah, could be that radeonsi is missing cases where this hook needs to run (or DCC needs to be disabled altogether).

Pierre or Marek, can you look into this?
Comment 9 Pierre-Eric Pelloux-Prayer 2019-08-08 13:47:01 UTC
(In reply to Brian Schott from comment #3)
> I think that I'm seeing something related with my 2700u Inspiron 7375.
> 
> If I have compositing enabled in XFWM4, the system will immediately stop
> responding after logging in with LightDM. If the window manager compositing
> is disabled, I'm able to log in, but then there is graphical corruption.
> 
> With git bisect I traced the problem back to
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?h=df8368be1382&id=df8368be1382b442384507a5147c89978cd60702
> 
> I can edit the source file, and by only changing the KMS_DRIVER_MINOR
> definition from 32 to 30, get the system working correctly with 5.2.0.

I couldn't reproduce the problem (Ryzen 7 PRO 2700U laptop).

Could you list the version number of the various component involved (kernel, mesa, xf86-video-amdgpu and libdrm) please?

Also can you reproduce the problem with another desktop environment?
Comment 10 Brian Schott 2019-08-09 03:30:36 UTC
Just rebuilt mesa, libdrm, and xf86-video-amdgpu from git this evening. The kernel is the gentoo patched version of 5.2.6. The problem is not limited to XFCE's window manager. This is what it looks like in Dolphin: https://i.imgur.com/b8VLVP6.png
Comment 11 Chí-Thanh Christopher Nguyễn 2019-08-09 07:21:31 UTC
(In reply to Pierre-Eric Pelloux-Prayer from comment #9)
> Could you list the version number of the various component involved (kernel,
> mesa, xf86-video-amdgpu and libdrm) please?

kernel 5.2.7
mesa 19.0.8
libdrm 2.4.97
xf86-video-amdgpu 19.0.1
llvm 7.1.0
Comment 12 Pierre-Eric Pelloux-Prayer 2019-08-09 10:06:59 UTC
(In reply to Brian Schott from comment #10)
> Just rebuilt mesa, libdrm, and xf86-video-amdgpu from git this evening. The
> kernel is the gentoo patched version of 5.2.6. The problem is not limited to
> XFCE's window manager. This is what it looks like in Dolphin:
> https://i.imgur.com/b8VLVP6.png

Does using "AMD_DEBUG=nodcc" Mesa environment variable help?

Could you capture an apitrace of the issue in Dolphin so I can reproduce more easily?
Comment 13 Brian Schott 2019-08-10 17:46:51 UTC
(In reply to Pierre-Eric Pelloux-Prayer from comment #12)
> Does using "AMD_DEBUG=nodcc" Mesa environment variable help?

It does. Exporting that in my ~/.profile makes the desktop usable.

> Could you capture an apitrace of the issue in Dolphin so I can reproduce
> more easily?

It seems that the Dolphin issue is not related to the desktop graphics corruption. I tested it again using a kernel that had my hack to under-report the version number and saw the same rendering issues in Dolphin, but that kernel doesn't require the nodcc environment variable.
Comment 14 Chí-Thanh Christopher Nguyễn 2019-08-12 08:19:35 UTC
Created attachment 145035 [details]
screenshot

Setting AMD_DEBUG="nodcc" system-wide via /etc/profile will help against GUI freeze, but not against graphical corruption (screenshot attached).
Comment 15 Chí-Thanh Christopher Nguyễn 2019-08-12 14:49:26 UTC
It seems that the graphics corruption on my Dell Latitude 5495 with kernel 5.2 is a different issue. Kai-Heng Feng bisected and identified this in the Ubuntu Launchpad tracker already.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1837688#yui_3_10_3_1_1565621082983_410
Comment 16 Pierre-Eric Pelloux-Prayer 2019-08-13 08:15:34 UTC
(In reply to Brian Schott from comment #13)
> (In reply to Pierre-Eric Pelloux-Prayer from comment #12)
> > Does using "AMD_DEBUG=nodcc" Mesa environment variable help?
> 
> It does. Exporting that in my ~/.profile makes the desktop usable.
> 

Let's focus on this issue first.

Can you paste the output of: "AMD_DEBUG=info glxgears" please?

And would you be able to test other versions of Mesa to see if your issue could be bisected (if it happens to be a Mesa problem)?
Comment 17 Brian Schott 2019-08-15 09:26:57 UTC
Created attachment 145069 [details]
glxgears output

> Can you paste the output of: "AMD_DEBUG=info glxgears" please?

Done.

I'll try to bisect Mesa as time allows.
Comment 18 Wiktor Kaczor 2019-08-15 19:15:20 UTC
So, I've installed Compton as an alternative compositor on XFCE4 (I disabled the internal one), it works rather well The only problem I could find so far is that the graphics corruption persists when moving windows (the white blocky stuff that appears around all moving elements). Although, It does clear up as soon as I stop moving the window.

Alternatively, I was thinking of moving to using Wayland, however, XFCE4 doesn't seem to support it so that's not an option for me. Perhaps someone can test that on their own system. Anyhow, I'm happy with the functionality of my current albeit partial solution. I will continue to check in for a full solution though.
Comment 19 Brian Schott 2019-08-16 22:15:13 UTC
I haven't had a lot of time to work on this, but I do have one more data point: Mesa 19.1.4 has no corruption in Dolphin but still requires the nodcc workaround with kernel 5.2.6.
Comment 20 Brian Schott 2019-08-18 02:18:23 UTC
The following applies to the graphics corruption seen in Dolphin:

ea5b7de138bb7e9a4e7e4f0c39c4ceb16acae923 is the first bad commit
commit ea5b7de138bb7e9a4e7e4f0c39c4ceb16acae923
Author: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Date:   Wed Jul 3 19:27:12 2019 +0200

    radeonsi: make gl_SampleMaskIn = 0x1 when MSAA is disabled
    
    gl_SampleMaskIn is 1 when R_028BE0_PA_SC_AA_CONFIG is 0, so this commit rework the conditions
    controlling this register.
    
    Before it was set if the sctx->framebuffer had a sample count > 1.
    
    Now we still require this condition, but we also need either:
      - GL_MULTISAMPLE to be enabled
      - to be executing an operation that doesn't depends on GL state using u_blitter.
    
    This fixes the arb_sample_shading/sample_mask piglit tests on radeonsi.
    
    Signed-off-by: Marek Olšák <marek.olsak@amd.com>

 src/gallium/drivers/radeonsi/si_state.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Comment 21 Brian Schott 2019-08-18 10:22:58 UTC
As far as the issue about desktop corruption and lockup on login requiring the AMD_DEBUG=nodcc workaround:

b563460b494e9228cf5bb1aa4a70ac2499ad81fe is the first bad commit
commit b563460b494e9228cf5bb1aa4a70ac2499ad81fe
Author: Marek Olšák <marek.olsak@amd.com>
Date:   Tue Jan 8 20:08:08 2019 -0500

    radeonsi: enable displayable DCC on Ravens

 src/amd/common/ac_gpu_info.c                      | 8 ++++++++
 src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c | 4 ++++
 2 files changed, 12 insertions(+)
Comment 22 Marek Olšák 2019-08-19 23:31:02 UTC
How do I reproduce the Xfce hang and corruption? It's not reproducible with Ubuntu 16.04.
Comment 23 Brian Schott 2019-08-21 09:21:41 UTC
(In reply to Marek Olšák from comment #22)
> How do I reproduce the Xfce hang and corruption? It's not reproducible with
> Ubuntu 16.04.

How old is XFCE in 16.04? I'm using 4.14, which was released this month. 

Here's a quote from the release notes: "The window manager received a slew of updates and features, including support for VSync (using either Present or OpenGL as backend) to reduce or remove display flickering, HiDPI support, improved GLX support with NVIDIA proprietary/closed source drivers, support for XInput2, various compositor improvements and a new default theme." Maybe older versions of the window manager don't trigger the issue.

Either way, I have an extra SSD on the way. I should be able to swap that in to the machine and figure out some directions for reproducing the bug from a clean install.
Comment 24 James Le Cuirot 2019-08-26 10:14:14 UTC
It's hard to tell who's talking about which issue as there seems to be two or even three different ones going on here.

I believe I'm seeing the originally reported issue as iommu=pt helps on my 2700U. Without it, I get heavy corruption throughout KDE. I'm running OpenSUSE Leap 15.1 with kernel-default 5.2.10-1.g5878ee6. Going back to 5.1 also works.
Comment 25 andreaskem 2019-08-26 15:53:24 UTC
Apparently, Lenovo released a new BIOS update for my laptop (Thinkpad E485) today. The changelog mentions, "Sync IOAPICID in IVRS and APIC ACPI tables (Linux)." I installed the update and removed both the ivrs_ioapic[32]=00:14.0 flag, as well as the iommu=pt flag from my kernel command line for testing. To my surprise, the laptop booted just fine and from what I can tell, the graphical corruption seems to be gone. In the meantime, the kernel and mesa were updated a few times and I cannot pinpoint what, exactly, fixed my issues.

-> mesa 19.1.5
-> kernel 5.2.10.arch1-1
-> llvm-libs 8.0.1-3
Comment 26 Matt D. 2019-08-26 18:27:02 UTC
Created attachment 145172 [details]
amd_fix.sh

I've been having a problem with display corruption on Fedora using recent updates. Here is my report on their bug tracker:

https://bugzilla.redhat.com/show_bug.cgi?id=1745380

I can confirm that when compiz is enabled, the system will lock up when it's started. Adding "iommu=pt" to the kernel flags in grub2 will allow the system to boot without freezing but the display is still corrupt.

Adding "AMD_DEBUG=nodcc" to the environment provides a complete workaround (without the need for the kernel flag).

I've attached a file "amd_fix.sh" which if placed in /etc/profile.d/ will provide a workaround until this can be fixed in the kernel or wherever the problem is.
Comment 27 Matt D. 2019-08-26 18:28:25 UTC
I am using an HP Envy x360 13m with a Ryzen 7 2700U with the latest BIOS update.
Comment 28 Pierre-Eric Pelloux-Prayer 2019-09-19 09:32:02 UTC
(In reply to Brian Schott from comment #21)
> As far as the issue about desktop corruption and lockup on login requiring
> the AMD_DEBUG=nodcc workaround:
> 
> b563460b494e9228cf5bb1aa4a70ac2499ad81fe is the first bad commit
> commit b563460b494e9228cf5bb1aa4a70ac2499ad81fe
> Author: Marek Olšák <marek.olsak@amd.com>
> Date:   Tue Jan 8 20:08:08 2019 -0500
> 
>     radeonsi: enable displayable DCC on Ravens
> 
>  src/amd/common/ac_gpu_info.c                      | 8 ++++++++
>  src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c | 4 ++++
>  2 files changed, 12 insertions(+)


Could you test this commit https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2016/diffs?commit_id=4829f697ab2ceb2fc2772cc1220acc4185e6013d and let us know if it fixes this issue?
Comment 29 Brian Schott 2019-09-20 01:39:00 UTC
(In reply to Pierre-Eric Pelloux-Prayer from comment #28)
> 
> Could you test this commit
> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2016/
> diffs?commit_id=4829f697ab2ceb2fc2772cc1220acc4185e6013d and let us know if
> it fixes this issue?

Using kernel 5.3.0 and mesa git with that patch:
* Graphics corruption in Dolphin is gone (without nodcc env variable)
* System lockup is gone (without nodcc env variable)
* Desktop corruption is still present (without nodcc env variable)
* Desktop corruption is gone (nodcc env variable set)
Comment 30 William Bonnaventure 2019-09-23 17:31:59 UTC
Fixed issue with a BIOS update on Lenovo E485 (v1.54 with AMD 2500U) with Fedora 30 KDE.

I had an issue on previous BIOS that requires a kernel option to boot on all kernels (ivrs_ioapic[32]=00:14.0). After BIOS update, this option is not needed and there is no longer graphic corruption on Kernel 5.2+. This corruption was impacting at least Firefox, dolphin and the Plasma Desktop.

KDE plasma 5.15.5
mesa 19.1.6
xorg-x11-drv-amdgpu 19.0.1
libdrm 2.4.99

As of my observation, the graphic corruption was introduced in the amdgpu driver inside Kernel 5.2 and occurs only when forcing boot with the ivrs_ioapic override.
Comment 31 Martin Peres 2019-11-19 09:32:24 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/842.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.