Bug 97504

Summary: Enabling SDMA on CIK (0241d8300f66ee2c6c2c55fe64ac88d76440c591) causes corruption on a mobile Bonaire with AMDGPU DDX / video desktop recording
Product: Mesa Reporter: Shawn Starr <shawn.starr>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: VERIFIED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: kai, vedran
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Shawn Starr 2016-08-27 00:29:36 UTC
With latest agd5f drm-fixes-4.8 / drm-next-4.9-wip + Linus master kernel:

Latest git mesa master with SDMA enabled on CIK patch:

1) AMDGPU DDX shows squared corruption, X locks up
2) Using vlc desktop recording / vokoscreen desktop recording get corrupted video recording

Kernel spits out GPUVPM faults:

[ 6612.359198] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0248770c
[ 6612.359199] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0010C836
[ 6612.359199] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0807700C
[ 6612.359200] VM fault (0x0c, vmid 4) at page 1099830, read from 'SDM0' (0x53444d30) (119)

  Revert "radeonsi: enable SDMA on CIK"
    
    This reverts commit 0241d8300f66ee2c6c2c55fe64ac88d76440c591.


When reverted problems go away.
Comment 1 Vedran Miletić 2016-08-29 13:51:24 UTC
It's a long shot, but does https://lists.freedesktop.org/archives/mesa-dev/2016-August/127318.html fix it?
Comment 2 Luke A. Guest 2016-08-30 19:44:32 UTC
(In reply to Vedran Miletić from comment #1)
> It's a long shot, but does
> https://lists.freedesktop.org/archives/mesa-dev/2016-August/127318.html fix
> it?

I can confirm on R9 390, this patch stops the GPU page faults in dmesg log, but I'm still getting major corruption when recording the screen in obs or ffmeg https://youtu.be/pFqhIGYLbDM
Comment 3 Shawn Starr 2016-08-31 02:51:04 UTC
Revert this patch to fix VDPAU corruption:

"radeonsi: increase performance for DRI PRIME offloading if 2nd GPU is CIK or VI"
5ee3cac1380fec6971e9d25267589a586da0ecd8.
Comment 4 Luke A. Guest 2016-08-31 16:37:20 UTC
(In reply to Shawn Starr from comment #3)
> Revert this patch to fix VDPAU corruption:
> 
> "radeonsi: increase performance for DRI PRIME offloading if 2nd GPU is CIK
> or VI"
> 5ee3cac1380fec6971e9d25267589a586da0ecd8.

I reverted this, rebuilt mesa, ran the ffmpeg command, played it back, the first 4 seconds is corrupt in the same way as before, then it's fine.

Rebuilt obs-studio, the window now shows an uncorrupted screen! REbuilt ffmpeg and no corruption.
Comment 5 Luke A. Guest 2016-08-31 17:13:58 UTC
I can also confirm that the pagefaults above are back with this patch reversion.
Comment 6 Michel Dänzer 2016-09-09 03:02:36 UTC
*** Bug 97610 has been marked as a duplicate of this bug. ***
Comment 7 Michel Dänzer 2016-09-09 03:07:16 UTC
This issue wasn't limited to mobile Bonaire. One interesting hint from bug 97610 is that it works fine with the radeon kernel driver, so maybe it's related to addrlib or the amdgpu winsys.

Anyway, fixed for now in Git master. Note that you need to start X with the fixed Mesa, because it affects glamor as well.

Module: Mesa
Branch: master
Commit: 93f3d8e10d712336b86ebe17dafaee0aac7ec429
URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=93f3d8e10d712336b86ebe17dafaee0aac7ec429

Author: Marek Olšák <marek.olsak@amd.com>
Date:   Thu Sep  8 18:21:04 2016 +0200

Revert "radeonsi: enable SDMA on CIK"
Comment 8 Kai 2016-09-09 17:56:16 UTC
(In reply to Michel Dänzer from comment #7)
> This issue wasn't limited to mobile Bonaire. One interesting hint from bug
> 97610 is that it works fine with the radeon kernel driver, so maybe it's
> related to addrlib or the amdgpu winsys.
> 
> Anyway, fixed for now in Git master. Note that you need to start X with the
> fixed Mesa, because it affects glamor as well.
> 
> Module: Mesa
> Branch: master
> Commit: 93f3d8e10d712336b86ebe17dafaee0aac7ec429
> URL: <http://cgit.freedesktop.org/mesa/mesa/commit/?id=93f3d8e10d712336b86ebe17dafaee0aac7ec429>
> 
> Author: Marek Olšák <marek.olsak@amd.com>
> Date:   Thu Sep  8 18:21:04 2016 +0200
> 
> Revert "radeonsi: enable SDMA on CIK"

I can confirm this fixes the corruption I've reported in bug 97610.

Since Marek mentioned in the commit message of the revert, that it's maybe some tile configuration issue, I was wondering if Tom was correct in bug 97610, comment #1? Should I do the test – obviously with the revert undone – with the tiling configurations copied from radeon (bug 97610, comment #2)? Or is this something else?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.