Bug 111948

Summary: [Vega10][bisected] Vega56 VM_L2_PROTECTION_FAULT when logging into KDE with kernel 5.3.0-rc1 and newer
Product: DRI Reporter: Andreas <freedesktop>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: critical    
Priority: high    
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg for 5.3.0-rc1 none

Description Andreas 2019-10-09 20:10:35 UTC
Created attachment 145692 [details]
dmesg for 5.3.0-rc1

Hi,

I'm getting this problem when I log into KDE (5.16.5) if I use kernel 5.3.0-rc1 or newer including 5.4.0-rc2, it happens every time, the splash show for a few seconds and then the screen goes black and it comes back garbled.

I've done a bisect between 5.2 and 5.3.0-rc1:

52d2d44eee8091e740d0d275df1311fb8373c9a9 is the first bad commit

This is a merge commit though...

[   19.840654] amdgpu 0000:0a:00.0: [gfxhub] no-retry page fault (src_id:0 ring:144 vmid:1 pasid:32769, for process X pid 4118 thread X:cs0 pid 4169)
[   19.840656] amdgpu 0000:0a:00.0:   in page starting at address 0x000080010a021000 from 27
[   19.840657] amdgpu 0000:0a:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101121
[   19.840661] amdgpu 0000:0a:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 4118 thread X:cs0 pid 4169)
[   19.840662] amdgpu 0000:0a:00.0:   in page starting at address 0x000080010a036000 from 27
[   19.840662] amdgpu 0000:0a:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   19.840665] amdgpu 0000:0a:00.0: [gfxhub] no-retry page fault (src_id:0 ring:157 vmid:1 pasid:32769, for process X pid 4118 thread X:cs0 pid 4169)
[   19.840666] amdgpu 0000:0a:00.0:   in page starting at address 0x000080010a037000 from 27
[   19.840666] amdgpu 0000:0a:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   19.840670] amdgpu 0000:0a:00.0: [gfxhub] no-retry page fault (src_id:0 ring:157 vmid:1 pasid:32769, for process X pid 4118 thread X:cs0 pid 4169)
[   19.840670] amdgpu 0000:0a:00.0:   in page starting at address 0x000080010a035000 from 27
[   19.840671] amdgpu 0000:0a:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   19.840674] amdgpu 0000:0a:00.0: [gfxhub] no-retry page fault (src_id:0 ring:157 vmid:1 pasid:32769, for process X pid 4118 thread X:cs0 pid 4169)
[   19.840675] amdgpu 0000:0a:00.0:   in page starting at address 0x000080010a035000 from 27
[   19.840675] amdgpu 0000:0a:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   19.840679] amdgpu 0000:0a:00.0: [gfxhub] no-retry page fault (src_id:0 ring:157 vmid:1 pasid:32769, for process X pid 4118 thread X:cs0 pid 4169)
[   19.840679] amdgpu 0000:0a:00.0:   in page starting at address 0x000080010a035000 from 27
[   19.840680] amdgpu 0000:0a:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   19.840683] amdgpu 0000:0a:00.0: [gfxhub] no-retry page fault (src_id:0 ring:128 vmid:1 pasid:32769, for process X pid 4118 thread X:cs0 pid 4169)
[   19.840683] amdgpu 0000:0a:00.0:   in page starting at address 0x000080010a025000 from 27
[   19.840684] amdgpu 0000:0a:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   19.840690] amdgpu 0000:0a:00.0: [gfxhub] no-retry page fault (src_id:0 ring:157 vmid:1 pasid:32769, for process X pid 4118 thread X:cs0 pid 4169)
[   19.840690] amdgpu 0000:0a:00.0:   in page starting at address 0x000080010a05b000 from 27
[   19.840691] amdgpu 0000:0a:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   19.840694] amdgpu 0000:0a:00.0: [gfxhub] no-retry page fault (src_id:0 ring:157 vmid:1 pasid:32769, for process X pid 4118 thread X:cs0 pid 4169)
[   19.840695] amdgpu 0000:0a:00.0:   in page starting at address 0x000080010a05b000 from 27
[   19.840695] amdgpu 0000:0a:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   19.840698] amdgpu 0000:0a:00.0: [gfxhub] no-retry page fault (src_id:0 ring:157 vmid:1 pasid:32769, for process X pid 4118 thread X:cs0 pid 4169)
[   19.840699] amdgpu 0000:0a:00.0:   in page starting at address 0x000080010a058000 from 27
[   19.840699] amdgpu 0000:0a:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   25.066367] [drm:amdgpu_dm_atomic_commit_tail] *ERROR* Waiting for fences timed out or interrupted!
[   25.066369] [drm:amdgpu_dm_atomic_commit_tail] *ERROR* Waiting for fences timed out or interrupted!
[   30.194884] [drm:amdgpu_job_timedout] *ERROR* ring sdma0 timeout, signaled seq=491, emitted seq=493
[   30.194886] [drm:amdgpu_job_timedout] *ERROR* Process information: process ksplashqml pid 4333 thread ksplashqml:cs0 pid 4336
[   30.194889] amdgpu 0000:0a:00.0: GPU reset begin!
[   30.195169] [drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, but soft recovered

I've attached the dmesg log.
Comment 1 Andreas 2019-10-13 12:17:17 UTC
Kernel 5.3.6 seems to have fixed the issue.
Comment 2 Andreas 2019-10-14 18:37:47 UTC
I spoke too soon, I'm getting it on 5.3.6 now as well as on 5.4.0-rc3
Comment 3 Martin Peres 2019-11-19 09:57:25 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/932.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.