Bug 104192

Summary: [amdgpu][VEGA10] regular lockups with VM_L2_PROTECTION_FAULT_STATUS
Product: DRI Reporter: Tom Englund <tomenglund26>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: medium CC: 9parsonsb, sarnex
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Tom Englund 2017-12-10 13:13:56 UTC
i have been having somewhat unreproduceable lockups with my rx 56 vega all going from 4.12 amd-staging up until 4.15-rc2 where its mainlined. but for some reason i can reproduce it now with simply just open pavucontrol, and weirdly enough only that.

i can properly play games, dirt rally, cs:go, run benchmarks or just leave the computer on for days. but as fast as i open pavucontrol it lockups and this happends in dmesg. and i dont seem to be all alone with it either, here is a user with a rx 64 vega on archlinux having same issue/errors. 
https://bbs.archlinux.org/viewtopic.php?id=232519


amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
amdgpu 0000:03:00.0:   at page 0x0000000000000000 from 27
amdgpu 0000:03:00.0: [gfxhub] VMC page fault (src_id:0 ring:171 vm_id:4 pas_id:0)

System info:
Radeon RX Vega (VEGA10 / DRM 3.23.0 / 4.15.0-rc2-mainline, LLVM 6.0.0)
Mesa 17.4.0-devel (git-4c7af87fb9)
Comment 1 Z G 2017-12-10 19:17:17 UTC
I've ran into this bug, can you confirm this fixes it and is the same issue? 

I downgraded mesa-git to 109de3049d. I believe anything before Oct. 26th works for me. 3ba973fe37 breaks but that's all the way at Oct. 30th. I haven't binary searched yet to find the offending commit.
Comment 2 Z G 2017-12-10 22:58:14 UTC
Alright I figured it out for me, there was a regression in llvm/clang which was giving me protection faults. I rolled back my llvm a week and built mesa with it and it works fine now.

Hope that helps people. I'll continue looking into what caused it
Comment 3 Tom Englund 2017-12-10 23:01:33 UTC
(In reply to ejr.phone from comment #2)
> Alright I figured it out for me, there was a regression in llvm/clang which
> was giving me protection faults. I rolled back my llvm a week and built mesa
> with it and it works fine now.
> 
> Hope that helps people. I'll continue looking into what caused it

yeah was just about to say it still froze with same errors on that mesa commit. time to try rollback llvm then.
Comment 4 Tom Englund 2017-12-10 23:16:29 UTC
(In reply to Tom Englund from comment #3)
> (In reply to ejr.phone from comment #2)
> > Alright I figured it out for me, there was a regression in llvm/clang which
> > was giving me protection faults. I rolled back my llvm a week and built mesa
> > with it and it works fine now.
> > 
> > Hope that helps people. I'll continue looking into what caused it
> 
> yeah was just about to say it still froze with same errors on that mesa
> commit. time to try rollback llvm then.

llvm-svn revision 320250 gives me the freezes meanwhile, 317901 doesnt.
so its somewhere in between that, not sure what tooling svn have for bisecting, like git bisect. but perhaps that helps someone.
Comment 5 Ben Parsons 2017-12-11 06:10:10 UTC
I am the user from the Arch Forums mentioned in comment #0. Is there anything I can do to help this along?
Comment 6 Z G 2017-12-11 07:16:54 UTC
looks like #104001 and #104159 are duplicates
Comment 7 Felix Schwarz 2017-12-11 07:39:51 UTC
(In reply to Tom Englund from comment #4)
> so its somewhere in between that, not sure what tooling svn have for
> bisecting, like git bisect. but perhaps that helps someone.

You can use git-svn for bisecting. :-)
Comment 8 ojab 2017-12-11 07:54:12 UTC
JFYI: llvm has official git mirror for all repos https://llvm.org/docs/GettingStarted.html#git-mirror
Comment 9 Michel Dänzer 2017-12-11 10:48:07 UTC
FWIW, I bisected some piglit regressions to LLVM SVN r319894.
Comment 10 Tom Englund 2017-12-11 12:52:28 UTC
(In reply to Michel Dänzer from comment #9)
> FWIW, I bisected some piglit regressions to LLVM SVN r319894.

thanks, that gave me some starting points to bisecting, narrowed things down now to where r319882 works while r319894 freezes. starting to look like the same regression you are facing.
Comment 11 Tom Englund 2017-12-11 13:22:14 UTC
(In reply to Tom Englund from comment #10)
> (In reply to Michel Dänzer from comment #9)
> > FWIW, I bisected some piglit regressions to LLVM SVN r319894.
> 
> thanks, that gave me some starting points to bisecting, narrowed things down
> now to where r319882 works while r319894 freezes. starting to look like the
> same regression you are facing.

after yet another compilation, r319893 works. so it would mean this commit is the cause for the freezes/errors.
 
https://github.com/llvm-mirror/llvm/commit/3b06fccc7749b974d2905fe852b389b4697485b7
Comment 12 Tom Englund 2017-12-12 16:57:37 UTC
the commit causing this have been reverted upstream, see https://reviews.llvm.org/rL320466 . so from 320466 and up. things work again.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.