MESA: git master, Feb 19-20th builds
LLVM: trunk 3.9, Feb 19-20th builds
For some reason, X crashes with radeonsi triggering VM fault, when in GNOME or KDE environments:
If you open up a shell console (gnome-terminal, konsole etc), run mock as non-root, as soon as an attempt to prompt for root happens (with gtksu) it locks up system.
[ 38.111551] radeon 0000:01:00.0: GPU fault detected: 146 0x0008480c
[ 38.111861] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
[ 38.112219] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0804800C
[ 38.112576] VM fault (0x0c, vmid 4) at page 0, read from 'TC2' (0x54433200) (72)
[ 38.112931] radeon 0000:01:00.0: GPU fault detected: 146 0x0008440c
[ 38.113229] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x08000000
[ 38.113587] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044001
[ 38.113945] VM fault (0x01, vmid 4) at page 134217728, read from 'TC3' (0x54433300) (68)
I've attached a VM fault debug of crash
Created attachment 121873 [details]
R600_DEBUG=check_vm capture of VM fault
(In reply to Shawn Starr from comment #0)
> If you open up a shell console (gnome-terminal, konsole etc), run mock as
> non-root, as soon as an attempt to prompt for root happens (with gtksu) it
> locks up system.
Since you can retrieve the GPUVM fault messages, it's hard to believe that the system locks up completely. Can you try logging in via ssh after the problem occurs and getting a gdb backtrace of the Xorg process?
This isn't limited to gtksu. I can reproduce it fairly quickly by playing around with a MATE desktop session (which seems to use GTK2). OTOH I haven't run into it with this GNOME3 session, which mostly uses GTK3, though with some GTK2 apps as well.
I bisected it to 9aaf28da ("radeonsi: enable compiling one variant per shader"). I also confirmed that it happens with Marek's current si-one-variant branch as well as an older snapshot of that branch.
Now the "fun" part will be tracking down which glamor shaders are broken by this and why. Meanwhile, it might be better to disable the single shader variant by default, especially on the 11.2 branch.
Attempting to attach gdb to X, I am unable to break out of gdb.
X.Org X Server 1.18.0
Release Date: 2015-11-09
X Protocol Version 11, Revision 0
Created attachment 121931 [details]
apitrace reproducing the problem
This apitrace reproduces the problem for me on Kaveri and Tonga.
Sadly, I can't reproduce this on Verde, Bonaire, Tonga using the apitrace.
Could you please get a new check_vm report with this branch?
Created attachment 121979 [details]
check_vm dump from ddebug-shader-dump branch
The bad news is the check_vm report probably doesn't contain the problematic shaders. The good news is I can reproduce this after updating LLVM, thus this is an LLVM bug. I'm bisecting.
The first bad commit:
Author: Tom Stellard <email@example.com>
Date: Fri Feb 12 23:45:29 2016 +0000
AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions
Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D16603
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260765 91177308-0d34-0410-b5e6-96231b3b80d8
Created attachment 121988 [details]
The problematic shader is attached. It has "s_branch" at the end "ret" somewhere in the middle. My initial theory is that the shader fails to jump to the epilog, which is outside of the binary, and jumps somewhere else. It may be even stuck in an infinite loop due to an incorrect jump.
got the same issue.
Arch Linux x64
mesa-git - from mesa-git repo (http://pkgbuild.com/~lcarlier/mesa-git/)
kernel - linux-mainline 4.5.0-rc7-mainline
radeon 0000:01:00.0: GPU fault detected: 147 0x0c024801
radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FFFF860
radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02048001
last working version was on commit 89d25a8 (mesa-11.2)
problems started after commit ff360a5 (mesa-11.3)
when i launch 'plank' or 'mate-system-monitor' everything freezes (on some versions/commits my mouse is still working and on some it doesn't) and my Xorg server crashes. sometimes the session restarts, but after login the opengl is not available (glxinfo shows some errors)
The fix is under review: http://reviews.llvm.org/D17964
Fixed in LLVM SVN r263441.