Bug 94242 - [radeonsi] Crash while running Fedora mock tool for prompting root (gtksu)
Summary: [radeonsi] Crash while running Fedora mock tool for prompting root (gtksu)
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
Depends on:
Reported: 2016-02-22 02:49 UTC by Shawn Starr
Modified: 2016-03-15 07:17 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:

R600_DEBUG=check_vm capture of VM fault (67.37 KB, text/plain)
2016-02-22 02:50 UTC, Shawn Starr
apitrace reproducing the problem (4.08 MB, application/octet-stream)
2016-02-24 01:43 UTC, Michel Dänzer
check_vm dump from ddebug-shader-dump branch (226.17 KB, text/plain)
2016-02-26 09:08 UTC, Michel Dänzer
problematic shader (43.05 KB, text/plain)
2016-02-26 16:21 UTC, Marek Olšák

Description Shawn Starr 2016-02-22 02:49:30 UTC
Kernel: 4.5.0-0.rc4.git2.2.fc24.x86_64
MESA: git master, Feb 19-20th builds
LLVM: trunk 3.9, Feb 19-20th builds

For some reason, X crashes with radeonsi triggering VM fault, when in GNOME or KDE environments:

If you open up a shell console (gnome-terminal, konsole etc), run mock as non-root, as soon as an attempt to prompt for root happens (with gtksu) it locks up system.

[   38.111551] radeon 0000:01:00.0: GPU fault detected: 146 0x0008480c
[   38.111861] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[   38.112219] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0804800C
[   38.112576] VM fault (0x0c, vmid 4) at page 0, read from 'TC2' (0x54433200) (72)
[   38.112931] radeon 0000:01:00.0: GPU fault detected: 146 0x0008440c
[   38.113229] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x08000000
[   38.113587] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044001
[   38.113945] VM fault (0x01, vmid 4) at page 134217728, read from 'TC3' (0x54433300) (68)

I've attached a VM fault debug of crash
Comment 1 Shawn Starr 2016-02-22 02:50:15 UTC
Created attachment 121873 [details]
R600_DEBUG=check_vm capture of VM fault
Comment 2 Michel Dänzer 2016-02-22 07:04:21 UTC
(In reply to Shawn Starr from comment #0)
> If you open up a shell console (gnome-terminal, konsole etc), run mock as
> non-root, as soon as an attempt to prompt for root happens (with gtksu) it
> locks up system.

Since you can retrieve the GPUVM fault messages, it's hard to believe that the system locks up completely. Can you try logging in via ssh after the problem occurs and getting a gdb backtrace of the Xorg process?
Comment 3 Michel Dänzer 2016-02-23 07:19:24 UTC
This isn't limited to gtksu. I can reproduce it fairly quickly by playing around with a MATE desktop session (which seems to use GTK2). OTOH I haven't run into it with this GNOME3 session, which mostly uses GTK3, though with some GTK2 apps as well.

I bisected it to 9aaf28da ("radeonsi: enable compiling one variant per shader"). I also confirmed that it happens with Marek's current si-one-variant branch as well as an older snapshot of that branch.

Now the "fun" part will be tracking down which glamor shaders are broken by this and why. Meanwhile, it might be better to disable the single shader variant by default, especially on the 11.2 branch.
Comment 4 Shawn Starr 2016-02-23 15:39:34 UTC
Attempting to attach gdb to X, I am unable to break out of gdb.

X info:

X.Org X Server 1.18.0
Release Date: 2015-11-09
X Protocol Version 11, Revision 0
Comment 5 Michel Dänzer 2016-02-24 01:43:49 UTC
Created attachment 121931 [details]
apitrace reproducing the problem

This apitrace reproduces the problem for me on Kaveri and Tonga.
Comment 6 Marek Olšák 2016-02-25 16:55:51 UTC
Sadly, I can't reproduce this on Verde, Bonaire, Tonga using the apitrace.

Could you please get a new check_vm report with this branch?

Comment 7 Michel Dänzer 2016-02-26 09:08:13 UTC
Created attachment 121979 [details]
check_vm dump from ddebug-shader-dump branch
Comment 8 Marek Olšák 2016-02-26 11:20:17 UTC
The bad news is the check_vm report probably doesn't contain the problematic shaders. The good news is I can reproduce this after updating LLVM, thus this is an LLVM bug. I'm bisecting.
Comment 9 Marek Olšák 2016-02-26 15:12:09 UTC
The first bad commit:

commit 98ef4478258fda9028cd1786841eca952c136319
Author: Tom Stellard <thomas.stellard@amd.com>
Date:   Fri Feb 12 23:45:29 2016 +0000

    AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions
    Reviewers: arsenm
    Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits
    Differential Revision: http://reviews.llvm.org/D16603
    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260765 91177308-0d34-0410-b5e6-96231b3b80d8
Comment 10 Marek Olšák 2016-02-26 16:21:43 UTC
Created attachment 121988 [details]
problematic shader
Comment 11 Marek Olšák 2016-02-26 16:29:30 UTC
The problematic shader is attached. It has "s_branch" at the end "ret" somewhere in the middle. My initial theory is that the shader fails to jump to the epilog, which is outside of the binary, and jumps somewhere else. It may be even stuck in an infinite loop due to an incorrect jump.
Comment 12 Oleg Suchilov 2016-03-09 18:37:36 UTC
got the same issue.
Gigabyte HD7870
Arch Linux x64
mesa-git - from mesa-git repo (http://pkgbuild.com/~lcarlier/mesa-git/)
kernel - linux-mainline 4.5.0-rc7-mainline

radeon 0000:01:00.0: GPU fault detected: 147 0x0c024801
radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0FFFF860
radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02048001

last working version was on commit 89d25a8  (mesa-11.2)
problems started after commit ff360a5   (mesa-11.3)

when i launch 'plank' or 'mate-system-monitor' everything freezes (on some versions/commits my mouse is still working and on some it doesn't) and my Xorg server crashes. sometimes the session restarts, but after login the opengl is not available (glxinfo shows some errors)
Comment 13 Marek Olšák 2016-03-10 17:02:15 UTC
The fix is under review: http://reviews.llvm.org/D17964
Comment 14 Michel Dänzer 2016-03-15 07:17:32 UTC
Fixed in LLVM SVN r263441.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.