Bug 94900 - HD6950 GPU lockup loop with various steam games (octodad[always], saints row 4[always], dead island[always], grid autosport[sometimes])
Summary: HD6950 GPU lockup loop with various steam games (octodad[always], saints row ...
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600 (show other bugs)
Version: 11.2
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-11 21:41 UTC by Daniel T.
Modified: 2017-01-12 10:53 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (92.53 KB, text/plain)
2016-04-11 21:41 UTC, Daniel T.
Details
Grid Autosport crash GLSL dump SB (688.36 KB, application/x-xz)
2016-06-06 11:57 UTC, Adam Lyall
Details
Octodad call 30070 fragment shader dump (16.27 KB, text/plain)
2016-07-16 16:20 UTC, Daniel T.
Details
Octodad call 30070 vertex shader dump (6.02 KB, text/plain)
2016-07-16 16:20 UTC, Daniel T.
Details
Octrodad crashing pixel shader - TGSI, disassembly of normal and SB optimised versions (147.94 KB, text/plain)
2016-07-17 21:36 UTC, i.kalvachev
Details
Octrodad crashing pixel shader - TGSI, disassembly, sbdump of IR code and crash backtrace (1.47 MB, text/plain)
2016-07-18 09:04 UTC, i.kalvachev
Details
Simple workaround (526 bytes, patch)
2016-07-23 08:43 UTC, Heiko
Details | Splinter Review
Possible fix for the lockups (8.27 KB, patch)
2016-09-28 17:26 UTC, Heiko
Details | Splinter Review
Cleaned up version of the porposed fix (6.45 KB, patch)
2016-10-30 09:11 UTC, Heiko
Details | Splinter Review

Description Daniel T. 2016-04-11 21:41:59 UTC
Created attachment 122873 [details]
dmesg

I Am seeing gpu lockups trying to launch octodad: dadliest catch with a radeon HD6950 on steam. It was working before but its been a few months since I've last tried to launch it. Both mesa and kernel have been updated since it was last working.
kernel 4.4->4.5  mesa 11.1->11.2. Possibly libdrm updates too.

[  267.335354] radeon 0000:02:00.0: ring 0 stalled for more than 10416msec
[  267.335364] radeon 0000:02:00.0: GPU lockup (current fence id 0x000000000000821e last fence id 0x0000000000008229 on ring 0)
[  267.444949] radeon 0000:02:00.0: Saved 335 dwords of commands on ring 0.
[  267.445029] radeon 0000:02:00.0: GPU softreset: 0x00000009
[  267.445031] radeon 0000:02:00.0:   GRBM_STATUS               = 0xE77E4828
[  267.445032] radeon 0000:02:00.0:   GRBM_STATUS_SE0           = 0xFF800001
[  267.445033] radeon 0000:02:00.0:   GRBM_STATUS_SE1           = 0xFF800001
[  267.445034] radeon 0000:02:00.0:   SRBM_STATUS               = 0x200000C0
[  267.445101] radeon 0000:02:00.0:   SRBM_STATUS2              = 0x00000000
[  267.445102] radeon 0000:02:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  267.445103] radeon 0000:02:00.0:   R_008678_CP_STALLED_STAT2 = 0x00018000
[  267.445104] radeon 0000:02:00.0:   R_00867C_CP_BUSY_STAT     = 0x00008006
[  267.445106] radeon 0000:02:00.0:   R_008680_CP_STAT          = 0x80038647
[  267.445107] radeon 0000:02:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  267.445108] radeon 0000:02:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[  267.445109] radeon 0000:02:00.0:   VM_CONTEXT0_PROTECTION_FAULT_ADDR   0x00000000
[  267.445111] radeon 0000:02:00.0:   VM_CONTEXT0_PROTECTION_FAULT_STATUS 0x00000000
[  267.445112] radeon 0000:02:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  267.445113] radeon 0000:02:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  267.461656] radeon 0000:02:00.0: GRBM_SOFT_RESET=0x0000DF7B
[  267.461709] radeon 0000:02:00.0: SRBM_SOFT_RESET=0x00000100
[  267.462863] radeon 0000:02:00.0:   GRBM_STATUS               = 0x00003828
[  267.462864] radeon 0000:02:00.0:   GRBM_STATUS_SE0           = 0x00000007
[  267.462866] radeon 0000:02:00.0:   GRBM_STATUS_SE1           = 0x00000007
[  267.462867] radeon 0000:02:00.0:   SRBM_STATUS               = 0x200000C0
[  267.462933] radeon 0000:02:00.0:   SRBM_STATUS2              = 0x00000000
[  267.462934] radeon 0000:02:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  267.462935] radeon 0000:02:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[  267.462937] radeon 0000:02:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[  267.462938] radeon 0000:02:00.0:   R_008680_CP_STAT          = 0x00000000
[  267.462939] radeon 0000:02:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  267.462940] radeon 0000:02:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[  267.463016] radeon 0000:02:00.0: GPU reset succeeded, trying to resume
Comment 1 Daniel T. 2016-04-11 22:53:13 UTC
I don't think this is actually a regression. It's likely I was using the onboard intel graphics last I ran this game
Comment 2 Daniel T. 2016-04-15 03:49:33 UTC
System is using (up to date Arch Linux)
Mesa 11.2
LLVM 3.7.1
libdrm 2.4.67

glxinfo
---
OpenGL vendor string: X.Org
OpenGL renderer string: Gallium 0.4 on AMD CAYMAN (DRM 2.43.0, LLVM 3.7.1)
OpenGL core profile version string: 4.1 (Core Profile) Mesa 11.2.0
OpenGL core profile shading language version string: 4.10
...
OpenGL version string: 3.0 Mesa 11.2.0
OpenGL shading language version string: 1.30
---

This is what the game is reporting when launched from terminal.
---
Irrlicht Engine version 1.8.0
Creating LogManager.
Creating Irrlicht Device.
Creating SteamManager.
[S_API FAIL] SteamAPI_Init() failed; no appID found.
Either launch the game from Steam, or put the file steam_appid.txt containing the correct appID in your game folder.
Warning: Could not connect to steam client.
Irrlicht Engine version 1.8.0
Using renderer: OpenGL 3.0
Gallium 0.4 on AMD CAYMAN (DRM 2.43.0, LLVM 3.7.1): X.Org
OpenGL driver version is 1.2 or better.
GLSL version: 1.3
---
Comment 3 Daniel T. 2016-05-03 11:06:32 UTC
Saints row 4 causes this lockup

[118144.509627] radeon 0000:01:00.0: ring 0 stalled for more than 10060msec
[118144.509638] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000001dccef9 last fence id 0x0000000001dccefe on ring 0)
[118144.582926] radeon 0000:01:00.0: ring 4 stalled for more than 10133msec
[118144.582941] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000a861a1 last fence id 0x0000000000a861a2 on ring 4)
[118145.009595] radeon 0000:01:00.0: ring 0 stalled for more than 10560msec
[118145.009605] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000001dccef9 last fence id 0x0000000001dccefe on ring 0)
[118145.082921] radeon 0000:01:00.0: ring 4 stalled for more than 10633msec
[118145.082930] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000a861a1 last fence id 0x0000000000a861a2 on ring 4)
[118145.509637] radeon 0000:01:00.0: ring 0 stalled for more than 11060msec
[118145.509647] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000001dccef9 last fence id 0x0000000001dccefe on ring 0)
[118145.579594] radeon 0000:01:00.0: ring 4 stalled for more than 11130msec
[118145.579603] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000a861a1 last fence id 0x0000000000a861a2 on ring 4)
[118146.009631] radeon 0000:01:00.0: ring 0 stalled for more than 11560msec
[118146.009641] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000001dccef9 last fence id 0x0000000001dccefe on ring 0)
[118146.082911] radeon 0000:01:00.0: ring 4 stalled for more than 11633msec
[118146.082921] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000a861a1 last fence id 0x0000000000a861a2 on ring 4)
[118146.379880] radeon 0000:01:00.0: Saved 31 dwords of commands on ring 0.
[118146.379961] radeon 0000:01:00.0: GPU softreset: 0x00000029
[118146.379963] radeon 0000:01:00.0:   GRBM_STATUS               = 0xE5703828
[118146.379964] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0xFC000007
[118146.379965] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[118146.379966] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[118146.380033] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[118146.380034] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[118146.380035] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00018000
[118146.380036] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00008004
[118146.380037] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80038647
[118146.380039] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[118146.380040] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83146
[118146.380041] radeon 0000:01:00.0:   VM_CONTEXT0_PROTECTION_FAULT_ADDR   0x00000000
[118146.380043] radeon 0000:01:00.0:   VM_CONTEXT0_PROTECTION_FAULT_STATUS 0x00000000
[118146.380044] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[118146.380045] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[118146.390413] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DF7B
[118146.390465] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000140
[118146.391620] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[118146.391621] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[118146.391622] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[118146.391623] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[118146.391690] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[118146.391691] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[118146.391692] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[118146.391693] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[118146.391694] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[118146.391696] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[118146.391697] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[118146.391772] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
Comment 4 Marcin Juszkiewicz 2016-05-04 21:23:21 UTC
I have R7 R240 and run fedora/rawhide. Either Torchlight II under Steam/Linux or Diablo III under wine can not be played too long:

maj 04 22:54:33 puchatek.local kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10020msec
maj 04 22:54:33 puchatek.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000001aa12 last fence id 0x000000000001aa22 on ring 0)
maj 04 22:54:34 puchatek.local kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10521msec
maj 04 22:54:34 puchatek.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000001aa12 last fence id 0x000000000001aa22 on ring 0)
maj 04 22:54:34 puchatek.local kernel: radeon 0000:01:00.0: ring 0 stalled for more than 11022msec
maj 04 22:54:34 puchatek.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000001aa12 last fence id 0x000000000001aa22 on ring 0)
maj 04 22:54:35 puchatek.local kernel: sysrq: SysRq : Emergency Sync
maj 04 22:54:35 puchatek.local kernel: Emergency Sync complete
maj 04 22:54:35 puchatek.local kernel: radeon 0000:01:00.0: ring 0 stalled for more than 11523msec
maj 04 22:54:35 puchatek.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000001aa12 last fence id 0x000000000001aa22 on ring 0)
maj 04 22:54:35 puchatek.local kernel: sysrq: SysRq : HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sy
maj 04 22:54:35 puchatek.local kernel: radeon 0000:01:00.0: ring 0 stalled for more than 12024msec
maj 04 22:54:35 puchatek.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000001aa12 last fence id 0x000000000001aa22 on ring 0)
maj 04 22:54:36 puchatek.local kernel: sysrq: SysRq : HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sy
maj 04 22:54:36 puchatek.local kernel: radeon 0000:01:00.0: ring 0 stalled for more than 12525msec
maj 04 22:54:36 puchatek.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000001aa12 last fence id 0x000000000001aa22 on ring 0)
maj 04 22:54:36 puchatek.local kernel: radeon 0000:01:00.0: ring 0 stalled for more than 13026msec
maj 04 22:54:36 puchatek.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000001aa12 last fence id 0x000000000001aa22 on ring 0)
maj 04 22:54:37 puchatek.local kernel: radeon 0000:01:00.0: ring 0 stalled for more than 13527msec
maj 04 22:54:37 puchatek.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000001aa12 last fence id 0x000000000001aa28 on ring 0)
maj 04 22:54:37 puchatek.local kernel: show_signal_msg: 195 callbacks suppressed
maj 04 22:54:37 puchatek.local kernel: QDBusConnection[1764]: segfault at 7f3c1c387790 ip 00007f3c314d49ff sp 00007f3c17ffe970 error 4 in libQt5Core.so.5.6.0[7f3c3121f000+4c3000]
maj 04 22:54:37 puchatek.local kernel: radeon 0000:01:00.0: ring 0 stalled for more than 14028msec
maj 04 22:54:37 puchatek.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000001aa12 last fence id 0x000000000001aa2d on ring 0)
maj 04 22:54:38 puchatek.local kernel: radeon 0000:01:00.0: ring 0 stalled for more than 14529msec
maj 04 22:54:38 puchatek.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000001aa12 last fence id 0x000000000001aa2d on ring 0)
maj 04 22:54:39 puchatek.local kernel: BUG: unable to handle kernel paging request at ffffc90403688ffc
maj 04 22:54:39 puchatek.local kernel: IP: [<ffffffffc016eb71>] radeon_ring_backup+0xd1/0x160 [radeon]
maj 04 22:54:39 puchatek.local kernel: PGD 606098067 PUD 0 
maj 04 22:54:39 puchatek.local kernel: Oops: 0000 [#1] SMP 
maj 04 22:54:39 puchatek.local kernel: Modules linked in: cmac rfcomm xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables arc4 md4 nls_utf8 cifs dns_
maj 04 22:54:39 puchatek.local kernel:  snd_pcm crc32_pclmul snd_timer parport_serial snd ghash_clmulni_intel mei_me parport_pc i2c_i801 mei parport soundcore lpc_ich shpchp wmi video tpm_tis nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc amdkfd amd_iommu_v2 radeon hid_microsoft i2c_algo_bit drm_kms_helper ttm c
maj 04 22:54:39 puchatek.local kernel: CPU: 1 PID: 1301 Comm: Xorg Not tainted 4.6.0-0.rc6.git0.1.fc25.x86_64 #1
maj 04 22:54:39 puchatek.local kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./P67X-UD3-B3, BIOS U1b 03/13/2013
maj 04 22:54:39 puchatek.local kernel: task: ffff880600c19e80 ti: ffff8805f1264000 task.ti: ffff8805f1264000
maj 04 22:54:39 puchatek.local kernel: RIP: 0010:[<ffffffffc016eb71>]  [<ffffffffc016eb71>] radeon_ring_backup+0xd1/0x160 [radeon]
maj 04 22:54:39 puchatek.local kernel: RSP: 0018:ffff8805f1267c30  EFLAGS: 00010202
maj 04 22:54:39 puchatek.local kernel: RAX: ffffc90003ab9000 RBX: 00000000ffffffff RCX: 0000000000000000
maj 04 22:54:39 puchatek.local kernel: RDX: 0000000000000000 RSI: ffffc90403688ffc RDI: 00000000000bec80
maj 04 22:54:39 puchatek.local kernel: RBP: ffff8805f1267c58 R08: ffff8805ce164f00 R09: 8000000000000163
maj 04 22:54:39 puchatek.local kernel: R10: ffffffff81a3ab0b R11: ffffea0017622d80 R12: ffff8805fefc14e0
maj 04 22:54:39 puchatek.local kernel: R13: 000000000002fb21 R14: ffff8805fefc14b8 R15: ffff8805f1267ca0
maj 04 22:54:39 puchatek.local kernel: FS:  00007f64ad245a40(0000) GS:ffff88061ec40000(0000) knlGS:0000000000000000
maj 04 22:54:39 puchatek.local kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
maj 04 22:54:39 puchatek.local kernel: CR2: ffffc90403688ffc CR3: 00000000dd6a3000 CR4: 00000000000406e0
maj 04 22:54:39 puchatek.local kernel: Stack:
maj 04 22:54:39 puchatek.local kernel:  ffff8805fefc0000 ffff8805fefc14e0 ffff8805f1267ca0 ffff8805fefc14e0
maj 04 22:54:39 puchatek.local kernel:  0000000000000000 ffff8805f1267d10 ffffffffc013cd1d ffff8805fefc0740
maj 04 22:54:39 puchatek.local kernel:  00ff880600000001 ffff8805fefc0018 ffff8806000493c0 ffff8805589e6df0
maj 04 22:54:39 puchatek.local kernel: Call Trace:
maj 04 22:54:39 puchatek.local kernel:  [<ffffffffc013cd1d>] radeon_gpu_reset+0xcd/0x330 [radeon]
maj 04 22:54:39 puchatek.local kernel:  [<ffffffff8154eb7d>] ? fence_wait_timeout+0x7d/0x150
maj 04 22:54:39 puchatek.local kernel:  [<ffffffffc016c6fe>] radeon_gem_handle_lockup.part.3+0xe/0x20 [radeon]
maj 04 22:54:39 puchatek.local kernel:  [<ffffffffc016d721>] radeon_gem_wait_idle_ioctl+0xf1/0x150 [radeon]
maj 04 22:54:39 puchatek.local kernel:  [<ffffffffc007c922>] drm_ioctl+0x152/0x540 [drm]
maj 04 22:54:39 puchatek.local kernel:  [<ffffffffc016d630>] ? radeon_gem_busy_ioctl+0x100/0x100 [radeon]
maj 04 22:54:39 puchatek.local kernel:  [<ffffffff81129b24>] ? do_futex+0x2c4/0xb10
maj 04 22:54:39 puchatek.local kernel:  [<ffffffffc013a04f>] radeon_drm_ioctl+0x4f/0x90 [radeon]
maj 04 22:54:39 puchatek.local kernel:  [<ffffffff8125da03>] do_vfs_ioctl+0xa3/0x5d0
maj 04 22:54:39 puchatek.local kernel:  [<ffffffff8125dfa9>] SyS_ioctl+0x79/0x90
maj 04 22:54:39 puchatek.local kernel:  [<ffffffff817dc472>] entry_SYSCALL_64_fastpath+0x1a/0xa4
maj 04 22:54:39 puchatek.local kernel: Code: 0b c1 48 85 c0 49 89 07 74 7d 41 8d 7d ff 31 d2 48 c1 e7 02 eb 07 49 8b 07 48 83 c2 04 49 8b 74 24 08 8d 4b 01 89 db 48 8d 34 9e <8b> 36 89 34 10 41 23 4c 24 54 48 39 d7 89 cb 75 da 4c 89 f7 e8 
maj 04 22:54:39 puchatek.local kernel: RIP  [<ffffffffc016eb71>] radeon_ring_backup+0xd1/0x160 [radeon]
maj 04 22:54:39 puchatek.local kernel:  RSP <ffff8805f1267c30>
maj 04 22:54:39 puchatek.local kernel: CR2: ffffc90403688ffc
maj 04 22:54:39 puchatek.local kernel: ---[ end trace 73d74f5579095394 ]---
maj 04 22:54:44 puchatek.local kernel: sysrq: SysRq : Emergency Sync
Comment 5 Marcin Juszkiewicz 2016-05-06 19:30:28 UTC
Going back with kernel version... same issue with 4.1 kernel.
Comment 6 Marcin Juszkiewicz 2016-05-06 20:32:15 UTC
Probably not only kernel issue - 3.17 + uptodate fedora/rawhide == crash sooner or later.
Comment 7 Daniel T. 2016-05-16 09:05:27 UTC
Grid autosport is also causing endless gpu lockup loop on 6950 mesa 11.2.2

[124378.314517] radeon 0000:01:00.0: ring 0 stalled for more than 10473msec
[124378.314519] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000176171b last fence id 0x0000000001761722 on ring 0)
[124378.430375] radeon 0000:01:00.0: Saved 202 dwords of commands on ring 0.
[124378.430456] radeon 0000:01:00.0: GPU softreset: 0x00000009
[124378.430457] radeon 0000:01:00.0:   GRBM_STATUS               = 0xE5705828
[124378.430458] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000003
[124378.430459] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0xFC000003
[124378.430460] radeon 0000:01:00.0:   SRBM_STATUS               = 0x20000AC0
[124378.430527] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[124378.430529] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[124378.430530] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00018000
[124378.430531] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00008006
[124378.430532] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80038647
[124378.430534] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[124378.430536] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[124378.430537] radeon 0000:01:00.0:   VM_CONTEXT0_PROTECTION_FAULT_ADDR   0x00000000
[124378.430539] radeon 0000:01:00.0:   VM_CONTEXT0_PROTECTION_FAULT_STATUS 0x00000000
[124378.430540] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[124378.430541] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[124378.445104] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DF7B
[124378.445157] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[124378.446312] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[124378.446313] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[124378.446314] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[124378.446315] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[124378.446382] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[124378.446383] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[124378.446384] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[124378.446385] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[124378.446386] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[124378.446388] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[124378.446389] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[124378.446464] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
Comment 8 Marcin Juszkiewicz 2016-05-16 10:00:12 UTC
Can you guys check gfx card temperature when it crashes? I have other GPU in desktop now so can not check.
Comment 9 Adam Lyall 2016-06-06 11:57:35 UTC
Created attachment 124359 [details]
Grid Autosport crash GLSL dump SB
Comment 10 Adam Lyall 2016-06-06 11:58:08 UTC
Hi, I've been get this issue with my Radeon HD5850 as well. Grid Autosport always crashes at the same place in its benchmark (just before the big hill) but I found if I disable the SB Shader optimizer (R600_DEBUG=nosb) the game runs perfectly throughout.

My GPU temperature is fine (tops out at 62c on a hot day) so that does not seem to be an issue.

I've dumped the shaders for both when the game crashes with SB and when it is running fine with nosb in hopes they might help (I've followed what Jürgen Scholz did at issue 93352#c7). I won't be near this computer for a few weeks now but if you let me know if there is more info you would like me to gather in the next 24 hours I'll see what I can do.

FYI my system is:
Ubuntu 16.04
Mesa from padoka PPA (12.1~git160603151200.a64c7cd~x~padoka0)
SAPPHIRE VAPOR-X Radeon HD 5870
AMD Phenom II 955

For now I'm just attaching the shader dump from the crashed SB version. If you want them I'll upload the nosb variant as well? I've also attached the script I made for debugging the game.
Comment 11 i.kalvachev 2016-07-12 22:21:25 UTC
I'd ask you to try and narrow down the issue.

Get the program `apitrace`. Set `R600_DEBUG=nosb` and use apitrace with some of the games that cause a crash. It should produce a trace file. That trace file should be able to reproduce the lockup when replaying(retracing) it without the "nosb".
(Smaller traces help with the next steps. Also use any trick to limit the fps and keep scenes simple. e.g. lock the dpm in low power mode, set vblank/vsync, etc)

Then try to narrow down what causes the lockup. You can use `qapitrace` GUI program for that. Load the trace, and use "lookup state" (right mouse button on a frame or just double click) to replay to a specific frame. Find out the frame that causes lockup.

(E.g. lookup every 100'th frame. When you find a crash, try every 10'th starting from last known working frame. Then lookup frames one by one.
It might be less optimal than binary search, but rebooting and reloading probably takes more time.)

After you find the frame, try to find the exact call inside that frame that causes the lockup. You can "look up" individual calls, just like with frames, so use the same process. For the final narrow down you can use that only draw calls could cause a lockup, because they are the one that execute the shaders (usually 1 vertex and 1 pixel/fragment).

At this point you can upload the archived trace file somewhere (usually traces are quite big) and tell us the location and the number of the call that causes the lockup.

Doing this saves a lot of time to the developers, because each lookup replays the trace file from the begging to the selected call. As you can see, the process is tedious and time consuming.


(It might be good idea to trim the trace file a few frames after the first frame that crashes, just to make it smaller. Make sure you don't cut too much and that you report the correct call number for the file you upload).
---

Optionally, you can also start `qapitrace` with `R600_DEBUG=nosb` and lookup the crashing draw call. That would let you see the exact vertex and fragment (pixel) shaders that cause the problem (in GLSL form).

If you have Mesa3D compiled with debug support you may try to get the disassembled output of the shaders.

This is done by setting `R600_DEBUG=sbdry,ps,vs` and then looking up the draw call that crashes.

The "sbdry" option allows the "shader backend" to run, but then prevents the use the result. This allows us to print the crashing code, but not use it and crash the system.

The "ps,vs" options would dump the disassembly of the pixel and vertex shaders, before and after "shader backend" (the one disabled by nosb).

Since the binary shader code is compiled and optimized by SB at first use, The last vertex and fragment(pixel) shaders are the ones that cause the hang.

These steps are optional, because developers can easily do them on their own and they already build mesa with debug enabled.

---

There is a faster way to get the disassembly of the crashing shaders.
Set `R600_DEBUG=ps,vs` and run the game. The last shaders that are output should be the ones that cause the crash. You just have to be sure that the real info is not lost at the lockup. (Getting partial log could be quite misleading.)
---
btw, I'm not developer. Just advanced user.
Comment 12 Daniel T. 2016-07-13 12:06:11 UTC
Cheers for the thorough reply, before I try your debugging tips I will wait for 12.1(first bugfix release) to filter down to archlinux in hopes that mesa 12 fixes some of these already so I'm not wasting my time chasing bugs that are already fixed. I'll update in a few weeks with new info. thanks
Comment 13 eydee 2016-07-16 01:50:12 UTC
Adding some info, I have very similar GPU crashes (including kernel log) on 6850 BARTS using any recent (11.x, 12.x) Mesa version up until latest git.

The GPU lockup and crash happens in Bioshock Infinite, right after the intro logos and before reaching the title screen. It is not random, there's no way to start the game at all without touching Mesa configuration. However, launching with R600_DEBUG=nosb makes the game start up perfectly. Everything works, no visual glitches either, only performance is lower than expected, possibly because of the lack of shader optimization.

I can do an apitrace if needed, but can't replay and analyze it, as apitrace can't replay on forced OpenGL context. (BARTS only exposes 3.3, overriding is required to start the game.)
Comment 14 i.kalvachev 2016-07-16 10:51:13 UTC
@Daniel T.
FYI 12.1 would be the next release. The first bugfix release 12.0.1 is already out (fixes compilation with debugging).
The shader backend is very specific to r600 cards and very few people even touch it. So there is no way this bugs gets fixed by accident.

@eydee,
Are you using environment variables for the override in drirc file?
`
export MESA_GL_VERSION_OVERRIDE=4.1
export MESA_GLSL_VERSION_OVERRIDE=410
`
should set override for any programs started from this bash session onward.

If you prefer drirc override, then have in mind that `glretrace` is the program that does the actual playback/replay.

If you upload a trace I could try to narrow it down, but my card also needs the override.
Comment 15 Daniel T. 2016-07-16 13:53:55 UTC
Okay I've updated to 12.0.1 and managed to get a trace of Octodad: Dadliest Catch (Was a good starter because it always crashes consistently before the main menu)

Here's the trace
https://drive.google.com/open?id=0Bzmfxv--_ou4NHIwU1ZNeURrRnc

In qapitrace the first call that causes a lockup is Frame 90; Call 30070

Various good calls:
26916
27306
28899
29045
29249
29980
30013
30055
30065
30067
30069 <-- last good

Various bad calls:
34530
31749
30200
30089
30075
30070 <-- first bad
Comment 16 Daniel T. 2016-07-16 16:20:22 UTC
Created attachment 125108 [details]
Octodad call 30070 fragment shader dump
Comment 17 Daniel T. 2016-07-16 16:20:41 UTC
Created attachment 125109 [details]
Octodad call 30070 vertex shader dump
Comment 18 i.kalvachev 2016-07-17 21:36:00 UTC
Created attachment 125121 [details]
Octrodad crashing pixel shader - TGSI, disassembly of normal and SB optimised versions

I used SB_DSKIP_* variables to narrow down the exact shader that causes the crash.

The shader is big and complicated. It does contain loops, breaks, if/then/else statements, jumps. 

Just a note. `glretrace` outputs 2-3 more shaders after the dump point, so it won't always be the last shader. Still the correct 2 shaders are output right before the dump.
Comment 19 i.kalvachev 2016-07-18 09:04:40 UTC
Created attachment 125130 [details]
Octrodad crashing pixel shader - TGSI, disassembly, sbdump of IR code and crash backtrace

I've discovered few more things.

1. When retracing with mesa3d debug enabled, the shader backend internal checker is run and it does detect errors (and crashes at failed assert). It seems to error out at shader 70 (a draw command before the one that hangs). I've attached a log that includes the output of `sbdump` too.

2. `R600_DEBUG=sbsafemath` could also be used to workaround the above crash and the original lockup. (Please test if this works with the other titles too.)

Looking at the source, the 'safe math' option skips few optimization. The one that seems to cause problem is the call to `fold_assoc()` in `expr_handler::fold_alu_op2()` somewhere around `sb_expr.cpp:740`.
Comment 20 Daniel T. 2016-07-18 10:26:11 UTC
I can confirm that setting R600_DEBUG=sbsafemath allows atleast octodad to be launched and played without problems.

I currently do not have the other games installed, but I will start the downloads and test ASAP.
Comment 21 Heiko 2016-07-23 08:43:33 UTC
Created attachment 125271 [details] [review]
Simple workaround

'Fixes' the octodad trace for me. That stops sb from using fold_assoc to fold ADD_INT(ADD_INT(x, 1), 2) for scalar registers.

Problem is, that sb currently optimizes away the loop counter node, due to optimizing its reference out of another node and marking it as unused/dead, finally nuking the counter increment as well. Thus the loop break condition is never met and the gpu hangs due to an endless loop, and finally, gets reset.

If mesa is compiled in debug mode, sb checks the shader it would put out and fails assertions with unset registers... just being the loop counter of the three loops in the octodad case (shader 70).

On a side note, this scenario might be already known for phi nodes, as the comment in sb_expr.cpp:expr_handler::fold() mentions similar issues...
Comment 22 Alex 2016-09-02 08:11:51 UTC
It seems I'm esperiencing the same kind of crash, but with an HD7950 (and usually under the 'Cities: skylines' game).

I'm using kubuntu 16.04 with the padoka ppa.
> uname -a
Linux AciD 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

> lspci -v
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti PRO [Radeon HD 7950/8950 OEM / R9 280] (prog-if 00 [VGA controller])
        Subsystem: PC Partner Limited / Sapphire Technology Tahiti PRO [Radeon HD 7950/8950 OEM / R9 280]
        Flags: bus master, fast devsel, latency 0, IRQ 27
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at f4300000 (64-bit, non-prefetchable) [size=256K]
        I/O ports at a000 [size=256]
        Expansion ROM at f4340000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: radeon
        Kernel modules: radeon


the crash info :
> dmesg
[87732.677658] radeon 0000:09:00.0: ring 3 stalled for more than 10000msec                                                                    
[87732.677664] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x000000000028462e on ring 3)               
[87732.993670] radeon 0000:09:00.0: ring 0 stalled for more than 10000msec                                                                    
[87732.993676] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc701 on ring 0)               
[87733.177659] radeon 0000:09:00.0: ring 3 stalled for more than 10500msec                                                                    
[87733.177663] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284634 on ring 3)               
[87733.493683] radeon 0000:09:00.0: ring 0 stalled for more than 10500msec
[87733.177663] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284634 on ring 3)
[87733.493683] radeon 0000:09:00.0: ring 0 stalled for more than 10500msec                                                                    
[87733.493689] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc701 on ring 0)               
[87733.677671] radeon 0000:09:00.0: ring 3 stalled for more than 11000msec                                                                    
[87733.677678] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284634 on ring 3)               
[87733.993674] radeon 0000:09:00.0: ring 0 stalled for more than 11000msec                                                                    
[87733.993680] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc702 on ring 0)               
[87734.177683] radeon 0000:09:00.0: ring 3 stalled for more than 11500msec                                                                    
[87734.177689] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x000000000028463a on ring 3)               
[87734.493685] radeon 0000:09:00.0: ring 0 stalled for more than 11500msec                                                                    
[87734.493691] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc702 on ring 0)               
[87734.677675] radeon 0000:09:00.0: ring 3 stalled for more than 12000msec                                                                    
[87734.677681] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x000000000028463a on ring 3)               
[87734.993694] radeon 0000:09:00.0: ring 0 stalled for more than 12000msec                                                                    
[87734.993698] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc703 on ring 0)               
[87735.177730] radeon 0000:09:00.0: ring 3 stalled for more than 12500msec                                                                    
[87735.177736] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284640 on ring 3)               
[87735.493705] radeon 0000:09:00.0: ring 0 stalled for more than 12500msec                                                                    
[87735.493711] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc703 on ring 0)               
[87735.677711] radeon 0000:09:00.0: ring 3 stalled for more than 13000msec                                                                    
[87735.677718] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284640 on ring 3)               
[87735.993724] radeon 0000:09:00.0: ring 0 stalled for more than 13000msec                                                                    
[87735.993731] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc704 on ring 0)               
[87736.177724] radeon 0000:09:00.0: ring 3 stalled for more than 13500msec                                                                    
[87736.177730] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284648 on ring 3)               
[87736.493725] radeon 0000:09:00.0: ring 0 stalled for more than 13500msec                                                                    
[87736.493732] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc705 on ring 0)               
[87736.677765] radeon 0000:09:00.0: ring 3 stalled for more than 14000msec                                                                    
[87736.677771] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284648 on ring 3)               
[87736.993758] radeon 0000:09:00.0: ring 0 stalled for more than 14000msec                                                                    
[87736.993765] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc706 on ring 0)               
[87737.177766] radeon 0000:09:00.0: ring 3 stalled for more than 14500msec                                                                    
[87737.177772] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284650 on ring 3)               
[87737.493744] radeon 0000:09:00.0: ring 0 stalled for more than 14500msec                                                                    
[87737.493750] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc706 on ring 0)               
[87737.677764] radeon 0000:09:00.0: ring 3 stalled for more than 15000msec                                                                    
[87737.677769] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284650 on ring 3)               
[87737.993777] radeon 0000:09:00.0: ring 0 stalled for more than 15000msec                                                                    
[87737.993783] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc708 on ring 0)               
[87738.177787] radeon 0000:09:00.0: ring 3 stalled for more than 15500msec                                                                    
[87738.177793] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284658 on ring 3)               
[87738.493795] radeon 0000:09:00.0: ring 0 stalled for more than 15500msec                                                                    
[87738.493802] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc708 on ring 0)               
[87738.677795] radeon 0000:09:00.0: ring 3 stalled for more than 16000msec                                                                    
[87738.677801] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284658 on ring 3)               
[87738.993763] radeon 0000:09:00.0: ring 0 stalled for more than 16000msec                                                                    
[87738.993768] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc709 on ring 0)               
[87739.177765] radeon 0000:09:00.0: ring 3 stalled for more than 16500msec                                                                    
[87739.177769] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284674 on ring 3)               
[87739.493788] radeon 0000:09:00.0: ring 0 stalled for more than 16500msec                                                                    
[87739.493791] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc70a on ring 0)               
[87739.677773] radeon 0000:09:00.0: ring 3 stalled for more than 17000msec                                                                    
[87739.677778] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x000000000028468f on ring 3)               
[87739.993801] radeon 0000:09:00.0: ring 0 stalled for more than 17000msec                                                                    
[87739.993807] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc711 on ring 0)               
[87740.177811] radeon 0000:09:00.0: ring 3 stalled for more than 17500msec                                                                    
[87740.177817] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x000000000028469a on ring 3)               
[87740.493825] radeon 0000:09:00.0: ring 0 stalled for more than 17500msec                                                                    
[87740.493831] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc711 on ring 0)               
[87740.677822] radeon 0000:09:00.0: ring 3 stalled for more than 18000msec                                                                    
[87740.677827] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x000000000028469a on ring 3)               
[87740.993842] radeon 0000:09:00.0: ring 0 stalled for more than 18000msec                                                                    
[87740.993848] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc715 on ring 0)               
[87741.177814] radeon 0000:09:00.0: ring 3 stalled for more than 18500msec                                                                    
[87741.177820] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x00000000002846af on ring 3)               
[87741.493861] radeon 0000:09:00.0: ring 0 stalled for more than 18500msec                                                                    
[87741.493867] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc717 on ring 0)               
[87741.677838] radeon 0000:09:00.0: ring 3 stalled for more than 19000msec                                                                    
[87741.677844] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x00000000002846b0 on ring 3)               
[87741.993860] radeon 0000:09:00.0: ring 0 stalled for more than 19000msec 
[87741.677844] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x00000000002846b0 on ring 3)
[87741.993860] radeon 0000:09:00.0: ring 0 stalled for more than 19000msec                                                                    
[87741.993867] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc718 on ring 0)               
[87742.177870] radeon 0000:09:00.0: ring 3 stalled for more than 19500msec                                                                    
[87742.177876] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x00000000002846d7 on ring 3)               
[87742.493871] radeon 0000:09:00.0: ring 0 stalled for more than 19500msec                                                                    
[87742.493877] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc718 on ring 0)               
[87742.681866] radeon 0000:09:00.0: ring 3 stalled for more than 20004msec                                                                    
[87742.681872] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x00000000002846d7 on ring 3)               
[87742.993882] radeon 0000:09:00.0: ring 0 stalled for more than 20000msec                                                                    
[87742.993889] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc719 on ring 0)               
[87743.181878] radeon 0000:09:00.0: ring 3 stalled for more than 20504msec                                                                    
[87743.181884] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x00000000002846dd on ring 3)               
[87743.493894] radeon 0000:09:00.0: ring 0 stalled for more than 20500msec                                                                    
[87743.493900] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc719 on ring 0)               
[87743.681905] radeon 0000:09:00.0: ring 3 stalled for more than 21004msec                                                                    
[87743.681911] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x00000000002846e3 on ring 3)               
[87743.993914] radeon 0000:09:00.0: ring 0 stalled for more than 21000msec                                                                    
[87743.993920] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc71c on ring 0)               
[87744.181912] radeon 0000:09:00.0: ring 3 stalled for more than 21504msec                                                                    
[87744.181919] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x00000000002846e6 on ring 3)               
[87744.493911] radeon 0000:09:00.0: ring 0 stalled for more than 21500msec                                                                    
[87744.493917] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc71c on ring 0)               
[87744.681920] radeon 0000:09:00.0: ring 3 stalled for more than 22004msec                                                                    
[87744.681926] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x00000000002846e6 on ring 3)               
[87744.993931] radeon 0000:09:00.0: ring 0 stalled for more than 22000msec                                                                    
[87744.993938] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc71d on ring 0)               
[87745.181931] radeon 0000:09:00.0: ring 3 stalled for more than 22504msec                                                                    
[87745.181937] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x00000000002846ef on ring 3)               
[87745.493937] radeon 0000:09:00.0: ring 0 stalled for more than 22500msec                                                                    
[87745.493944] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc71d on ring 0)               
[87745.681932] radeon 0000:09:00.0: ring 3 stalled for more than 23004msec                                                                    
[87745.681938] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x00000000002846ef on ring 3)               
[87745.993924] radeon 0000:09:00.0: ring 0 stalled for more than 23000msec                                                                    
[87745.993930] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc71e on ring 0)               
[87746.181963] radeon 0000:09:00.0: ring 3 stalled for more than 23504msec                                                                    
[87746.181968] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x00000000002846f7 on ring 3)               
[87746.493955] radeon 0000:09:00.0: ring 0 stalled for more than 23500msec                                                                    
[87746.493960] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc71f on ring 0)               
[87746.681961] radeon 0000:09:00.0: ring 3 stalled for more than 24004msec                                                                    
[87746.681966] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x00000000002846f7 on ring 3)               
[87746.993979] radeon 0000:09:00.0: ring 0 stalled for more than 24000msec                                                                    
[87746.993985] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc720 on ring 0)               
[87747.181960] radeon 0000:09:00.0: ring 3 stalled for more than 24504msec                                                                    
[87747.181966] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x00000000002846ff on ring 3)               
[87747.493975] radeon 0000:09:00.0: ring 0 stalled for more than 24500msec                                                                    
[87747.493980] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc720 on ring 0)               
[87747.681981] radeon 0000:09:00.0: ring 3 stalled for more than 25004msec                                                                    
[87747.681987] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x00000000002846ff on ring 3)               
[87747.994010] radeon 0000:09:00.0: ring 0 stalled for more than 25000msec                                                                    
[87747.994016] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc722 on ring 0)               
[87748.181999] radeon 0000:09:00.0: ring 3 stalled for more than 25504msec                                                                    
[87748.182004] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284707 on ring 3)               
[87748.494031] radeon 0000:09:00.0: ring 0 stalled for more than 25500msec                                                                    
[87748.494038] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc722 on ring 0)               
[87748.682026] radeon 0000:09:00.0: ring 3 stalled for more than 26004msec                                                                    
[87748.682032] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284707 on ring 3)               
[87748.994006] radeon 0000:09:00.0: ring 0 stalled for more than 26000msec                                                                    
[87748.994013] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc723 on ring 0)               
[87749.182040] radeon 0000:09:00.0: ring 3 stalled for more than 26504msec                                                                    
[87749.182047] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x000000000028470f on ring 3)               
[87749.494026] radeon 0000:09:00.0: ring 0 stalled for more than 26500msec                                                                    
[87749.494033] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc723 on ring 0)               
[87749.682021] radeon 0000:09:00.0: ring 3 stalled for more than 27004msec                                                                    
[87749.682027] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x000000000028470f on ring 3)               
[87749.994035] radeon 0000:09:00.0: ring 0 stalled for more than 27000msec                                                                    
[87749.994041] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc725 on ring 0)               
[87750.182041] radeon 0000:09:00.0: ring 3 stalled for more than 27504msec                                                                    
[87750.182047] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284718 on ring 3)               
[87750.494030] radeon 0000:09:00.0: ring 0 stalled for more than 27500msec
[87750.182047] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284718 on ring 3)
[87750.494030] radeon 0000:09:00.0: ring 0 stalled for more than 27500msec                                                                    
[87750.494036] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc725 on ring 0)               
[87750.682064] radeon 0000:09:00.0: ring 3 stalled for more than 28004msec                                                                    
[87750.682071] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284718 on ring 3)               
[87750.994068] radeon 0000:09:00.0: ring 0 stalled for more than 28000msec                                                                    
[87750.994075] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc726 on ring 0)               
[87751.182063] radeon 0000:09:00.0: ring 3 stalled for more than 28504msec                                                                    
[87751.182070] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284723 on ring 3)               
[87751.494088] radeon 0000:09:00.0: ring 0 stalled for more than 28500msec                                                                    
[87751.494094] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc727 on ring 0)               
[87751.682091] radeon 0000:09:00.0: ring 3 stalled for more than 29004msec                                                                    
[87751.682098] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x0000000000284723 on ring 3)               
[87751.994105] radeon 0000:09:00.0: ring 0 stalled for more than 29000msec                                                                    
[87751.994111] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc728 on ring 0)               
[87752.182082] radeon 0000:09:00.0: ring 3 stalled for more than 29504msec                                                                    
[87752.182088] radeon 0000:09:00.0: GPU lockup (current fence id 0x000000000028455c last fence id 0x000000000028472b on ring 3)               
[87752.494097] radeon 0000:09:00.0: ring 0 stalled for more than 29500msec                                                                    
[87752.494103] radeon 0000:09:00.0: GPU lockup (current fence id 0x00000000005cc6d9 last fence id 0x00000000005cc728 on ring 0)               
[87753.445531] BUG: unable to handle kernel paging request at ffffc90401cc0ffc                                                                
[87753.445561] IP: [<ffffffffc021279a>] radeon_ring_backup+0xda/0x190 [radeon]                                                                
[87753.445601] PGD 40e099067 PUD 0                                                                                                            
[87753.445614] Oops: 0000 [#1] SMP                                                                                                            
[87753.445626] Modules linked in: nls_utf8 btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c pci_stub vboxpci(OE) vbo
xnetadp(OE) vboxnetflt(OE) snd_hrtimer vboxdrv(OE) binfmt_misc intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10
dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw joydev input_leds snd_usb_audio snd_usbmi
di_lib arc4 ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 snd_ctxfi lpc_ich snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi
 mei_me mei snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_
timer snd tpm_infineon soundcore shpchp mac_hid it87 hwmon_vid coretemp parport_pc ppdev lp parport                                           
[87753.445903]  autofs4 uas usb_storage wacom hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid mxm_wmi amdkfd amd_iommu_v2 psmouse at
l1c radeon firewire_ohci firewire_core crc_itu_t i2c_algo_bit ttm drm_kms_helper ahci libahci syscopyarea sysfillrect e1000e sysimgblt fb_sys_
fops drm ptp pps_core video wmi fjes                                                                                                          
[87753.446016] CPU: 1 PID: 8913 Comm: Cities.x64 Tainted: G        W  OE   4.4.0-36-generic #55-Ubuntu                                        
[87753.446041] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77X-UD5H, BIOS F4 03/09/2012                              
[87753.446069] task: ffff8803910aee00 ti: ffff8800234c0000 task.ti: ffff8800234c0000                                                          
[87753.446090] RIP: 0010:[<ffffffffc021279a>]  [<ffffffffc021279a>] radeon_ring_backup+0xda/0x190 [radeon]                                    
[87753.446128] RSP: 0018:ffff8800234c3c48  EFLAGS: 00010202                                                                                   
[87753.446143] RAX: ffffc9000576f000 RBX: 00000000ffffffff RCX: 0000000000000000                                                              
[87753.446163] RDX: 0000000000000000 RSI: ffffc90401cc0ffc RDI: 00000000000c2e40                                                              
[87753.446183] RBP: ffff8800234c3c78 R08: ffff8801520284c0 R09: 8000000000000163                                                              
[87753.446202] R10: 0000000000000000 R11: ffffffff81ccf5ea R12: ffff88040637d4e0                                                              
[87753.446222] R13: ffff88040637d4b8 R14: 0000000000030b91 R15: ffff8800234c3cc0                                                              
[87753.446242] FS:  00007fc374400740(0000) GS:ffff88041ec80000(0000) knlGS:0000000000000000                                                   
[87753.446264] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                              
[87753.446280] CR2: ffffc90401cc0ffc CR3: 00000000d95ba000 CR4: 00000000000406e0                                                              
[87753.446300] Stack:                                                                                                                         
[87753.446306]  ffff88040637c000 ffff88040637c000 ffff88040637d4e0 ffff8800234c3cc0                                                           
[87753.446330]  ffff88040637d4e0 0000000000000000 ffff8800234c3d30 ffffffffc01e0cdd                                                           
[87753.446354]  ffff88040637c740 00ff880300000001 ffff88040637c018 ffff88033e03f6c0                                                           
[87753.446379] Call Trace:                                                                                                                    
[87753.446395]  [<ffffffffc01e0cdd>] radeon_gpu_reset+0xcd/0x330 [radeon]                                                                     
[87753.446416]  [<ffffffff815ac18d>] ? fence_wait_timeout+0x7d/0x160                                                                          
[87753.446443]  [<ffffffffc021051e>] radeon_gem_handle_lockup.part.3+0xe/0x20 [radeon]                                                        
[87753.446475]  [<ffffffffc02114bf>] radeon_gem_wait_idle_ioctl+0xdf/0x130 [radeon]                                                           
[87753.446504]  [<ffffffffc003b742>] drm_ioctl+0x152/0x540 [drm]                                                                              
[87753.446531]  [<ffffffffc02113e0>] ? radeon_gem_busy_ioctl+0xe0/0xe0 [radeon]                                                               
[87753.446558]  [<ffffffffc01de04c>] radeon_drm_ioctl+0x4c/0x80 [radeon]                                                                      
[87753.446577]  [<ffffffff81220c1f>] do_vfs_ioctl+0x29f/0x490                                                                                 
[87753.446594]  [<ffffffff81707b40>] ? __sys_recvmsg+0x80/0x90                                                                                
[87753.446610]  [<ffffffff81220e89>] SyS_ioctl+0x79/0x90                                                                                      
[87753.446625]  [<ffffffff8182dfb2>] entry_SYSCALL_64_fastpath+0x16/0x71                                                                      
[87753.446643] Code: fd c0 48 85 c0 49 89 07 74 6c 41 8d 7e ff 31 d2 48 c1 e7 02 eb 07 49 8b 07 48 83 c2 04 49 8b 74 24 08 8d 4b 01 89 db 48 8
d 34 9e <8b> 36 89 34 10 41 23 4c 24 54 48 39 d7 89 cb 75 da 4c 89 ef e8                                                                      
[87753.446763] RIP  [<ffffffffc021279a>] radeon_ring_backup+0xda/0x190 [radeon]                                                               
[87753.446794]  RSP <ffff8800234c3c48>                                                                                                        
[87753.446805] CR2: ffffc90401cc0ffc                                                                                                          
[87753.459544] ---[ end trace ded18fbae638a95c ]---
Comment 23 Alex 2016-09-09 04:52:41 UTC
This also happens 100% of the time under Faeria and Rocket League.

The bug that appeared during a Rocket League session :
[36447.794626] BUG: unable to handle kernel paging request at ffffc90401d40ffc
[36447.794656] IP: [<ffffffffc023c79a>] radeon_ring_backup+0xda/0x190 [radeon]
[36447.794697] PGD 40e099067 PUD 0
[36447.794710] Oops: 0000 [#1] SMP
[36447.794723] Modules linked in: pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) snd_hrtimer vboxdrv(OE) binfmt_misc intel_rapl x86_pkg_te
mp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper
cryptd serio_raw hid_logitech_hidpp lpc_ich joydev input_leds snd_hda_codec_realtek arc4 snd_hda_codec_hdmi snd_hda_codec_generic ath9k ath9k_
common snd_usb_audio ath9k_hw snd_usbmidi_lib ath mac80211 cfg80211 snd_hda_intel snd_hda_codec snd_ctxfi snd_hda_core snd_hwdep snd_seq_midi
snd_seq_midi_event snd_pcm snd_rawmidi snd_seq snd_seq_device snd_timer snd mei_me mei soundcore tpm_infineon shpchp mac_hid it87 hwmon_vid co
retemp parport_pc ppdev lp parport autofs4 hid_logitech_dj uas usb_storage wacom hid_generic usbhid
[36447.794999]  hid mxm_wmi amdkfd amd_iommu_v2 radeon psmouse firewire_ohci firewire_core crc_itu_t atl1c i2c_algo_bit ttm ahci drm_kms_helpe
r libahci syscopyarea sysfillrect sysimgblt fb_sys_fops e1000e drm ptp pps_core video fjes wmi
[36447.795083] CPU: 2 PID: 3667 Comm: Xorg Tainted: G           OE   4.4.0-36-generic #55-Ubuntu
[36447.795109] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77X-UD5H, BIOS F4 03/09/2012
[36447.795137] task: ffff8804072c5280 ti: ffff8800d9ed0000 task.ti: ffff8800d9ed0000
[36447.795158] RIP: 0010:[<ffffffffc023c79a>]  [<ffffffffc023c79a>] radeon_ring_backup+0xda/0x190 [radeon]
[36447.795197] RSP: 0000:ffff8800d9ed3c48  EFLAGS: 00010202
[36447.795213] RAX: ffffc90002313000 RBX: 00000000ffffffff RCX: 0000000000000000
[36447.795233] RDX: 0000000000000000 RSI: ffffc90401d40ffc RDI: 0000000000096440
[36447.795253] RBP: ffff8800d9ed3c78 R08: ffff88010c011e00 R09: 8000000000000163
[36447.795274] R10: 0000000000000000 R11: ffffffff81ccf5ea R12: ffff8800358494e0
[36447.795294] R13: ffff8800358494b8 R14: 0000000000025911 R15: ffff8800d9ed3cc0
[36447.795314] FS:  00007fc9397a1a00(0000) GS:ffff88041ed00000(0000) knlGS:0000000000000000
[36447.795337] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[36447.795354] CR2: ffffc90401d40ffc CR3: 000000040a177000 CR4: 00000000000406e0
[36447.795374] Stack:
[36447.795380]  ffff880035848000 ffff880035848000 ffff8800358494e0 ffff8800d9ed3cc0
[36447.795406]  ffff8800358494e0 0000000000000000 ffff8800d9ed3d30 ffffffffc020acdd
[36447.795430]  ffff880035848740 00ff880400000001 ffff880035848018 ffff8803216e3000
[36447.795455] Call Trace:
[36447.795472]  [<ffffffffc020acdd>] radeon_gpu_reset+0xcd/0x330 [radeon]
[36447.795492]  [<ffffffff815ac18d>] ? fence_wait_timeout+0x7d/0x160
[36447.795521]  [<ffffffffc023a51e>] radeon_gem_handle_lockup.part.3+0xe/0x20 [radeon]
[36447.795553]  [<ffffffffc023b4bf>] radeon_gem_wait_idle_ioctl+0xdf/0x130 [radeon]
[36447.795584]  [<ffffffffc003b742>] drm_ioctl+0x152/0x540 [drm]
[36447.795611]  [<ffffffffc023b3e0>] ? radeon_gem_busy_ioctl+0xe0/0xe0 [radeon]
[36447.795639]  [<ffffffffc020804c>] radeon_drm_ioctl+0x4c/0x80 [radeon]
[36447.795659]  [<ffffffff81220c1f>] do_vfs_ioctl+0x29f/0x490
[36447.795677]  [<ffffffff81103471>] ? SyS_futex+0x81/0x180
[36447.795693]  [<ffffffff81220e89>] SyS_ioctl+0x79/0x90
[36447.795709]  [<ffffffff8182dfb2>] entry_SYSCALL_64_fastpath+0x16/0x71
[36447.795727] Code: fb c0 48 85 c0 49 89 07 74 6c 41 8d 7e ff 31 d2 48 c1 e7 02 eb 07 49 8b 07 48 83 c2 04 49 8b 74 24 08 8d 4b 01 89 db 48 8
d 34 9e <8b> 36 89 34 10 41 23 4c 24 54 48 39 d7 89 cb 75 da 4c 89 ef e8
[36447.795850] RIP  [<ffffffffc023c79a>] radeon_ring_backup+0xda/0x190 [radeon]
[36447.795882]  RSP <ffff8800d9ed3c48>
[36447.795893] CR2: ffffc90401d40ffc
[36447.809449] ---[ end trace ffefc2413ee46759 ]---
Comment 24 Marek Olšák 2016-09-17 20:23:41 UTC
How about we make R600_DEBUG=sbsafemath be the default?
Comment 25 russianneuromancer 2016-09-18 15:37:01 UTC
But how this will affect performance with other games that doesn't freeze? There is no other way to fix this?
Comment 26 Gregor Münch 2016-09-22 14:59:08 UTC
(In reply to Marek Olšák from comment #24)
> How about we make R600_DEBUG=sbsafemath be the default?

If there are no resources inside AMD to fix those bugs, this will be the best option. Those stalls are way more annoying than loosing some fps.

At least this should be the default in stable releases.
Comment 27 Heiko 2016-09-23 07:35:08 UTC
Well, tiling and hyz were disabled per default as well, until bugs were ironed out. So I'd go with the overall stability and use sbsafemath as default for now (or just disable the calls to fold_assoc()).

And you're still able to revert with R600_DEBUG settings, if there is need for that particular performance boost. Are there benchmark results to show the gains for sb[no]safemath?
Comment 28 Heiko 2016-09-28 17:26:26 UTC
Created attachment 126832 [details] [review]
Possible fix for the lockups

The more I look at the sb code the more I dislike it :/ Anyhow, looks like the GCM pass is b0rked and doesn't like unused ops at all.

The problem with that octodad trace is that with a pass through fold_assoc() an ADD_INT op becomes unused, but isn't removed prior to GCM. GCM then moves it up to the front of the shader (because there are no users), where the op's src values aren't defined (in that particular case the loop counter variable). GCM also moves ops up, if the usage count isn't fulfilled yet. Well that's when things get really broken, since it seems to move the loop counter -- or at least the initializer -- to fulfill the usage count. And well, then the GPU finally locks up on the shader (or if mesa is compiled in debug mode, sb shows unset registers), probably due to endlessly looping.

I tried to fix GCM, but everytime I thought I've did the right thing, I got either unscheduled ops or wrong levels for the basic blocks. Also, the DONT_HOIST stuff doesn't really seem to work that well either.

So I decided to fix the input feeded into the GCM pass, by iteratively removing all unused ops in dce_cleanup. This could also be reducing amount of instructions, that weren't actually removed before. Also optimized valtable's use_count(), which gets called 1500+ times for the octodad trace and did iterate over the whole use_info list every time...

The (untidied) patch fixes octodad for me. Would be nice, if someone could test the other problematic games (be sure to test with a debug build, to get an exception rather than a lockup). If it works there as well, I'd clean things up.

@Marek, what's the best/usual way to test for performance/instruction count in mesa changes. I noticed those 'helped'/'hurt'/'+-%' infos and some runtime numbers in commit messages, but I don't know how they are produced :/
Comment 29 Marek Olšák 2016-09-28 17:40:46 UTC
(In reply to Heiko from comment #28)
> @Marek, what's the best/usual way to test for performance/instruction count
> in mesa changes. I noticed those 'helped'/'hurt'/'+-%' infos and some
> runtime numbers in commit messages, but I don't know how they are produced :/

There is no way to get stats for r600g easily. Intel and radeonsi developers use their private shader-db repositories and generate their own stats. Intel have "helped/hurt +-%" stats, while we have much more detailed stats for radeonsi.
Comment 30 Frederic Romagne 2016-10-02 10:07:20 UTC
I can confirm it removes the lockup in Octodad, however it breaks the rendering in Rocket League... Recompiling mesa without the patch and Rocket League renders properly again...
Comment 31 Frederic Romagne 2016-10-02 10:13:22 UTC
(In reply to Frederic Romagne from comment #30)
> I can confirm it removes the lockup in Octodad, however it breaks the
> rendering in Rocket League... Recompiling mesa without the patch and Rocket
> League renders properly again...

It was a reply to comment 28,

I had a bunch of 

oct. 02 10:06:22 kiss-desktop kernel: [drm:evergreen_packet3_check.isra.14 [radeon]] *ERROR* bad EVENT_WRITE
oct. 02 10:06:22 kiss-desktop kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Invalid command stream !
oct. 02 10:06:22 kiss-desktop steam.desktop[1150]: radeon: The kernel rejected CS, see dmesg for more information (-22).
oct. 02 10:06:22 kiss-desktop kernel: [drm:radeon_cs_packet_next_reloc [radeon]] *ERROR* No packet3 for relocation for packet at 6094.
oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6094]=0xC0044700
oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6095]=0x00000528
oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6096]=0x00000080
oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6097]=0x20000000
oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6098]=0x80000000
oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6099]=0x00000000
oct. 02 10:06:22 kiss-desktop kernel: [drm:evergreen_packet3_check.isra.14 [radeon]] *ERROR* bad EVENT_WRITE
oct. 02 10:06:22 kiss-desktop kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Invalid command stream !
oct. 02 10:06:22 kiss-desktop steam.desktop[1150]: radeon: The kernel rejected CS, see dmesg for more information (-22).
oct. 02 10:06:22 kiss-desktop kernel: [drm:radeon_cs_packet_next_reloc [radeon]] *ERROR* No packet3 for relocation for packet at 6094.
oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6094]=0xC0044700
oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6095]=0x00000528
oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6096]=0x00000080
oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6097]=0x20000000
oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6098]=0x80000000
oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6099]=0x00000000


while trying Rocket League with this patch.
Comment 32 Heiko 2016-10-02 10:29:22 UTC
(In reply to Frederic Romagne from comment #31)
> I had a bunch of 
> 
> oct. 02 10:06:22 kiss-desktop kernel: [drm:evergreen_packet3_check.isra.14
> [radeon]] *ERROR* bad EVENT_WRITE
> oct. 02 10:06:22 kiss-desktop kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR*
> Invalid command stream !
> oct. 02 10:06:22 kiss-desktop steam.desktop[1150]: radeon: The kernel
> rejected CS, see dmesg for more information (-22).
> oct. 02 10:06:22 kiss-desktop kernel: [drm:radeon_cs_packet_next_reloc
> [radeon]] *ERROR* No packet3 for relocation for packet at 6094.
> oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6094]=0xC0044700
> oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6095]=0x00000528
> oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6096]=0x00000080
> oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6097]=0x20000000
> oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6098]=0x80000000
> oct. 02 10:06:22 kiss-desktop kernel: [drm] ib[6099]=0x00000000
> 
> while trying Rocket League with this patch.

That's strange... the patch isn't touching any command stream packets. Just a guess in the blue: did the GPU hang/reset usually happen before the broken rendering point? And why is the command steam.desktop? I'd expect the RL executable there...

So I just bought RL and I'll take a look.

Also, at least for the always lockups, an apitrace would be awesome :)
Comment 33 Marek Olšák 2016-10-04 15:47:38 UTC
The EVENT_WRITE errors are from a different issue that should be fixed in master now.
Comment 34 Heiko 2016-10-30 09:11:14 UTC
Created attachment 127616 [details] [review]
Cleaned up version of the porposed fix

Ok, so I've stripped the debugging stuff from the patch. I tested it against the octodad trace, saints row 4, grid autosport, rocket league, none of them hanging the GPU. Also I've tested it on system's mesa and didn't have any unwanted fallout.

Probably can be posted to the mailing list or should I split it up?
Comment 35 Adam Lyall 2016-12-16 13:32:06 UTC
I've tested Heiko's patch from comment 34 and it fixed Grid Autosport crashing for on the latest Mesa from git.

Note that, at least for my Radeon HD5850, without the patch running with sbsafemath still leads to the game crashing. Only nosb was working.

Grid Autosports benchmark FPS increases a few frames with the patch vs vanilla with "nosb"
Comment 36 Heiko 2016-12-22 08:37:32 UTC
Btw, I've submitted the patch to the mesa list some time ago: https://patchwork.freedesktop.org/patch/122534
Comment 37 Andreas Boll 2017-01-12 10:53:11 UTC
Thanks to Heiko for fixing this issue!

Fixed by

commit e933246013eef376804662f3fcf4646c143c6c88
Author: Heiko Przybyl <lil_tux@web.de>
Date:   Sun Nov 20 14:42:28 2016 +0100

    r600/sb: Fix loop optimization related hangs on eg


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.