Bug 95308 - [radeonsi] Hangs after some minutes on Team Fortress 2
Summary: [radeonsi] Hangs after some minutes on Team Fortress 2
Status: RESOLVED DUPLICATE of bug 93649
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 77449
  Show dependency treegraph
 
Reported: 2016-05-06 20:55 UTC by Matías Locatti
Modified: 2018-04-03 04:08 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
glxinfo (99.62 KB, text/plain)
2016-05-18 13:18 UTC, Kamil Anonim
Details
lspci (2.00 KB, text/plain)
2016-05-18 13:18 UTC, Kamil Anonim
Details
dmesg log with gpu reset disabled (63.73 KB, text/plain)
2016-05-25 05:05 UTC, Winston Weinert
Details
dmesg log (kernel 4.6) (186.74 KB, text/plain)
2016-05-25 05:34 UTC, Kamil Anonim
Details
dmesg log (kernel 4.7) (66.64 KB, text/x-log)
2016-08-15 18:14 UTC, Fornax
Details

Description Matías Locatti 2016-05-06 20:55:18 UTC

    
Comment 1 Matías Locatti 2016-05-06 20:59:05 UTC
This is what I get:

radeon 0000:01:00.0: ring 0 stalled for more than 10386msec
radeon 0000:01:00.0: VCE init error (-22).
[drm:r600_ib_test [radeon]] *ERROR* radeon: fance wait failed (-35).
[drm:radeon_ib_ring_test [radeon]] *ERROR* radeon: failed testing IB on GFX ring (-35).
radeon 0000:01:00.0: VCE init error (-22).

After that, the kernel hangs.
Comment 2 jhuber72 2016-05-16 14:52:54 UTC
I'm having this problem too, across multiple distros, multiple mesa versions, and multiple radeonsi cards.

I have tried:

Debian 8 with Mesa 10.3, kernel 3.16
Debian 8 with Mesa 11.1, kernel 4.4, kernel 4.5
Fedora 23 with Mesa 11.1, kernel 4.4
Gentoo with Mesa 10.5, 11.1, 11.2, kernel 4.4

Radeon HD 7790, Radeon R9 290, FirePro M6100

This has been an issue since TF2's mid-December 2015 update.
Comment 3 Matthew Dawson 2016-05-16 15:11:58 UTC
This looks to be a duplicate of bug 93649

Matías, if you upgrade to v4.6, things you shouldn't have system lockups anymore.  Unfortunately after the game crashes you will need to reboot your system.

jhuber72, What version of LLVM where you using at the time when you tested Mesa 10.*?  And can you give any other details about your system (in the other bug)?  I have some suspicions, but your data contradicts my current thinking.  Want to see if something else may be up.

*** This bug has been marked as a duplicate of bug 93649 ***
Comment 4 Kamil Anonim 2016-05-18 13:18:02 UTC
Created attachment 123880 [details]
glxinfo

Full output of glxinfo
Comment 5 Kamil Anonim 2016-05-18 13:18:54 UTC
Created attachment 123882 [details]
lspci

Full output of lspci
Comment 6 Kamil Anonim 2016-05-18 13:19:38 UTC
The problem still persists on Ubuntu 16.04, Radeon R9 280X, kernel 4.6.0-xanmod1. Please let me know which logs should I provide in order to help with finding an issue.
The video showing the glitch should be available in ~30-40 minutes here: https://youtu.be/1iBkh6SYSZU
Comment 7 Marek Olšák 2016-05-24 22:25:40 UTC
Can you please do this:
- Disable GPU reset by adding this kernel parameter: radeon.lockup_timeout=0
- Reproduce the GPU hang.
- CTRL+F1. This should switch to text mode successfully if you've disabled GPU reset. You can't got back to X now.
- Save the contents of dmesg.
- Reboot

Attach dmesg here.

If dmesg doesn't contain a VM fault, you don't have to do anything else for now.

If dmesg contains a VM fault, set this environment variable and start the game:
R600_DEBUG=check_vm

(If it's a steam game, you must get the correct steam run command, which can be obtained from the desktop shortcut. For me, it looks like this: steam steam://rungameid/440 ; Make sure Steam isn't running, then run "R600_DEBUG=check_vm steam steam://rungameid/$number" where $number is the game number)

After you reproduce the hang again, reboot and attach the new files located in ~/ddebug_dumps/. Those should be records of VM faults created by R600_DEBUG=check_vm.

I can't promise I will be able to fix this. The issue is kinda random and it may be fixed by a later kernel or Mesa release.
Comment 8 Winston Weinert 2016-05-25 05:04:17 UTC
I have experienced issues with TF2 and my Radeon HD 7770 hanging, with the same error messages.

I did follow the instructions to check for a VM error, but I found nothing in the dmesg output. I've attached it in case you still would like to look at it.
Comment 9 Winston Weinert 2016-05-25 05:05:12 UTC
Created attachment 124067 [details]
dmesg log with gpu reset disabled
Comment 10 Kamil Anonim 2016-05-25 05:32:51 UTC
I'm also attaching my dmesg log. I've tried to run the game with the R600_DEBUG option first, however the game started running extremely slowly, making it hard to even click buttons on the menu. Steam would immediatelly crash after turning off the game. Additionally, the logs were spammed with segfault messages. So I reproduced the hang without the variable in place.

Thank you Marek for providing instructions and Winston for providing even more logs!
Comment 11 Kamil Anonim 2016-05-25 05:34:15 UTC
Created attachment 124068 [details]
dmesg log (kernel 4.6)
Comment 12 Fornax 2016-08-15 18:14:42 UTC
Created attachment 125800 [details]
dmesg log (kernel 4.7)

I have the same problem with TF2, I attached dmesg.log without gpu reset disabled. I didn't see any errors with gpu reset enabled?
Comment 13 Timothy Arceri 2018-04-03 04:08:21 UTC

*** This bug has been marked as a duplicate of bug 93649 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.