Bug 90217 - Counter Strike Global Offensive: GPU fault after a while
Summary: Counter Strike Global Offensive: GPU fault after a while
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-28 16:34 UTC by Christoph Haag
Modified: 2019-09-25 17:52 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (229.11 KB, text/plain)
2015-04-28 16:34 UTC, Christoph Haag
Details
dmesg with graphical output hang (1.04 MB, text/plain)
2015-04-29 21:58 UTC, Christoph Haag
Details
R600_DEBUG=ps,vs,gs csgo 2> csgoerr.txt (334.11 KB, application/octet-stream)
2015-04-29 23:43 UTC, Christoph Haag
Details
Mass Effect 3 with nine (353.92 KB, text/plain)
2015-05-22 21:35 UTC, Christoph Haag
Details
dmesg csgo tahiti (8.65 KB, text/x-log)
2015-08-04 01:58 UTC, Paul
Details

Description Christoph Haag 2015-04-28 16:34:09 UTC
Created attachment 115406 [details]
dmesg

00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Wimbledon XT [Radeon HD 7970M] (rev ff)

mesa git, llvm recent revision, with PRIME.
linux 4.1rc1, but with 4.0 it was the same.

Counter Strike GO seems to be problematic anyway. Even with 1366x768 and settings on low I get bad performance drops, see video: https://www.youtube.com/watch?v=HVDZowUGlSc There is flickering all over the place. At the end of the video there is one of the GPU hangs:


perf interrupt took too long (4996 > 4960), lowering kernel.perf_event_max_sample_rate to 25200
radeon 0000:01:00.0: GPU fault detected: 146 0x08e2c804
radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0000EBC7                                            
radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x020C8004                                            
VM fault (0x04, vmid 1) at page 60359, read from TC (200)
radeon 0000:01:00.0: ring 0 stalled for more than 10490msec
radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000008a0d3c last fence id 0x00000000008a0d57 on ring 0)
radeon 0000:01:00.0: ring 0 stalled for more than 10990msec
radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000008a0d3c last fence id 0x00000000008a0d57 on ring 0)
radeon 0000:01:00.0: ring 0 stalled for more than 11490msec
radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000008a0d3c last fence id 0x00000000008a0d57 on ring 0)

and then only further stalled messages

It happens after 15-60 minutes or something like that.
Comment 1 Tom Stellard 2015-04-29 14:56:36 UTC
Can you run the program with the environment variable R600_DEBUG=ps,vs,gs and post the output.
Comment 2 Christoph Haag 2015-04-29 21:58:34 UTC
Created attachment 115456 [details]
dmesg with graphical output hang

It can also have more severe consequences and require a reboot, see dmesg.

I'll use R600_DEBUG=ps,vs,gs next time. It can take quite a while to trigger it, I hope it won't produce too much data.
Comment 3 Christoph Haag 2015-04-29 23:43:40 UTC
Created attachment 115459 [details]
R600_DEBUG=ps,vs,gs csgo 2> csgoerr.txt

This time with a 3.19 kernel with ck patches (wanted to see whether gameplay is more smooth with it)

Didn't show a GPU fault in dmesg this time, just started outputting messages like

[ 6556.091663] radeon 0000:01:00.0: ring 0 stalled for more than 10035msec
[ 6556.091668] radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000003eaad7 last fence id 0x00000000003eaaf4 on ring 0)
[ 6556.593178] radeon 0000:01:00.0: ring 0 stalled for more than 10536msec
[ 6556.593187] radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000003eaad7 last fence id 0x00000000003eaaf4 on ring 0)
[ 6557.094722] radeon 0000:01:00.0: ring 0 stalled for more than 11037msec
[ 6557.094728] radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000003eaad7 last fence id 0x00000000003eaaf4 on ring 0)
[ 6557.596237] radeon 0000:01:00.0: ring 0 stalled for more than 11538msec
Comment 4 Christoph Haag 2015-05-22 21:35:59 UTC
Created attachment 115981 [details]
Mass Effect 3 with nine

Hm, I tried Mass Effect 3 with nine and got a GPU hang too. Not sure if it's the same. No further log right now, I'll first check whether it's always at the same point (it happened in the intro at the exact moment the reaper shoots, so I think there is something that actually triggers it)
Comment 5 Christoph Haag 2015-08-01 09:27:39 UTC
Hasn't happened for a while, will reopen in case it happens again.
Comment 6 Paul 2015-08-04 01:58:09 UTC
Unfortunately I'm hit by this as well. Although I can't tell if it's fixed in the latest git.
Happened after a few minutes (First try 2 minutes, second try ~30 minutes) of gameplay on my HD 7970 - Tahiti XT. 
Mesa 10.6.3
Kernel 4.1.2
Arch, up-to-date

Game Settings: 
Global Shadow Quality - High
Model / Texture Detail -High 
Effect Detail - High
Shader Detail - Very High
Multicore Rendering - Enabled
MSAA - 8x
FSAA - Enabled
Texture Filter - Anisotropic 16x
Vsync - Disabled
Motion Blur - Enabled
Comment 7 Paul 2015-08-04 01:58:59 UTC
Created attachment 117500 [details]
dmesg csgo tahiti
Comment 8 Timothy Arceri 2018-04-03 03:36:02 UTC
Hi guys,

There have been many improvements since this was reported, and I don't see any other reports like this. Is this still happening or can we close this bug?
Comment 9 Matthew Dawson 2018-04-03 13:54:56 UTC
(In reply to Timothy Arceri from comment #8)
> Hi guys,
> 
> There have been many improvements since this was reported, and I don't see
> any other reports like this. Is this still happening or can we close this
> bug?

Hi Timothy,

I've been playing CSGO frequently on my 7970 with no issues on a recent graphics stack (Linux 4.14.10, Mesa 17.3.1 most recently) and have had no issues.  Thus I think this issue can be closed now.
Comment 10 GitLab Migration User 2019-09-25 17:52:05 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1217.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.