105207 – The Talos Principle freezes system using radv

Bug 105207 - The Talos Principle freezes system using radv

Summary: The Talos Principle freezes system using radv

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/Vulkan/radeon (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	mesa-dev
QA Contact:	mesa-dev

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-02-22 11:02 UTC by pritzl3452
Modified:	2018-12-18 20:27 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
dmesg (76.07 KB, text/plain) 2018-02-22 11:02 UTC, pritzl3452	Details
tracefile (10.03 KB, text/plain) 2018-02-22 23:11 UTC, pritzl3452	Details
steam output (50.03 KB, text/plain) 2018-02-25 13:03 UTC, pritzl3452	Details
tracefile (10.03 KB, text/plain) 2018-02-26 22:47 UTC, pritzl3452	Details
backtrace (13.54 KB, text/plain) 2018-02-26 22:48 UTC, pritzl3452	Details
tracefile (10.03 KB, text/plain) 2018-02-26 23:43 UTC, pritzl3452	Details
backtrace (15.47 KB, text/plain) 2018-02-26 23:43 UTC, pritzl3452	Details
Show Obsolete (3) View All

Description pritzl3452 2018-02-22 11:02:22 UTC

Created attachment 137529 [details]
dmesg

When playing The Talos principle at 3840*2160 and using "Max 3D Rendering MPIX" set to 8.4 (4K 2160) the game frequently hangs the system. If I lower the setting to 1.8 (1680x1050) the game does not seem to hang.
This has happened for as long as I can remember I just have not filed a bug until now.

I can ssh to the system after this happens but I can not reboot the system.

I am using a Vega 64, linux 4.16-rc2 and Mesa 18.0.0-rc4. Please let me know if I can provide any logs that could help

Attaching dmesg after the hang has happened.

Comment 1 Bas Nieuwenhuizen 2018-02-22 21:17:03 UTC

Curious, I haven't noticed that and that sounds really similar to my benchmarking setup.

Not being able to reboot is a generic amdgpu problem after a hang.

It happening for so long as you can remember, how long are we talking about? What kernels have you used before where you had the issue and what mesa versions?

What could be useful is starting Talos with

RADV_TRACE_FILE=/some/file/name RADV_DEBUG=allbos,syncshaders

You can do this for example by right clicking on the game name in the list -> properties -> set launch options ->

RADV_TRACE_FILE=/some/file/name RADV_DEBUG=allbos,syncshaders %command%

Given the nature of the issue, it is possible we crash after the hang while trying to produce the trace file. In that case a stacktrace would be useful to narrow down which packet we define illegally (though may need debug symbols to be useful).

Comment 2 pritzl3452 2018-02-22 23:10:44 UTC

I'm not sure for how long this has been happening. Definitely for as long as I had my Vega, so at least since the middle of october.
I almost always run the latest rc kernels and used to use Mesa from git but have been using 17.3 and now 18.0-rcs lately.

When I add RADV_TRACE_FILE=/home/erik/temp/tracefile RADV_DEBUG=allbos,syncshaders %command% starting the game freezes the desktop. The screen turns black as the game starts but nothing more happens.

I can ssh in and see this in dmesg:

[  142.816155] [drm:amdgpu_job_timedout] *ERROR* ring comp_1.1.0 timeout, last signaled seq=2, last emitted seq=3
[  142.816162] [drm] No hardware hang detected. Did some blocks stall?

I dont know if the tracefile is useful in this case but I'm attaching it to the bug.

Comment 3 pritzl3452 2018-02-22 23:11:31 UTC

Created attachment 137546 [details]
tracefile

Comment 4 pritzl3452 2018-02-23 08:41:58 UTC

I should note that I have not played Talos very often (since it hangs my computer) so it might have worked fine at times. I just tried it now and then and gave up for a month or 2 when I got a hang to try with a new kernel or a more recent mesa etc. I do feel this was happening with my earlier card, a Fury X but I'm not sure.

Comment 5 pritzl3452 2018-02-25 13:03:13 UTC

I dont seen to be able to get a backtrace. When I try to attach the Talos process after a hang gdb just sits there trying to attach to the process. I'm not a developer so I dont really know what else to try.

I'm attaching the output from steam as there seems to be some more information in there.

Comment 6 pritzl3452 2018-02-25 13:03:58 UTC

Created attachment 137591 [details]
steam output

Comment 7 pritzl3452 2018-02-26 22:47:15 UTC

Managed to get a core dump finally. I am now on LLVM 6.0.0-rc3 and Mesa 18.0.0-rc4 and the hang still happens.

Attaching a new tracefile and a backtrace from the same crash. I compiled LLVM and Mesa with -ggdb. Please let me know if something more needs to have debugging symbols and I'll get a new backtrace.

Comment 8 pritzl3452 2018-02-26 22:47:51 UTC

Created attachment 137624 [details]
tracefile

Comment 9 pritzl3452 2018-02-26 22:48:34 UTC

Created attachment 137625 [details]
backtrace

Comment 10 pritzl3452 2018-02-26 23:43:34 UTC

Created attachment 137626 [details]
tracefile

I saw that alot of the values were optimised out so I compiled Mesa with -O0 and got a new set of files.

Sorry for all the noise.

Comment 11 pritzl3452 2018-02-26 23:43:57 UTC

Created attachment 137627 [details]
backtrace

Comment 12 Samuel Pitoiset 2018-12-18 14:47:15 UTC

Are you still able to reproduce this? FWIW, I launch Talos with different settings almost every week and I never got any GPU hangs.

Comment 13 pritzl3452 2018-12-18 20:27:37 UTC

I just played a little bit and there was a spot it always hung for me but now it works fine.

I havent tried the game since I last updated the bug I think until today.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.