Created attachment 142050 [details]
gpu error dump
I'm just run Mad Max on integrated video, and caught an a segfault, dmesg says file this bug to here, and here we are :)
[ 2125.632326] [drm] GPU HANG: ecode 9:0:0x84d77efc, reason: no progress on rcs0, action: reset
[ 2125.632329] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 2125.632330] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 2125.632330] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 2125.632330] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 2125.632331] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 2125.632354] i915 0000:00:02.0: Resetting rcs0 for no progress on rcs0
hi Vova, it was one time segfault or in your case you can easily reproduce it?
not sure that I got the same crash, but I got something. Below all outputs:
MadMax: dumped to "/home/den/.local/share/feral-interactive/Mad Max/crashes/3b8053c2-376a-1c03-55c46659-1ad90210.dmp"
MadMax: crash reporter "/home/den/.steam/steam/steamapps/common/Mad Max/bin/feral_linux_crash_reporter" launching
Game crashed with signal 6
Vulkan call failed: -4
If possible, launch Steam from command line to check the output when the game is run.
Then, contact email@example.com with the details of the output, your Steam System Info, as well as the dump file:
[ 6334.854969] [drm] GPU HANG: ecode 9:0:0x85d7fcfb, in WinMain , reason: No progress on rcs0, action: reset
[ 6334.855042] i915 0000:00:02.0: Resetting rcs0 after gpu hang
continue investigation (btw, looks like using openGL there is no crash).
ok, here is new peace of information:
Checked 3 mesa versions:
>mesa-vulkan-drivers from repository (18.0.5-0ubuntu0~16.04.1)
>built from git mesa from 21.09
>built from git latest mesa (from 19.10)
works fine again
So, to summarize - with latest git mesa game should work fine. Vova, could you please clarify, what mesa version do you have?
From my side, to double-check, I will wait for new mesa release and check on it again.
It stable reproduced, all vulkan games will hang GPU. I think it a more kernel-space problem, because userspace do not should hang GPU.
mesa Version: 18.2.2-1
can confirm that mesa from git is not crash GPU
Best guess, it was fixed by this: https://gitlab.freedesktop.org/mesa/mesa/commit/0fa9e6d7b304f6a8064ed78a4b9c557e1026e7e5
I tried mesa without that commit, but it worked fine.
Are we interested in bisecting it? It is not straightforward because of different branches I think:
The merge base 8d3ccdbb9ba480dfe435023b747714cd517e5028 is bad.
This means the bug has been fixed between 8d3ccdbb9ba480dfe435023b747714cd517e5028 and [2bb05d70afe82fdc5e6d1d7c7bcbd8dc28df4b82].
If it's easy to reproduce, that'd be good. I don't like magically fixed bugs. :(
Here it is I think, commit which provided fix for the game:
f5bab06428fc7ca6116cf0daf1c237eb86202e7a is the first bad commit
Author: Jason Ekstrand <firstname.lastname@example.org>
Date: Tue Oct 2 17:19:32 2018 -0500
anv/batch_chain: Don't start a new BO just for BATCH_BUFFER_START
Previously, we just went ahead and emitted MI_BATCH_BUFFER_START as
normal. If we are near enough to the end, this can cause us to start a
new BO just for the MI_BATCH_BUFFER_START which messes up chaining. We
always reserve enough space at the end for an MI_BATCH_BUFFER_START so
we can just increment cmd_buffer->batch.end prior to emitting the
Fixes: a0b133286a3 "anv/batch_chain: Simplify secondary batch return..."
Tested-by: Alex Smith <email@example.com>
Reviewed-by: Lionel Landwerlin <firstname.lastname@example.org>
:040000 040000 37d291419a86e6fca5d872b7b53974d72167c57b 5dba5eacf4dcb36ccc35224107fa5d0a2806a937 M src
That also makes sense. Thanks for bisecting!