Summary: | [kbl] GPU hang on Mad Max vulkan | ||
---|---|---|---|
Product: | Mesa | Reporter: | Vova <vova7890> |
Component: | Drivers/Vulkan/intel | Assignee: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Status: | RESOLVED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | intel-gfx-bugs, jason |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | gpu error dump |
hi Vova, it was one time segfault or in your case you can easily reproduce it? not sure that I got the same crash, but I got something. Below all outputs: >game log: MadMax: dumped to "/home/den/.local/share/feral-interactive/Mad Max/crashes/3b8053c2-376a-1c03-55c46659-1ad90210.dmp" MadMax: crash reporter "/home/den/.steam/steam/steamapps/common/Mad Max/bin/feral_linux_crash_reporter" launching Game crashed with signal 6 Vulkan call failed: -4 If possible, launch Steam from command line to check the output when the game is run. Then, contact support@feralinteractive.com with the details of the output, your Steam System Info, as well as the dump file: /home/den/.local/share/feral-interactive/Mad Max/crashes/3b8053c2-376a-1c03-55c46659-1ad90210.dmp >dmesg output: [ 6334.854969] [drm] GPU HANG: ecode 9:0:0x85d7fcfb, in WinMain [6644], reason: No progress on rcs0, action: reset [ 6334.855042] i915 0000:00:02.0: Resetting rcs0 after gpu hang continue investigation (btw, looks like using openGL there is no crash). ok, here is new peace of information: Checked 3 mesa versions: >mesa-vulkan-drivers from repository (18.0.5-0ubuntu0~16.04.1) works fine >built from git mesa from 21.09 hangs exist >built from git latest mesa (from 19.10) works fine again So, to summarize - with latest git mesa game should work fine. Vova, could you please clarify, what mesa version do you have? From my side, to double-check, I will wait for new mesa release and check on it again. It stable reproduced, all vulkan games will hang GPU. I think it a more kernel-space problem, because userspace do not should hang GPU. mesa Version: 18.2.2-1 can confirm that mesa from git is not crash GPU Best guess, it was fixed by this: https://gitlab.freedesktop.org/mesa/mesa/commit/0fa9e6d7b304f6a8064ed78a4b9c557e1026e7e5 I tried mesa without that commit, but it worked fine. Are we interested in bisecting it? It is not straightforward because of different branches I think: The merge base 8d3ccdbb9ba480dfe435023b747714cd517e5028 is bad. This means the bug has been fixed between 8d3ccdbb9ba480dfe435023b747714cd517e5028 and [2bb05d70afe82fdc5e6d1d7c7bcbd8dc28df4b82]. If it's easy to reproduce, that'd be good. I don't like magically fixed bugs. :( Here it is I think, commit which provided fix for the game: f5bab06428fc7ca6116cf0daf1c237eb86202e7a is the first bad commit commit f5bab06428fc7ca6116cf0daf1c237eb86202e7a Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Tue Oct 2 17:19:32 2018 -0500 anv/batch_chain: Don't start a new BO just for BATCH_BUFFER_START Previously, we just went ahead and emitted MI_BATCH_BUFFER_START as normal. If we are near enough to the end, this can cause us to start a new BO just for the MI_BATCH_BUFFER_START which messes up chaining. We always reserve enough space at the end for an MI_BATCH_BUFFER_START so we can just increment cmd_buffer->batch.end prior to emitting the command. Fixes: a0b133286a3 "anv/batch_chain: Simplify secondary batch return..." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926 Tested-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> :040000 040000 37d291419a86e6fca5d872b7b53974d72167c57b 5dba5eacf4dcb36ccc35224107fa5d0a2806a937 M src That also makes sense. Thanks for bisecting! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 142050 [details] gpu error dump I'm just run Mad Max on integrated video, and caught an a segfault, dmesg says file this bug to here, and here we are :) [ 2125.632326] [drm] GPU HANG: ecode 9:0:0x84d77efc, reason: no progress on rcs0, action: reset [ 2125.632329] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 2125.632330] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 2125.632330] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 2125.632330] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 2125.632331] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 2125.632354] i915 0000:00:02.0: Resetting rcs0 for no progress on rcs0