Bug 104899 - GPU HANG after entering a match in Team Fortress 2. fedora 27
Summary: GPU HANG after entering a match in Team Fortress 2. fedora 27
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: medium blocker
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-01 09:43 UTC by avinoash
Modified: 2018-02-05 09:18 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
GPU crash dump (355.22 KB, application/gzip)
2018-02-01 09:48 UTC, avinoash
Details
dmesg from boot to crash (19.58 KB, application/gzip)
2018-02-01 09:50 UTC, avinoash
Details

Note You need to log in before you can comment on or make changes to this bug.
Description avinoash 2018-02-01 09:43:31 UTC
Steps to reproduce the issue:
boot PC => launch Steam => start Team Fortress 2 => enter a match => gpu hangs.
the above steps trigger the issue every time (always),
game is not playable as it freezes the PC till game crashs to desktop after a minute.


the journal that brought me here:
kernel: [drm] GPU HANG: ecode 9:0:0x85dffffb, in hl2_linux [3347], reason: Hang on rcs0, action: reset
kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
kernel: [drm] GPU crash dump saved to /sys/class/drm/card1/error


i915 platform:
~]$ sudo lshw -c video
  *-display                 
       description: VGA compatible controller
       product: Skylake GT2 [HD Graphics 520]
       vendor: Intel Corporation
       physical id: 2
       bus info: pci@0000:00:02.0
       version: 07
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
       configuration: driver=i915 latency=0
       resources: irq:130 memory:e1000000-e1ffffff memory:c0000000-cfffffff ioport:f000(size=64) memory:c0000-dffff
  *-display
       description: Display controller
       product: Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/M445]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:01:00.0
       version: 81
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi bus_master cap_list rom
       configuration: driver=amdgpu latency=0
       resources: irq:129 memory:d0000000-dfffffff memory:e0000000-e01fffff ioport:e000(size=256) memory:e0200000-e023ffff memory:e0240000-e025ffff

~]$ modinfo i915 #excluding aliases and signature from the output
filename:       /lib/modules/4.14.14-300.fc27.x86_64/kernel/drivers/gpu/drm/i915/i915.ko.xz
license:        GPL and additional rights
description:    Intel Graphics
author:         Intel Corporation
author:         Tungsten Graphics, Inc.
firmware:       i915/bxt_dmc_ver1_07.bin
firmware:       i915/skl_dmc_ver1_26.bin
firmware:       i915/kbl_dmc_ver1_01.bin
firmware:       i915/kbl_guc_ver9_14.bin
firmware:       i915/bxt_guc_ver8_7.bin
firmware:       i915/skl_guc_ver6_1.bin
firmware:       i915/kbl_huc_ver02_00_1810.bin
firmware:       i915/bxt_huc_ver01_07_1398.bin
firmware:       i915/skl_huc_ver01_07_1398.bin
depends:        drm_kms_helper,drm,video,i2c-algo-bit
intree:         Y
name:           i915
vermagic:       4.14.14-300.fc27.x86_64 SMP mod_unload 
sig_id:         PKCS#7
signer:         
sig_key:        
sig_hashalgo:   md4
parm:           modeset:Use kernel modesetting [KMS] (0=disable, 1=on, -1=force vga console preference [default]) (int)
parm:           panel_ignore_lid:Override lid status (0=autodetect, 1=autodetect disabled [default], -1=force lid closed, -2=force lid open) (int)
parm:           semaphores:Use semaphores for inter-ring sync (default: -1 (use per-chip defaults)) (int)
parm:           enable_rc6:Enable power-saving render C-state 6. Different stages can be selected via bitmask values (0 = disable; 1 = enable rc6; 2 = enable deep rc6; 4 = enable deepest rc6). For example, 3 would enable rc6 and deep rc6, and 7 would enable everything. default: -1 (use per-chip default) (int)
parm:           enable_dc:Enable power-saving display C-states. (-1=auto [default]; 0=disable; 1=up to DC5; 2=up to DC6) (int)
parm:           enable_fbc:Enable frame buffer compression for power savings (default: -1 (use per-chip default)) (int)
parm:           lvds_channel_mode:Specify LVDS channel mode (0=probe BIOS [default], 1=single-channel, 2=dual-channel) (int)
parm:           lvds_use_ssc:Use Spread Spectrum Clock with panels [LVDS/eDP] (default: auto from VBT) (int)
parm:           vbt_sdvo_panel_type:Override/Ignore selection of SDVO panel mode in the VBT (-2=ignore, -1=auto [default], index in VBT BIOS table) (int)
parm:           reset:Attempt GPU resets (0=disabled, 1=full gpu reset, 2=engine reset [default]) (int)
parm:           vbt_firmware:Load VBT from specified file under /lib/firmware (charp)
parm:           error_capture:Record the GPU state following a hang. This information in /sys/class/drm/card<N>/error is vital for triaging and debugging hangs. (bool)
parm:           enable_hangcheck:Periodically check GPU activity for detecting hangs. WARNING: Disabling this can cause system wide hangs. (default: true) (bool)
parm:           enable_ppgtt:Override PPGTT usage. (-1=auto [default], 0=disabled, 1=aliasing, 2=full, 3=full with extended address space) (int)
parm:           enable_execlists:Override execlists usage. (-1=auto [default], 0=disabled, 1=enabled) (int)
parm:           enable_psr:Enable PSR (0=disabled, 1=enabled - link mode chosen per-platform, 2=force link-standby mode, 3=force link-off mode) Default: -1 (use per-chip default) (int)
parm:           alpha_support:Enable alpha quality driver support for latest hardware. See also CONFIG_DRM_I915_ALPHA_SUPPORT. (bool)
parm:           disable_power_well:Disable display power wells when possible (-1=auto [default], 0=power wells always on, 1=power wells disabled when possible) (int)
parm:           enable_ips:Enable IPS (default: true) (int)
parm:           fastboot:Try to skip unnecessary mode sets at boot time (default: false) (bool)
parm:           prefault_disable:Disable page prefaulting for pread/pwrite/reloc (default:false). For developers only. (bool)
parm:           load_detect_test:Force-enable the VGA load detect code for testing (default:false). For developers only. (bool)
parm:           force_reset_modeset_test:Force a modeset during gpu reset for testing (default:false). For developers only. (bool)
parm:           invert_brightness:Invert backlight brightness (-1 force normal, 0 machine defaults, 1 force inversion), please report PCI device ID, subsystem vendor and subsystem device ID to dri-devel@lists.freedesktop.org, if your machine needs it. It will then be included in an upcoming module version. (int)
parm:           disable_display:Disable display (default: false) (bool)
parm:           enable_cmd_parser:Enable command parsing (true=enabled [default], false=disabled) (bool)
parm:           use_mmio_flip:use MMIO flips (-1=never, 0=driver discretion [default], 1=always) (int)
parm:           mmio_debug:Enable the MMIO debug code for the first N failures (default: off). This may negatively affect performance. (int)
parm:           verbose_state_checks:Enable verbose logs (ie. WARN_ON()) in case of unexpected hw state conditions. (bool)
parm:           nuclear_pageflip:Force enable atomic functionality on platforms that don't have full support yet. (bool)
parm:           edp_vswing:Ignore/Override vswing pre-emph table selection from VBT (0=use value from vbt [default], 1=low power swing(200mV),2=default swing(400mV)) (int)
parm:           enable_guc_loading:Enable GuC firmware loading (-1=auto, 0=never [default], 1=if available, 2=required) (int)
parm:           enable_guc_submission:Enable GuC submission (-1=auto, 0=never [default], 1=if available, 2=required) (int)
parm:           guc_log_level:GuC firmware logging level (-1:disabled (default), 0-3:enabled) (int)
parm:           guc_firmware_path:GuC firmware path to use instead of the default one (charp)
parm:           huc_firmware_path:HuC firmware path to use instead of the default one (charp)
parm:           enable_dp_mst:Enable multi-stream transport (MST) for new DisplayPort sinks. (default: true) (bool)
parm:           inject_load_failure:Force an error after a number of failure check points (0:disabled (default), N:force failure at the Nth failure check point) (uint)
parm:           enable_dpcd_backlight:Enable support for DPCD backlight control (default:false) (bool)
parm:           enable_gvt:Enable support for Intel GVT-g graphics virtualization host support(default:false) (bool)


system architecture:
~]$ uname -m
x86_64


kernel version:
~]$ uname -r
4.14.14-300.fc27.x86_64


Linux distribution:
~]$ cat /etc/fedora-release 
Fedora release 27 (Twenty Seven)


Machine or mother board model:
~]$ dmidecode -t 2
# dmidecode 3.1
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
	Manufacturer: Dell Inc.
	Product Name: 0DVKGM
	Version: A00
	Serial Number: /F89MFC2/CN129636620011/
	Asset Tag: Not Specified
	Features:
		Board is a hosting board
		Board is replaceable
	Location In Chassis: Not Specified
	Chassis Handle: 0x0003
	Type: Motherboard
	Contained Object Handles: 0


Display connector:
I have three monitors, each connected to my docking station via a different connector: VGA, DVI, DP.
But the also gpu hang happens if I only use the build in display alone without the docking statuion.


A full dmesg with debug information:
attached.
(I dont think it has the debug information you are looking for...
I was not sure how to produce it, what's the "kernel command line"?..)

GPU crash dump:
attached.
(sorry for not bz2'ing the file...)
Comment 1 avinoash 2018-02-01 09:48:32 UTC
Created attachment 137107 [details]
GPU crash dump
Comment 2 avinoash 2018-02-01 09:50:30 UTC
Created attachment 137108 [details]
dmesg from boot to crash
Comment 3 Chris Wilson 2018-02-01 09:53:10 UTC
This should be fixed in mesa-17.3 (possibly even mesa-17.2 iirc).
Comment 4 avinoash 2018-02-01 12:14:15 UTC
(In reply to Chris Wilson from comment #3)
> This should be fixed in mesa-17.3 (possibly even mesa-17.2 iirc).

Thanks for the reply Chris.

I've downloaded mesa-17.3.3 and followed the install guide as described at https://www.mesa3d.org/install.html.

the system is still using the 17.2.4-3.fc27 version.


can you point me on how to replace the existing "build-in" version with the new one?


thanks
Comment 5 Elizabeth 2018-02-01 17:58:20 UTC
Hello Avinoash,
If you did only ./configure, mesa will be installed in /usr/local, make sure to install in the right path, for fedora 64 should be something like prefix /usr and liddir /usr/lib64, but not sure at all.
Check https://www.mesa3d.org/autoconf.html for more info. 

You should use something like this:
./autogen.sh --prefix=/usr --libdir=/usr/lib64 --with-dri-driverdir=/usr/lib64/lib/dri
Comment 6 avinoash 2018-02-04 11:37:19 UTC
(In reply to Elizabeth from comment #5)
> Hello Avinoash,
> If you did only ./configure, mesa will be installed in /usr/local, make sure
> to install in the right path, for fedora 64 should be something like prefix
> /usr and liddir /usr/lib64, but not sure at all.
> Check https://www.mesa3d.org/autoconf.html for more info. 
> 
> You should use something like this:
> ./autogen.sh --prefix=/usr --libdir=/usr/lib64
> --with-dri-driverdir=/usr/lib64/lib/dri

tried, didnt help.
I'll search the web for more mesa upgrading guides.

I'll update if i find any luck.

thanks
Comment 7 avinoash 2018-02-05 09:18:23 UTC
Its working!

installed both repo's here: https://copr.fedorainfracloud.org/coprs/che/mesa/
then ran:
dnf update --best --allowerasing

make sure you do NOT remove steam in the process...

and make sure the right mesa packages are being updated.

good luck and thanks for the help.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.