Bug 104712 - [APL GLK SKL] Source engine games GPU hang
Summary: [APL GLK SKL] Source engine games GPU hang
Status: CLOSED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 17.3
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Hector Velazquez
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-20 16:09 UTC by Alejandro Lorenzo
Modified: 2018-02-09 15:49 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
/sys/class/drm/card0/error (45.14 KB, text/plain)
2018-01-20 16:09 UTC, Alejandro Lorenzo
Details
GLK error file (/sys/class/drm/card0/error) (201.21 KB, text/plain)
2018-02-02 21:32 UTC, Hector Velazquez
Details
GLK kernel log (289.56 KB, text/plain)
2018-02-02 21:33 UTC, Hector Velazquez
Details
GLK Xorg log (6.03 KB, text/plain)
2018-02-02 21:33 UTC, Hector Velazquez
Details

Description Alejandro Lorenzo 2018-01-20 16:09:43 UTC
Created attachment 136869 [details]
/sys/class/drm/card0/error

While working in eclipse Version: Kepler Service Release 2 the GPU hangs. After 30 seconds the session is reset.

Crash occurs somewhat between 1 to 15 minutes, usually on the early side.

It occurs with kernel 4.14.14 and 4.14.4.

Mesa version in which if crashes are: 17.3.2, 17.3.3 and 17.2.5
Comment 1 Alejandro Lorenzo 2018-01-21 17:12:48 UTC
Also happens with kernel 4.13.12
Comment 2 Hector Velazquez 2018-02-02 21:31:22 UTC
This Steam games has a GPU hang on GLK QA 

Games:

Half Life 2
Left 4 Dead 2
Team Fortress
Counter strike

Note: kernel log, error (/sys/class/drm/card0/error), Xorg log, dmesg-H has been attached...

======================================
        Steps To reproduce:
======================================

1)Install Ubuntu 17.10 (Artful)
2)Setup/Install all graphics components listed below
3)Install Steam with the reported games
4)Setup as modesetting graphics option
5)Start any of listed Steam game
6)Play the game...
7)After a short time the platform not respond (Hang)...

======================================
        Kernel log sample
======================================
. . .
Feb  1 14:21:43 gfx-desktop kernel: [  381.660139] [drm:edp_panel_vdd_off_sync [i915]] Turning eDP port A VDD off
Feb  1 14:21:43 gfx-desktop kernel: [  381.660172] [drm:edp_panel_vdd_off_sync [i915]] PP_STATUS: 0x80000008 PP_CONTROL: 0x00000067
Feb  1 14:21:43 gfx-desktop kernel: [  381.660201] [drm:intel_power_well_disable [i915]] disabling AUX A
Feb  1 14:21:43 gfx-desktop kernel: [  381.660228] [drm:intel_power_well_disable [i915]] disabling DC off
Feb  1 14:21:43 gfx-desktop kernel: [  381.660261] [drm:gen9_enable_dc5 [i915]] Enabling DC5
Feb  1 14:21:43 gfx-desktop kernel: [  381.660291] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 01
Feb  1 14:25:07 gfx-desktop kernel: [  585.788344] [drm:missed_breadcrumb [i915]] rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x5a/0x80 [i915], irq posted? yes, current seqno=b4e1, last=b4ea
Feb  1 14:25:13 gfx-desktop kernel: [  591.120229] [drm] GPU HANG: ecode 9:0:0x85dffffb, in hl2_linux [2556], reason: Hang on rcs0, action: reset
Feb  1 14:25:13 gfx-desktop kernel: [  591.120232] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Feb  1 14:25:13 gfx-desktop kernel: [  591.120232] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Feb  1 14:25:13 gfx-desktop kernel: [  591.120233] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Feb  1 14:25:13 gfx-desktop kernel: [  591.120233] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Feb  1 14:25:13 gfx-desktop kernel: [  591.120234] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Feb  1 14:25:13 gfx-desktop kernel: [  591.120404] [drm:i915_reset_device [i915]] resetting chip
Feb  1 14:25:13 gfx-desktop kernel: [  591.120437] i915 0000:00:02.0: Resetting chip after gpu hang
Feb  1 14:25:13 gfx-desktop kernel: [  591.122378] [drm:i915_gem_reset_engine [i915]] context hl2_linux[2556]/1 marked guilty (score 10) banned? no
Feb  1 14:25:13 gfx-desktop kernel: [  591.122433] [drm:i915_gem_reset_engine [i915]] resetting rcs0 to restart from tail of request 0xb4e2
Feb  1 14:25:13 gfx-desktop kernel: [  591.122480] [drm] RC6 on
Feb  1 14:25:13 gfx-desktop kernel: [  591.122593] [drm:gen8_init_common_ring [i915]] Execlists enabled for rcs0
Feb  1 14:25:13 gfx-desktop kernel: [  591.122622] [drm:gen8_init_common_ring [i915]] Restarting rcs0:0 from 0xb4ea
Feb  1 14:25:13 gfx-desktop kernel: [  591.122651] [drm:gen8_init_common_ring [i915]] Restarting rcs0:1 from 0xb4e6
Feb  1 14:25:13 gfx-desktop kernel: [  591.122690] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 13
Feb  1 14:25:13 gfx-desktop kernel: [  591.122775] [drm:gen8_init_common_ring [i915]] Execlists enabled for bcs0
Feb  1 14:25:13 gfx-desktop kernel: [  591.122860] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs0
Feb  1 14:25:13 gfx-desktop kernel: [  591.122945] [drm:gen8_init_common_ring [i915]] Execlists enabled for vecs0
Feb  1 14:25:13 gfx-desktop kernel: [  591.123013] [drm:intel_huc_init_hw [i915]] i915/glk_huc_ver02_00_1748.bin fw status: fetch SUCCESS, load SUCCESS
Feb  1 14:25:13 gfx-desktop kernel: [  591.124167] [drm:intel_huc_init_hw [i915]] HuC DMA transfer wait over with ret 0
Feb  1 14:25:13 gfx-desktop kernel: [  591.124215] [drm:intel_huc_init_hw [i915]] i915/glk_huc_ver02_00_1748.bin fw status: fetch SUCCESS, load SUCCESS
Feb  1 14:25:13 gfx-desktop kernel: [  591.124281] [drm:intel_guc_init_hw [i915]] GuC fw status: path i915/glk_guc_ver10_56.bin, fetch SUCCESS, load SUCCESS
Feb  1 14:25:13 gfx-desktop kernel: [  591.124318] [drm:intel_guc_init_hw [i915]] GuC fw status: fetch SUCCESS, load PENDING
Feb  1 14:25:13 gfx-desktop kernel: [  591.126647] [drm:guc_ucode_xfer_dma [i915]] DMA status 0x10, GuC status 0x8002f0ec
Feb  1 14:25:13 gfx-desktop kernel: [  591.126678] [drm:guc_ucode_xfer_dma [i915]] returning 0
Feb  1 14:25:13 gfx-desktop kernel: [  591.126679] [drm] GuC submission enabled (firmware i915/glk_guc_ver10_56.bin [version 10.56])
Feb  1 14:25:13 gfx-desktop kernel: [  591.128334] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:76:eDP-1]
. . .

======================================
	     Software
======================================
kernel version              : 4.14.144-14-14-stable-kernel-from-kernel-org
hostname                    : gfx-desktop
architecture                : x86_64
os version                  : Ubuntu 17.10
os codename                 : artful
kernel driver               : i915
bios revision               : 77.50
bios release date           : 12/07/2017
ksc                         : 1.41
hardware acceleration       : disabled
swap partition              : enabled on (/dev/sda2)

======================================
	Graphic drivers
======================================
libdrm                      : 2.4.89
vaapi (intel-driver)        : Intel i965 driver for Intel(R) Gemini Lake - 2.0.0
cairo                       : 1.14.10

======================================
	     Firmware
======================================
dmc fw loaded             : yes
dmc version               : 1.4
guc fw loaded             : SUCCESS
guc version wanted        : 10.56
guc version found         : 10.56
huc fw loaded             : yes

======================================
	Graphic Stack Recipe
======================================
Component: drm
    tag: libdrm-2.4.89
    commit: 27b7d2d998e8ca3b5dffeb00a60c1a09a92c388d

Component: mesa
    tag: mesa-17.3.2
    commit: 0f27052e325c3617e437912d0a3acaf3e3afd786

Component: macros
    tag: util-macros-1.19.1-3-g6694c97
    commit: 6694c973c8c2b5fae5934a49578f69d2817ab49c

Component: xproto
    tag: xproto-7.0.31-6-gab86666
    commit: ab8666661fc68f075b8d6ffabe22c6b577c30ac1

Component: glproto
    tag: glproto-1.4.17-8-g3f6d569
    commit: 3f6d569b583e3df9ca130f00548837e560d185c3

Component: dri2proto
    tag: dri2proto-2.8-4-gb118dfb
    commit: b118dfbf91dcec6d82dfddc3f41031e23ea3c039

Component: xserver
    tag: xorg-server-1.19.6
    commit: ebfb06b11955a6c32500b7086be912ab96b753a7

Component: libXfont
    tag: libXfont2-2.0.3
    commit: cdb2f990348c3bd1407022f7e0e5fcba552d539f

Component: xf86-input-evdev
    tag: xf86-input-evdev-2.10.5-4-g192fdb0
    commit: 192fdb06905f0f190e3a0e258919676934e6633c

Component: xf86-input-libinput
    tag: xf86-input-libinput-0.26.0
    commit: 2be6487de417473aac85ebd800392cdd8604c4a6

Component: xf86-video-fbdev
    tag: xf86-video-fbdev-0.4.4-11-g3cf9923
    commit: 3cf99231199bd5bd9e681e85d9da1f9eb736e3e7

Component: xf86-video-vesa
    tag: xf86-video-vesa-2.3.4-5-gb9f9c95
    commit: b9f9c95ca2383460aa283adeeee6e0a66eed722b

Component: xf86-video-vmware
    tag: xf86-video-vmware-13.0.2-57-gc0a2f40
    commit: c0a2f40d978e77287d0cac95254fb6f26b2449a8

Component: xf86-video-qxl
    tag: xf86-video-qxl-0.1.5-9-gee8f904
    commit: ee8f904ab0d590c741e640e9548c472e6a58b3cc

Component: xf86-video-chips
    tag: xf86-video-chips-1.2.7-5-gc2711ee
    commit: c2711eedaac20af973721111a909a6f575078410

Component: x11proto
    tag: xproto-7.0.31-6-gab86666
    commit: ab8666661fc68f075b8d6ffabe22c6b577c30ac1

Component: libxtrans
    tag: xtrans-1.3.5-12-g2836667
    commit: 28366676effaa512e43bfd2276a317389a992511

Component: libX11
    tag: libX11-1.6.5-21-ge835a9d
    commit: e835a9dcc3362b5e92893be756dd7ae361e64ced

Component: libXext
    tag: libXext-1.3.3-7-ga07b4bb
    commit: a07b4bb8290d0c1bba7bcecd5bb6896fbe1b169c

Component: xrdb
    tag: xrdb-1.1.0-15-gae86081
    commit: ae86081a92522653ff1523c92524ff892f75d496

Component: xf86-video-intel
    tag: 2.99.917-805-g26f5406
    commit: 26f5406841f3924f23f29df61b5ea53d2816b665

Component: xkbcomp
    tag: xkbcomp-1.4.0-1-g3e2a6ad
    commit: 3e2a6ad4edfbf21c3f76f8319f0039b7f589944f

Component: xf86-input-wacom
    tag: xf86-input-wacom-0.34.2-34-g4cc67d1
    commit: 4cc67d161123774f79d5830cd87d7adddc31bf4c

Component: pixman
    tag: pixman-0.33.6-25-g8b95e0e
    commit: 8b95e0e460baa499e54c19d29bf761d34c25badc

Component: libpciaccess
    tag: libpciaccess-0.14
    commit: 13854f603f720c45caf51d785a874d3c7e8c5f58

Component: libinput
    tag: 1.8.1
    commit: cc9a4debd3889a3b3a5139576ea873eebcf7dde7

Component: xkeyboard-config
    tag: xkeyboard-config-2.22-4-gf734a19
    commit: f734a19420ab6d37dee51c6ace7d6b6aeb85f967

Component: xf86-input-mouse
    tag: xf86-input-mouse-1.9.2-4-g3c8f243
    commit: 3c8f243b750a92d5837a449d344ff884dbd02b57

Component: xf86-input-keyboard
    tag: xf86-input-keyboard-1.9.0-3-g940f441
    commit: 940f44149d1037cfc14bbb3628044a2bd002c33e

Component: xf86-input-synaptics
    tag: xf86-input-synaptics-1.9.0-4-g59eb0c3
    commit: 59eb0c372b615fce5039e69b5067adc0efe5b64b

Component: libva
    tag: libva-2.0.0
    commit: 24cbd05c9477ab59cc5665989026b4ad17e06eb6

Component: libva-utils
    tag: 2.0.0
    commit: 505216d0e0d30273f665c888d4c60aa694060729

Component: intel-vaapi-driver
    tag: 2.0.0
    commit: caca038ab7c348d1bbadfd411587735112da9aa4

Component: cairo
    tag: 1.15.8-76-g6b05938
    commit: 95c464d5feaae58b6cc0990434ce2498cc315dc6

Component: intel-gpu-tools
    tag: intel-gpu-tools-1.20-286-g7b685d5
    commit: 7b685d5790c1770eeac43c17d6b207a6df602985

Component: piglit
    tag: piglit-v1
    commit: 938ec48e2575b78defd06d169f704ed8d4f11bce

======================================
	     Hardware
======================================
platform                   : Geminilake
motherboard model          : Geminilake
motherboard id             : GLKRVP1DDR4(05)
form factor                : Hand Held
manufacturer               : IntelCorp.
cpu family                 : Pentium
cpu family id              : 6
cpu information            : Intel(R) Pentium(R) Silver N5000 CPU @ 1.10GHz
gpu card                   : Intel Corporation Device 3184 (rev 03) (prog-if 00 [VGA controller])
memory ram                 : 7.63 GB
max memory ram             : 16 GB
cpu thread                 : 4
cpu core                   : 4
cpu model                  : 122
cpu stepping               : 1
socket                     : Other
signature                  : Type 0, Family 6, Model 122, Stepping 1
hard drive                 : 223GiB (240GB)
current cd clock frequency : 79200 kHz
maximum cd clock frequency : 316800 kHz
displays connected         : eDP-1

======================================
	     kernel parameters
======================================
quiet splash drm.debug=0x1e fsck.repair=yes i915.alpha_support=1 i915.enable_guc_loading=2 i915.enable_guc_submission=2 resume=/dev/sda2
Comment 3 Hector Velazquez 2018-02-02 21:32:30 UTC
Created attachment 137139 [details]
GLK error file (/sys/class/drm/card0/error)
Comment 4 Hector Velazquez 2018-02-02 21:33:10 UTC
Created attachment 137140 [details]
GLK kernel log
Comment 5 Hector Velazquez 2018-02-02 21:33:42 UTC
Created attachment 137141 [details]
GLK Xorg log
Comment 6 Octavio 2018-02-02 22:06:53 UTC
This Steam games has a GPU hang on SKL  QA 

Games:

Half Life 2
Left 4 Dead 2
Team Fortress
Counter strike

same full configuration like comment 2 

kernel log

[ 1577.600783] [drm:skl_enable_dc6 [i915]] Enabling DC6
[ 1577.600795] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 02
[ 1577.842856] [drm] GPU HANG: ecode 9:0:0x85dffffb, in hl2_linux [4036], reason: Hang on rcs0, action: reset
[ 1577.842858] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Comment 7 Armando Antonio 2018-02-06 15:06:28 UTC
The following Steam games has a GPU hang on APL 

Games:

Half Life 2
Left 4 Dead 2
Team Fortress

same full configuration like comment 2 

kernel log

[ 1516.783918] MatQueue0[2503]: segfault at fffffffc ip 00000000e87daf88 sp 00000000c9e92a10 error 4 in client.so[e7887000+2055000]
[ 2015.669006] perf: interrupt took too long (7917 > 7905), lowering kernel.perf_event_max_sample_rate to 25250
[ 2363.669288] hl2_linux[2768]: segfault at 0 ip           (null) sp 00000000fff58c8c error 14 in hl2_linux[8048000+1000]
[ 2685.824565] i915 0000:00:02.0: Resetting chip after gpu hang
[ 2685.829284] [drm] RC6 on
[ 2685.833411] [drm] GuC submission enabled (firmware i915/bxt_guc_ver8_7.bin [version 8.7])
[ 2693.840908] i915 0000:00:02.0: Resetting chip after gpu hang
[ 2693.859022] [drm:i915_reset [i915]] *ERROR* GPU recovery failed
[ 2694.392773] rfkill: input handler enabled
Comment 8 Armando Antonio 2018-02-06 15:23:59 UTC
Counter-strike is failing on APL with  gpu hang too
Comment 9 Mark Janes 2018-02-06 18:35:40 UTC
Why are you testing with the GuC enabled?  Retest all of this with the guc disabled.

Provide an apitrace for at least one of the workloads.  Test with Linux 4.15 and Mesa 18.0rc3.
Comment 10 Alejandro Lorenzo 2018-02-06 19:20:00 UTC
Are you speaking about my original report or this other hang with steam games ?

Are they the same hang ? Should it be in different reports ?
Comment 11 Kenneth Graunke 2018-02-06 19:47:05 UTC
(In reply to Alejandro Lorenzo from comment #10)
> Are you speaking about my original report or this other hang with steam
> games ?
> 
> Are they the same hang ? Should it be in different reports ?

Sorry, Alejandro - looks like people took over your report for some unrelated issue by mistake.  I've split your original report out into a new bug,

https://bugs.freedesktop.org/show_bug.cgi?id=104974

We'll turn this one into the Valve games hang bug, since it contains so much about that now.  (Unfortunately, splitting bugs is hard to do in Bugzilla...)
Comment 12 Kenneth Graunke 2018-02-06 19:53:00 UTC
About the Valve hangs - 17.3.2 has known GPU hangs in this regard.  Please do try Mesa master or 18.0-rc3.
Comment 13 Hector Velazquez 2018-02-08 22:48:35 UTC
This Steam games has worked good on GLK QA with modesetting and sna modes...

Games:

Half Life 2
Left 4 Dead 2
Team Fortress
Counter strike

I tested with same settings in comment 2, except the mesa version:

(10:14 AM) [gfx@gfx-desktop] [~]$ : glxinfo | grep -i version
server glx version string: 1.4
client glx version string: 1.4
GLX version: 1.4
    Version: 18.1.0
    Max core profile version: 3.3
    Max compat profile version: 3.0
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.0
OpenGL version string: 3.0 Mesa 18.1.0-devel
OpenGL shading language version string: 1.30

======================================
       configuration
======================================
. . .
root@gfx-desktop:/home/gfx# cat /home/gfx/.local/share/xorg/Xorg.0.log | grep -i modesetting
[   130.341] (==) Matched modesetting as autoconfigured driver 2
[   130.343] (II) LoadModule: "modesetting"
[   130.343] (II) Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so
[   130.344] (II) Module modesetting: vendor="X.Org Foundation"
[   130.347] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
root@gfx-desktop:/home/gfx# lspci -vnn | grep -i "kernel driver in use"
        Kernel driver in use: i915
        Kernel driver in use: snd_hda_intel
        Kernel driver in use: mei_me
        Kernel driver in use: ahci
        Kernel driver in use: pcieport
        Kernel driver in use: xhci_hcd
        Kernel driver in use: intel-lpss
        Kernel driver in use: intel-lpss
        Kernel driver in use: intel-lpss
        Kernel driver in use: intel-lpss
        Kernel driver in use: intel-lpss
        Kernel driver in use: intel-lpss
        Kernel driver in use: intel-lpss
        Kernel driver in use: intel-lpss
        Kernel driver in use: intel-lpss
        Kernel driver in use: intel-lpss
        Kernel driver in use: intel-lpss
        Kernel driver in use: intel-lpss
        Kernel driver in use: intel-lpss
        Kernel driver in use: intel-lpss
        Kernel driver in use: lpc_ich
        Kernel driver in use: r8169
root@gfx-desktop:/home/gfx# cat /sys/kernel/debug/dri/0/i915_dmc_info
fw loaded: yes
path: i915/glk_dmc_ver1_04.bin
version: 1.4
program base: 0x0a004040
ssp base: 0x00003fc0
htp: 0x00e00088
root@gfx-desktop:/home/gfx# cat /sys/kernel/debug/dri/0/i915_guc_load_status                                                                                  GuC firmware status:
        path: i915/glk_guc_ver10_56.bin
        fetch: SUCCESS
        load: SUCCESS
        version wanted: 10.56
        version found: 10.56
        header: offset is 0; size = 128
        uCode: offset is 128; size = 145472
        RSA: offset is 145600; size = 256

GuC status 0x800330ec:
        Bootrom status = 0x76
        uKernel status = 0x30
        MIA Core status = 0x3

Scratch registers:
         0:     0xf0000000
         1:     0x0
         2:     0x0
         3:     0x5f5e100
         4:     0x600
         5:     0xd5fd3
         6:     0x0
         7:     0x8
         8:     0x3
         9:     0x74240
        10:     0x0
        11:     0x0
        12:     0x0
        13:     0x0
        14:     0x0
        15:     0x0
root@gfx-desktop:/home/gfx# cat /sys/kernel/debug/dri/0/i915_huc_load_status                                                                                  HuC firmware status:
        path: i915/glk_huc_ver02_00_1748.bin
        fetch: SUCCESS
        load: SUCCESS
        version wanted: 2.0
        version found: 2.0
        header: offset is 0; size = 128
        uCode: offset is 128; size = 218304
        RSA: offset is 218432; size = 256

HuC status 0x00006080:
root@gfx-desktop:/home/gfx# cat /home/gfx/.local/share/xorg/Xorg.0.log | grep -i sna
root@gfx-desktop:/home/gfx#. . .
Comment 14 Mark Janes 2018-02-09 00:19:07 UTC
Hector: do not test with the guc installed.
Comment 15 Hector Velazquez 2018-02-09 14:59:56 UTC
Hi Mark, i tested with the same settings of the comment 13 without Guc and Huc, the games work as expected...
Comment 16 Elizabeth 2018-02-09 15:25:25 UTC
According to comment 13 and comment 15 the issue is resolved with latest mesa tested, in this case 18.1.0-devel. And games are working correctly with and without guc. So I believe this case can be closed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.