Created attachment 126840 [details] Ouptut of /sys/class/drm/card0/error To reproduce: 1. Start up Steam 2. Play Team Fortress 2 3. Start a practice run with some bots 4. Game stutters and eventually crashes (within the first 2 seconds). $ uname -a Linux desktop 4.7.5-040705-generic #201609240533 SMP Sat Sep 24 09:35:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.1 LTS Release: 16.04 Codename: xenial Machine: NUC Skull Canyon, Model nuc6i7kyk Connector: DisplayPort to DVI adapter. $ dmesg .... [ 195.680575] [drm] stuck on render ring [ 195.688034] [drm] GPU HANG: ecode 9:0:0xfffffffe, in MatQueue0 [4918], reason: Engine(s) hung, action: reset [ 195.688036] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 195.688037] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 195.688038] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 195.688039] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 195.688040] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 195.689701] drm/i915: Resetting chip after gpu hang [ 197.680299] [drm] RC6 on [ 205.632284] [drm] stuck on render ring [ 205.638688] [drm] GPU HANG: ecode 9:0:0x85dffffb, in MatQueue0 [4918], reason: Engine(s) hung, action: reset [ 205.640245] drm/i915: Resetting chip after gpu hang [ 206.652422] [drm] RC6 on The output of /sys/class/drm/card0/error and full dmesg output is in the attachment since I can't upload more than one file.
if you add i915.enable_rc6=0 to your command line, is gpu hang still happening?
(In reply to yann from comment #1) > if you add i915.enable_rc6=0 to your command line, is gpu hang still > happening? If so, this should be fixed with: commit d528a6a0f3fd346bd7cc2de611a4149b6ebaab41 Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Tue Apr 5 15:56:16 2016 +0300 drm/i915/skl: Fix rc6 based gpu/system hang
Yes, adding that to my command line still causes the hang to happen. $ cat /proc/cmdline BOOT_IMAGE=/vmlinuz-4.7.5-040705-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash i915.enable_rc6=0 vt.handoff=7
And the dmesg / debug output: (and uploading the new crashdump) $ dmesg .... [ 180.764441] [drm] GPU HANG: ecode 9:0:0x85dffffb, in MatQueue0 [4771], reason: Engine(s) hung, action: reset [ 180.764443] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 180.764444] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 180.764445] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 180.764445] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 180.764446] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 180.765925] drm/i915: Resetting chip after gpu hang [ 181.793871] [drm] RC6 off [ 190.793791] [drm] stuck on render ring [ 190.799477] [drm] GPU HANG: ecode 9:0:0xfefffffe, in MatQueue0 [4771], reason: Engine(s) hung, action: reset [ 190.801365] drm/i915: Resetting chip after gpu hang [ 192.793828] [drm] RC6 off
Created attachment 126873 [details] Second crash dump
(In reply to snrub from comment #5) > Created attachment 126873 [details] > Second crash dump thanks for your quick feedback. So it looks like we have a different issue from the original one. For the 1st one either disabling rc6 or ensuring that you are updating to a kernel that has the commit You may also try to update your mesa version if this is not already the case, collect and attach logs collected thanks to apitrace: http://apitrace.github.io/. Regarding the last one, reassigning to Mesa (please let me know if I am mistaken with this GPU Hang). Kernel: 4.7.5-040705-generic Platform: Skylake NUC Skull Canyon, Model nuc6i7kyk (pci id: 0x193b) Mesa: [Please confirm your mesa version] From this error dump, hung is happening in render ring batch with active head at 0xd4bb4594, with 0x7a000004 (PIPE_CONTROL) as IPEHR. Batch extract (around 0xd4bb4594): 0xd4bb4548: 0x7b000005: 3DPRIMITIVE: fail sequential 0xd4bb454c: 0x00000104: vertex count 0xd4bb4550: 0x0000000c: start vertex 0xd4bb4554: 0x00001d3a: instance count 0xd4bb4558: 0x00000001: start instance 0xd4bb455c: 0x00000000: index bias 0xd4bb4560: 0x00000000: MI_NOOP Bad count in PIPE_CONTROL 0xd4bb4564: 0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush 0xd4bb4568: 0x0000a000: destination address 0xd4bb456c: 0xddc6a008: immediate dword low 0xd4bb4570: 0x00000000: immediate dword high Bad count in PIPE_CONTROL 0xd4bb457c: 0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush 0xd4bb4580: 0x00101001: destination address 0xd4bb4584: 0x00000000: immediate dword low 0xd4bb4588: 0x00000000: immediate dword high Bad count in PIPE_CONTROL 0xd4bb4594: 0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush 0xd4bb4598: 0x00000408: destination address 0xd4bb459c: 0x00000000: immediate dword low 0xd4bb45a0: 0x00000000: immediate dword high 0xd4bb45ac: 0x78210000: 3D UNKNOWN: 3d_965 opcode = 0x7821 0xd4bb45b0: 0x00006680: MI_NOOP 0xd4bb45b4: 0x78240000: 3D UNKNOWN: 3d_965 opcode = 0x7824
Thanks for the pointer yann. I updated Mesa from 11.2.2 to 12.1.0-devel (using ppa:oibaf/graphics-drivers ). Now it works fine!
May I know how does Mesa cause gpu hang?
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.