Summary: | [i915] GPU HANG: ecode 9:0:0xfffffffe (Team Fortress 2) | ||
---|---|---|---|
Product: | Mesa | Reporter: | snrub |
Component: | Drivers/DRI/i915 | Assignee: | Ian Romanick <idr> |
Status: | CLOSED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | intel-gfx-bugs |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | SKL | i915 features: | GPU hang |
Attachments: |
Ouptut of /sys/class/drm/card0/error
Second crash dump |
if you add i915.enable_rc6=0 to your command line, is gpu hang still happening? (In reply to yann from comment #1) > if you add i915.enable_rc6=0 to your command line, is gpu hang still > happening? If so, this should be fixed with: commit d528a6a0f3fd346bd7cc2de611a4149b6ebaab41 Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Tue Apr 5 15:56:16 2016 +0300 drm/i915/skl: Fix rc6 based gpu/system hang Yes, adding that to my command line still causes the hang to happen. $ cat /proc/cmdline BOOT_IMAGE=/vmlinuz-4.7.5-040705-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash i915.enable_rc6=0 vt.handoff=7 And the dmesg / debug output: (and uploading the new crashdump) $ dmesg .... [ 180.764441] [drm] GPU HANG: ecode 9:0:0x85dffffb, in MatQueue0 [4771], reason: Engine(s) hung, action: reset [ 180.764443] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 180.764444] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 180.764445] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 180.764445] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 180.764446] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 180.765925] drm/i915: Resetting chip after gpu hang [ 181.793871] [drm] RC6 off [ 190.793791] [drm] stuck on render ring [ 190.799477] [drm] GPU HANG: ecode 9:0:0xfefffffe, in MatQueue0 [4771], reason: Engine(s) hung, action: reset [ 190.801365] drm/i915: Resetting chip after gpu hang [ 192.793828] [drm] RC6 off Created attachment 126873 [details]
Second crash dump
(In reply to snrub from comment #5) > Created attachment 126873 [details] > Second crash dump thanks for your quick feedback. So it looks like we have a different issue from the original one. For the 1st one either disabling rc6 or ensuring that you are updating to a kernel that has the commit You may also try to update your mesa version if this is not already the case, collect and attach logs collected thanks to apitrace: http://apitrace.github.io/. Regarding the last one, reassigning to Mesa (please let me know if I am mistaken with this GPU Hang). Kernel: 4.7.5-040705-generic Platform: Skylake NUC Skull Canyon, Model nuc6i7kyk (pci id: 0x193b) Mesa: [Please confirm your mesa version] From this error dump, hung is happening in render ring batch with active head at 0xd4bb4594, with 0x7a000004 (PIPE_CONTROL) as IPEHR. Batch extract (around 0xd4bb4594): 0xd4bb4548: 0x7b000005: 3DPRIMITIVE: fail sequential 0xd4bb454c: 0x00000104: vertex count 0xd4bb4550: 0x0000000c: start vertex 0xd4bb4554: 0x00001d3a: instance count 0xd4bb4558: 0x00000001: start instance 0xd4bb455c: 0x00000000: index bias 0xd4bb4560: 0x00000000: MI_NOOP Bad count in PIPE_CONTROL 0xd4bb4564: 0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush 0xd4bb4568: 0x0000a000: destination address 0xd4bb456c: 0xddc6a008: immediate dword low 0xd4bb4570: 0x00000000: immediate dword high Bad count in PIPE_CONTROL 0xd4bb457c: 0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush 0xd4bb4580: 0x00101001: destination address 0xd4bb4584: 0x00000000: immediate dword low 0xd4bb4588: 0x00000000: immediate dword high Bad count in PIPE_CONTROL 0xd4bb4594: 0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush 0xd4bb4598: 0x00000408: destination address 0xd4bb459c: 0x00000000: immediate dword low 0xd4bb45a0: 0x00000000: immediate dword high 0xd4bb45ac: 0x78210000: 3D UNKNOWN: 3d_965 opcode = 0x7821 0xd4bb45b0: 0x00006680: MI_NOOP 0xd4bb45b4: 0x78240000: 3D UNKNOWN: 3d_965 opcode = 0x7824 Thanks for the pointer yann. I updated Mesa from 11.2.2 to 12.1.0-devel (using ppa:oibaf/graphics-drivers ). Now it works fine! May I know how does Mesa cause gpu hang? |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 126840 [details] Ouptut of /sys/class/drm/card0/error To reproduce: 1. Start up Steam 2. Play Team Fortress 2 3. Start a practice run with some bots 4. Game stutters and eventually crashes (within the first 2 seconds). $ uname -a Linux desktop 4.7.5-040705-generic #201609240533 SMP Sat Sep 24 09:35:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.1 LTS Release: 16.04 Codename: xenial Machine: NUC Skull Canyon, Model nuc6i7kyk Connector: DisplayPort to DVI adapter. $ dmesg .... [ 195.680575] [drm] stuck on render ring [ 195.688034] [drm] GPU HANG: ecode 9:0:0xfffffffe, in MatQueue0 [4918], reason: Engine(s) hung, action: reset [ 195.688036] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 195.688037] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 195.688038] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 195.688039] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 195.688040] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 195.689701] drm/i915: Resetting chip after gpu hang [ 197.680299] [drm] RC6 on [ 205.632284] [drm] stuck on render ring [ 205.638688] [drm] GPU HANG: ecode 9:0:0x85dffffb, in MatQueue0 [4918], reason: Engine(s) hung, action: reset [ 205.640245] drm/i915: Resetting chip after gpu hang [ 206.652422] [drm] RC6 on The output of /sys/class/drm/card0/error and full dmesg output is in the attachment since I can't upload more than one file.