Summary: | [SKL] GPU HANG: ecode 9:0:0x85dfbfff, in firefox [881], reason: Ring hung, action: reset | ||
---|---|---|---|
Product: | Mesa | Reporter: | Yuval Adam <yuv.adm> |
Component: | Drivers/DRI/i965 | Assignee: | Ian Romanick <idr> |
Status: | RESOLVED INVALID | QA Contact: | |
Severity: | major | ||
Priority: | medium | CC: | ben, bugs.freedesktop.org, intel-gfx-bugs, jamundso, john.stultz, nell |
Version: | 11.1 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | SKL | i915 features: | GPU hang |
Attachments: | bzip'd /sys/class/drm/card0/error |
Currently I'm using this workaround which disables most of the hardware acceleration $ cat /etc/X11/xorg.conf.d/20-intel.conf Section "Device" Identifier "Intel Graphics" Driver "intel" Option "DRI" "false" EndSection Booting with kernel param i915.enable_rc6=0 seems to workaround this bug. Some relevant details from my current Xorg.0.log (with RC6 DISABLED! - I'm currently booted with the workaround mentioned earlier): $ cat ~/.local/share/xorg/Xorg.0.log X.Org X Server 1.18.0 Release Date: 2015-11-09 [ 19.987] X Protocol Version 11, Revision 0 [ 19.987] Build Operating System: Linux 4.2.5-1-ARCH x86_64 [ 19.987] Current Operating System: Linux hostname 4.4.1-2-ARCH #1 SMP PREEMPT Wed Feb 3 13:12:33 UTC 2016 x86_64 [ 19.987] Kernel command line: initrd=xxx root=xxx i915.enable_rc6=0 [ 19.988] Build Date: 08 January 2016 05:56:16PM [ 19.988] [ 19.988] Current version of pixman: 0.34.0 [ 19.988] Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. [ 19.988] Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. [ 19.988] (==) Log file: "~/.local/share/xorg/Xorg.0.log", Time: Mon Feb 8 23:11:38 2016 [ 19.989] (==) Using config directory: "/etc/X11/xorg.conf.d" [ 19.989] (==) Using system config directory "/usr/share/X11/xorg.conf.d" [ 19.990] (==) No Layout section. Using the first Screen section. [ 19.990] (==) No screen section available. Using defaults. [ 19.991] (**) |-->Screen "Default Screen Section" (0) [ 19.991] (**) | |-->Monitor "<default monitor>" [ 19.991] (==) No device specified for screen "Default Screen Section". Using the first device section listed. [ 19.991] (**) | |-->Device "Intel Graphics" [ 19.991] (==) No monitor specified for screen "Default Screen Section". Using a default monitor configuration. [ 19.991] (==) Automatically adding devices [ 19.991] (==) Automatically enabling devices [ 19.991] (==) Automatically adding GPU devices [ 19.991] (==) Max clients allowed: 256, resource mask: 0x1fffff [ 19.993] (==) FontPath set to: /usr/share/fonts/misc/, /usr/share/fonts/TTF/, /usr/share/fonts/OTF/, /usr/share/fonts/Type1/, /usr/share/fonts/100dpi/, /usr/share/fonts/75dpi/ [ 19.993] (==) ModulePath set to "/usr/lib/xorg/modules" [ 19.993] (II) The server relies on udev to provide the list of input devices. If no devices become available, reconfigure udev or disable AutoAddDevices. [ 19.993] (II) Loader magic: 0x819d40 [ 19.993] (II) Module ABI versions: [ 19.993] X.Org ANSI C Emulation: 0.4 [ 19.993] X.Org Video Driver: 20.0 [ 19.993] X.Org XInput driver : 22.1 [ 19.993] X.Org Server Extension : 9.0 [ 19.994] (++) using VT number 1 [ 19.994] (--) controlling tty is VT number 1, auto-enabling KeepTty [ 19.995] (II) systemd-logind: took control of session /org/freedesktop/login1/session/c1 [ 19.995] (II) xfree86: Adding drm device (/dev/dri/card0) [ 19.996] (II) systemd-logind: got fd for /dev/dri/card0 226:0 fd 8 paused 0 [ 20.218] (--) PCI:*(0:0:2:0) 8086:1926:8086:2063 rev 10, Mem @ 0xde000000/16777216, 0xc0000000/268435456, I/O @ 0x0000f000/64 [ 20.218] (WW) Open ACPI failed (/var/run/acpid.socket) (No such file or directory) [ 20.218] (II) LoadModule: "glx" [ 20.220] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so [ 20.237] (II) Module glx: vendor="X.Org Foundation" [ 20.238] compiled for 1.18.0, module version = 1.0.0 [ 20.238] ABI class: X.Org Server Extension, version 9.0 [ 20.238] (==) AIGLX enabled [ 20.238] (II) LoadModule: "intel" [ 20.238] (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so [ 20.242] (II) Module intel: vendor="X.Org Foundation" [ 20.242] compiled for 1.18.0, module version = 2.99.917 [ 20.242] Module class: X.Org Video Driver [ 20.242] ABI class: X.Org Video Driver, version 20.0 [ 20.242] (II) intel: Driver for Intel(R) Integrated Graphics Chipsets: i810, i810-dc100, i810e, i815, i830M, 845G, 854, 852GM/855GM, 865G, 915G, E7221 (i915), 915GM, 945G, 945GM, 945GME, Pineview GM, Pineview G, 965G, G35, 965Q, 946GZ, 965GM, 965GME/GLE, G33, Q35, Q33, GM45, 4 Series, G45/G43, Q45/Q43, G41, B43 [ 20.243] (II) intel: Driver for Intel(R) HD Graphics: 2000-6000 [ 20.243] (II) intel: Driver for Intel(R) Iris(TM) Graphics: 5100, 6100 [ 20.243] (II) intel: Driver for Intel(R) Iris(TM) Pro Graphics: 5200, 6200, P6300 [ 20.243] xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted) [ 20.244] (II) intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20151010 [ 20.244] (II) intel(0): SNA compiled from 2.99.917-519-g8229390 [ 20.246] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support [ 20.247] (--) intel(0): gen9 engineering sample [ 20.247] (--) intel(0): CPU: x86-64, sse2, sse3, ssse3, sse4.1, sse4.2, avx, avx2; using a maximum of 2 threads [ 20.247] (II) intel(0): Creating default Display subsection in Screen section "Default Screen Section" for depth/fbbpp 24/32 [ 20.247] (==) intel(0): Depth 24, (--) framebuffer bpp 32 [ 20.247] (==) intel(0): RGB weight 888 [ 20.248] (==) intel(0): Default visual is TrueColor [ 20.248] (**) intel(0): Option "TearFree" "true" [ 20.249] (II) intel(0): Output HDMI1 has no monitor section [ 20.249] (II) intel(0): Enabled output HDMI1 [ 20.249] (II) intel(0): Output DP1 has no monitor section [ 20.249] (II) intel(0): Enabled output DP1 [ 20.249] (II) intel(0): Output HDMI2 has no monitor section [ 20.249] (II) intel(0): Enabled output HDMI2 [ 20.249] (--) intel(0): Using a maximum size of 256x256 for hardware cursors [ 20.249] (II) intel(0): Output VIRTUAL1 has no monitor section [ 20.249] (II) intel(0): Enabled output VIRTUAL1 [ 20.249] (--) intel(0): Output HDMI1 using initial mode 2560x1440 on pipe 0 [ 20.249] (**) intel(0): TearFree enabled [ 20.249] (==) intel(0): DPI set to (96, 96) [ 20.250] (II) Loading sub module "dri2" [ 20.250] (II) LoadModule: "dri2" [ 20.250] (II) Module "dri2" already built-in [ 20.250] (II) Loading sub module "present" [ 20.250] (II) LoadModule: "present" [ 20.250] (II) Module "present" already built-in [ 20.250] (==) Depth 24 pixmap format is 32 bpp [ 20.253] (II) intel(0): SNA initialized with generic backend [ 20.253] (==) intel(0): Backing store enabled [ 20.253] (==) intel(0): Silken mouse enabled [ 20.253] (II) intel(0): HW Cursor enabled [ 20.253] (II) intel(0): RandR 1.2 enabled, ignore the following RandR disabled message. [ 20.255] (==) intel(0): DPMS enabled [ 20.255] (==) intel(0): Display hotplug detection enabled [ 20.255] (II) intel(0): Textured video not supported on this hardware or backend [ 20.256] (II) intel(0): [DRI2] Setup complete [ 20.256] (II) intel(0): [DRI2] DRI driver: i965 [ 20.256] (II) intel(0): [DRI2] VDPAU driver: va_gl [ 20.256] (II) intel(0): direct rendering: DRI2 enabled [ 20.256] (II) intel(0): hardware support for Present enabled [ 20.256] (--) RandR disabled [ 20.330] (II) AIGLX: enabled GLX_MESA_copy_sub_buffer [ 20.330] (II) AIGLX: enabled GLX_ARB_create_context [ 20.330] (II) AIGLX: enabled GLX_ARB_create_context_profile [ 20.330] (II) AIGLX: enabled GLX_EXT_create_context_es2_profile [ 20.330] (II) AIGLX: enabled GLX_INTEL_swap_event [ 20.330] (II) AIGLX: enabled GLX_SGI_swap_control and GLX_MESA_swap_control [ 20.330] (II) AIGLX: enabled GLX_EXT_framebuffer_sRGB [ 20.330] (II) AIGLX: enabled GLX_ARB_fbconfig_float [ 20.330] (II) AIGLX: GLX_EXT_texture_from_pixmap backed by buffer objects [ 20.330] (II) AIGLX: enabled GLX_ARB_create_context_robustness [ 20.331] (II) AIGLX: Loaded and initialized i965 [ 20.331] (II) GLX: Initialized DRI2 GL provider for screen 0 [ 20.338] (II) intel(0): switch to mode 2560x1440@60.0 on HDMI1 using pipe 0, position (0, 0), rotation normal, reflection none [ 20.338] (II) intel(0): Setting screen physical size to 677 x 381 [ 20.378] (II) config/udev: Adding input device Power Button (/dev/input/event2) [ 20.379] (**) Power Button: Applying InputClass "evdev keyboard catchall" [ 20.379] (**) Power Button: Applying InputClass "libinput keyboard catcha Also attaching my current mesa version: $ pacman -Q mesa mesa 11.1.2-1 INSTDONE says VS is hung, but the EUs are all done. Would be nice if we had that patch I wrote to dump all the INSTDONE bits for multislice parts, as this is a GT3.... Imre? The batch contents leading up to the hang looks okay to me. It's not even doing anything complicated, just a single rectangle (rectlist). Same issue for me with exactly the same device on a linux kernel 4.4.1. The workaround works well doing i915.enable_rc6=0 on kernel param do the job. The xorg.conf is not needed. I'm see the same thing when trying run chrome (with "HW acceleration enabled" and openarena: [265964.933851] [drm] stuck on render ring [265964.934358] [drm] GPU HANG: ecode 9:0:0x85dfbfff, in ioquake3 [13145], reason: Ring hung, action: reset [265964.936862] drm/i915: Resetting chip after gpu hang [265966.933810] [drm] RC6 on [265978.932434] [drm] stuck on render ring [265978.933048] [drm] GPU HANG: ecode 9:0:0x85dfbfff, in ioquake3 [13145], reason: Ring hung, action: reset [265978.935527] drm/i915: Resetting chip after gpu hang [265980.920236] [drm] RC6 on [265986.931547] [drm] stuck on render ring [265986.932296] [drm] GPU HANG: ecode 9:0:0x85dfffff, in ioquake3 [13145], reason: Ring hung, action: reset [265986.934802] drm/i915: Resetting chip after gpu hang [265988.919528] [drm] RC6 on I have tried with RC6 off but ISTR that also failing (but could have been slightly different failure. I to report back with next updated. Also happens on a stock Ubuntu 16.04 running E17: kernel: [drm] GPU HANG: ecode 6:0:0xb6c81bfc, in enlightenment [2513], reason: Ring hung, action: reset Steps to reproduce: 1) Suspend machine 2) Wake up again 3) X11 crashes and takes E17 with it For further details cf. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1585765 There were workarounds on SKL pushed in kernel, so please re-test with latest kernel to see if it has some benefits on that work. In parallel, assigning to Mesa product (please let me know if I am mistaken with this GPU Hang). From this error dump, hung is happening in render ring batch with active head at 0xfff5d9cc, with 0x7a000004 (PIPE_CONTROL) as IPEHR. Batch extract (around 0xfff5d9cc): Bad length 7 in (null), expected 6-6 0xfff5d980: 0x7b000005: 3DPRIMITIVE: fail sequential 0xfff5d984: 0x0000000f: vertex count 0xfff5d988: 0x00000003: start vertex 0xfff5d98c: 0x00000000: instance count 0xfff5d990: 0x00000001: start instance 0xfff5d994: 0x00000000: index bias 0xfff5d998: 0x00000000: MI_NOOP Bad count in PIPE_CONTROL 0xfff5d99c: 0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush 0xfff5d9a0: 0x00000000: destination address 0xfff5d9a4: 0x00000000: immediate dword low 0xfff5d9a8: 0x00000000: immediate dword high Bad count in PIPE_CONTROL 0xfff5d9b4: 0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush 0xfff5d9b8: 0x00101c11: destination address 0xfff5d9bc: 0x00000000: immediate dword low 0xfff5d9c0: 0x00000000: immediate dword high Bad count in PIPE_CONTROL 0xfff5d9cc: 0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush 0xfff5d9d0: 0x00000000: destination address 0xfff5d9d4: 0x00000000: immediate dword low 0xfff5d9d8: 0x00000000: immediate dword high Please test a new version of Mesa (12 or 13) and mark as REOPENED if you can reproduce and RESOLVED/* if you cannot reproduce. Dear Reporter, This Mesa bug has been in the "NEEDINFO" status for over 60 days. I am closing this bug based on lack of response but feel free to reopen if resolution is still needed. Please ensure you're supplying the correct information as requested. Thank you. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 121518 [details] bzip'd /sys/class/drm/card0/error On a Skylake system (i5-6260U, Intel NUC6i5SYH) I'm getting multiple GPU crashes up to the point of complete system freeze. This reproduces easily when opening a browser with WebGL-intensive processes, and happens within 10-20 seconds. Running x86_64 Arch Linux, kernel 4.4.1. Tested with 4.5-rc2 and same problem occurs there as well. Error log attached.