Bug 96175

Summary:

[SKL GT4e] 3D game nexuiz 1.6.1 causes GPU HANG

Product:

DRI

Reporter:

binx.wu

Component:

DRM/Intel

Assignee:

mwa <matthew.auld>

Status:

CLOSED FIXED

QA Contact:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Severity:

critical

Priority:

highest

CC:

gordon.jin, intel-gfx-bugs, knikkane, terrence.xu

Version:

DRI git

Hardware:

x86-64 (AMD64)

OS:

Linux (All)

Whiteboard:

i915 platform:

SKL

i915 features:

GPU hang

Attachments:

Description	Flags
intel_reg dump	none
/sys/class/drm/card0/error	none
dmesg with drm.debug=0x0e	none
/var/log/kern.log file	none
0713-1.log	none
0713-2.log	none
0713-3.log	none
dmesg-with-guc-0718.log	none

Description binx.wu 2016-05-25 06:07:51 UTC

Created attachment 124069 [details]
intel_reg dump

kernel: 4.6.0+
  source: git://anongit.freedesktop.org/drm-intel
  branch: drm-intel-nightly
  commit: 8621fb5af862648269427c20fff64f0a3a3bc406
    drm-intel-nightly: 2016y-05m-23d-18h-18m-33s UTC integration manifest

Linux distribution: 
  NAME="Ubuntu"
  VERSION="16.04 LTS (Xenial Xerus)"
  ID=ubuntu
  ID_LIKE=debian
  UBUNTU_CODENAME=xenial

Machine information: 
  OpenGL renderer string: Mesa DRI Intel(R) Iris Pro Graphics P580 (Skylake GT4e)
  VGA compatible controller: Intel Corporation Device 193a (rev 09)

Display connector:
  DP2 connected primary 1920x1080+0+0

Reproduce steps:
  Get nexuiz 1.6.1 from Phoronix Test Suite
  using command:
    vblank_mode=0 ./nexuiz-linux-glx.sh +exec effects-high.cfg -nohome -benchmark demos/demo2 +r_glsl 1 +vid_width 1920 +vid_height 1080 +r_hdr
    ./nexuiz-linux-glx.sh +exec effects-high.cfg -nohome -benchmark demos/demo2 +r_glsl 1 +vid_width 1920 +vid_height 1080 +r_hdr

Result:
  Application report: intel_do_flush_locked failed: Input/output error

dmesg:
[  121.703090] [drm] stuck on render ring
[  121.703292] [drm] GPU HANG: ecode 9:0:0x85dffffb, in nexuiz-linux-x8 [3368], reason: Engine(s) hung, action: reset
[  121.703293] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  121.703294] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  121.703295] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  121.703296] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  121.703297] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  121.705090] drm/i915: Resetting chip after gpu hang
[  121.705174] [drm] GuC firmware load failed: -5
[  122.715374] [drm] RC6 on
[  131.703617] [drm] stuck on render ring
[  131.703834] [drm] GPU HANG: ecode 9:0:0xfffffffe, in nexuiz-linux-x8 [3368], reason: Engine(s) hung, action: reset
[  131.705463] drm/i915: Resetting chip after gpu hang
[  131.705585] [drm] GuC firmware load failed: -5
[  132.711726] [drm] RC6 on

/var/log/Xorg.0.log:
[     3.826] 
X.Org X Server 1.18.3
Release Date: 2016-04-04
[     3.826] X Protocol Version 11, Revision 0
[     3.826] Build Operating System: Linux 3.13.0-85-generic x86_64 Ubuntu
[     3.826] Current Operating System: Linux igvtperf-efi 4.6.0+ #1 SMP Wed May 25 09:21:42 CST 2016 x86_64
[     3.826] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.6.0+ root=UUID=53f3b514-695b-493c-a612-e85f3146c1d5 ro quiet splash consoleblank=0 net.ifnames=0 biosdevname=0 vt.handoff=7
[     3.826] Build Date: 07 April 2016  09:18:50AM
[     3.826] xorg-server 2:1.18.3-1ubuntu2 (For technical support please see http://www.ubuntu.com/support) 
[     3.826] Current version of pixman: 0.33.6
[     3.826] 	Before reporting problems, check http://wiki.x.org
	to make sure that you have the latest version.
[     3.826] Markers: (--) probed, (**) from config file, (==) default setting,
	(++) from command line, (!!) notice, (II) informational,
	(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[     3.826] (==) Log file: "/var/log/Xorg.0.log", Time: Wed May 25 13:17:27 2016
[     3.826] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[     3.827] (==) No Layout section.  Using the first Screen section.
[     3.827] (==) No screen section available. Using defaults.
[     3.827] (**) |-->Screen "Default Screen Section" (0)
[     3.827] (**) |   |-->Monitor "<default monitor>"
[     3.828] (==) No monitor specified for screen "Default Screen Section".
	Using a default monitor configuration.
[     3.828] (==) Automatically adding devices
[     3.828] (==) Automatically enabling devices
[     3.828] (==) Automatically adding GPU devices
[     3.828] (==) Max clients allowed: 256, resource mask: 0x1fffff
[     3.828] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
[     3.828] 	Entry deleted from font path.
[     3.828] (WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist.
[     3.828] 	Entry deleted from font path.
[     3.828] (WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist.
[     3.828] 	Entry deleted from font path.
[     3.828] (WW) The directory "/usr/share/fonts/X11/100dpi" does not exist.
[     3.828] 	Entry deleted from font path.
[     3.828] (WW) The directory "/usr/share/fonts/X11/75dpi" does not exist.
[     3.828] 	Entry deleted from font path.
[     3.828] (==) FontPath set to:
	/usr/share/fonts/X11/misc,
	/usr/share/fonts/X11/Type1,
	built-ins
[     3.828] (==) ModulePath set to "/usr/lib/x86_64-linux-gnu/xorg/extra-modules,/usr/lib/xorg/extra-modules,/usr/lib/xorg/modules"
[     3.828] (II) The server relies on udev to provide the list of input devices.
	If no devices become available, reconfigure udev or disable AutoAddDevices.
[     3.828] (II) Loader magic: 0x560202287da0
[     3.828] (II) Module ABI versions:
[     3.828] 	X.Org ANSI C Emulation: 0.4
[     3.828] 	X.Org Video Driver: 20.0
[     3.828] 	X.Org XInput driver : 22.1
[     3.828] 	X.Org Server Extension : 9.0
[     3.828] (++) using VT number 7

[     3.828] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration
[     3.829] (II) xfree86: Adding drm device (/dev/dri/card0)
[     3.864] (--) PCI:*(0:0:2:0) 8086:193a:8086:2212 rev 9, Mem @ 0xc0000000/16777216, 0xa0000000/536870912, I/O @ 0x00003000/64, BIOS @ 0x????????/131072
[     3.864] (II) LoadModule: "glx"
[     3.864] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[     3.871] (II) Module glx: vendor="X.Org Foundation"
[     3.871] 	compiled for 1.18.3, module version = 1.0.0
[     3.871] 	ABI class: X.Org Server Extension, version 9.0
[     3.871] (==) AIGLX enabled
[     3.871] (==) Matched intel as autoconfigured driver 0
[     3.871] (==) Matched intel as autoconfigured driver 1
[     3.871] (==) Matched modesetting as autoconfigured driver 2
[     3.871] (==) Matched fbdev as autoconfigured driver 3
[     3.871] (==) Matched vesa as autoconfigured driver 4
[     3.871] (==) Assigned the driver to the xf86ConfigLayout
[     3.871] (II) LoadModule: "intel"
[     3.871] (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so
[     3.874] (II) Module intel: vendor="X.Org Foundation"
[     3.874] 	compiled for 1.18.1, module version = 2.99.917
[     3.874] 	Module class: X.Org Video Driver
[     3.874] 	ABI class: X.Org Video Driver, version 20.0
[     3.874] (II) LoadModule: "modesetting"
[     3.874] (II) Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so
[     3.874] (II) Module modesetting: vendor="X.Org Foundation"
[     3.874] 	compiled for 1.18.3, module version = 1.18.3
[     3.874] 	Module class: X.Org Video Driver
[     3.874] 	ABI class: X.Org Video Driver, version 20.0
[     3.874] (II) LoadModule: "fbdev"
[     3.874] (II) Loading /usr/lib/xorg/modules/drivers/fbdev_drv.so
[     3.875] (II) Module fbdev: vendor="X.Org Foundation"
[     3.875] 	compiled for 1.18.1, module version = 0.4.4
[     3.875] 	Module class: X.Org Video Driver
[     3.875] 	ABI class: X.Org Video Driver, version 20.0
[     3.875] (II) LoadModule: "vesa"
[     3.875] (II) Loading /usr/lib/xorg/modules/drivers/vesa_drv.so
[     3.875] (II) Module vesa: vendor="X.Org Foundation"
[     3.875] 	compiled for 1.18.1, module version = 2.3.4
[     3.875] 	Module class: X.Org Video Driver
[     3.875] 	ABI class: X.Org Video Driver, version 20.0
[     3.875] (II) intel: Driver for Intel(R) Integrated Graphics Chipsets:
	i810, i810-dc100, i810e, i815, i830M, 845G, 854, 852GM/855GM, 865G,
	915G, E7221 (i915), 915GM, 945G, 945GM, 945GME, Pineview GM,
	Pineview G, 965G, G35, 965Q, 946GZ, 965GM, 965GME/GLE, G33, Q35, Q33,
	GM45, 4 Series, G45/G43, Q45/Q43, G41, B43
[     3.875] (II) intel: Driver for Intel(R) HD Graphics: 2000-6000
[     3.875] (II) intel: Driver for Intel(R) Iris(TM) Graphics: 5100, 6100
[     3.875] (II) intel: Driver for Intel(R) Iris(TM) Pro Graphics: 5200, 6200, P6300
[     3.875] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[     3.875] (II) FBDEV: driver for framebuffer: fbdev
[     3.875] (II) VESA: driver for VESA chipsets: vesa
[     3.883] (II) intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20160522
[     3.883] (II) intel(0): SNA compiled: xserver-xorg-video-intel 2:2.99.917+git20160325-1ubuntu1 (Timo Aaltonen <tjaalton@debian.org>)
[     3.883] (II) intel(0): SNA compiled for use with valgrind
[     3.884] (WW) Falling back to old probe method for modesetting
[     3.884] (WW) Falling back to old probe method for fbdev
[     3.884] (II) Loading sub module "fbdevhw"
[     3.884] (II) LoadModule: "fbdevhw"
[     3.884] (II) Loading /usr/lib/xorg/modules/libfbdevhw.so
[     3.885] (II) Module fbdevhw: vendor="X.Org Foundation"
[     3.885] 	compiled for 1.18.3, module version = 0.0.2
[     3.885] 	ABI class: X.Org Video Driver, version 20.0
[     3.885] (WW) Falling back to old probe method for vesa
[     3.885] (--) intel(0): gen9 engineering sample
[     3.885] (--) intel(0): CPU: x86-64, sse2, sse3, ssse3, sse4.1, sse4.2, avx, avx2; using a maximum of 4 threads
[     3.885] (II) intel(0): Creating default Display subsection in Screen section
	"Default Screen Section" for depth/fbbpp 24/32
[     3.885] (==) intel(0): Depth 24, (--) framebuffer bpp 32
[     3.885] (==) intel(0): RGB weight 888
[     3.885] (==) intel(0): Default visual is TrueColor
[     3.886] (II) intel(0): Output DP1 has no monitor section
[     3.886] (II) intel(0): Enabled output DP1
[     3.886] (II) intel(0): Output HDMI1 has no monitor section
[     3.886] (II) intel(0): Enabled output HDMI1
[     3.886] (II) intel(0): Output HDMI2 has no monitor section
[     3.886] (II) intel(0): Enabled output HDMI2
[     3.886] (II) intel(0): Output DP2 has no monitor section
[     3.886] (II) intel(0): Enabled output DP2
[     3.886] (II) intel(0): Output HDMI3 has no monitor section
[     3.886] (II) intel(0): Enabled output HDMI3
[     3.886] (--) intel(0): Using a maximum size of 256x256 for hardware cursors
[     3.886] (II) intel(0): Output VIRTUAL1 has no monitor section
[     3.886] (II) intel(0): Enabled output VIRTUAL1
[     3.886] (--) intel(0): Output DP2 using initial mode 1920x1080 on pipe 0
[     3.886] (==) intel(0): TearFree disabled
[     3.886] (==) intel(0): DPI set to (96, 96)
[     3.886] (II) Loading sub module "dri2"
[     3.886] (II) LoadModule: "dri2"
[     3.886] (II) Module "dri2" already built-in
[     3.886] (II) Loading sub module "present"
[     3.886] (II) LoadModule: "present"
[     3.886] (II) Module "present" already built-in
[     3.886] (II) UnloadModule: "modesetting"
[     3.886] (II) Unloading modesetting
[     3.886] (II) UnloadModule: "fbdev"
[     3.886] (II) Unloading fbdev
[     3.886] (II) UnloadSubModule: "fbdevhw"
[     3.886] (II) Unloading fbdevhw
[     3.886] (II) UnloadModule: "vesa"
[     3.886] (II) Unloading vesa
[     3.886] (==) Depth 24 pixmap format is 32 bpp
[     3.888] (II) intel(0): SNA initialized with generic backend
[     3.888] (==) intel(0): Backing store enabled
[     3.888] (==) intel(0): Silken mouse enabled
[     3.888] (II) intel(0): HW Cursor enabled
[     3.888] (II) intel(0): RandR 1.2 enabled, ignore the following RandR disabled message.
[     3.889] (==) intel(0): DPMS enabled
[     3.889] (==) intel(0): Display hotplug detection enabled
[     3.889] (II) intel(0): Textured video not supported on this hardware or backend
[     3.889] (II) intel(0): [DRI2] Setup complete
[     3.889] (II) intel(0): [DRI2]   DRI driver: i965
[     3.889] (II) intel(0): [DRI2]   VDPAU driver: va_gl
[     3.889] (II) intel(0): direct rendering: DRI2 enabled
[     3.889] (II) intel(0): hardware support for Present enabled
[     3.889] (--) RandR disabled
[     3.892] (II) SELinux: Disabled on system
[     3.908] (II) AIGLX: enabled GLX_MESA_copy_sub_buffer
[     3.908] (II) AIGLX: enabled GLX_ARB_create_context
[     3.908] (II) AIGLX: enabled GLX_ARB_create_context_profile
[     3.908] (II) AIGLX: enabled GLX_EXT_create_context_es{,2}_profile
[     3.908] (II) AIGLX: enabled GLX_INTEL_swap_event
[     3.908] (II) AIGLX: enabled GLX_SGI_swap_control and GLX_MESA_swap_control
[     3.908] (II) AIGLX: enabled GLX_EXT_framebuffer_sRGB
[     3.908] (II) AIGLX: enabled GLX_ARB_fbconfig_float
[     3.908] (II) AIGLX: enabled GLX_EXT_fbconfig_packed_float
[     3.908] (II) AIGLX: GLX_EXT_texture_from_pixmap backed by buffer objects
[     3.908] (II) AIGLX: enabled GLX_ARB_create_context_robustness
[     3.908] (II) AIGLX: Loaded and initialized i965
[     3.908] (II) GLX: Initialized DRI2 GL provider for screen 0
[     3.910] (II) intel(0): switch to mode 1920x1080@60.0 on DP2 using pipe 0, position (0, 0), rotation normal, reflection none
[     3.910] (II) intel(0): Setting screen physical size to 508 x 285
[     3.931] (II) config/udev: Adding input device Power Button (/dev/input/event3)
[     3.931] (**) Power Button: Applying InputClass "evdev keyboard catchall"
[     3.931] (II) LoadModule: "evdev"
[     3.931] (II) Loading /usr/lib/xorg/modules/input/evdev_drv.so
[     3.933] (II) Module evdev: vendor="X.Org Foundation"
[     3.933] 	compiled for 1.18.1, module version = 2.10.1
[     3.933] 	Module class: X.Org XInput Driver
[     3.933] 	ABI class: X.Org XInput driver, version 22.1
[     3.933] (II) Using input driver 'evdev' for 'Power Button'
[     3.933] (**) Power Button: always reports core events
[     3.933] (**) evdev: Power Button: Device: "/dev/input/event3"
[     3.933] (--) evdev: Power Button: Vendor 0 Product 0x1
[     3.933] (--) evdev: Power Button: Found keys
[     3.933] (II) evdev: Power Button: Configuring as keyboard
[     3.933] (**) Option "config_info" "udev:/sys/devices/LNXSYSTM:00/LNXPWRBN:00/input/input3/event3"
[     3.933] (II) XINPUT: Adding extended input device "Power Button" (type: KEYBOARD, id 6)
[     3.933] (**) Option "xkb_rules" "evdev"
[     3.933] (**) Option "xkb_model" "pc105"
[     3.933] (**) Option "xkb_layout" "us"
[     3.933] (II) config/udev: Adding input device Video Bus (/dev/input/event5)
[     3.933] (**) Video Bus: Applying InputClass "evdev keyboard catchall"
[     3.933] (II) Using input driver 'evdev' for 'Video Bus'
[     3.933] (**) Video Bus: always reports core events
[     3.933] (**) evdev: Video Bus: Device: "/dev/input/event5"
[     3.933] (--) evdev: Video Bus: Vendor 0 Product 0x6
[     3.933] (--) evdev: Video Bus: Found keys
[     3.933] (II) evdev: Video Bus: Configuring as keyboard
[     3.933] (**) Option "config_info" "udev:/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input7/event5"
[     3.933] (II) XINPUT: Adding extended input device "Video Bus" (type: KEYBOARD, id 7)
[     3.933] (**) Option "xkb_rules" "evdev"
[     3.933] (**) Option "xkb_model" "pc105"
[     3.933] (**) Option "xkb_layout" "us"
[     3.934] (II) config/udev: Adding input device Lid Switch (/dev/input/event0)
[     3.934] (II) No input driver specified, ignoring this device.
[     3.934] (II) This device may have been added with another device file.
[     3.934] (II) config/udev: Adding input device Power Button (/dev/input/event1)
[     3.934] (**) Power Button: Applying InputClass "evdev keyboard catchall"
[     3.934] (II) Using input driver 'evdev' for 'Power Button'
[     3.934] (**) Power Button: always reports core events
[     3.934] (**) evdev: Power Button: Device: "/dev/input/event1"
[     3.934] (--) evdev: Power Button: Vendor 0 Product 0x1
[     3.934] (--) evdev: Power Button: Found keys
[     3.934] (II) evdev: Power Button: Configuring as keyboard
[     3.934] (**) Option "config_info" "udev:/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input1/event1"
[     3.934] (II) XINPUT: Adding extended input device "Power Button" (type: KEYBOARD, id 8)
[     3.934] (**) Option "xkb_rules" "evdev"
[     3.934] (**) Option "xkb_model" "pc105"
[     3.934] (**) Option "xkb_layout" "us"
[     3.934] (II) config/udev: Adding input device Sleep Button (/dev/input/event2)
[     3.934] (**) Sleep Button: Applying InputClass "evdev keyboard catchall"
[     3.934] (II) Using input driver 'evdev' for 'Sleep Button'
[     3.934] (**) Sleep Button: always reports core events
[     3.934] (**) evdev: Sleep Button: Device: "/dev/input/event2"
[     3.934] (--) evdev: Sleep Button: Vendor 0 Product 0x3
[     3.934] (--) evdev: Sleep Button: Found keys
[     3.934] (II) evdev: Sleep Button: Configuring as keyboard
[     3.934] (**) Option "config_info" "udev:/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0E:00/input/input2/event2"
[     3.934] (II) XINPUT: Adding extended input device "Sleep Button" (type: KEYBOARD, id 9)
[     3.934] (**) Option "xkb_rules" "evdev"
[     3.934] (**) Option "xkb_model" "pc105"
[     3.934] (**) Option "xkb_layout" "us"
[     3.935] (II) config/udev: Adding input device Lite-On Technology Corp. HP Basic USB Keyboard (/dev/input/event6)
[     3.935] (**) Lite-On Technology Corp. HP Basic USB Keyboard: Applying InputClass "evdev keyboard catchall"
[     3.935] (II) Using input driver 'evdev' for 'Lite-On Technology Corp. HP Basic USB Keyboard'
[     3.935] (**) Lite-On Technology Corp. HP Basic USB Keyboard: always reports core events
[     3.935] (**) evdev: Lite-On Technology Corp. HP Basic USB Keyboard: Device: "/dev/input/event6"
[     3.935] (--) evdev: Lite-On Technology Corp. HP Basic USB Keyboard: Vendor 0x3f0 Product 0x325
[     3.935] (--) evdev: Lite-On Technology Corp. HP Basic USB Keyboard: Found keys
[     3.935] (II) evdev: Lite-On Technology Corp. HP Basic USB Keyboard: Configuring as keyboard
[     3.935] (**) Option "config_info" "udev:/sys/devices/pci0000:00/0000:00:14.0/usb1/1-13/1-13:1.0/0003:03F0:0325.0001/input/input8/event6"
[     3.935] (II) XINPUT: Adding extended input device "Lite-On Technology Corp. HP Basic USB Keyboard" (type: KEYBOARD, id 10)
[     3.935] (**) Option "xkb_rules" "evdev"
[     3.935] (**) Option "xkb_model" "pc105"
[     3.935] (**) Option "xkb_layout" "us"
[     3.935] (II) config/udev: Adding input device HID 413c:3010 (/dev/input/event7)
[     3.935] (**) HID 413c:3010: Applying InputClass "evdev pointer catchall"
[     3.935] (II) Using input driver 'evdev' for 'HID 413c:3010'
[     3.935] (**) HID 413c:3010: always reports core events
[     3.935] (**) evdev: HID 413c:3010: Device: "/dev/input/event7"
[     3.986] (--) evdev: HID 413c:3010: Vendor 0x413c Product 0x3010
[     3.986] (--) evdev: HID 413c:3010: Found 3 mouse buttons
[     3.986] (--) evdev: HID 413c:3010: Found scroll wheel(s)
[     3.986] (--) evdev: HID 413c:3010: Found relative axes
[     3.986] (--) evdev: HID 413c:3010: Found x and y relative axes
[     3.986] (II) evdev: HID 413c:3010: Configuring as mouse
[     3.986] (II) evdev: HID 413c:3010: Adding scrollwheel support
[     3.986] (**) evdev: HID 413c:3010: YAxisMapping: buttons 4 and 5
[     3.986] (**) evdev: HID 413c:3010: EmulateWheelButton: 4, EmulateWheelInertia: 10, EmulateWheelTimeout: 200
[     3.986] (**) Option "config_info" "udev:/sys/devices/pci0000:00/0000:00:14.0/usb1/1-14/1-14:1.0/0003:413C:3010.0002/input/input9/event7"
[     3.986] (II) XINPUT: Adding extended input device "HID 413c:3010" (type: MOUSE, id 11)
[     3.986] (II) evdev: HID 413c:3010: initialized for relative axes.
[     3.986] (**) HID 413c:3010: (accel) keeping acceleration scheme 1
[     3.986] (**) HID 413c:3010: (accel) acceleration profile 0
[     3.986] (**) HID 413c:3010: (accel) acceleration factor: 2.000
[     3.986] (**) HID 413c:3010: (accel) acceleration threshold: 4
[     3.986] (II) config/udev: Adding input device HID 413c:3010 (/dev/input/mouse0)
[     3.986] (II) No input driver specified, ignoring this device.
[     3.986] (II) This device may have been added with another device file.
[     3.986] (II) config/udev: Adding input device AT Translated Set 2 keyboard (/dev/input/event4)
[     3.986] (**) AT Translated Set 2 keyboard: Applying InputClass "evdev keyboard catchall"
[     3.986] (II) Using input driver 'evdev' for 'AT Translated Set 2 keyboard'
[     3.986] (**) AT Translated Set 2 keyboard: always reports core events
[     3.986] (**) evdev: AT Translated Set 2 keyboard: Device: "/dev/input/event4"
[     3.986] (--) evdev: AT Translated Set 2 keyboard: Vendor 0x1 Product 0x1
[     3.986] (--) evdev: AT Translated Set 2 keyboard: Found keys
[     3.986] (II) evdev: AT Translated Set 2 keyboard: Configuring as keyboard
[     3.986] (**) Option "config_info" "udev:/sys/devices/platform/i8042/serio0/input/input4/event4"
[     3.986] (II) XINPUT: Adding extended input device "AT Translated Set 2 keyboard" (type: KEYBOARD, id 12)
[     3.986] (**) Option "xkb_rules" "evdev"
[     3.986] (**) Option "xkb_model" "pc105"
[     3.986] (**) Option "xkb_layout" "us"

Comment 1 binx.wu 2016-05-25 06:09:33 UTC

Created attachment 124070 [details]
/sys/class/drm/card0/error

file of /sys/class/drm/card0/error

Comment 2 binx.wu 2016-05-26 01:53:55 UTC

Created attachment 124099 [details]
dmesg with drm.debug=0x0e

Sorry for forget upload this log

Comment 3 Jani Nikula 2016-06-16 07:13:42 UTC

I really have no idea, but there seems to be e.g. this workaround still pending, please try it: http://patchwork.freedesktop.org/patch/msgid/1465816501-25557-1-git-send-email-tim.gore@intel.com

Also, the dmesg is missing the beginning, including the i915 device info parts.

Comment 4 Mika Kuoppala 2016-06-16 08:12:49 UTC

Could you please try with latest drm-intel-nightly from
git://anongit.freedesktop.org/drm-intel

Some skl specific workarounds got their scope extended to more modern
revisions just recently.

Comment 5 binx.wu 2016-06-19 13:11:25 UTC

Created attachment 124603 [details]
/var/log/kern.log file

commit 3aaddcdb189ded5595700cf07e2e0991bfb812c7
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Sun Jun 19 10:40:16 2016 +0200

    drm-intel-nightly: 2016y-06m-19d-08h-39m-57s UTC integration manifest

Still exist

[  488.731857] [drm] RC6 on
[  496.732046] [drm] stuck on render ring
[  496.732241] [drm] GPU HANG: ecode 9:0:0x85dffffb, in nexuiz-linux-x8 [3325], reason: Engine(s) hung, action: reset
[  496.732290] [drm:i915_reset_and_wakeup] resetting chip
[  496.734046] drm/i915: Resetting chip after gpu hang
[  496.734052] [drm:gen8_init_common_ring] Execlists enabled for render ring
[  496.734070] [drm:gen8_init_common_ring] Execlists enabled for blitter ring
[  496.734085] [drm:gen8_init_common_ring] Execlists enabled for bsd ring
[  496.734099] [drm:gen8_init_common_ring] Execlists enabled for bsd2 ring
[  496.734114] [drm:gen8_init_common_ring] Execlists enabled for video enhancement ring
[  496.734135] [drm:intel_guc_setup] GuC fw status: path i915/skl_guc_ver6_1.bin, fetch FAIL, load NONE
[  496.734138] [drm] GuC firmware load failed: -5

# glxinfo |grep "renderer string"
OpenGL renderer string: Mesa DRI Intel(R) Iris Pro Graphics P580 (Skylake GT4e)

I already add the full log from /var/log/kern.log file
OS: ubuntu 16.04 desktop

Comment 6 Terrence Xu 2016-06-21 08:36:06 UTC

Change the status to assigned since we still can reproduce it in the newest code from drm-intel-nightly and provided the newest dmesg.

Kuoppala, do you need any more information?

Comment 7 Jani Nikula 2016-06-21 08:40:42 UTC

(In reply to Terrence Xu from comment #6)
> Change the status to assigned since we still can reproduce it in the newest
> code from drm-intel-nightly and provided the newest dmesg.

And the patch from comment #3?

Comment 8 Terrence Xu 2016-06-23 02:35:43 UTC

(In reply to Jani Nikula from comment #7)
> (In reply to Terrence Xu from comment #6)
> > Change the status to assigned since we still can reproduce it in the newest
> > code from drm-intel-nightly and provided the newest dmesg.
> 
> And the patch from comment #3?

Hi Kikula,
This patch already existed in the drm-intel-nightly branch:

commit a8ab5ed5e1bf856eceaab5579236de6f92822b9f
Author: Tim Gore <tim.gore@intel.com>
Date:   Mon Jun 13 12:15:01 2016 +0100

    drm/i915/gen9: implement WaConextSwitchWithConcurrentTLBInvalidate

    This patch enables a workaround for a mid thread preemption
    issue where a hardware timing problem can prevent the
    context restore from happening, leading to a hang.

    v2: move to gen9_init_workarounds (Arun)
    v3: move to start of gen9_init_workarounds (Arun)

    Signed-off-by: Tim Gore <tim.gore@intel.com>
    Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
    Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: http://patchwork.freedesktop.org/patch/msgid/1465816501-25557-1-git-send-email-tim.gore@intel.com


Still met the GPU hang issue:
[  114.965619] [drm] stuck on render ring
[  114.970172] [drm] GPU HANG: ecode 9:0:0xfffffffe, in nexuiz-linux-x8 [3252], reason: Engine(s) hung, action: reset
[  114.981934] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  114.992402] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  115.002481] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  115.013447] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  115.023619] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  115.031127] [drm:i915_reset_and_wakeup] resetting chip
[  115.039093] drm/i915: Resetting chip after gpu hang
[  115.044679] [drm:gen8_init_common_ring] Execlists enabled for render ring
[  115.052422] [drm:gen8_init_common_ring] Execlists enabled for blitter ring
[  115.060257] [drm:gen8_init_common_ring] Execlists enabled for bsd ring
[  115.067692] [drm:gen8_init_common_ring] Execlists enabled for bsd2 ring
[  115.075227] [drm:gen8_init_common_ring] Execlists enabled for video enhancement ring
[  115.084045] [drm:intel_guc_setup] GuC fw status: path i915/skl_guc_ver6_1.bin, fetch FAIL, load NONE
[  115.094438] [drm] GuC firmware load failed: -5
[  116.977464] [drm] RC6 on
[  124.977716] [drm] stuck on render ring
[  124.982275] [drm] GPU HANG: ecode 9:0:0xfffffffe, in nexuiz-linux-x8 [3252], reason: Engine(s) hung, action: reset
[  124.994066] [drm:i915_reset_and_wakeup] resetting chip
[  125.002035] drm/i915: Resetting chip after gpu hang
[  125.007627] [drm:gen8_init_common_ring] Execlists enabled for render ring
[  125.015376] [drm:gen8_init_common_ring] Execlists enabled for blitter ring
[  125.023211] [drm:gen8_init_common_ring] Execlists enabled for bsd ring
[  125.030652] [drm:gen8_init_common_ring] Execlists enabled for bsd2 ring
[  125.038195] [drm:gen8_init_common_ring] Execlists enabled for video enhancement ring
[  125.047033] [drm:intel_guc_setup] GuC fw status: path i915/skl_guc_ver6_1.bin, fetch FAIL, load NONE
[  125.057420] [drm] GuC firmware load failed: -5
[  125.126724] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.149208] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.165956] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.182720] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.199386] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.216057] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.232721] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.454492] [drm:skl_wm_flush_pipe] flush pipe A (pass 3)
[  125.466015] DMAR: DRHD: handling fault status reg 3
[  125.471564] DMAR: [DMA Read] Request device [00:02.0] fault addr f9827000 [fault reason 06] PTE Read access is not set
[  125.472089] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.489039] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.505019] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.515840] DMAR: DRHD: handling fault status reg 3
[  125.521408] DMAR: [DMA Read] Request device [00:02.0] fault addr f9827000 [fault reason 06] PTE Read access is not set
[  125.549366] DMAR: DRHD: handling fault status reg 3
[  125.554919] DMAR: [DMA Read] Request device [00:02.0] fault addr f9827000 [fault reason 06] PTE Read access is not set
[  125.582693] DMAR: DRHD: handling fault status reg 3
[  125.588257] DMAR: [DMA Read] Request device [00:02.0] fault addr f9827000 [fault reason 06] PTE Read access is not set
[  125.608688] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.619399] DMAR: DRHD: handling fault status reg 3
[  125.619401] DMAR: [DMA Read] Request device [00:02.0] fault addr f9827000 [fault reason 06] PTE Read access is not set
[  125.637102] DMAR: DRHD: handling fault status reg 3
[  125.637104] DMAR: [DMA Read] Request device [00:02.0] fault addr f9827000 [fault reason 06] PTE Read access is not set
[  125.654799] DMAR: DRHD: handling fault status reg 3
[  125.654800] DMAR: [DMA Read] Request device [00:02.0] fault addr f9827000 [fault reason 06] PTE Read access is not set
[  125.672491] DMAR: DRHD: handling fault status reg 3
[  125.672492] DMAR: [DMA Read] Request device [00:02.0] fault addr f9827000 [fault reason 06] PTE Read access is not set
[  125.690191] DMAR: DRHD: handling fault status reg 3
[  125.690193] DMAR: [DMA Read] Request device [00:02.0] fault addr f9827000 [fault reason 06] PTE Read access is not set
[  125.707890] DMAR: DRHD: handling fault status reg 3
[  125.707891] DMAR: [DMA Read] Request device [00:02.0] fault addr f9827000 [fault reason 06] PTE Read access is not set
[  125.735986] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.753193] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.768880] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.785399] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.802125] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.818760] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.835536] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  125.852006] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  126.297968] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  126.898794] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  126.965872] [drm] RC6 on
[  127.500350] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  128.101283] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  128.702096] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  129.303064] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  129.904004] [drm:skl_update_scaler_plane] Updating scaler for [PLANE:23:plane 1A] scaler_user index 0.0
[  130.482775] dmar_fault: 286 callbacks suppressed
[  130.488056] DMAR: DRHD: handling fault status reg 3

Comment 9 Mika Kuoppala 2016-07-08 10:05:40 UTC

Two dozen or so workarounds affecting skl/kbl went to nightly by the end of July.

Could you please respin with latest on git://anongit.freedesktop.org/drm-intel ?

Comment 10 mwa 2016-07-08 10:25:53 UTC

I tried to see if I could reproduce this on a SKL GT4e running a recent -nightly but with no luck.

Comment 11 Mika Kuoppala 2016-07-12 14:57:49 UTC

try to load with intel_iommu=igfx_off.

Then reproduce 3 times and upload all error states here. Thanks

Comment 12 Terrence Xu 2016-07-13 08:02:54 UTC

The bad news, we still can reproduce it with the latest drm-intel-nightly code with intel_iommu=igfx_off and the commit as below:

commit 9561f5c5e1918cfaeb2a39f90eed046730ae7399
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Jul 12 17:15:07 2016 +0200

    drm-intel-nightly: 2016y-07m-12d-15h-14m-43s UTC integration manifest

The corresponding attachment is 0713-1.log, 0713-2.log and 0713-3.log.

Comment 13 Terrence Xu 2016-07-13 08:05:53 UTC

Created attachment 125046 [details]
0713-1.log

Comment 14 Terrence Xu 2016-07-13 08:06:25 UTC

Created attachment 125047 [details]
0713-2.log

Comment 15 Terrence Xu 2016-07-13 08:07:05 UTC

Created attachment 125048 [details]
0713-3.log

Comment 16 mwa 2016-07-15 13:18:49 UTC

hmm, so I managed to reproduce this, but only when I disable GuC submission, which by the look of your logs is what is also happening, though this is because it can't find the firmware and not that it has been intentionally disabled. Is there any particular reason why you are not using the GuC? Would you be able to test it with the GuC loaded? Nevertheless there does seem to be a bug when falling back to execlist mode on the SKL GT4e...

Comment 17 Terrence Xu 2016-07-18 05:43:50 UTC

Hello mwa,
The bad news is I also reproduced this issue after I downloaded the guc firmware and enabled it in i915.

The error log as below:
[  252.697245] [drm] GPU HANG: ecode 9:0:0xfffffffe, in nexuiz-linux-x8 [2970], reason: Hang on render ring, action: reset
[  252.697247] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  252.697248] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  252.697249] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  252.697250] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  252.697251] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  252.697272] [drm:i915_reset_and_wakeup] resetting chip
[  252.697282] drm/i915: Resetting chip after gpu hang
[  252.697316] [drm:gen8_init_common_ring] Execlists enabled for render ring
[  252.697334] [drm:gen8_init_common_ring] Execlists enabled for blitter ring
[  252.697349] [drm:gen8_init_common_ring] Execlists enabled for bsd ring
[  252.697363] [drm:gen8_init_common_ring] Execlists enabled for bsd2 ring
[  252.697378] [drm:gen8_init_common_ring] Execlists enabled for video enhancement ring
[  252.697408] [drm:intel_guc_setup] GuC fw status: path i915/skl_guc_ver6_1.bin, fetch SUCCESS, load SUCCESS
[  252.697411] [drm:intel_guc_setup] GuC fw status: fetch SUCCESS, load PENDING
[  252.698532] [drm:guc_ucode_xfer_dma] DMA status 0x10, GuC status 0x8002f0ec
[  252.698534] [drm:guc_ucode_xfer_dma] returning 0
[  252.698536] [drm:intel_guc_setup] GuC fw status: fetch SUCCESS, load SUCCESS
[  252.698550] [drm:select_doorbell_register] assigned normal priority doorbell id 0x0
[  252.698551] [drm:select_doorbell_cacheline] selected doorbell cacheline 0x40, next 0x80, linesize 64
[  252.698559] [drm:guc_client_alloc] new priority 2 client ffff8804898d9280: ctx_index 0
[  252.698560] [drm:guc_client_alloc] doorbell id 0, cacheline offset 0x40
[  254.696694] [drm] RC6 on
[  262.695933] [drm:i915_reset_and_wakeup] resetting chip
[  262.695944] drm/i915: Resetting chip after gpu hang
[  262.697760] [drm:gen8_init_common_ring] Execlists enabled for render ring
[  262.697787] [drm:gen8_init_common_ring] Execlists enabled for blitter ring
[  262.697808] [drm:gen8_init_common_ring] Execlists enabled for bsd ring
[  262.697828] [drm:gen8_init_common_ring] Execlists enabled for bsd2 ring
[  262.697847] [drm:gen8_init_common_ring] Execlists enabled for video enhancement ring
[  262.697882] [drm:intel_guc_setup] GuC fw status: path i915/skl_guc_ver6_1.bin, fetch SUCCESS, load SUCCESS
[  262.697886] [drm:intel_guc_setup] GuC fw status: fetch SUCCESS, load PENDING
[  262.701710] [drm:guc_ucode_xfer_dma] DMA status 0x10, GuC status 0x8002f0ec
[  262.701713] [drm:guc_ucode_xfer_dma] returning 0
[  262.701715] [drm:intel_guc_setup] GuC fw status: fetch SUCCESS, load SUCCESS
[  262.701729] [drm:select_doorbell_register] assigned normal priority doorbell id 0x0
[  262.701730] [drm:select_doorbell_cacheline] selected doorbell cacheline 0x80, next 0xc0, linesize 64
[  262.703708] [drm:guc_client_alloc] new priority 2 client ffff8804898d9280: ctx_index 0
[  262.703710] [drm:guc_client_alloc] doorbell id 0, cacheline offset 0x80

The guc status as below:
root@igvt-1604:/sys/kernel/debug/dri/0# cat i915_guc_load_status
GuC firmware status:
        path: i915/skl_guc_ver6_1.bin
        fetch: SUCCESS
        load: SUCCESS
        version wanted: 6.1
        version found: 6.1
        header: offset is 0; size = 128
        uCode: offset is 128; size = 128640
        RSA: offset is 128768; size = 256

GuC status 0x800300ec:
        Bootrom status = 0x76
        uKernel status = 0x0
        MIA Core status = 0x3

Scratch registers:
         0:     0xf0000000
         1:     0x0
         2:     0x0
         3:     0x5f5e100
         4:     0x600
         5:     0x0
         6:     0x0
         7:     0x8
         8:     0x3
         9:     0xd4a00
        10:     0x0
        11:     0x0
        12:     0x0
        13:     0x0
        14:     0x0
        15:     0x0

Comment 18 Terrence Xu 2016-07-18 05:50:40 UTC

Created attachment 125127 [details]
dmesg-with-guc-0718.log

Attach the error dmesg with guc enabled.

BTW, the GUC version is 6.1, download address: https://01.org/zh/linuxgraphics/downloads/skylake-guc-6.1

Comment 19 mwa 2016-07-19 19:12:10 UTC

Okay, so after *lots* more investigation, this does not look like a kernel issue. It would seem the user-space component which is the root cause of the hang is in fact Mesa. The good news is that it seems to have been fixed, I tested on the latest master(9c63224) and the hang doesn't seem to present itself. Would you be able to also confirm this and report back?

Comment 20 Terrence Xu 2016-07-20 03:09:55 UTC

(In reply to mwa from comment #19)
> Okay, so after *lots* more investigation, this does not look like a kernel
> issue. It would seem the user-space component which is the root cause of the
> hang is in fact Mesa. The good news is that it seems to have been fixed, I
> tested on the latest master(9c63224) and the hang doesn't seem to present
> itself. Would you be able to also confirm this and report back?

You are right.:)
The Ubuntu 16.04 default Mesa version is 11.2.0, and after I upgraded the Mesa to master(9c63224,12.1.0-devel), this issue disappeared, same result as Bug96177.

Comment 21 yann 2016-07-20 07:12:40 UTC

Resolving this issue since fix is done by upgrading with latest Mesa.

Comment 22 Timo Aaltonen 2016-07-21 05:49:12 UTC

any idea what fixed it, and if so, will it be backported to 11.2.x/12.0.x?

Comment 23 Terrence Xu 2016-08-04 08:32:23 UTC

We have confirmed this issue disappeared on Both Ubuntu and Centos after Mesa upgrading.

Comment 24 mwa 2016-08-11 11:34:47 UTC

For reference the fix is:

commit ddcfc35f62ed3ad83b100beacb5b30394dcd9960
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Thu May 26 11:04:07 2016 -0700

    i965/sklgt4: Implement depth/timestamp write w/a
    
    The stated bug describes a scenario in which a post sync write operation for
    depth or timestamp can be ignored. There are two workarounds suggested, the
    first and easier is to simply do a cs stall when we do these type of writes.
    The second option is to do a PIPE_CONTROL flush after the post sync but before
    the data is required.
    
    Generally, I believe the data written out is consumed by the application on the
    CPU side and so doing the easier of the two is ideal. Furthermore, these queries
    aren't tremendously common in the perf sensitive apps I have looked at. However,
    there could be cases where a shader stage might directly consume the data, and
    as a result option 2 may be desirable.
    
    This patch goes with the easier solution for now.
    
    gen9lp bug_de_id=2137196
    
    By itself, this does *not* fix any of the GT4 hangs we're currently
    experiencing.
    
    Cc: Mika Kuoppala <mika.kuoppala@intel.com>
    Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
    Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

There are plans to backport the fix to the next Mesa stable release(should be 11.2.3).

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.