Bug 93049 - Skylake machines randomly hang
Summary: Skylake machines randomly hang
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Olivier Berthier
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-20 19:57 UTC by mike
Modified: 2016-02-22 16:56 UTC (History)
3 users (show)

See Also:
i915 platform: SKL
i915 features: GPU hang


Attachments
dmesg trace (13.43 KB, text/plain)
2015-11-26 11:01 UTC, mike
no flags Details

Description mike 2015-11-20 19:57:46 UTC
Intel Skylake graphics hangs randomly but seems to happen faster while running minecraft, i presume because the graphics is being heavily used. The whole system becomes is unresponsive. This happens on all 3 of my skylake systems (2 laptops and 1 desktop).

Nov 09 03:02:05 laptop kernel: [drm] stuck on render ring
Nov 09 03:02:05 laptop kernel: [drm] GPU HANG: ecode 9:0:0x84df9ffc, in java [2403], reason: Ring hung, action: reset
Nov 09 03:02:05 laptop kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Nov 09 03:02:05 laptop kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Nov 09 03:02:05 laptop kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Nov 09 03:02:05 laptop kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Nov 09 03:02:05 laptop kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Nov 09 03:02:05 laptop kernel: drm/i915: Resetting chip after gpu hang
Nov 09 03:02:07 laptop kernel: [drm] RC6 on
Nov 09 03:02:23 laptop kernel: [drm] stuck on render ring
Nov 09 03:02:23 laptop kernel: [drm] GPU HANG: ecode 9:0:0x84df7cfc, in java [2403], reason: Ring hung, action: reset
Nov 09 03:02:23 laptop kernel: drm/i915: Resetting chip after gpu hang
Nov 09 03:02:25 laptop kernel: [drm] RC6 on


This happens on both ubuntu and arch linux (xf86-video-intel 1:2.99.917+478+gdf72bc5-2)
Comment 1 Jani Nikula 2015-11-23 08:49:22 UTC
(In reply to mike from comment #0)
> Nov 09 03:02:05 laptop kernel: [drm] The gpu crash dump is required to
> analyze gpu hangs, so please always attach it.
> Nov 09 03:02:05 laptop kernel: [drm] GPU crash dump saved to
> /sys/class/drm/card0/error
Comment 2 mike 2015-11-24 02:55:29 UTC
(In reply to Jani Nikula from comment #1)
> (In reply to mike from comment #0)
> > Nov 09 03:02:05 laptop kernel: [drm] The gpu crash dump is required to
> > analyze gpu hangs, so please always attach it.
> > Nov 09 03:02:05 laptop kernel: [drm] GPU crash dump saved to
> > /sys/class/drm/card0/error

That file is not created so I cannot attach it. Is there a reason why it would not be created or could it be somewhere else?
Comment 3 cprigent 2015-11-24 17:57:57 UTC
Bug scrub:
Hi Olivier,
Could you try to reproduce it.
Thanks
Comment 4 mike 2015-11-26 11:01:08 UTC
Created attachment 120138 [details]
dmesg trace
Comment 5 iDanoo 2015-12-08 05:46:00 UTC
I'm also having this issue, can help provide any other information required.

It happens completely intermittantly with no trigger that I can pick up. Usually within 15-60min of boot. 

I'm currently passing these kernel params on boot: i915.preliminary_hw_support=1 drm.debug=0 drm.vblankoffdelay=1 i915.semaphores=0 i915.modeset=1 i915.use_mmio_flip=1 i915.powersave=1 i915.enable_ips=1 i915.disable_power_well=1 i915.enable_hangcheck=1 i915.enable_cmd_parser=1 i915.fastboot=0 i915.enable_ppgtt=1 i915.reset=0 i915.lvds_use_ssc=0 i915.enable_psr=0 

Am now testing UXA accel method with Xorg to see if there's any improvement as someone stated on another site that it may help.
Comment 6 iDanoo 2015-12-08 06:08:55 UTC
Confirmed setting UXA accel method works - however performance drops as expected to an almost unusable state. 

------------------------------------------------------------
~ ยป sudo cat /usr/share/X11/xorg.conf.d/20-intel.conf                                                                                                                                                
Section "Device"
   Identifier  "Intel Graphics"
   Driver      "intel"
   Option      "AccelMethod"  "uxa"
EndSection
------------------------------------------------------------
Comment 7 iDanoo 2015-12-11 06:16:59 UTC
Appears fixed running 4.4.0-rc4 kernel. Currently up to 4 hours without a lock-up.
Comment 8 iDanoo 2015-12-13 01:13:23 UTC
Issue resolved running Kernel 4.4.0-rc4
Comment 9 cprigent 2016-02-22 16:56:15 UTC
So closed


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.