Created attachment 114884 [details]
GPU crash dump
I had Ubuntu 14.04, kernel 3.16, on Asus UX303LN-R4281H (Optimus Intel + Nvidia, nvidia drivers currently not installed). I updated to 4.0.0rc6 to solve a few issues not related with graphics (e.g. touchpad). After the update, resuming after suspend has some problems. System resumes and then freezes for a few seconds. Then unfreezes and in the kernel log I find:
[ 71.845023] [drm] stuck on render ring
[ 71.845672] [drm] GPU HANG: ecode 8:0:0xfffffffe, in Xorg , reason: Ring hung, action: reset
[ 71.845674] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 71.845674] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 71.845675] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 71.845676] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 71.845676] [drm] GPU crash dump saved to /sys/class/drm/card0/error
After a couple minutes, kernel log is full of:
drm:hsw_unclaimed_reg_detect.isra.10 [i915]] *ERROR* Unclaimed register detected. Please use the i915.mmio_debug=1 to debug this problem.
Note: I was using Chrome in the while, two tabs were probably using acceleration (e.g. 3d gmaps). I don't know if it is related, but some minutes after the system froze completely, without even responding to sysreqs. Never happened before kernel upgrade.
Now I rebooted with mmio.debug=1, suspended and resumed: it freezed again for some seconds. Part of dmesg | grep i915:
[ 39.884700] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[ 71.852974] drm/i915: Resetting chip after gpu hang
[ 77.848497] [drm:i915_set_reset_status.part.38 [i915]] *ERROR* gpu hanging too fast, banning!
[ 77.854831] drm/i915: Resetting chip after gpu hang
Attaching GPU crash dump and dmesg after reboot.
Created attachment 114885 [details]
Thanks for the suggestion. The error is not easily reproducible but shows up quite randomly after intense use of Chrome, however I've run the laptop for 2 days with "i915.enable_execlists=0" and it did not show up. Reboot once without it, and after one suspension I got the error again. So the option likely solved it.
After 2 days with "i915.enable_execlists=0" I found only this instead:
[ 8171.802098] PM: Entering mem sleep
[ 8171.802110] Suspending console(s) (use no_console_suspend to debug)
[ 8171.803545] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 8171.807718] sd 0:0:0:0: [sda] Stopping disk
[ 8172.835579] [drm:stop_ring [i915]] *ERROR* render ring : timed out trying to stop ring
[ 8173.267949] PM: suspend of devices complete after 1464.521 msecs
which seems to be related with the beginning of suspension and did not cause any freeze nor crash.
Also, I have now thousands of "Unclaimed register detected" messages. Not fatal nor problematic, just annoying for filling the log. Not sure if related or not.
Looking at the reported date, isn't this a duplicate of bug 89600?
Possibly, but bug 89600 was only confirmed for BSW as the reporters there had working BDW.
*** Bug 91252 has been marked as a duplicate of this bug. ***
I still think it's the same problem fixed by Peter (http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=364aece01a2dd748fc36a1e8bf52ef639b0857bd).
The issue was a race between enabling the interrupts and completing
the first batchbuffer, that's probably why we only saw it in chv, but
it's the same code bdw uses.
v4.0.6 didn't get the fix,
Now, this is Michel's comment 4 that it looks like bug 89000 (which at the time was negatively indicated for bdw) but it should be easy enough for everyone to test whether this is now fixed in 4.0.7
Assuming this is fixed then? Please re-open if not...
I clicked on reopen but I think I created a new one? 95019
*** This bug has been marked as a duplicate of bug 95019 ***