Bug 99286 - [SNB] GPU HANG: ecode 6:0:0x2a8d8d94, in kscreenlocker_g [20793], reason: Hang on render ring, action: reset
Summary: [SNB] GPU HANG: ecode 6:0:0x2a8d8d94, in kscreenlocker_g [20793], reason: Han...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-05 17:19 UTC by Alexandre
Modified: 2017-07-11 13:48 UTC (History)
1 user (show)

See Also:
i915 platform: SNB
i915 features: GPU hang


Attachments
dump from /sys/class/drm/card0/error (124.13 KB, application/gzip)
2017-01-05 17:19 UTC, Alexandre
no flags Details
dmesg (28.17 KB, application/gzip)
2017-01-05 17:25 UTC, Alexandre
no flags Details
Xorg log (7.20 KB, application/gzip)
2017-01-05 17:25 UTC, Alexandre
no flags Details

Description Alexandre 2017-01-05 17:19:25 UTC
Created attachment 128776 [details]
dump from /sys/class/drm/card0/error

Attached dmesg, Xorg.log & /sys/class/drm/card0/error
Comment 1 Alexandre 2017-01-05 17:25:11 UTC
Created attachment 128777 [details]
dmesg
Comment 2 Alexandre 2017-01-05 17:25:57 UTC
Created attachment 128778 [details]
Xorg log
Comment 3 Chris Wilson 2017-01-05 21:46:04 UTC
This is strange. According to the error state we retired requests before their seqno were complete.
Comment 4 Alexandre 2017-01-06 19:03:41 UTC
I forgot to mention the bug manifests very frequently after suspend/resume. I do resume like 5 ~ 6 times a day and the frequency it happens is like once every two or three days. With 4.8.14 kernel & xf86-video-intel git @169c74fa6c2cd9c28dd7bfacd9639cd245b8c8a8 the behavior is that if I kill the offending kde screenlock program, the system comes back to work (altought a bit slow and I get frequent over temperature notifications in spite of very low cpu usage, the cpu fan is also on in hi-speed mode, I suspect gpu could be wasting power instead of cpu?).

On older kernel I had to either reboot or ctrl+alt+backspace to kill X to restore functionality - the low performance & over temp was present too.
Comment 5 Elizabeth 2017-06-29 17:30:23 UTC
(In reply to Alexandre from comment #4)
> I forgot to mention the bug manifests very frequently after suspend/resume.
> I do resume like 5 ~ 6 times a day and the frequency it happens is like once
> every two or three days. With 4.8.14 kernel & xf86-video-intel git
> @169c74fa6c2cd9c28dd7bfacd9639cd245b8c8a8 the behavior is that if I kill the
> offending kde screenlock program, the system comes back to work (altought a
> bit slow and I get frequent over temperature notifications in spite of very
> low cpu usage, the cpu fan is also on in hi-speed mode, I suspect gpu could
> be wasting power instead of cpu?).
> 
> On older kernel I had to either reboot or ctrl+alt+backspace to kill X to
> restore functionality - the low performance & over temp was present too.

Hello Alexandre,
Sorry for the delay, are you still able to reproduce the problem on latest Kernel? Have you done any new HW or SW configuration? Thank you.
Comment 6 Alexandre 2017-06-30 13:57:32 UTC
I had to disable DRI 3 and stick with DRI 2. This made the hang go away.

BTW, I installed ubuntu 17.04 vanilla and it also have the same problem with every kernel I tested (from 4.9 to 4.11, both vanilla and ubuntu), with both xserver-xorg-video-intel and builtin modeset drivers. I settled with xserver-xorg-video-intel with sna and tearfree on and dri 2, didn't try modeset since then because "it works for me as is". 

It's actually worse on ubuntu because I can reproduce almost every resume, all I have to do is use DRI 3 and suspend. The system comes back from resume, but when I enter my password to log back in, it freezes. I can't log back in to see if there's a crash log or something.
Comment 7 Alexandre 2017-07-01 18:54:44 UTC
Well, spoke too soon. The same problem happened once. It seems to be way more likely w/ DRI 3, but still happens with DRI 2.
Comment 8 Elizabeth 2017-07-03 20:21:09 UTC
(In reply to Alexandre from comment #7)
> Well, spoke too soon. The same problem happened once. It seems to be way
> more likely w/ DRI 3, but still happens with DRI 2.

Thank you for your answer,
If possible, could you try to reproduce with 4.12 https://www.kernel.org/
Also, please attach new logs with your last SW configuration, and dmesg with the parameter "drm.debug=0xe" on grub. Thank you.
Comment 9 Alexandre 2017-07-06 16:35:57 UTC
I'm using 4.12 for a few days. I didn't suspend many times yet, but since so far there's no hang on resume, it's a clear improvement even if it eventually fails. Last time I had  the system so stable was on 4.9/debian.

I'll collect the info you requested in a few days or after a freeze - whatever happens first.
Comment 10 Ricardo 2017-07-11 13:47:35 UTC
Alexandre several days has passed, I'm going to close this bug as Fixed but please if the hang is reproduce open a new bug with logs included...


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.