Bug 92998 - gpu hang after resume on ivybridge and dmesg errors
Summary: gpu hang after resume on ivybridge and dmesg errors
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 94124 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-11-18 19:17 UTC by Tasev
Modified: 2016-11-14 15:07 UTC (History)
5 users (show)

See Also:
i915 platform: HSW
i915 features: GPU hang


Attachments
dmesg after boot (67.20 KB, text/plain)
2015-11-18 19:17 UTC, Tasev
no flags Details
dmesg after suspend resume with gpu hang (80.43 KB, text/plain)
2015-11-18 19:18 UTC, Tasev
no flags Details
GPU crash dump (2.04 MB, text/plain)
2015-11-18 19:19 UTC, Tasev
no flags Details
GPU crash dump kernel 4.4-rc5 (2.13 MB, text/plain)
2015-12-14 09:16 UTC, Tasev
no flags Details
gpu dump (3.09 MB, text/plain)
2016-01-21 14:56 UTC, Alexey Kharlamov
no flags Details
dmesg with another hang (270.41 KB, text/plain)
2016-01-21 21:18 UTC, Alexey Kharlamov
no flags Details
/sys/class/drm/card0/error with that hang (3.08 MB, text/plain)
2016-01-21 21:18 UTC, Alexey Kharlamov
no flags Details

Description Tasev 2015-11-18 19:17:30 UTC
Created attachment 119914 [details]
dmesg after boot

Hi

After boot i have a *ERROR* mismatch in has_drrs in dmesg (at 2.40) .
After suspend/resume the gpu hang for 20 seconds with

[  109.349028] [drm] stuck on render ring
[  109.349397] [drm] GPU HANG: ecode 7:0:0x85fffff8, in kwin [1667], reason: Ring hung, action: reset

The hang only happen with the new 4.4-rc1 kernel, never with the 4.3 stable,
but even with the 4.3 kernel there is the *ERROR* mismatch in has_drrs in dmesg.

The gpu crash dump, dmesg after boot and after suspend/resume are attached for the 4.4-rc1 kernel.

lspci | grep VGA 
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)

glxinfo | grep OpenGL :

OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Ivybridge Mobile 
OpenGL core profile version string: 3.3 (Core Profile) Mesa 11.1.0-devel (git-e06238c 2015-11-08 trusty-oibaf-ppa)
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 11.1.0-devel (git-e06238c 2015-11-08 trusty-oibaf-ppa)
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 11.1.0-devel (git-e06238c 2015-11-08 trusty-oibaf-ppa)

I use Kubuntu 14.04 Trusty.
Comment 1 Tasev 2015-11-18 19:18:20 UTC
Created attachment 119915 [details]
dmesg after suspend resume with gpu hang
Comment 2 Tasev 2015-11-18 19:19:01 UTC
Created attachment 119916 [details]
GPU crash dump
Comment 3 Chris Wilson 2015-11-18 20:12:09 UTC
About the only big nearby change was PIN_HIGH. You can try reverting

commit 101b506a7fc7be3f0d0a337ade270eb5eb5a2857
Author: Michel Thierry <michel.thierry@intel.com>
Date:   Thu Oct 1 13:33:57 2015 +0100

    drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset

and see if that makes the hang go away, otherwise I'm out of good guesses and a bisection would be most invaluable.
Comment 4 Tasev 2015-11-19 08:03:31 UTC
(In reply to Chris Wilson from comment #3)
> About the only big nearby change was PIN_HIGH. You can try reverting
> 
> commit 101b506a7fc7be3f0d0a337ade270eb5eb5a2857
> Author: Michel Thierry <michel.thierry@intel.com>
> Date:   Thu Oct 1 13:33:57 2015 +0100
> 
>     drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
> 
> and see if that makes the hang go away, otherwise I'm out of good guesses
> and a bisection would be most invaluable.

Yes, reverting that commit fix the hang.

Thank you
Comment 5 Tasev 2015-11-24 14:28:19 UTC
Hi

The bug is still present in 4.4-rc2
Comment 6 Tasev 2015-11-30 17:49:13 UTC
Hi 

Just tested the 4.4-rc3 , the bug is still present
Comment 7 Tasev 2015-12-08 09:00:47 UTC
In the 4.4-rc4 the dmesg error (has_drrs) is fixed 
but the gpu hang is still there.
Comment 8 Tasev 2015-12-14 09:15:16 UTC
Hi

The bug is still present in the 4.4-rc5 kernel.

Attached is a new gpu crash dump from this kernel.
Comment 9 Tasev 2015-12-14 09:16:21 UTC
Created attachment 120492 [details]
GPU crash dump kernel 4.4-rc5
Comment 10 Tasev 2015-12-22 10:00:33 UTC
Hi

Just tested 4.4-rc6 , no changes.

I purged the oibaf ppa, revert to mesa 10.1 stable with no luck,
just to check that the problem is not in mesa.

But reverting  drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
still fix the hang.

Once i noticed after gpu hang in dmesg (i don't know if it is important): 

[drm:intel_set_cpu_fifo_underrun_reporting [i915]] *ERROR* uncleared fifo underrun on pipe C

I also find that switching VT with CTRL ALT F1 is not working since
4.4-rc1, i just have a black screen with nothing displayed on it, but typing
something (in the dark sudo reboot) is working. But i suppose that this is a separate issue.
Comment 11 Tasev 2016-01-04 16:38:17 UTC
Hi

Happy new year and my best wishes for all the intel team.

The bug is no more present since the 4.4-rc7 kernel.
I didn't experience a single crash the past 7 days.
I just started with the 4.4-rc8 kernel, no crash for now.
Comment 12 Julian Andres Klode 2016-01-18 08:26:08 UTC
I have the same issue in both 4.4-rc7 and 4.4-rc8
Comment 13 Alexey Kharlamov 2016-01-21 14:55:30 UTC
Same problem here. Kernel -- release 4.4

Graphics:  Card: Intel Haswell-ULT Integrated Graphics Controller
Display Server: X.org 1.17.4 driver: intel tty size: 142x35 Advanced Data: N/A for root

When I receive a gpu hang, X shows up slower than usual and ttys break.
Comment 14 Alexey Kharlamov 2016-01-21 14:56:37 UTC
Created attachment 121182 [details]
gpu dump
Comment 15 Tasev 2016-01-21 18:53:40 UTC
Hi

Just to say that for me the problem is fixed since the 4.4-rc7 kernel.
I have no more a gpu crash for more than 3 week's now (ivybridge graphics).

I opened a separate bug report for the tty problem here .
https://bugs.freedesktop.org/show_bug.cgi?id=93483
Comment 16 Alexey Kharlamov 2016-01-21 21:18:22 UTC
Created attachment 121196 [details]
dmesg with another hang
Comment 17 Alexey Kharlamov 2016-01-21 21:18:50 UTC
Created attachment 121197 [details]
/sys/class/drm/card0/error with that hang
Comment 18 yann 2016-05-20 09:41:07 UTC
*** Bug 94124 has been marked as a duplicate of this bug. ***
Comment 19 yann 2016-09-20 15:13:23 UTC
(In reply to Alexey Kharlamov from comment #17)
> Created attachment 121197 [details]
> /sys/class/drm/card0/error with that hang

Alexey, there were improvements pushed in kernel and Mesa that will benefit to your system, so please re-test with latest kernel & Mesa to see if this issue is still occurring.
Comment 20 yann 2016-11-14 15:07:05 UTC
(In reply to yann from comment #19)
> (In reply to Alexey Kharlamov from comment #17)
> > Created attachment 121197 [details]
> > /sys/class/drm/card0/error with that hang
> 
> Alexey, there were improvements pushed in kernel and Mesa that will benefit
> to your system, so please re-test with latest kernel & Mesa to see if this
> issue is still occurring.

Timeout. Assuming that it is fixed by now. If this is not the case, please re-test with latest kernel & Mesa (12-13) to see if this issue is still occurring since there were improvements pushed in kernel and Mesa that will benefit to your system.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.