Bug 89220 - [ilk] GPU hang on pageflip
Summary: [ilk] GPU hang on pageflip
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-19 09:11 UTC by Phil Armstrong
Modified: 2016-11-18 13:45 UTC (History)
1 user (show)

See Also:
i915 platform: ILK
i915 features: GPU hang


Attachments
Contents of /sys/class/drm/card0/error (229.21 KB, application/gzip)
2015-02-19 09:11 UTC, Phil Armstrong
no flags Details

Description Phil Armstrong 2015-02-19 09:11:36 UTC
Created attachment 113655 [details]
Contents of /sys/class/drm/card0/error

The DRM driver in the kernel hangs at gnome-shell startup, resets the chip & then seems to carry on fine after that. 

$ dmesg
...
[   31.021308] [drm] stuck on render ring
[   31.025961] [drm] GPU HANG: ecode 5:0:0xfdffffdd, in gnome-shell [2833], reason: Ring hung, action: reset
[   31.025965] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   31.025966] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   31.025967] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   31.025968] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   31.025969] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   31.041341] drm/i915: Resetting chip after gpu hang

I see this bug both with the stock Debian kernel (a patched 3.16) and with 3.19.

$ uname -a
Linux bill 3.19.0-trunk-amd64 #1 SMP Debian 3.19-1~exp1 (2015-02-12) x86_64 GNU/Linux

Upgrading to the latest userspace driver doesn't seem to have helped, although I haven't tried compiling from git yet:

$ dpkg -s xserver-xorg-video-intel
Version: 2:2.99.917-1~exp1
Comment 1 Chris Wilson 2015-02-19 20:47:37 UTC
The actual cause of hang is a little dubious as there is no command barrier between the last batch and the dieing MI_FLUSH i.e. it may either be the batch or the flip itself causing the hang. However, since the batch itself is not recorded (the seqno convinced hangcheck that it is not at fault - be suspicious!) all we have to ponder is whether the flip could have died.
Comment 2 yann 2016-09-29 12:54:33 UTC
There were improvements pushed in kernel and Mesa that will benefit to your system, so please re-test with latest kernel & Mesa to see if this issue is still occurring.
Comment 3 yann 2016-11-18 13:45:08 UTC
(In reply to yann from comment #2)
> There were improvements pushed in kernel and Mesa that will benefit to your
> system, so please re-test with latest kernel & Mesa to see if this issue is
> still occurring.

Timeout. Assuming that it is fixed by now. If this is not the case, please re-test with latest kernel & Mesa (12-13) to see if this issue is still occurring since there were improvements pushed in kernel and Mesa that will benefit to your system, and fill a new bug.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.