Bug 93842 - [SNB] GPU HANG: ecode 6:0:0x91beffb8, in zandronum [3531], reason: Ring hung, action: reset
Summary: [SNB] GPU HANG: ecode 6:0:0x91beffb8, in zandronum [3531], reason: Ring hung,...
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-24 19:35 UTC by main.haarp
Modified: 2019-09-25 18:55 UTC (History)
2 users (show)

See Also:
i915 platform: SNB
i915 features: GPU hang


Attachments
xz'ed crash dump (48.22 KB, text/plain)
2016-01-24 19:35 UTC, main.haarp
Details
gpu crash dump (xz) (87.98 KB, application/x-xz)
2016-06-10 19:07 UTC, main.haarp
Details

Description main.haarp 2016-01-24 19:35:10 UTC
Created attachment 121246 [details]
xz'ed crash dump

On a Thinkpad W520 using the Intel GPU, playing Zandronum (gzdoom) with the (graphically intensive) Brutal Doom mod, every 5-15 minutes, a GPU hang occurs.

Usually, the GPU gets reset properly. Sometimes, however, it freezes the entire system.

Linux Kernel 4.4.0, 64bit, mesa-11.0.6, xorg-server-1.17.4, xf86-video-intel-2.99.917. No i915 cmdline parameters are in use. TearFree is used.


The hang was also observed on kernel 4.1.12.


[drm] stuck on render ring
[drm] GPU HANG: ecode 6:0:0x91beffb8, in zandronum [3531], reason: Ring hung, action: reset
[drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[drm] GPU crash dump saved to /sys/class/drm/card0/error
drm/i915: Resetting chip after gpu hang

...

[drm] stuck on render ring
[drm] GPU HANG: ecode 6:0:0x85fffffc, in zandronum [3531], reason: Ring hung, action: reset
drm/i915: Resetting chip after gpu hang


I am capable of testing kernel and userspace patches.
Comment 1 main.haarp 2016-01-31 11:58:37 UTC
I can confirm that TearFree is not the culprit.
Comment 2 main.haarp 2016-06-10 19:07:59 UTC
Created attachment 124455 [details]
gpu crash dump (xz)

Still happening on kernel 4.5.7
Comment 3 main.haarp 2016-07-10 18:35:12 UTC
4.6.3, no change.
Comment 4 main.haarp 2016-07-23 17:24:22 UTC
For testing purposes, I checked modesetting instead of xf86-video-intel. This issue still appears!

This leads me to conclude that this is indeed a kernel bug.
Comment 5 Matt Turner 2016-07-23 18:47:08 UTC
We suspect a Mesa problem is the culprit. Do you know of a version that didn't exhibit this problem, and if so can you try to bisect the commit?
Comment 6 main.haarp 2016-07-31 17:08:44 UTC
(In reply to Matt Turner from comment #5)
> We suspect a Mesa problem is the culprit. Do you know of a version that
> didn't exhibit this problem, and if so can you try to bisect the commit?

Thanks, this seems correct! I can (mostly) confirm that mesa-10.3.7 is good and 11.0.6 bad.

I've been bisecting for a week now, but I'm not making much progress. It's hard to trigger the bug deliberately, as it's a statistical event. All I can do is play a game and hope it shows up (or play for a few hours when it doesn't, to make sure it's *really* not present)
Comment 7 main.haarp 2016-07-31 17:13:00 UTC
35a77a148f8b7ef03fe3b31d63719e0bfdf4b783 may be a possible culprit, but this may be incorrect.

Due to the difficulty in triggering the bug, I may have inadvertently marked a bad bisect as good.
Comment 8 Matt Turner 2016-07-31 22:57:20 UTC
(In reply to main.haarp from comment #7)
> 35a77a148f8b7ef03fe3b31d63719e0bfdf4b783 may be a possible culprit, but this
> may be incorrect.
> 
> Due to the difficulty in triggering the bug, I may have inadvertently marked
> a bad bisect as good.

If that bisect is correct (I have some doubts), setting INTEL_DEBUG=nocompact should avoid the issue. If it does, then we should be able to debug it quite easily from there.
Comment 9 Matt Turner 2016-07-31 22:59:51 UTC
(In reply to Matt Turner from comment #8)
> If that bisect is correct (I have some doubts), setting
> INTEL_DEBUG=nocompact should avoid the issue. If it does, then we should be
> able to debug it quite easily from there.

Hm. I see I only added that environment variable in time for 11.1.

Try putting an unconditional return at the top of the brw_compact_instructions() function in src/mesa/drivers/dri/i965/brw_eu_compact.c. That will have the same effect (disabling instruction compaction) as the environment variable would have.
Comment 10 Matt Turner 2016-11-03 22:11:06 UTC
Presumably this is still a problem with mesa-13.0.0?
Comment 11 main.haarp 2017-11-24 19:36:50 UTC
Sorry for the radio silence. I do not have any SNB hardware anymore, so I'm able to test this further.

I cannot find 'abandonded' as an option for this bug report, so please close at your discretion. Thank you for your time!
Comment 12 Danylo 2018-06-15 09:05:15 UTC
Was no able to reproduce with:

Zandronum 3.0
Brutal Doom v21
Mesa 18.1.1, 11.0.6
Kernel 4.14
Sandy Bridge cpu

Played several levels of Doom2 and deathmatch, 18.1.1 works perfectly, in 11.0.6 there are terrible flickering but no hangs.
Comment 13 GitLab Migration User 2019-09-25 18:55:57 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1510.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.