Bug 91730 - [ivb blorp?] GPU hang in witcher.EXE
Summary: [ivb blorp?] GPU hang in witcher.EXE
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-23 08:20 UTC by Tim Allen
Modified: 2019-09-25 18:54 UTC (History)
2 users (show)

See Also:
i915 platform: IVB
i915 features: GPU hang


Attachments
Contents of /sys/class/drm/card0/error as requested (2.05 MB, text/plain)
2015-08-23 08:20 UTC, Tim Allen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Allen 2015-08-23 08:20:51 UTC
Created attachment 117869 [details]
Contents of /sys/class/drm/card0/error as requested

I wish I had some easy way to reproduce this with some widely-available software, but I'm not sure what would count.

The setup:

- I'm trying to play The Witcher Enhanced Edition
- Running under Wine 1.7.49
- Mesa 10.6.3
- Xorg 2.99.917
- Linux 4.1.0
- Debian Testing
- Intel Ivy Bridge i5-3570K CPU, with HD4000 graphics

Steps to reproduce (on my machine):

- Start the game
- Load my usual saved game
- Sometimes, once the initial loading screen's progress bar gets to the 99% mark, the hard-drive light stops flickering and the progress will pause for 10 seconds or so and then crash to desktop
- Sometimes the game loads properly, but later I'll be moving from one zone to another and *that* loading screen will pause and crash.
- Generally, I can't play the game for more than 5-10 minutes before a crash occurs.
- This same problem also occurs with another game called Path of Exile, in a similar fashion: every loading screen has some percentage chance to pause and crash.

Effects:
- When I start the game from a terminal, after a crash it displays the following message:

    intel_do_flush_locked failed: Input/output error

- /var/log/messages receives this message:

[  807.367059] [drm] stuck on render ring
[  807.367363] [drm] GPU HANG: ecode 7:0:0x85fffff8, in witcher.EXE [4865], reason: Ring hung, action: reset
[  807.367364] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  807.367364] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  807.367365] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  807.367365] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  807.367366] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  807.369412] drm/i915: Resetting chip after gpu hang
[  813.363894] [drm] stuck on render ring
[  813.364243] [drm] GPU HANG: ecode 7:0:0x85ffbff8, in witcher.EXE [4865], reason: Ring hung, action: reset
[  813.366337] drm/i915: Resetting chip after gpu hang

- If I restart the game and try again, it'll crash again (usually faster), and after a few iterations eventually the console stops responding to anything but the Alt-SysRq keys (although I can SSH in from another computer and everything seems to be running fine) and I have to reboot.

Because I'm using Debian Testing, it's a bit difficult to say when this started. My recollection is that everything worked fine under Mesa 10.4, but these errors started appearing after Debian upgraded to Mesa 10.5. Of course, Debian was changing all kinds of stuff around that time, so it might have been anything.
Comment 1 Pavel Ondračka 2015-09-29 14:19:04 UTC
I can reproduce a similar behavior in Total War: Shogun2.
GPU hangs and "intel_do_flush_locked failed: Input/output error" is printed in the terminal. Hopefully it is the same bug, however for me this is 100% reproducible.

I've managed to bisect it to:
commit f5cf74d8ba8ce30b9d53b2198e5122ed72f1dcff
Author: Matt Turner <mattst88@gmail.com>
Date:   Tue May 5 20:25:07 2015 -0700

    nir: Recognize (a < c || b < c) as min(a, b) < c.
    
    ... and (a >= c) || (b >= c) as max(a, b) >= c.
    
    Similar to commit 97e6c1b9.
    
    total instructions in shared programs: 6182276 -> 6182180 (-0.00%)
    instructions in affected programs:     6400 -> 6304 (-1.50%)
    helped:                                68
    HURT:                                  4
    
    Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
    Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>

Tim can you confirm this with your testcases?

It can be reproduced with this trace: http://pavel.ondracka.cz/Shogun2.trace

My system info:
GPU: Mesa DRI Intel(R) HD Graphics 5500 (Broadwell GT2)
Mesa: c0722be9f58ef89dae98d8c459ec4f9589f97748
kernel: 4.1.7-200.fc22.x86_64
libdrm: 94ecdcb8b11dd3eb6b047ad72030d775014aadee
xf86-video-intel: 679ee12079a7d2682d41506b81973c7c7d4fa1d8 (sna+dri3)
Comment 2 Matt Turner 2015-10-02 00:29:55 UTC
(In reply to Pavel Ondračka from comment #1)
> I can reproduce a similar behavior in Total War: Shogun2.
> GPU hangs and "intel_do_flush_locked failed: Input/output error" is printed
> in the terminal. Hopefully it is the same bug, however for me this is 100%
> reproducible.

Why do you believe it's the same bug? Are Shogun2 and Witcher based on the same engine or something?

> I've managed to bisect it to:
> commit f5cf74d8ba8ce30b9d53b2198e5122ed72f1dcff
> Author: Matt Turner <mattst88@gmail.com>
> Date:   Tue May 5 20:25:07 2015 -0700
> 
>     nir: Recognize (a < c || b < c) as min(a, b) < c.
>     
>     ... and (a >= c) || (b >= c) as max(a, b) >= c.
>     
>     Similar to commit 97e6c1b9.
>     
>     total instructions in shared programs: 6182276 -> 6182180 (-0.00%)
>     instructions in affected programs:     6400 -> 6304 (-1.50%)
>     helped:                                68
>     HURT:                                  4
>     
>     Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
>     Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
> 
> Tim can you confirm this with your testcases?
> 
> It can be reproduced with this trace: http://pavel.ondracka.cz/Shogun2.trace

I couldn't get a hang by running this trace on Haswell, but I did confirm that 3 of the shaders were affected (hurt, actually...) by the patch you bisected the problem to.

I'll see if I can get someone to run the trace on BDW, and I'll look more closely at the differences in the shaders to see if we're doing something suspicious.
Comment 3 Pavel Ondračka 2015-10-02 06:26:40 UTC
(In reply to Matt Turner from comment #2)
> (In reply to Pavel Ondračka from comment #1)
> > I can reproduce a similar behavior in Total War: Shogun2.
> > GPU hangs and "intel_do_flush_locked failed: Input/output error" is printed
> > in the terminal. Hopefully it is the same bug, however for me this is 100%
> > reproducible.
> 
> Why do you believe it's the same bug? Are Shogun2 and Witcher based on the
> same engine or something?
> 
 
Well, I found this bug when I searched for "intel_do_flush_locked failed: Input/output error" in bugzilla, and the symptoms looked very similar to me. Same error, GPU hang, also regression, both running under wine...
If this is not enough, I'll open a new bug.
Comment 4 Matt Turner 2015-10-02 07:41:53 UTC
(In reply to Pavel Ondračka from comment #3)
> (In reply to Matt Turner from comment #2)
> > (In reply to Pavel Ondračka from comment #1)
> > > I can reproduce a similar behavior in Total War: Shogun2.
> > > GPU hangs and "intel_do_flush_locked failed: Input/output error" is printed
> > > in the terminal. Hopefully it is the same bug, however for me this is 100%
> > > reproducible.
> > 
> > Why do you believe it's the same bug? Are Shogun2 and Witcher based on the
> > same engine or something?
> > 
>  
> Well, I found this bug when I searched for "intel_do_flush_locked failed:
> Input/output error" in bugzilla, and the symptoms looked very similar to me.
> Same error, GPU hang, also regression, both running under wine...
> If this is not enough, I'll open a new bug.

That's the error /whenever/ there's a GPU hang. It's not more specific than that. You should open a new bug. :)
Comment 5 Pavel Ondračka 2015-10-02 07:57:28 UTC
(In reply to Matt Turner from comment #4)
> That's the error /whenever/ there's a GPU hang. It's not more specific than
> that. You should open a new bug. :)

OK, didn't know that. I've opened a new bug 92234, and added you to CC.

Tim, I'm sorry about spamming your bug.
Comment 6 GitLab Migration User 2019-09-25 18:54:31 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1491.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.