Bug 111204 - [drm] GPU HANG: ecode 9:0:0x85dffffb, in game [27732], reason: Hang on rcs0, action: reset
Summary: [drm] GPU HANG: ecode 9:0:0x85dffffb, in game [27732], reason: Hang on rcs0, ...
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-23 19:53 UTC by Bill Grupp
Modified: 2019-09-25 20:33 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Kernal Log (103.43 KB, text/plain)
2019-07-23 19:53 UTC, Bill Grupp
Details
card0/error log (29.51 KB, text/plain)
2019-07-23 19:54 UTC, Bill Grupp
Details
Another card0/error (31.48 KB, text/plain)
2019-07-25 22:12 UTC, Bill Grupp
Details

Description Bill Grupp 2019-07-23 19:53:48 UTC
Created attachment 144862 [details]
Kernal Log

[330214.862248] [drm] GPU HANG: ecode 9:0:0x85dffffb, in game [27732], reason: Hang on rcs0, action: reset
[330214.862260] i915 0000:00:02.0: Resetting rcs0 after gpu hang


We have seen this a few times but have only been able to get the log once. 

Logs attached.
thanks.
Comment 1 Bill Grupp 2019-07-23 19:54:40 UTC
Created attachment 144863 [details]
card0/error log
Comment 2 Lakshmi 2019-07-25 07:44:47 UTC
Considering this as a Mesa issue, Changing the product to Mesa

batch (rcs0 (submitted by game [27732], ctx 1 [4], score 0)) at 0x00000000_015de000
Bad count in PIPE_CONTROL
0x015de000:      0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush
0x015de004:      0x00105021:    destination address
0x015de008:      0x00005000:    immediate dword low
0x015de00c:      0x00000000:    immediate dword high
Bad length 19 in STATE_BASE_ADDRESS, expected 6-10
0x015de018:      0x61010011: STATE_BASE_ADDRESS
Bad count in STATE_BASE_ADDRESS
0x015de01c:      0x00000041:    general state base address 0x00000040
0x015de020:      0x00000000:    surface state base not updated
0x015de024:      0x00040000:    indirect state base not updated
0x015de028:      0x00165041:    general state upper bound 0x00165040
0x015de02c:      0x00000000:    indirect state upper bound not updated
Bad count in PIPE_CONTROL
0x015de064:      0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush
0x015de068:      0x00000c04:    destination address
0x015de06c:      0x00000000:    immediate dword low
0x015de070:      0x00000000:    immediate dword high
0x015de07c:      0x79000002: 3DSTATE_DRAWING_RECTANGLE
0x015de080:      0x00000000:    top left: 0,0
0x015de084:      0x0437077f:    bottom right: 1919,1079
0x015de088:      0x00000000:    origin: 0,0
Comment 3 Lionel Landwerlin 2019-07-25 07:49:57 UTC
Could you add the mesa version you're running?
Thanks!
Comment 4 Bill Grupp 2019-07-25 14:42:19 UTC
I believe it's 18.2.8


Bill.
Comment 5 Lionel Landwerlin 2019-07-25 15:54:26 UTC
(In reply to Bill Grupp from comment #4)
> I believe it's 18.2.8
> 
> 
> Bill.

I would really recommend switching to 19.1.1.
For Coffeelake I would upgrade the kernel too, 5.0 maybe?
Comment 6 Bill Grupp 2019-07-25 22:12:50 UTC
Created attachment 144870 [details]
Another card0/error

Captured the gpu hang a second time.
Comment 7 Bill Grupp 2019-07-25 22:17:40 UTC
(In reply to Lionel Landwerlin from comment #5)
> (In reply to Bill Grupp from comment #4)
> > I believe it's 18.2.8
> > 
> > 
> > Bill.
> 
> I would really recommend switching to 19.1.1.
> For Coffeelake I would upgrade the kernel too, 5.0 maybe?

We can work on trying to upgrade. Is there any specific issue that can be associated with the output from the error log?

This would help since we don't have a known way of reproducing the issue (other than waiting for it to happen again).

thanks,
Bill
Comment 8 Denis 2019-07-29 09:10:26 UTC
name of the game and approximate steps (game settings) also would be helpful. As I see, you have Ubuntu OS, so you can easily update kernel version with ukuu app.
Comment 9 Bill Grupp 2019-07-29 19:32:48 UTC
Hi Denis, 
This is an embedded system for an arcade style game. We don't know the exact steps to reproduce it. We have only seen it happen a few times in the past 5 weeks. We have only been able to capture the log twice. We think it might be related to displaying a large font glyph but only because that is what was on the screen at the time of the error. The error does not happen every time that screen is shown. It has displayed that same screen for several weeks without hitting the error.
We were hoping the decode of the error log would indicate something that would lead us to a better way to reproduce it.

Upgrading the OS is not quite as simple as running ukuu for our system. 

thanks,
Bill
Comment 10 GitLab Migration User 2019-09-25 20:33:57 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1821.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.