Bug 106022 - [snb] GPU HANG: ecode 6:0:0x85fffffc, in neverball
Summary: [snb] GPU HANG: ecode 6:0:0x85fffffc, in neverball
Status: RESOLVED WORKSFORME
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 18.0
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-13 05:24 UTC by Alexei
Modified: 2018-09-20 12:22 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
/sys/class/drm/card0/error (85.34 KB, text/plain)
2018-04-13 05:24 UTC, Alexei
Details
/sys/class/drm/card0/error with mesa 17.3.7 (84.99 KB, text/plain)
2018-04-14 19:28 UTC, Alexei
Details
/sys/class/drm/card0/error, mesa 18.0.0-3, openarena.x86_64 (59.64 KB, text/plain)
2018-04-15 16:09 UTC, Alexei
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexei 2018-04-13 05:24:46 UTC
Created attachment 138817 [details]
/sys/class/drm/card0/error

Been getting this exact lockup in atleast a couple of games on 2 different intel graphics based machines. Both are pretty generic Arch installations.
Lockup lasted for a minute, and machine became responsive again. 

uname -r: 4.15.15-1-ARCH

dmidecode:
# dmidecode 3.1
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
	Manufacturer: Shuttle Inc.
	Product Name: FH61V
	Version: 1.0
	Serial Number: 4015
	Asset Tag: To be filled by O.E.M.
	Features:
		Board is a hosting board
		Board is replaceable
	Location In Chassis: To be filled by O.E.M.
	Chassis Handle: 0x0003
	Type: Motherboard
	Contained Object Handles: 0

My mesa version seems to be 18.0.0-2 (for the Arch package), and output of glxinfo shows:
OpenGL core profile version string: 3.3 (Core Profile) Mesa 18.0.0
Comment 1 Elizabeth 2018-04-13 14:18:14 UTC
Hello Alexei, What desktop/wm are you using? Do this game worked properly with a previous mesa version?
Comment 2 Alexei 2018-04-13 16:15:20 UTC
(In reply to Elizabeth from comment #1)
> Hello Alexei, What desktop/wm are you using? Do this game worked properly
> with a previous mesa version?

Hi Elizabeth.
I run xfce4 on machine where the crash report originated from. This is quite recent Arch install, so the oldest mesa version that I had on this machine was 17.3.5. I can try and test it with that, not sure how much back I need to go.
Comment 3 Alexei 2018-04-14 19:28:05 UTC
Created attachment 138839 [details]
/sys/class/drm/card0/error with mesa 17.3.7

I've restored mesa 17.3.7 (Arch package: mesa-17.3.7-1) which I had before last system upgrade and lockup still happened.
Comment 4 Lionel Landwerlin 2018-04-14 21:07:47 UTC
There are unknown instructions in the content of one of the BCS batchbuffer (in both error states) :

--- ring buffer (bcs0) at 0x00000000 001f6000
0x001f6000:  0x11000001:  MI_LOAD_REGISTER_IMM                                                            
0x001f6000:  0x11000001 : Dword 0
    DWord Length: 1
    Byte Write Disables: 0
0x001f6004:  0x00002044 : Dword 1
    Register Offset: 0x00002044
0x001f6008:  0x0057fe00 : Dword 2
    Data DWord: 5766656
0x001f600c:  0x11000001:  MI_LOAD_REGISTER_IMM                                                            
0x001f600c:  0x11000001 : Dword 0
    DWord Length: 1
    Byte Write Disables: 0
0x001f6010:  0x00012040 : Dword 1
    Register Offset: 0x00012040
0x001f6014:  0x0057fe00 : Dword 2
    Data DWord: 5766656
0x001f6018:  0x10800001:  MI_STORE_DATA_INDEX                                                             
0x001f6018:  0x10800001 : Dword 0
    DWord Length: 1
0x001f601c:  0x000000c0 : Dword 1
    Offset: 48
0x001f6020:  0x0057fe00 : Dword 2
    Data DWord 0: 5766656
0x001f6024:  0x01000000:  MI_USER_INTERRUPT                                                               
0x001f6024:  0x01000000 : Dword 0
0x001f6028: unknown instruction 13244001
0x001f6034:  0x00000000:  MI_NOOP                                                                         
0x001f6034:  0x00000000 : Dword 0
    Identification Number: 0
    Identification Number Register Write Enable: false
0x001f6038:  0x18800100:  MI_BATCH_BUFFER_START                                                           
0x001f6038:  0x18800100 : Dword 0
    DWord Length: 0
    Address Space Indicator: 1 (PPGTT)
    Clear Command Buffer Enable: false
0x001f603c:  0x16a2a000 : Dword 1
    Batch Buffer Start Address: 0x16a2a000

The other has 2 :

--- ring buffer (bcs0) at 0x00000000 001f6000
0x001f6000: unknown instruction 13204001
0x001f600c:  0x00000000:  MI_NOOP                                                                         
0x001f600c:  0x00000000 : Dword 0
    Identification Number: 0
    Identification Number Register Write Enable: false
0x001f6010:  0x11000001:  MI_LOAD_REGISTER_IMM                                                            
0x001f6010:  0x11000001 : Dword 0
    DWord Length: 1
    Byte Write Disables: 0
0x001f6014:  0x00002044 : Dword 1
    Register Offset: 0x00002044
0x001f6018:  0x0001dd78 : Dword 2
    Data DWord: 122232
0x001f601c:  0x11000001:  MI_LOAD_REGISTER_IMM                                                            
0x001f601c:  0x11000001 : Dword 0
    DWord Length: 1
    Byte Write Disables: 0
0x001f6020:  0x00012040 : Dword 1
    Register Offset: 0x00012040
0x001f6024:  0x0001dd78 : Dword 2
    Data DWord: 122232
0x001f6028:  0x10800001:  MI_STORE_DATA_INDEX                                                             
0x001f6028:  0x10800001 : Dword 0
    DWord Length: 1
0x001f602c:  0x000000c0 : Dword 1
    Offset: 48
0x001f6030:  0x0001dd78 : Dword 2
    Data DWord 0: 122232
0x001f6034:  0x01000000:  MI_USER_INTERRUPT                                                               
0x001f6034:  0x01000000 : Dword 0
0x001f6038:  0x0b140001:  MI_SEMAPHORE_MBOX                                                               
0x001f6038:  0x0b140001 : Dword 0
    DWord Length: 1
    Register Select: 0 (RVSYNC)
0x001f603c:  0x0011cfd6 : Dword 1
    Semaphore Data Dword: 1167318
0x001f6044:  0x00000000:  MI_NOOP                                                                         
0x001f6044:  0x00000000 : Dword 0
    Identification Number: 0
    Identification Number Register Write Enable: false
0x001f6048: unknown instruction 13244001
0x001f6054:  0x00000000:  MI_NOOP                                                                         
0x001f6054:  0x00000000 : Dword 0
    Identification Number: 0
    Identification Number Register Write Enable: false
0x001f6058:  0x18800100:  MI_BATCH_BUFFER_START                                                           
0x001f6058:  0x18800100 : Dword 0
    DWord Length: 0
    Address Space Indicator: 1 (PPGTT)
    Clear Command Buffer Enable: false
0x001f605c:  0x00040000 : Dword 1
    Batch Buffer Start Address: 0x00040000

I can't figure what instruction that is... It seems to be a synchronization batch emitted by the kernel.
Could you tell us what version of the kernel you were running before/after your upgrade?

Thanks!
Comment 5 Alexei 2018-04-15 08:27:53 UTC
Right now I'm running 4.15.15 (Arch package linux 4.15.15-1, uname -r gives "4.15.15-1-ARCH").

My previous kernel was 4.15.12-1 (was installed the same day mesa 17.3.7 was), and before that, it was 4.15.5-1.
Comment 6 Alexei 2018-04-15 12:41:07 UTC
I apologize, I seem to have glanced over quite a few of most likely important details in the initial description of my problem.
First of all, my hardware is:
* Shuttle XH61V PC
* Intel(R) Celeron(R) CPU G540 @ 2.50GHz
* 2 1920x1080 displays connected through DVI and HDMI

I have those versions of packages installed right now:
* linux 4.15.15-1
* mesa 18.0.0-3
* xf86-video-intel 1:2.99.917+823+gd9bf46e4-1
* chromium 65.0.3325.181-7
* neverball 1.6.0-3

I usually play games with something playing in browser (Chromium) in background. If I try to run Neverball along with Chromium playing some youtube video, I get periodic lockups. I can't exactly recall if I experienced any lockups when Chromium doesn't display video or is not running at all (I *think* I did, rarely).

Neverball has V-Sync enabled. If I disable V-Sync in the game, it still locks up, although quite less frequently - I only encountered a single lockup within 30 minutes time period.

Although unsurprisingly, both the game and video run much smoother without each other, it's just the lockup that I think is abnormal.
Comment 7 Alexei 2018-04-15 16:09:16 UTC
Created attachment 138849 [details]
/sys/class/drm/card0/error, mesa 18.0.0-3, openarena.x86_64

Yet another lockup, this time in OpenArena installed through AUR package 'openarena'.
Mostly under same conditions, except that I removed xf86-video-intel package while I was testing, and Chromium had only one blank page opened - there were no videos to play.
Comment 8 Marina Chernish 2018-07-24 10:21:12 UTC
GPU hang was caught on mentioned mesa-18.0.0 on Openarena. Two monitors were connected having resolutions 1920*1080. Also Chromium playing clips via youtube.com was used.
Also a lot of lockups were observed on both Openarena and Neverball.
Apitrace wasn't helpful in this case since issue wasn't reproduced in Apitrace playback.

The latest version of mesa: 18.2.0 doesn't show all these issues: no GPU hangs were observed on Openarena. Also playing is much more smoother and no heavy lockups were observed. Neverball shows only rare slowdowns due to level complexity.

Following configuration was used:
Openarena v 0.8.8-15
Neverball v 1.6.0-3
OS: Linux 4.13.0-45-generic #50~16.04.1-Ubuntu x86_64
CPU: Intel(R) i5-2520M CPU@ 2.50GHz
Device: Mesa DRI Intel(R) Sandybridge Mobile
Mesa 18.0.0; 18.2.0

Alexei, for me issue looks fixed. Could you please check it also from your side?
Comment 9 Alexei 2018-07-24 14:07:12 UTC
Hello.
Hopefully, I'll get to test it on my Sandy Bridge i5 machine somewhere around the weekend.
My Shuttle XH61V, from which I reported the most glitches, has since been upgraded with an Ivy Bridge i5 and been working fine ever since.

(If anything, in the near future I might actually get another, most likely identically configured Sandy Bridge Celeron-bearing Shuttle machine to test on to be sure)
Comment 10 Marina Chernish 2018-08-06 10:01:54 UTC
Hi Alexei,

Are there some updates regarding this issue? Have you had a chance to try the newest version of mesa on the mentioned configuration?
Comment 11 Alexei 2018-08-06 15:02:05 UTC
Hi Marina,
Just yesterday I've got a hold of a similar Shuttle machine with Celeron G540 CPU and a bit more RAM and checked that on mesa 18.1.5-1 problem still persists. 

I'll get back to you with whatever result I'll get after I'll build and try the latest 18.3.0 version that is available at the moment in AUR by the means of mesa-git package.
Comment 12 Alexei 2018-08-07 15:29:38 UTC
Update.
I installed 18.3.0 version of mesa from mesa-git repository and I wasn't able to make Celeron G540 to lock up under the same conditions anymore.
Works for me, it seems.
Comment 13 Denis 2018-09-20 12:22:46 UTC
as reporter confirmed that couldn't reproduce issue too, closing it. Please reopen in case if find out that issue is still actual


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.