Bug 101967 - Hang on render ring (since updating to openSUSE 42.2 -> 42.3)
Summary: Hang on render ring (since updating to openSUSE 42.2 -> 42.3)
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-29 05:36 UTC by kolAflash
Modified: 2017-08-18 19:27 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
/sys/class/drm/card0/error from 2017-07-29T05:57:59 (1.62 MB, text/plain)
2017-07-29 05:36 UTC, kolAflash
no flags Details
/sys/class/drm/card0/error from 2017-07-29T06:55:44 (936.73 KB, text/plain)
2017-07-29 05:36 UTC, kolAflash
no flags Details
/sys/class/drm/card0/error from 2017-07-29T08:20:17 (936.73 KB, text/plain)
2017-07-29 07:42 UTC, kolAflash
no flags Details
/sys/class/drm/card0/error from 2017-07-29T08:20:17 (944.08 KB, text/plain)
2017-07-29 15:30 UTC, kolAflash
no flags Details

Description kolAflash 2017-07-29 05:36:06 UTC
Created attachment 133116 [details]
/sys/class/drm/card0/error from 2017-07-29T05:57:59

Hi!

Since updating from openSUSE-Leap 42.2 to openSUSE-Leap 42.3 I get GPU hangs.

Those are logs from two crashes and I attached the corresponding /sys/class/drm/card0/error files. The running applications where plasmashell (KDE 5 desktop process, using 3D acceleration) and SC2_x64.exe (StarCraft Game by blizzard, running with minimal graphics settings on wine-staging-2.13 in Windows-7 mode - you can use the free "Starter Edition" for testing). Especially with SC2_x64.exe the bug appears after at least 10 minutes. Both programs where running fine on openSUSE-42.2. Probably I can trigger the bug in other 3D applications too, but I didn't test until now.

If needed, I'll happily provide more samples or other kind of information.

= 2017-07-29T05:57:59 =
2017-07-29T06:07:06.836214+02:00 gaston kernel: [    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.4.76-1-default root=/dev/mapper/system-root resume=/dev/system/swap splash=silent quiet showopts drm.vblankoffdelay=1 i915.enable_rc6=7 i915.lvds_downclock=1
[...]
2017-07-29T05:57:59.215290+02:00 gaston kernel: [ 7100.480380] [drm] GPU HANG: ecode 6:0:0x4bc4cf65, in plasmashell [6944], reason: Hang on render ring, action: reset
[...]
2017-07-29T05:57:59.215326+02:00 gaston kernel: [ 7100.480384] [drm] GPU crash dump saved to /sys/class/drm/card0/error
2017-07-29T05:57:59.215327+02:00 gaston kernel: [ 7100.480416] drm/i915: Resetting chip after gpu hang
2017-07-29T05:58:09.247282+02:00 gaston kernel: [ 7110.510054] drm/i915: Resetting chip after gpu hang

= 2017-07-29T06:55:44 =
2017-07-29T06:38:37.063964+02:00 gaston kernel: [    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.4.76-1-default root=/dev/mapper/system-root resume=/dev/system/swap splash=silent quiet showopts
[...]
2017-07-29T06:55:44.098258+02:00 gaston kernel: [ 1063.824977] [drm] GPU HANG: ecode 6:0:0x55555555, in SC2_x64.exe [6643], reason: Hang on render ring, action: reset
[...]
2017-07-29T06:55:44.098281+02:00 gaston kernel: [ 1063.824982] [drm] GPU crash dump saved to /sys/class/drm/card0/error
2017-07-29T06:55:44.098282+02:00 gaston kernel: [ 1063.825023] drm/i915: Resetting chip after gpu hang
2017-07-29T06:55:56.122251+02:00 gaston kernel: [ 1075.845444] drm/i915: Resetting chip after gpu hang
Comment 1 kolAflash 2017-07-29 05:36:53 UTC
Created attachment 133118 [details]
/sys/class/drm/card0/error from 2017-07-29T06:55:44
Comment 2 kolAflash 2017-07-29 05:51:40 UTC
Supplement: Everything on the X-Server becomes really slow after the bug appeared. The SC2_x64.exe application even totally crashed. But on the other hand I can still access the notebook via SSH from another pc and on the SSH shell everything runs fine and smooth.

Hardware:
Lenovo ThinkPad X220
  TYPE: 4291-36G
  PRODUCT ID: 429136G
CPU+GPU: Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz
RAM: 8 GB
OS: openSUSE-Leap 42.3 (x86_64)
X-Server: 7.6 by openSUSE 42.3
Mesa3D: 17.0.5 by openSUSE 42.3

Maybe related:
- https://bugs.freedesktop.org/show_bug.cgi?id=101959
- https://bugzilla.opensuse.org/show_bug.cgi?id=1051060
- https://bugzilla.opensuse.org/show_bug.cgi?id=1050256
Comment 3 kolAflash 2017-07-29 06:53:37 UTC
Looks like the openSUSE people already found how to fix this.
Deleting (just to be sure the correct module is being used)
/lib/modules/4.4.76-1-default/kernel/drivers/gpu/drm/
and installing drm-kmp-default-4.9.33_k4.4.76_1-5.1.x86_64.rpm seems to fix the problem for me.
http://download.opensuse.org/repositories/openSUSE:/Maintenance:/7039/openSUSE_Leap_42.3_Update/x86_64/drm-kmp-default-4.9.33_k4.4.76_1-5.1.x86_64.rpm

https://bugzilla.opensuse.org/show_bug.cgi?id=1051060#c15

https://build.opensuse.org/project/show/openSUSE:Maintenance:7039

http://download.opensuse.org/repositories/openSUSE:/Maintenance:/7039/openSUSE_Leap_42.3_Update/src/
Comment 4 kolAflash 2017-07-29 07:42:01 UTC
Created attachment 133120 [details]
/sys/class/drm/card0/error from 2017-07-29T08:20:17

Looks like drm-kmp-default-4.9.33_k4.4.76_1-5.1.x86_64.rpm fixed only half of the bug.

Now the the system just hangs for a moment (about 10 sec.), but then it continues running normally. And the SC2_x64.exe isn't totally crashed any more.

2017-07-29T08:20:17.993239+02:00 gaston kernel: [    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.4.76-1-default root=/dev/mapper/system-root resume=/dev/system/swap splash=silent quiet showopts drm.vblankoffdelay=1 i915.enable_rc6=7 i915.lvds_downclock=1
[...]
2017-07-29T09:24:34.803016+02:00 gaston kernel: [ 3891.933914] [drm] GPU HANG: ecode 6:0:0x85fffffc, in SC2_x64.exe [7167], reason: Hang on render ring, action: reset
[...]
2017-07-29T09:24:34.803057+02:00 gaston kernel: [ 3891.933917] [drm] GPU crash dump saved to /sys/class/drm/card0/error
2017-07-29T09:24:34.803059+02:00 gaston kernel: [ 3891.933951] drm/i915: Resetting chip after gpu hang

Interestingly the last line ("drm/i915: Resetting chip after gpu hang") comes only once. Before installing drm-kmp-default-4.9.33_k4.4.76_1-5.1.x86_64.rpm that line repeated without end about every 10 seconds.

A guess:
The drm-kmp-default-4.9.33_k4.4.76_1-5.1.x86_64.rpm didn't fixed the bug itself but it makes the gpu recover correctly when resetting.
Comment 5 Chris Wilson 2017-07-29 09:46:00 UTC
(In reply to kolAflash from comment #4)
> Created attachment 133120 [details]
> /sys/class/drm/card0/error from 2017-07-29T08:20:17
> 
> Looks like drm-kmp-default-4.9.33_k4.4.76_1-5.1.x86_64.rpm fixed only half
> of the bug.
> 
> Now the the system just hangs for a moment (about 10 sec.), but then it
> continues running normally. And the SC2_x64.exe isn't totally crashed any
> more.
> 
> 2017-07-29T08:20:17.993239+02:00 gaston kernel: [    0.000000] Kernel
> command line: BOOT_IMAGE=/vmlinuz-4.4.76-1-default
> root=/dev/mapper/system-root resume=/dev/system/swap splash=silent quiet
> showopts drm.vblankoffdelay=1 i915.enable_rc6=7 i915.lvds_downclock=1
> [...]
> 2017-07-29T09:24:34.803016+02:00 gaston kernel: [ 3891.933914] [drm] GPU
> HANG: ecode 6:0:0x85fffffc, in SC2_x64.exe [7167], reason: Hang on render
> ring, action: reset
> [...]
> 2017-07-29T09:24:34.803057+02:00 gaston kernel: [ 3891.933917] [drm] GPU
> crash dump saved to /sys/class/drm/card0/error
> 2017-07-29T09:24:34.803059+02:00 gaston kernel: [ 3891.933951] drm/i915:
> Resetting chip after gpu hang
> 
> Interestingly the last line ("drm/i915: Resetting chip after gpu hang")
> comes only once. Before installing
> drm-kmp-default-4.9.33_k4.4.76_1-5.1.x86_64.rpm that line repeated without
> end about every 10 seconds.
> 
> A guess:
> The drm-kmp-default-4.9.33_k4.4.76_1-5.1.x86_64.rpm didn't fixed the bug
> itself but it makes the gpu recover correctly when resetting.

You've confused two different issues. The first GPU hang was the loss of context state on mempressure or suspend. Your second GPU hang is from StarCraft running on mesa. Update mesa and if that doesn't resolve the issue file a bug against mesa:drivers/i965.
Comment 6 kolAflash 2017-07-29 15:30:11 UTC
Created attachment 133126 [details]
/sys/class/drm/card0/error from 2017-07-29T08:20:17

(In reply to Chris Wilson from comment #5)
> [...]
> You've confused two different issues. The first GPU hang was the loss of
> context state on mempressure or suspend. Your second GPU hang is from
> StarCraft running on mesa.

What tells you that?

dmesg always said "reason: Hang on render ring, action: reset", so I thought it's the same bug!?
Comment 7 Elizabeth 2017-08-18 19:27:30 UTC
(In reply to kolAflash from comment #6)
> Created attachment 133126 [details]
> /sys/class/drm/card0/error from 2017-07-29T08:20:17
> 
> (In reply to Chris Wilson from comment #5)
> > [...]
> > You've confused two different issues. The first GPU hang was the loss of
> > context state on mempressure or suspend. Your second GPU hang is from
> > StarCraft running on mesa.
> 
> What tells you that?
> 
> dmesg always said "reason: Hang on render ring, action: reset", so I thought
> it's the same bug!?
GPU HANG: ecode 6:0:0x85fffffc, in SC2_x64.exe [7167]
GPU HANG: ecode 6:0:0x55555555, in SC2_x64.exe [6643]
GPU HANG: ecode 6:0:0x4bc4cf65, in plasmashell [6944]
Even when the hang is on render ring, the causes are all different as ecode shows.
Comment 8 Elizabeth 2017-08-18 19:27:46 UTC
(In reply to kolAflash from comment #6)
> Created attachment 133126 [details]
> /sys/class/drm/card0/error from 2017-07-29T08:20:17
> 
> (In reply to Chris Wilson from comment #5)
> > [...]
> > You've confused two different issues. The first GPU hang was the loss of
> > context state on mempressure or suspend. Your second GPU hang is from
> > StarCraft running on mesa.
> 
> What tells you that?
> 
> dmesg always said "reason: Hang on render ring, action: reset", so I thought
> it's the same bug!?
GPU HANG: ecode 6:0:0x85fffffc, in SC2_x64.exe [7167]
GPU HANG: ecode 6:0:0x55555555, in SC2_x64.exe [6643]
GPU HANG: ecode 6:0:0x4bc4cf65, in plasmashell [6944]
Even when the hang is on render ring, the causes are all different as ecode shows.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.