Bug 90835 - [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck semaphore on render ring
Summary: [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck semaphore on render ...
Status: CLOSED DUPLICATE of bug 54226
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-03 21:44 UTC by Martin Steigerwald
Modified: 2017-07-24 22:46 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
4.1-rc6 (58.95 KB, text/plain)
2015-06-03 21:44 UTC, Martin Steigerwald
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Steigerwald 2015-06-03 21:44:24 UTC
Created attachment 116272 [details]
4.1-rc6

With 4.1-rc6 I got this after playing PlaneShift for a while:

[ 7168.882024] [drm] GPU HANG: ecode 6:-1:0x00000000, reason: Kicking stuck semaphore on render ring, action: continue
[ 7168.882051] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 7168.882052] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 7168.882054] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 7168.882055] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 7168.882057] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 7201.882566] mce: [Hardware Error]: Machine check events logged
[ 7204.891713] [drm] GPU HANG: ecode 6:-1:0x00000000, reason: Kicking stuck semaphore on render ring, action: continue
[ 7676.967832] [drm] GPU HANG: ecode 6:-1:0x00000000, reason: Kicking stuck semaphore on render ring, action: continue
[ 7708.969124] [drm] GPU HANG: ecode 6:-1:0x00000000, reason: Kicking stuck semaphore on render ring, action: continue
[ 7919.003315] [drm] GPU HANG: ecode 6:-1:0x00000000, reason: Kicking stuck semaphore on render ring, action: continue
[ 8111.034712] [drm] GPU HANG: ecode 6:-1:0x00000000, reason: Kicking stuck semaphore on render ring, action: continue
[ 8193.048111] [drm] GPU HANG: ecode 6:-1:0x00000000, reason: Kicking stuck semaphore on render ring, action: continue
[ 8329.070245] [drm] GPU HANG: ecode 6:-1:0x00000000, reason: Kicking stuck semaphore on render ring, action: continue


The machine, a ThinkPad T520, was overheating during playing PlaneShift, but the hangs happened afterwards.

I will attach the dump file.

GPU hangs now happen from time to time after I stopped playing PlaneShift.

I thought I never saw this before, but I can see it in logs back to 24th of May.

So, I think 4.1-rc4 which I installed at 19th of May is also affected.

Bug #89524 GPU HANG: ecode 6:-1:0x00000000, reason: Kicking stuck wait on render ring, action: continue

may be related, but I think I didn´t see it before kernel 4.1, so this may have a different cause. Feel free to merge bug reports if you think its the same issue.
Comment 1 Martin Steigerwald 2015-06-03 21:46:36 UTC
As for the overheating issue, that is I think partly aged cooling system and partly inability to throttle CPU usage while GPU intense application is running:

[Bug 97261] New: Intel P-State driver does not honor no_turbo  
https://bugzilla.kernel.org/show_bug.cgi?id=97261
Comment 2 Chris Wilson 2015-06-04 05:46:09 UTC

*** This bug has been marked as a duplicate of bug 54226 ***
Comment 3 Martin Steigerwald 2015-06-04 09:10:24 UTC
Chris, its really that for me 4.1-rc6 makes things much *worse*. I am typing this after a clean reboot and already got the GPU hang again. It happens about every few minutes. Are you really sure this is the same GPU hang as in bug 54226? I didn´t have this before 4.1 kernel?

Also I note that I still use

        Option          "AccelMethod"   "uxa"

and I have

martin@merkaba:~> cat /etc/modprobe.d/i915-kms.conf 
options i915 modeset=1 i915_enable_rc6=7

thus maximum energy saving. But according to powertop it never enters the highed sleep state anyway.

I will remove the AccelMethod setting now and see whether it helps. If not, I downgrade to 4.1-rc4 for now, as issues have been at least much less frequent with it.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.