Bug 90561 - [SKL-Y] GPU hangs ecode 9:0:0x87cafff2/ecode 9:0:0x85dffffb while running benchmarks
Summary: [SKL-Y] GPU hangs ecode 9:0:0x87cafff2/ecode 9:0:0x85dffffb while running ben...
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: highest critical
Assignee: Ben Widawsky
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-21 13:28 UTC by valtteri.rantala
Modified: 2015-06-23 01:24 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg log (96.75 KB, text/plain)
2015-05-21 13:28 UTC, valtteri.rantala
Details
Error_state (2.88 MB, text/plain)
2015-05-21 13:29 UTC, valtteri.rantala
Details
error state and dmesg of gpu hang with halign_fix branch of mesa (407.97 KB, text/plain)
2015-05-28 13:56 UTC, valtteri.rantala
Details
dmesg with patch 51132 (1.24 MB, text/plain)
2015-06-17 13:27 UTC, valtteri.rantala
Details

Description valtteri.rantala 2015-05-21 13:28:37 UTC
Created attachment 115948 [details]
dmesg log

GPU hangs while running benchmark 
dmesg and error_state are attached.

Kernel hash that was used.
Comment 1 valtteri.rantala 2015-05-21 13:29:41 UTC
Created attachment 115949 [details]
Error_state
Comment 2 valtteri.rantala 2015-05-21 13:31:14 UTC
Kernel version that was used: drm-intel git://anongit.freedesktop.org/drm-intel origin/drm-intel-nightly 5ea91de4ff45adb60031853d64314c3405378fbd 2015-04-14_18-00-06 drm-intel-nightly: 2015y-04m-14d-17h-59m-22s UTC integration manifest
Comment 3 Mika Kuoppala 2015-05-21 13:36:01 UTC
It looks like 3DSTATE_CONSTANT_VS needs some special handling for skl.
(or that we need chicken bits set for legacy behaviour)
Comment 4 Ben Widawsky 2015-05-23 21:08:01 UTC
Can you try this branch? Something in there seems to fix terrain for me, but I don't know what.

http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=halign_fast
Comment 5 valtteri.rantala 2015-05-25 11:35:43 UTC
Marking this bug as invalid since updating SKL-Y firmware fixed the issue.
Comment 6 valtteri.rantala 2015-05-26 11:48:14 UTC
Reopened issue still exists.


Tested also Ben's branch

http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=halign_fast

That seem to help it a little
for Synmark multithread case it GPU hangs 1 out of 3

Later with the additional test causes a system hang.
Here are new dmesg.
[  167.712499] [drm] stuck on render ring
[  167.714151] [drm] GPU HANG: ecode 9:0:0x85dffffb, in synmark2 [3031], reason:
 Ring hung, action: reset
[  167.714157] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  167.714159] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  167.714161] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  167.714163] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  167.714165] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  167.716498] drm/i915: Resetting chip after gpu hang
[  168.182886] [drm] RC6 on
[  175.719839] [drm] stuck on render ring
[  175.721031] [drm] GPU HANG: ecode 9:0:0x87cafff2, in synmark2 [3032], reason: Ring hung, action: reset
[  175.723310] drm/i915: Resetting chip after gpu hang
[  176.189345] [drm] RC6 on
[  261.507979] [drm:fw_domains_get [i915]] *ERROR* blitter: timed out waiting for forcewake ack request.
[  262.510760] [drm:fw_domains_get [i915]] *ERROR* blitter: timed out waiting for forcewake ack request.
[  262.732373] [drm:fw_domains_get [i915]] *ERROR* media: timed out waiting for forcewake ack request.
Comment 7 Ander Conselvan de Oliveira 2015-05-26 12:43:45 UTC
(In reply to valtteri.rantala from comment #6)
> Later with the additional test causes a system hang.
> Here are new dmesg.
> [  167.712499] [drm] stuck on render ring
> [  167.714151] [drm] GPU HANG: ecode 9:0:0x85dffffb, in synmark2 [3031],
> reason:
>  Ring hung, action: reset
> [  167.714157] [drm] GPU hangs can indicate a bug anywhere in the entire gfx
> stack, including userspace.
> [  167.714159] [drm] Please file a _new_ bug report on bugs.freedesktop.org
> against DRI -> DRM/Intel
> [  167.714161] [drm] drm/i915 developers can then reassign to the right
> component if it's not a kernel issue.
> [  167.714163] [drm] The gpu crash dump is required to analyze gpu hangs, so
> please always attach it.
> [  167.714165] [drm] GPU crash dump saved to /sys/class/drm/card0/error
> [  167.716498] drm/i915: Resetting chip after gpu hang
> [  168.182886] [drm] RC6 on
> [  175.719839] [drm] stuck on render ring
> [  175.721031] [drm] GPU HANG: ecode 9:0:0x87cafff2, in synmark2 [3032],
> reason: Ring hung, action: reset
> [  175.723310] drm/i915: Resetting chip after gpu hang
> [  176.189345] [drm] RC6 on
> [  261.507979] [drm:fw_domains_get [i915]] *ERROR* blitter: timed out
> waiting for forcewake ack request.
> [  262.510760] [drm:fw_domains_get [i915]] *ERROR* blitter: timed out
> waiting for forcewake ack request.
> [  262.732373] [drm:fw_domains_get [i915]] *ERROR* media: timed out waiting
> for forcewake ack request.

This system hang looks like bug 89959.
Comment 8 valtteri.rantala 2015-05-26 14:42:21 UTC
Yes, the system hang part looks like bug 89959, but the first cpu hangs are from different test cases. 

In that run two test cases were executed 
Synmark multithread test case cpu hangs were introduced. 

System hang was introduced by GPUtest suites 3dplot test case and that looks like a the other bug.

But yes it seems to have same kind of errors both cpu hangs and system hang.
Comment 9 Ben Widawsky 2015-05-27 17:06:39 UTC
Valterri, it was hanging 100% of the time, and now its hanging 33% of the time? Could you provide the error state with my branch so we can check if it's the same?
Comment 10 Ben Widawsky 2015-05-27 17:14:57 UTC
Also, can you try my halign-fix branch?
Comment 11 valtteri.rantala 2015-05-28 13:56:51 UTC
Created attachment 116116 [details]
error state and dmesg of gpu hang with halign_fix branch of mesa

Attached dmesg and error state of halign_fix test runs. Still GPU hangs with 1/3 of propability.
Comment 12 Ben Widawsky 2015-06-12 17:06:04 UTC
I'll have the halign fixes upstream later today. Also there have been some other fixes for SKL in master. In addition, I posted patches for hangs (though I never saw them fix anything).

So if someone can try master (after about 8 hours from now) with these patches, I'd be very appreciative.
http://patchwork.freedesktop.org/patch/51132/
Comment 13 valtteri.rantala 2015-06-17 13:27:12 UTC
Created attachment 116558 [details]
dmesg with patch 51132

tested master with the http://patchwork.freedesktop.org/patch/51132/ patches and it got rid of GPU hangs. There are still rendering issues and system hangs but GPU hangs are are gone. I ran the test 10 times with no hangs when usually 3 has been enough to produce a hang.
Comment 14 valtteri.rantala 2015-06-22 13:23:26 UTC
Ran some more tests for the patch and it did not produce GPU hangs. Seems it Fixed the issue. Marking as resolved.
Comment 15 Jani Nikula 2015-06-22 13:28:11 UTC
(In reply to valtteri.rantala from comment #14)
> Ran some more tests for the patch and it did not produce GPU hangs. Seems it
> Fixed the issue. Marking as resolved.

Side note, I don't know what the Mesa folks prefer, but for kernel we always keep the bugs open until the fix has landed in the repository.
Comment 16 valtteri.rantala 2015-06-22 13:36:55 UTC
True, keeping it open until patch is upstreamed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.