Summary: | [SKL-Y] GPU hangs ecode 9:0:0x87cafff2/ecode 9:0:0x85dffffb while running benchmarks | ||
---|---|---|---|
Product: | Mesa | Reporter: | valtteri.rantala |
Component: | Drivers/DRI/i965 | Assignee: | Ben Widawsky <ben> |
Status: | RESOLVED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | critical | ||
Priority: | highest | CC: | ben, eero.t.tamminen, intel-gfx-bugs, joe.konno |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
dmesg log
Error_state error state and dmesg of gpu hang with halign_fix branch of mesa dmesg with patch 51132 |
Created attachment 115949 [details]
Error_state
Kernel version that was used: drm-intel git://anongit.freedesktop.org/drm-intel origin/drm-intel-nightly 5ea91de4ff45adb60031853d64314c3405378fbd 2015-04-14_18-00-06 drm-intel-nightly: 2015y-04m-14d-17h-59m-22s UTC integration manifest It looks like 3DSTATE_CONSTANT_VS needs some special handling for skl. (or that we need chicken bits set for legacy behaviour) Can you try this branch? Something in there seems to fix terrain for me, but I don't know what. http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=halign_fast Marking this bug as invalid since updating SKL-Y firmware fixed the issue. Reopened issue still exists. Tested also Ben's branch http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=halign_fast That seem to help it a little for Synmark multithread case it GPU hangs 1 out of 3 Later with the additional test causes a system hang. Here are new dmesg. [ 167.712499] [drm] stuck on render ring [ 167.714151] [drm] GPU HANG: ecode 9:0:0x85dffffb, in synmark2 [3031], reason: Ring hung, action: reset [ 167.714157] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 167.714159] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 167.714161] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 167.714163] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 167.714165] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 167.716498] drm/i915: Resetting chip after gpu hang [ 168.182886] [drm] RC6 on [ 175.719839] [drm] stuck on render ring [ 175.721031] [drm] GPU HANG: ecode 9:0:0x87cafff2, in synmark2 [3032], reason: Ring hung, action: reset [ 175.723310] drm/i915: Resetting chip after gpu hang [ 176.189345] [drm] RC6 on [ 261.507979] [drm:fw_domains_get [i915]] *ERROR* blitter: timed out waiting for forcewake ack request. [ 262.510760] [drm:fw_domains_get [i915]] *ERROR* blitter: timed out waiting for forcewake ack request. [ 262.732373] [drm:fw_domains_get [i915]] *ERROR* media: timed out waiting for forcewake ack request. (In reply to valtteri.rantala from comment #6) > Later with the additional test causes a system hang. > Here are new dmesg. > [ 167.712499] [drm] stuck on render ring > [ 167.714151] [drm] GPU HANG: ecode 9:0:0x85dffffb, in synmark2 [3031], > reason: > Ring hung, action: reset > [ 167.714157] [drm] GPU hangs can indicate a bug anywhere in the entire gfx > stack, including userspace. > [ 167.714159] [drm] Please file a _new_ bug report on bugs.freedesktop.org > against DRI -> DRM/Intel > [ 167.714161] [drm] drm/i915 developers can then reassign to the right > component if it's not a kernel issue. > [ 167.714163] [drm] The gpu crash dump is required to analyze gpu hangs, so > please always attach it. > [ 167.714165] [drm] GPU crash dump saved to /sys/class/drm/card0/error > [ 167.716498] drm/i915: Resetting chip after gpu hang > [ 168.182886] [drm] RC6 on > [ 175.719839] [drm] stuck on render ring > [ 175.721031] [drm] GPU HANG: ecode 9:0:0x87cafff2, in synmark2 [3032], > reason: Ring hung, action: reset > [ 175.723310] drm/i915: Resetting chip after gpu hang > [ 176.189345] [drm] RC6 on > [ 261.507979] [drm:fw_domains_get [i915]] *ERROR* blitter: timed out > waiting for forcewake ack request. > [ 262.510760] [drm:fw_domains_get [i915]] *ERROR* blitter: timed out > waiting for forcewake ack request. > [ 262.732373] [drm:fw_domains_get [i915]] *ERROR* media: timed out waiting > for forcewake ack request. This system hang looks like bug 89959. Yes, the system hang part looks like bug 89959, but the first cpu hangs are from different test cases. In that run two test cases were executed Synmark multithread test case cpu hangs were introduced. System hang was introduced by GPUtest suites 3dplot test case and that looks like a the other bug. But yes it seems to have same kind of errors both cpu hangs and system hang. Valterri, it was hanging 100% of the time, and now its hanging 33% of the time? Could you provide the error state with my branch so we can check if it's the same? Also, can you try my halign-fix branch? Created attachment 116116 [details]
error state and dmesg of gpu hang with halign_fix branch of mesa
Attached dmesg and error state of halign_fix test runs. Still GPU hangs with 1/3 of propability.
I'll have the halign fixes upstream later today. Also there have been some other fixes for SKL in master. In addition, I posted patches for hangs (though I never saw them fix anything). So if someone can try master (after about 8 hours from now) with these patches, I'd be very appreciative. http://patchwork.freedesktop.org/patch/51132/ Created attachment 116558 [details] dmesg with patch 51132 tested master with the http://patchwork.freedesktop.org/patch/51132/ patches and it got rid of GPU hangs. There are still rendering issues and system hangs but GPU hangs are are gone. I ran the test 10 times with no hangs when usually 3 has been enough to produce a hang. Ran some more tests for the patch and it did not produce GPU hangs. Seems it Fixed the issue. Marking as resolved. (In reply to valtteri.rantala from comment #14) > Ran some more tests for the patch and it did not produce GPU hangs. Seems it > Fixed the issue. Marking as resolved. Side note, I don't know what the Mesa folks prefer, but for kernel we always keep the bugs open until the fix has landed in the repository. True, keeping it open until patch is upstreamed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 115948 [details] dmesg log GPU hangs while running benchmark dmesg and error_state are attached. Kernel hash that was used.