Bug 97083

Summary: [IVB,BYT] GPU hang on deqp-gles31.functional.separate.shader.random
Product: Mesa Reporter: Mark Janes <mark.a.janes>
Component: Drivers/DRI/i965Assignee: Kenneth Graunke <kenneth>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium    
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Mark Janes 2016-07-25 23:21:41 UTC
Bisected to:
1eef0b73aa323d94d5a080cd1efa81ccacdbd0d2
Author:     Kenneth Graunke <kenneth@whitecape.org>
i965: Rewrite FS input handling to use the new NIR intrinsics.

To reproduce, I ran deqp gles31 through piglit:

MESA_GLES_VERSION_OVERRIDE=3.1 PIGLIT_DEQP_GLES31_EXTRA_ARGS="--deqp-log-images=disable --deqp-gl-config-name=rgba8888d24s8 --deqp-surface-width=400 --deqp-surface-height=300 --deqp-visibility=hidden" PIGLIT_DEQP_GLES31_BIN=/tmp/build_root/m64/opt/deqp/modules/gles31/deqp-gles31  /tmp/build_root/m64/bin/piglit run -p gbm -c --include-tests deqp-gles31.functional.separate.shader.random deqp_gles31 /tmp/build_root/m64/test/ivb

GPU hang should reproduce in the first few test cases run.
Comment 1 Kenneth Graunke 2016-07-29 23:09:33 UTC
This should be fixed by:

commit ebdc82d06532f992aea592265c29a11330e698fa
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Tue Jul 26 13:19:46 2016 -0700

    i965: Fix move_interpolation_to_top() pass.
    
    ...

    Papers over GPU hangs on Ivybridge and Baytrail caused by the
    recent NIR FS input rework by restoring the old behavior.
    (I'm not honestly sure why they hang with PLN not at the top.)

However...it's probably worth trying to understand why they hang with move_interpolation_to_top disabled.
Comment 2 Kenneth Graunke 2016-08-02 04:42:52 UTC
Curro figured this one out.  The problem was the NoDDClear/NoDDCheck flags on the PLN instructions.  Normally, we do a (-f0.0) PLN with NoDDClear (for helper pixels) and a (+f0.0) PLN with NoDDCheck.  This works fine at the top level, because at least one pixel won't be a helper invocation, so the (-f0.0) PLN with NoDDClear will actually execute.

But with my (broken) rework, we started emitting PLNs inside control flow.  If only helper pixels take a certain control flow path, the (-f0.0) PLN with NoDDClear might not ever execute, leaving the register dependency set, so the next instruction that tries to touch that register would hang.

So, move to top is a real solution to this problem, not just papering over it.
I've also sent a patch to the mailing list that eliminates the PLN dependency hints, which will allow us to move PLNs into control flow safely (if we want to):
https://lists.freedesktop.org/archives/mesa-dev/2016-August/124946.html

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.