Bug 84557 - [HSW] Investigate why compacted control flow hangs the GPU
Summary: [HSW] Investigate why compacted control flow hangs the GPU
Status: RESOLVED WONTFIX
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Matt Turner
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-01 14:02 UTC by Eero Tamminen
Modified: 2016-08-30 19:14 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
screenshot of broken Heaven (213.33 KB, image/jpeg)
2014-10-01 14:04 UTC, Eero Tamminen
Details
screenshot of broken Heaven (104.94 KB, image/jpeg)
2014-10-02 07:18 UTC, Eero Tamminen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eero Tamminen 2014-10-01 14:02:48 UTC
TEST SETUP:
- HSW GT3e
- Ubuntu 14.04 Mesa from git

TEST CASE:
- Run Unigine Heaven 4.0, with following options:
        -project_name Heaven \
        -data_path ../ \
        -engine_config ../data/heaven_4.0.cfg \
        -system_script heaven/unigine.cpp \
        -sound_app null \
        -video_app opengl \
        -video_mode -1 \
        -video_width 1920 \
        -video_height 1080 \
        -video_fullscreen 1 \
        -video_multisample 0 \
        -extern_define ,BENCHMARK,RELEASE,LANGUAGE_EN,QUALITY_HIGH,TESSELLATION_DISABLED 


EXPECTED RESULTS:
- Normal looking screen, no hangs, test finishes succesfully


ACTUAL RESULT:
- When demo has loaded, instead of the normal colors, screen is blue/green (see attachment)
- Then, after few seconds Heaven fails to following error:
  intel_do_flush_locked failed: Input/output error
- dmesg contains "render ring hung inside bo" and "context hanging too fast" errors


I bisected this to following commit:

commit 54e30dbf4db437748509d1319c3f6e4185f76c69
Author:     Matt Turner <mattst88@gmail.com>
AuthorDate: Wed Aug 27 18:40:46 2014 -0700
Commit:     Matt Turner <mattst88@gmail.com>
CommitDate: Thu Sep 25 11:02:36 2014 -0700

    i965: Emit ELSE/ENDIF JIP with type D on Gen 7.
    
    The spec says the type must be W (JIP is 16-bits after all), but we've
    been emitting it with a UD type all along and have experienced no
    adverse effects. Changing the type to D allows ELSE and ENDIF
    instructions to be compacted.
    
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

Reverting that commit fixes the issue.
Comment 1 Eero Tamminen 2014-10-01 14:04:02 UTC
Created attachment 107192 [details]
screenshot of broken Heaven
Comment 2 Eero Tamminen 2014-10-02 07:18:14 UTC
Created attachment 107216 [details]
screenshot of broken Heaven
Comment 3 Eero Tamminen 2014-10-03 08:32:57 UTC
With latest (yesterday) version of Mesa, Heaven fails on BDW too.  Failures are more random, sometimes there's the same error as on HSW, sometimes Heaven segfaults.  Haven't tried whether reverting the patch fixes issue also on BDW.
Comment 4 Tapani Pälli 2014-10-03 10:26:31 UTC
reproduced on IVB with ultra and high quality levels (medium and low render fine), the rendering result look different but probably the same issue in question
Comment 5 Tapani Pälli 2014-10-03 11:06:29 UTC
Patch looks correct to me , else and endif are signed word integer per documentation. This change probably just causes something else. I noticed that if I disable opt_copy_propagate() I get almost correct rendering results and no hangs, there's still something wrong though.
Comment 6 Tapani Pälli 2014-10-03 11:34:57 UTC
(In reply to Tapani Pälli from comment #5)
> Patch looks correct to me , else and endif are signed word integer per
> documentation. This change probably just causes something else. I noticed
> that if I disable opt_copy_propagate() I get almost correct rendering
> results and no hangs, there's still something wrong though.

oops only now I noticed that it is not using word after all :) I don't know enough of the backend code but it seems a bit strange to me (should use w?)
Comment 7 Matt Turner 2014-10-03 16:57:05 UTC
Thanks for finding this! I'm not going to have time to investigate before (or probably during XDC) so I'm just going to revert this commit for now. I'll leave the bug open so I remember to investigate when I get back.
Comment 8 Matt Turner 2014-10-03 17:42:39 UTC
Reverted with 0d5c9bf1e46b2d4.
Comment 9 Eero Tamminen 2014-12-30 14:27:22 UTC
(In reply to Matt Turner from comment #7)
> Thanks for finding this! I'm not going to have time to investigate before
> (or probably during XDC) so I'm just going to revert this commit for now.
> I'll leave the bug open so I remember to investigate when I get back.

Have you had time to check this yet?  Or should the bug just be closed as revert fixed the issue?
Comment 10 Matt Turner 2014-12-30 17:25:35 UTC
I reproduced the problem when the bug was reported.

I'm planning to leave the bug open until I investigate further. I've retitled the bug.
Comment 11 Antia Puentes 2015-03-13 11:12:50 UTC
While we do not investigate this further and to be safe, I sent a patch to Mesa's mailing list to use W when emitting the JIP for instructions that were previously using UD:

http://lists.freedesktop.org/archives/mesa-dev/2015-March/079328.html

As far as I understand, we do not get any benefit from using UD and if we go against the HW specs, nothing guarantees us to not have undesired effects even when they have not been observed until now.

However, the patch does not change the type of JIP in instructions that were already emitting it with D, as I understand from the comments in this bug that we do get a benefit from that, that is to have compacted instructions.
Comment 12 Matt Turner 2016-08-30 19:14:55 UTC
I played with this a bit more, and initially thought I found the problem, but no such luck.


bug/show.html.tmpl processed on Mar 25, 2017 at 07:46:38.
(provided by the Example extension).