- HSW GT3e
- Ubuntu 14.04 Mesa from git
- Run Unigine Heaven 4.0, with following options:
-project_name Heaven \
-data_path ../ \
-engine_config ../data/heaven_4.0.cfg \
-system_script heaven/unigine.cpp \
-sound_app null \
-video_app opengl \
-video_mode -1 \
-video_width 1920 \
-video_height 1080 \
-video_fullscreen 1 \
-video_multisample 0 \
- Normal looking screen, no hangs, test finishes succesfully
- When demo has loaded, instead of the normal colors, screen is blue/green (see attachment)
- Then, after few seconds Heaven fails to following error:
intel_do_flush_locked failed: Input/output error
- dmesg contains "render ring hung inside bo" and "context hanging too fast" errors
I bisected this to following commit:
Author: Matt Turner <email@example.com>
AuthorDate: Wed Aug 27 18:40:46 2014 -0700
Commit: Matt Turner <firstname.lastname@example.org>
CommitDate: Thu Sep 25 11:02:36 2014 -0700
i965: Emit ELSE/ENDIF JIP with type D on Gen 7.
The spec says the type must be W (JIP is 16-bits after all), but we've
been emitting it with a UD type all along and have experienced no
adverse effects. Changing the type to D allows ELSE and ENDIF
instructions to be compacted.
Reviewed-by: Kenneth Graunke <email@example.com>
Reviewed-by: Jason Ekstrand <firstname.lastname@example.org>
Reverting that commit fixes the issue.
Created attachment 107192 [details]
screenshot of broken Heaven
Created attachment 107216 [details]
screenshot of broken Heaven
With latest (yesterday) version of Mesa, Heaven fails on BDW too. Failures are more random, sometimes there's the same error as on HSW, sometimes Heaven segfaults. Haven't tried whether reverting the patch fixes issue also on BDW.
reproduced on IVB with ultra and high quality levels (medium and low render fine), the rendering result look different but probably the same issue in question
Patch looks correct to me , else and endif are signed word integer per documentation. This change probably just causes something else. I noticed that if I disable opt_copy_propagate() I get almost correct rendering results and no hangs, there's still something wrong though.
(In reply to Tapani Pälli from comment #5)
> Patch looks correct to me , else and endif are signed word integer per
> documentation. This change probably just causes something else. I noticed
> that if I disable opt_copy_propagate() I get almost correct rendering
> results and no hangs, there's still something wrong though.
oops only now I noticed that it is not using word after all :) I don't know enough of the backend code but it seems a bit strange to me (should use w?)
Thanks for finding this! I'm not going to have time to investigate before (or probably during XDC) so I'm just going to revert this commit for now. I'll leave the bug open so I remember to investigate when I get back.
Reverted with 0d5c9bf1e46b2d4.
(In reply to Matt Turner from comment #7)
> Thanks for finding this! I'm not going to have time to investigate before
> (or probably during XDC) so I'm just going to revert this commit for now.
> I'll leave the bug open so I remember to investigate when I get back.
Have you had time to check this yet? Or should the bug just be closed as revert fixed the issue?
I reproduced the problem when the bug was reported.
I'm planning to leave the bug open until I investigate further. I've retitled the bug.
While we do not investigate this further and to be safe, I sent a patch to Mesa's mailing list to use W when emitting the JIP for instructions that were previously using UD:
As far as I understand, we do not get any benefit from using UD and if we go against the HW specs, nothing guarantees us to not have undesired effects even when they have not been observed until now.
However, the patch does not change the type of JIP in instructions that were already emitting it with D, as I understand from the comments in this bug that we do get a benefit from that, that is to have compacted instructions.
I played with this a bit more, and initially thought I found the problem, but no such luck.