Bug 97122 - list of 12 dEQP-GLES2 tests causing systematic GPU lockups
Summary: list of 12 dEQP-GLES2 tests causing systematic GPU lockups
Status: RESOLVED WORKSFORME
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-28 22:54 UTC by Mauro Rossi
Modified: 2016-08-02 21:51 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
List of tests inducing systematic GPU lockups (3.05 KB, text/plain)
2016-07-28 22:54 UTC, Mauro Rossi
Details
dmesg for uniform_iterations#conditional_continue_fragment FAIL (69.01 KB, text/plain)
2016-07-28 22:57 UTC, Mauro Rossi
Details
dmesg for uniform_iterations#double_continue_vertex FAIL (70.07 KB, text/plain)
2016-07-28 23:00 UTC, Mauro Rossi
Details
dmesg for uniform_iterations#double_continue_fragment FAIL (67.08 KB, text/plain)
2016-07-28 23:01 UTC, Mauro Rossi
Details

Description Mauro Rossi 2016-07-28 22:54:48 UTC
Created attachment 125386 [details]
List of tests inducing systematic GPU lockups

Hi,

while performing Android Compatibility Test Suite run on marshamallow-x86 with mesa 12.0.1 I encountered systematic GPU lockups preceeded by full screen artifact, sometime "noise screen alike" and some others "vertical stripes".

The issue was observed on HD7750 and HD7950.

The pattern and list of the 12 dEQP-GLES2 tests causing systematic GPU lockups is the following:

dEQP-GLES2.functional.shaders.loops.while_{constant,uniform,dynamic}.{conditional,double}_continue_{vertex,fragment}

You can find in the attachments the dEQP GLES2 tests logs and some dmesg logs captured during these GPU lockups.

I am available to collect further logs, if you could provide me instructions,
or to test patches and report back to help resolve the issue.

NOTE: I've already tried to define R600_DEBUG=nosb
which in android-x86 has to be done in init.x86.rc before building the iso image,
but then running a test plan with the aforementioned failed tests,
I get again systematic GPU lockups.

Mauro Rossi
Comment 1 Mauro Rossi 2016-07-28 22:57:53 UTC
Created attachment 125387 [details]
dmesg for uniform_iterations#conditional_continue_fragment FAIL
Comment 2 Mauro Rossi 2016-07-28 23:00:26 UTC
Created attachment 125388 [details]
dmesg for uniform_iterations#double_continue_vertex FAIL
Comment 3 Mauro Rossi 2016-07-28 23:01:14 UTC
Created attachment 125389 [details]
dmesg for uniform_iterations#double_continue_fragment FAIL
Comment 4 Nicolai Hähnle 2016-08-01 12:08:20 UTC
Hi Mauro, thanks for the report - sounds like a control flow lowering bug.

Note that the 'nosb' option will have no effect, it applies to r600 only.

Which version of LLVM are you using? glxinfo shows this. Does the lockup happen with LLVM trunk?
Comment 5 Mauro Rossi 2016-08-02 07:24:02 UTC
Hi,

llvm version used in android marshmallow-x86 is 3.7.0 (preceding the R600 to AMDGPU target renaming).
On android my options are currently limited to backporting latest patches on top llvm 3.7.0 or trying upgrade to latest llvm 3.7.x doing some AMDGPU target porting of android makefiles.

>'nosb' option will have no effect, it applies to r600 only

Are there other options that could help to workaround the issue
or for debugging purposes?

I'll check if some control flow lowering bug was reported,
in order to apply the relevant changes, do you know

In order to try latest llvm, I will move to linux test run,
by using chadversary fork and report about GLES2 results

Mauro
Comment 6 Mauro Rossi 2016-08-02 07:35:13 UTC
the missing part of a sentence:

> do you know

if similar GPU lockup bugs may be related and how?

Mauro
Comment 7 Nicolai Hähnle 2016-08-02 12:30:25 UTC
I'm sorry, but LLVM 3.7 is extremely old. I'd say you're mostly out of luck.

There have been huge changes to how control flow lowering works, and I'd say trying to cherry-pick individual fixes is basically a hopeless endeavour.
Comment 8 Nicolai Hähnle 2016-08-02 17:41:14 UTC
I have verified that the tests pass on current Mesa + LLVM master.

I would seriously recommend that you somehow upgrade the version of LLVM you use with the driver.
Comment 9 Mauro Rossi 2016-08-02 21:51:48 UTC
Hi Nicolai,

thanks a lot for your commitment and help.

Now I also learned how to run deqp on linux

I confirm that with the llvm 3.8.1 and later I see no GPU lockups.

Being oibaf ppa not affected, the resolution happened between 3.8.0 and 3.8.1
and has to be related to SI lowering, that's a lot of useful info for my problem.

Cheers!
Mauro


Test Sessions:
--------------

cmake . -DDEQP_TARGET=x11_egl
make
cd ./modules/gles2
./deqp-gles2 -n dEQP-GLES2.functional.shaders.loops.*


Test Session 1: padoka ppa (mesa 12.1 and llvm 4.0.0)
---------------------------------------------------

utente@utente-desktop:~$ glxinfo | grep Mesa
client glx vendor string: Mesa Project and SGI
OpenGL core profile version string: 4.2 (Core Profile) Mesa 12.1.0-devel - padoka PPA
OpenGL version string: 3.0 Mesa 12.1.0-devel - padoka PPA
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 12.1.0-devel - padoka PPA

utente@utente-desktop:~$ glxinfo | grep LLVM
    Device: AMD TAHITI (DRM 2.43.0 / 4.4.0-31-generic, LLVM 4.0.0) (0x679a)
OpenGL renderer string: Gallium 0.4 on AMD TAHITI (DRM 2.43.0 / 4.4.0-31-generic, LLVM 4.0.0)


Result: dEQP-GLES2.functional.shaders.loops.* 100% ok and no GPU lockups

DONE!

Test run totals:
  Passed:        624/624 (100.0%)
  Failed:        0/624 (0.0%)
  Not supported: 0/624 (0.0%)
  Warnings:      0/624 (0.0%)


Test Session 2: oibaf ppa (mesa 12.1 and llvm 3.8.1)
--------------------------------------------------

utente@utente-desktop:~$ glxinfo | grep Mesa
client glx vendor string: Mesa Project and SGI
OpenGL core profile version string: 4.1 (Core Profile) Mesa 12.1.0-devel
OpenGL version string: 3.0 Mesa 12.1.0-devel
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 12.1.0-devel
utente@utente-desktop:~$ glxinfo | grep LLVM
    Device: AMD TAHITI (DRM 2.43.0 / 4.4.0-31-generic, LLVM 3.8.1) (0x679a)
OpenGL renderer string: Gallium 0.4 on AMD TAHITI (DRM 2.43.0 / 4.4.0-31-generic, LLVM 3.8.1)

Result: dEQP-GLES2.functional.shaders.loops.* are 100% ok and no GPU lockups

DONE!

Test run totals:
  Passed:        624/624 (100.0%)
  Failed:        0/624 (0.0%)
  Not supported: 0/624 (0.0%)
  Warnings:      0/624 (0.0%)


Test Session 3: default Ubuntu 16.04 (mesa 11.2 and llvm 3.8.0)
-------------------------------------------------------------

utente@utente-desktop:~$ glxinfo | grep Mesa
client glx vendor string: Mesa Project and SGI
OpenGL core profile version string: 4.1 (Core Profile) Mesa 11.2.0
OpenGL version string: 3.0 Mesa 11.2.0
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 11.2.0

utente@utente-desktop:~$ glxinfo | grep LLVM
    Device: AMD TAHITI (DRM 2.43.0, LLVM 3.8.0) (0x679a)
OpenGL renderer string: Gallium 0.4 on AMD TAHITI (DRM 2.43.0, LLVM 3.8.0)


Result: Full screen artifacts and GPU lockup, but clearly llvm version related


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.