Bug 97122

Summary: list of 12 dEQP-GLES2 tests causing systematic GPU lockups
Product: Mesa Reporter: Mauro Rossi <issor.oruam>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED WORKSFORME QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: List of tests inducing systematic GPU lockups
dmesg for uniform_iterations#conditional_continue_fragment FAIL
dmesg for uniform_iterations#double_continue_vertex FAIL
dmesg for uniform_iterations#double_continue_fragment FAIL

Description Mauro Rossi 2016-07-28 22:54:48 UTC
Created attachment 125386 [details]
List of tests inducing systematic GPU lockups

Hi,

while performing Android Compatibility Test Suite run on marshamallow-x86 with mesa 12.0.1 I encountered systematic GPU lockups preceeded by full screen artifact, sometime "noise screen alike" and some others "vertical stripes".

The issue was observed on HD7750 and HD7950.

The pattern and list of the 12 dEQP-GLES2 tests causing systematic GPU lockups is the following:

dEQP-GLES2.functional.shaders.loops.while_{constant,uniform,dynamic}.{conditional,double}_continue_{vertex,fragment}

You can find in the attachments the dEQP GLES2 tests logs and some dmesg logs captured during these GPU lockups.

I am available to collect further logs, if you could provide me instructions,
or to test patches and report back to help resolve the issue.

NOTE: I've already tried to define R600_DEBUG=nosb
which in android-x86 has to be done in init.x86.rc before building the iso image,
but then running a test plan with the aforementioned failed tests,
I get again systematic GPU lockups.

Mauro Rossi
Comment 1 Mauro Rossi 2016-07-28 22:57:53 UTC
Created attachment 125387 [details]
dmesg for uniform_iterations#conditional_continue_fragment FAIL
Comment 2 Mauro Rossi 2016-07-28 23:00:26 UTC
Created attachment 125388 [details]
dmesg for uniform_iterations#double_continue_vertex FAIL
Comment 3 Mauro Rossi 2016-07-28 23:01:14 UTC
Created attachment 125389 [details]
dmesg for uniform_iterations#double_continue_fragment FAIL
Comment 4 Nicolai Hähnle 2016-08-01 12:08:20 UTC
Hi Mauro, thanks for the report - sounds like a control flow lowering bug.

Note that the 'nosb' option will have no effect, it applies to r600 only.

Which version of LLVM are you using? glxinfo shows this. Does the lockup happen with LLVM trunk?
Comment 5 Mauro Rossi 2016-08-02 07:24:02 UTC
Hi,

llvm version used in android marshmallow-x86 is 3.7.0 (preceding the R600 to AMDGPU target renaming).
On android my options are currently limited to backporting latest patches on top llvm 3.7.0 or trying upgrade to latest llvm 3.7.x doing some AMDGPU target porting of android makefiles.

>'nosb' option will have no effect, it applies to r600 only

Are there other options that could help to workaround the issue
or for debugging purposes?

I'll check if some control flow lowering bug was reported,
in order to apply the relevant changes, do you know

In order to try latest llvm, I will move to linux test run,
by using chadversary fork and report about GLES2 results

Mauro
Comment 6 Mauro Rossi 2016-08-02 07:35:13 UTC
the missing part of a sentence:

> do you know

if similar GPU lockup bugs may be related and how?

Mauro
Comment 7 Nicolai Hähnle 2016-08-02 12:30:25 UTC
I'm sorry, but LLVM 3.7 is extremely old. I'd say you're mostly out of luck.

There have been huge changes to how control flow lowering works, and I'd say trying to cherry-pick individual fixes is basically a hopeless endeavour.
Comment 8 Nicolai Hähnle 2016-08-02 17:41:14 UTC
I have verified that the tests pass on current Mesa + LLVM master.

I would seriously recommend that you somehow upgrade the version of LLVM you use with the driver.
Comment 9 Mauro Rossi 2016-08-02 21:51:48 UTC
Hi Nicolai,

thanks a lot for your commitment and help.

Now I also learned how to run deqp on linux

I confirm that with the llvm 3.8.1 and later I see no GPU lockups.

Being oibaf ppa not affected, the resolution happened between 3.8.0 and 3.8.1
and has to be related to SI lowering, that's a lot of useful info for my problem.

Cheers!
Mauro


Test Sessions:
--------------

cmake . -DDEQP_TARGET=x11_egl
make
cd ./modules/gles2
./deqp-gles2 -n dEQP-GLES2.functional.shaders.loops.*


Test Session 1: padoka ppa (mesa 12.1 and llvm 4.0.0)
---------------------------------------------------

utente@utente-desktop:~$ glxinfo | grep Mesa
client glx vendor string: Mesa Project and SGI
OpenGL core profile version string: 4.2 (Core Profile) Mesa 12.1.0-devel - padoka PPA
OpenGL version string: 3.0 Mesa 12.1.0-devel - padoka PPA
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 12.1.0-devel - padoka PPA

utente@utente-desktop:~$ glxinfo | grep LLVM
    Device: AMD TAHITI (DRM 2.43.0 / 4.4.0-31-generic, LLVM 4.0.0) (0x679a)
OpenGL renderer string: Gallium 0.4 on AMD TAHITI (DRM 2.43.0 / 4.4.0-31-generic, LLVM 4.0.0)


Result: dEQP-GLES2.functional.shaders.loops.* 100% ok and no GPU lockups

DONE!

Test run totals:
  Passed:        624/624 (100.0%)
  Failed:        0/624 (0.0%)
  Not supported: 0/624 (0.0%)
  Warnings:      0/624 (0.0%)


Test Session 2: oibaf ppa (mesa 12.1 and llvm 3.8.1)
--------------------------------------------------

utente@utente-desktop:~$ glxinfo | grep Mesa
client glx vendor string: Mesa Project and SGI
OpenGL core profile version string: 4.1 (Core Profile) Mesa 12.1.0-devel
OpenGL version string: 3.0 Mesa 12.1.0-devel
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 12.1.0-devel
utente@utente-desktop:~$ glxinfo | grep LLVM
    Device: AMD TAHITI (DRM 2.43.0 / 4.4.0-31-generic, LLVM 3.8.1) (0x679a)
OpenGL renderer string: Gallium 0.4 on AMD TAHITI (DRM 2.43.0 / 4.4.0-31-generic, LLVM 3.8.1)

Result: dEQP-GLES2.functional.shaders.loops.* are 100% ok and no GPU lockups

DONE!

Test run totals:
  Passed:        624/624 (100.0%)
  Failed:        0/624 (0.0%)
  Not supported: 0/624 (0.0%)
  Warnings:      0/624 (0.0%)


Test Session 3: default Ubuntu 16.04 (mesa 11.2 and llvm 3.8.0)
-------------------------------------------------------------

utente@utente-desktop:~$ glxinfo | grep Mesa
client glx vendor string: Mesa Project and SGI
OpenGL core profile version string: 4.1 (Core Profile) Mesa 11.2.0
OpenGL version string: 3.0 Mesa 11.2.0
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 11.2.0

utente@utente-desktop:~$ glxinfo | grep LLVM
    Device: AMD TAHITI (DRM 2.43.0, LLVM 3.8.0) (0x679a)
OpenGL renderer string: Gallium 0.4 on AMD TAHITI (DRM 2.43.0, LLVM 3.8.0)


Result: Full screen artifacts and GPU lockup, but clearly llvm version related

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.