Bug 97806

Summary: GPU lockup with mesa-git and llvm-svn with rx 470 on Unigine Heaven and TombRaider 2013
Product: Mesa Reporter: Laurent carlier <lordheavym>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: blocker    
Priority: medium CC: 0xe2.0x9a.0x9b, vedran
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: output of R600_DEBUG=ps,gs,vs,cs,tcs,tes ./heaven with tesselation extreme -> lockup
output of R600_DEBUG=ps,gs,vs,cs,tcs,tes ./heaven with tesselation extreme and llvm-3.9.0 -> good
GALLIUM_DDEBUG="pipelined 1000" ./heaven

Description Laurent carlier 2016-09-14 14:52:14 UTC
Just swithed from a R9 380 to a RX 470, from the first tests i have a GPU lockup on the first frames  with Unigine Heaven and Tom Raider 2013 benchmark.

* mesa-git a1e49b / llvm-svn trunk 281305
* AMD RX 470 nitro+ OC 8gB - AMD POLARIS10 (DRM 3.2.0 / 4.7.3-2-ARCH, LLVM 4.0.0) (0x67df)
* latest linux-firmware

It doesn't lockup with mesa-12.0.2 with llvm-3.8.1
I've also tried mesa-git with llvm-3.8.1 and it doesn't lockup too.
Comment 1 Laurent carlier 2016-09-14 14:54:58 UTC
Oops it's with kernel 4.7.3
Comment 2 Nicolai Hähnle 2016-09-14 16:23:17 UTC
LLVM 3.8 is already fairly old for graphics stack standards. Can you try with current trunk?
Comment 3 Laurent carlier 2016-09-14 16:39:05 UTC
(In reply to Nicolai Hähnle from comment #2)
> LLVM 3.8 is already fairly old for graphics stack standards. Can you try
> with current trunk?

Already done :) and i have a gpu lockup with mesa-git trunk with llvm-svn trunk. It's working with mesa-git trunk with llvm-3.8.1
Comment 4 Laurent carlier 2016-09-14 20:34:11 UTC
(In reply to Nicolai Hähnle from comment #2)
> LLVM 3.8 is already fairly old for graphics stack standards. Can you try
> with current trunk?

No GPU lockup with llvm-3.9.0/mesa-git
Comment 5 Michel Dänzer 2016-09-15 01:22:17 UTC
The obvious next step would be bisecting LLVM, but be warned that it can be a little painful, as you may have to find different Mesa Git snapshots to build and work with different LLVM Git snapshots.
Comment 6 Laurent carlier 2016-09-19 10:07:58 UTC
Bisecting gives me:
c220fde748d8b296c46498b37753c494a57e2ee9 is the first bad commit
commit c220fde748d8b296c46498b37753c494a57e2ee9
Author: Valery Pykhtin <Valery.Pykhtin@amd.com>
Date:   Sat Sep 10 13:09:16 2016 +0000

    [AMDGPU] Refactor MUBUF/MTBUF instructions
    
    Differential revision: https://reviews.llvm.org/D24295
    
    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281137 91177308-0d34-0410-b5e6-96231b3b80d8

:040000 040000 836a392130fd7b35b72ad5a681278a6b5f271258 07e868b807f3eed07c1cc7dd7c766d630f0f9480 M      lib

Bisected from llvm-git mirror

$ git bisect log                                                                                                                                                                      
git bisect start                                                                                                                                                                                                  
# good: [5337a14483af37562e99cd0bb35f1aa06b9ab7ec] Bump the trunk version to 4.0.0svn.                                                                                                                            
git bisect good 5337a14483af37562e99cd0bb35f1aa06b9ab7ec                                                                                                                                                          
# bad: [f31e66357099bc37f69d251b016fe70ac0b595d3] [PM] Port CFGViewer and CFGPrinter to the new Pass Manager Differential Revision: https://reviews.llvm.org/D24592                                               
git bisect bad f31e66357099bc37f69d251b016fe70ac0b595d3                                                                                                                                                           
# good: [12dc55cb60365c7a75a868e3ce468396ceaf4986] [Hexagon] Improve test to check for @PCREL, only run llc, not opt -> llc.                                                                                      
git bisect good 12dc55cb60365c7a75a868e3ce468396ceaf4986                                                                                                                                                          
# good: [605a81a85c4c0426b4a59a3a4afeda020eb92316] AMDGPU: Relax SGPR asm constraint register class
git bisect good 605a81a85c4c0426b4a59a3a4afeda020eb92316
# good: [6d2157da81ba33f283ec7fa6c428bf63594a5715] [AA] Fix typo in comment (s/hase/has).
git bisect good 6d2157da81ba33f283ec7fa6c428bf63594a5715
# bad: [0aa0c7d910cfa845790645e9d124e4162f9fc44b] Revert "[ARM] Promote small global constants to constant pools"
git bisect bad 0aa0c7d910cfa845790645e9d124e4162f9fc44b
# good: [1f4e68f079c8fa2a39912c50e3fa984133d7a770] [cmake] Export gtest/gtest_main and its dependencies via a special build tree only cmake exports file.
git bisect good 1f4e68f079c8fa2a39912c50e3fa984133d7a770
# bad: [d3eebe7daf2b913d5b1b79ce523c090c64bd4212] [AVX-512] Add test cases to demonstrate opportunities for commuting vpternlog. Commuting will be added in a future commit.
git bisect bad d3eebe7daf2b913d5b1b79ce523c090c64bd4212
# good: [da28e63a7469154c6888e0c70b1cad54cd7668bf] AMDGPU: Fix scheduling info for spill pseudos
git bisect good da28e63a7469154c6888e0c70b1cad54cd7668bf
# bad: [df708a504e06b5769348b09240abcfa8b7aa3a28] Add an isSwiftError predicate to Value
git bisect bad df708a504e06b5769348b09240abcfa8b7aa3a28
# bad: [c220fde748d8b296c46498b37753c494a57e2ee9] [AMDGPU] Refactor MUBUF/MTBUF instructions
git bisect bad c220fde748d8b296c46498b37753c494a57e2ee9
# good: [b7ef2005d10c48e878a1b19b504e1386f6e9e7cd] [WebAssembly] Fix typos in comments
git bisect good b7ef2005d10c48e878a1b19b504e1386f6e9e7cd
# good: [68d7a57aba91c4ebb4f83caf1bf539d16b317c90] [gold/LTO] Add test case for r281134
git bisect good 68d7a57aba91c4ebb4f83caf1bf539d16b317c90
# first bad commit: [c220fde748d8b296c46498b37753c494a57e2ee9] [AMDGPU] Refactor MUBUF/MTBUF instructions
Comment 7 Laurent carlier 2016-09-20 15:51:21 UTC
I have the gpu lockup with Unigine Heaven only when tesselation is enabled regardless of the level
Comment 8 Tom Stellard 2016-09-21 14:22:03 UTC
Can you post good/bad shader logs: R600_DEBUG=ps,gs,vs,cs,tcs,tes
Comment 9 Laurent carlier 2016-09-21 16:33:26 UTC
Created attachment 126709 [details]
output of R600_DEBUG=ps,gs,vs,cs,tcs,tes ./heaven with tesselation extreme -> lockup
Comment 10 Laurent carlier 2016-09-21 16:58:52 UTC
Created attachment 126712 [details]
output of R600_DEBUG=ps,gs,vs,cs,tcs,tes ./heaven with tesselation extreme and llvm-3.9.0 -> good
Comment 11 Nicolai Hähnle 2016-09-21 17:25:08 UTC
Could you please also run the hanging setup with GALLIUM_DDEBUG="pipelined 1000"? That should produce a file in ~/ddebug_dumps/ which will allow isolating the problematic shader.
Comment 12 Laurent carlier 2016-09-21 18:18:52 UTC
Created attachment 126713 [details]
GALLIUM_DDEBUG="pipelined 1000" ./heaven
Comment 13 Laurent carlier 2016-09-24 11:35:32 UTC
Fixed with llvm commit:
Author: vpykhtin
Date: Fri Sep 23 16:21:21 2016
New Revision: 282296

URL: http://llvm.org/viewvc/llvm-project?rev=282296&view=rev
Log:
[AMDGPU] Fix for bz30427: wrong MTBUF encoding on VI

Differential revision: https://reviews.llvm.org/D24875

--
llvm bug report https://llvm.org/bugs/show_bug.cgi?id=30427

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.