Bug 83416

Summary: [radeonsi] Serious Sam 3 lockup during its start
Product: Mesa Reporter: Laurent carlier <lordheavym>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: medium    
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: kernel.log file with kernel 3.17rc3
ouput of 'R600_DEBUG=ps,vs glretrace Sam3.trace'
segfault
Fix suggested by Vadim

Description Laurent carlier 2014-09-02 21:15:14 UTC
Created attachment 105638 [details]
kernel.log file with kernel 3.17rc3

* Tested with both kernel 3.16.1 and kernel 3.17rc3, with and without hyperz
* OpenGL renderer string: Gallium 0.4 on AMD PITCAIRN
* OpenGL core profile version string: 3.3 (Core Profile) Mesa 10.4.0-devel (git-021e84f)

I can reproduce the lockup with the trace:
http://pkgbuild.com/~lcarlier/trace/Sam3.tar.xz
Comment 1 smoki 2014-09-02 23:45:41 UTC
 Can not reproduce it on Kabini, with same git version 021e84f.

 That with mesa builded against current llvm-3.6 svn just pass fine, and when i build mesa against 3.5 this this apitrace just segfault... in both cases no lockup.

 Debian.
Comment 2 Michel Dänzer 2014-09-03 09:05:41 UTC
I get no lockup either, but I do see the same GPUVM protection faults:

 radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0FF00819

The FF bits make me suspect bits 32-4x of the GPUVM address are getting clobbered, maybe because of the LLVM backend generating invalid shader code.
Comment 3 Laurent carlier 2014-09-03 10:09:26 UTC
Created attachment 105674 [details]
ouput of 'R600_DEBUG=ps,vs glretrace Sam3.trace'

LLVM is 3.6svn r216889
Comment 4 Laurent carlier 2014-09-03 10:12:04 UTC
Link to the trace in google drive:
https://drive.google.com/file/d/0B1WCo3k21FK3dTZmaFFmU2wwQzQ/edit?usp=sharing
Comment 5 smoki 2014-09-03 10:25:05 UTC
(In reply to comment #2)
> I get no lockup either, but I do see the same GPUVM protection faults:
> 
>  radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0FF00819
> 
> The FF bits make me suspect bits 32-4x of the GPUVM address are getting
> clobbered, maybe because of the LLVM backend generating invalid shader code.

 For me nothing new in dmesg, but there is something very interesting here happen. When radeonsi.so is striped this trace segfault for me, if not striped it pass fine no segfault, what that can be? Hmm...
Comment 6 Laurent carlier 2014-09-03 10:28:24 UTC
Just to note that this trace is produced with apitrace 5.0 and with the following commandline:
GALLIUM_HUD=num-bytes-moved apitrace32 trace %command%
Comment 7 smoki 2014-09-03 10:53:03 UTC
Created attachment 105676 [details]
segfault

(In reply to comment #5)
>  For me nothing new in dmesg, but there is something very interesting here
> happen. When radeonsi.so is striped this trace segfault for me, if not
> striped it pass fine no segfault, what that can be? Hmm...

 After restart it works but segfault again, wwird... this one tried on a pure 32bit OS.
Comment 8 smoki 2014-09-03 11:35:44 UTC
 @Laurent carlier

 Is this new issue or regressions maybe?

 Don't have SSAM3 game, but i remember from earlier versions that Serios Sam have bunch of different settings, maybe you can try some different settings started with Low or something, maybe only some of settings triggers the issue, etc.
Comment 9 smoki 2014-09-03 12:21:50 UTC
 Try also some stable mesas if you can 10.2 or 10.3, i have very strange issues with 32bit mesa and apps, particulary build system in current git seems very broken for me. Make install, SSE41 macro compile needs much more CPU time, striping does not work fine, default optimization level is not good -O3 fixes it, etc.
Comment 10 Laurent carlier 2014-09-03 13:24:21 UTC
Just tried with mesa-10.2.6/llvm-3.4.2 and the trace works fine except the following from LLVM:
LLVM ERROR: ran out of registers during register allocation

Here are the flags used:
CPPFLAGS="-D_FORTIFY_SOURCE=2"
CFLAGS="-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong --param=ssp-buffer-size=4"
CXXFLAGS="-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong --param=ssp-buffer-size=4"
LDFLAGS="-Wl,-O1,--sort-common,--as-needed,-z,relro"
DEBUG_CFLAGS="-g -fvar-tracking-assignments"
DEBUG_CXXFLAGS="-g -fvar-tracking-assignments"
Comment 11 Vadim Girlin 2014-09-03 14:18:18 UTC
(In reply to comment #2)
> I get no lockup either, but I do see the same GPUVM protection faults:
> 
>  radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0FF00819
> 
> The FF bits make me suspect bits 32-4x of the GPUVM address are getting
> clobbered, maybe because of the LLVM backend generating invalid shader code.

I've found similar bug with incorrect high part of the address and the problem was that llvm backend uses S_ADD/SUB_I32 for lowering 64-bit integer add/sub, but it should use _U32 versions instead. I was going to send the patch but the fix is trivial, basically just replace all uses of S_ADD/SUB_I32 with S_ADD/SUB_U32. I'm not sure if you are hitting the same issue though.
Comment 12 Tom Stellard 2014-09-03 22:54:12 UTC
Created attachment 105709 [details] [review]
Fix suggested by Vadim

Can you try this patch?
Comment 13 Michel Dänzer 2014-09-04 01:50:51 UTC
(In reply to comment #12)
> Can you try this patch?

The patch fixes the GPUVM faults for me while replaying the apitrace.
Comment 14 Laurent carlier 2014-09-04 04:31:01 UTC
(In reply to comment #12)
> Created attachment 105709 [details] [review] [review]
> Fix suggested by Vadim
> 
> Can you try this patch?

It doesn't fix the lockup for me. I've tested mesa-git with llvm 3.4.3 both the trace and the game, and they failled both with the following error:

LLVM ERROR: Cannot select: 0x1671def0: i32 = truncate 0x16716ff4 [ORD=21] [ID=121]
  0x16716ff4: i128 = srl 0x1671cb14, 0x16717198 [ORD=21] [ID=102]
    0x1671cb14: i128,ch = load 0x166a9484, 0x167123bc, 0x16712e20<LD16[%32](tbaa=!"const")> [ORD=21] [ID=90]
      0x167123bc: i64,ch = CopyFromReg 0x166a9484, 0x16712330 [ID=81]
        0x16712330: i64 = Register %vreg66 [ID=2]
      0x16712e20: i64 = undef [ID=8]
    0x16717198: i32 = Constant<96> [ID=76]
In function: main
Comment 15 Laurent carlier 2014-09-04 15:21:53 UTC
I can confirm that 8bd67231797e5d79d72a4e91b37ea81da30c6df3 is fixing the hang.

Thanks Marek, closing!
Comment 16 Laurent carlier 2014-09-04 15:42:24 UTC
Bad luck, it's hanging again! -> reopened
Comment 17 Grigori Goronzy 2014-09-04 16:06:55 UTC
Does this Mesa patch help?

https://bugs.freedesktop.org/attachment.cgi?id=105755
Comment 18 Laurent carlier 2014-09-04 16:26:58 UTC
(In reply to comment #17)
> Does this Mesa patch help?
> 
> https://bugs.freedesktop.org/attachment.cgi?id=105755

No, it doesn't help
Comment 19 Laurent carlier 2014-11-18 15:44:20 UTC
Fixed with current mesa trunk, so closing

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.