Bug 88183

Summary: radeonsi: R9 280X hangs with SuperTuxKart
Product: DRI Reporter: Alexandre Demers <alexandre.f.demers>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: lee295012
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Alexandre Demers 2015-01-07 23:55:18 UTC
In SuperTuxKart, upon loading the first track of the story mode, the display freezes. The GPU resets but when it comes back everything is messed up and it keeps resetting continuously.

I'm using latest kernel 3.19-rc3 with the "drm/radeon: fix VM flush..." patches (also tested without it), latest mesa from git, latest drm from git.

I'll see a journald dump outputs something interesting.
Comment 1 Alexandre Demers 2015-01-08 01:46:02 UTC
Nothing interesting in there.
Comment 2 Michel Dänzer 2015-01-08 02:37:07 UTC
Can you create an apitrace which reproduces the problem?
Comment 3 Alexandre Demers 2015-01-08 04:54:08 UTC
(In reply to Michel Dänzer from comment #2)
> Can you create an apitrace which reproduces the problem?

I tried, but it was not conclusive. I'll give it another try tomorrow.
Comment 4 Alexandre Demers 2015-01-09 01:51:39 UTC
I have a trace available, but it's 170MB. Do you have a suggestion on where I should upload it?
Comment 5 Alexandre Demers 2015-01-10 00:14:07 UTC
Crashing trace: https://drive.google.com/file/d/0Bw_tZdWsNa4BeDN2c3VRZ014aW8/view?usp=sharing
Comment 6 Michel Dänzer 2015-01-15 08:19:27 UTC
I can reproduce the hang with current Mesa Git master, but not with the Debian 10.3.2 packages. Can you confirm that and if so, can you bisect?

Even with 10.3.2 though, there are GPUVM faults, looks like the CB writing past the end of DXT5 SRGBA textures:

VM start=0x262F0000  end=0x26346800 | Texture 512x512x1, 10 levels, 1 samples, dxt5_srgba
[...]
Jan 15 16:52:47 kaveri kernel: [  208.982506] radeon 0000:00:01.0: GPU fault detected: 146 0x09450014
Jan 15 16:52:47 kaveri kernel: [  208.982511] radeon 0000:00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0002634A
Jan 15 16:52:47 kaveri kernel: [  208.982513] radeon 0000:00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014
Jan 15 16:52:47 kaveri kernel: [  208.982514] VM fault (0x04, vmid 2) at page 156490, write from 'CB0' (0x43423000) (0)
Jan 15 16:52:47 kaveri kernel: [  208.982519] radeon 0000:00:01.0: GPU fault detected: 146 0x09050014
Jan 15 16:52:47 kaveri kernel: [  208.982520] radeon 0000:00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00026347
Jan 15 16:52:47 kaveri kernel: [  208.982521] radeon 0000:00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014
Jan 15 16:52:47 kaveri kernel: [  208.982522] VM fault (0x04, vmid 2) at page 156487, write from 'CB0' (0x43423000) (0)
Comment 7 smoki 2015-01-15 23:03:23 UTC
 Game works fine for me if i disable texture compression in options.

 BTW, they have disabled compression for any intel driver, that might mean this is not only radeonsi driver issue:

 https://github.com/supertuxkart/stk-code/blob/master/data/graphical_restrictions.xml
Comment 8 smoki 2015-01-16 00:04:07 UTC
 Compiled latest supertuxkart git, nothing good.

 Blah, even tried it on Windows now and there too it lockup driver randomly... they really needs to fix their new alpha engine.
Comment 9 Alexandre Demers 2015-01-16 00:32:20 UTC
(In reply to smoki from comment #7)
>  Game works fine for me if i disable texture compression in options.
> 
>  BTW, they have disabled compression for any intel driver, that might mean
> this is not only radeonsi driver issue:
> 
>  https://github.com/supertuxkart/stk-code/blob/master/data/
> graphical_restrictions.xml

Which GPU are you using?

Disabling texture compression doesn't solve the bug.
Comment 10 Alexandre Demers 2015-01-16 00:33:43 UTC
(In reply to Michel Dänzer from comment #6)
> I can reproduce the hang with current Mesa Git master, but not with the
> Debian 10.3.2 packages. Can you confirm that and if so, can you bisect?
> 
> Even with 10.3.2 though, there are GPUVM faults, looks like the CB writing
> past the end of DXT5 SRGBA textures:
> 
> VM start=0x262F0000  end=0x26346800 | Texture 512x512x1, 10 levels, 1
> samples, dxt5_srgba
> [...]
> Jan 15 16:52:47 kaveri kernel: [  208.982506] radeon 0000:00:01.0: GPU fault
> detected: 146 0x09450014
> Jan 15 16:52:47 kaveri kernel: [  208.982511] radeon 0000:00:01.0:  
> VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0002634A
> Jan 15 16:52:47 kaveri kernel: [  208.982513] radeon 0000:00:01.0:  
> VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014
> Jan 15 16:52:47 kaveri kernel: [  208.982514] VM fault (0x04, vmid 2) at
> page 156490, write from 'CB0' (0x43423000) (0)
> Jan 15 16:52:47 kaveri kernel: [  208.982519] radeon 0000:00:01.0: GPU fault
> detected: 146 0x09050014
> Jan 15 16:52:47 kaveri kernel: [  208.982520] radeon 0000:00:01.0:  
> VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00026347
> Jan 15 16:52:47 kaveri kernel: [  208.982521] radeon 0000:00:01.0:  
> VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014
> Jan 15 16:52:47 kaveri kernel: [  208.982522] VM fault (0x04, vmid 2) at
> page 156487, write from 'CB0' (0x43423000) (0)

I'll try 10.3.2 and launch a bisection, but I need to downgrade my newly updated llvm back to 3.5 first.
Comment 11 smoki 2015-01-16 00:57:02 UTC
(In reply to Alexandre Demers from comment #9)
> 
> Which GPU are you using?
> 
> Disabling texture compression doesn't solve the bug.

 Low end Kabini. Yeah texture compression disable, solve it for 0.8.2-beta release, but not for current game git i tried now, so yeah bug is there.

 Not sure it is driver bug, as game is really full of bugs and lockuped driver easely even on Windows for me, it lockup there even on very minimum settings... but somehow randomly, practically on any settings.

 Man to man said, game is now real shit... sorry to say that i don't know better words to describe this :) I only understend it uses new forked engine and game developers needs to fix some driver incompatibilities.

 I also read thir forums and issues on github, there are planty of unsolved isuess.
Comment 12 Alexandre Demers 2015-01-16 01:01:28 UTC
(In reply to Alexandre Demers from comment #10)
> (In reply to Michel Dänzer from comment #6)
> > I can reproduce the hang with current Mesa Git master, but not with the
> > Debian 10.3.2 packages. Can you confirm that and if so, can you bisect?
> > 
> > Even with 10.3.2 though, there are GPUVM faults, looks like the CB writing
> > past the end of DXT5 SRGBA textures:
> > 
> > VM start=0x262F0000  end=0x26346800 | Texture 512x512x1, 10 levels, 1
> > samples, dxt5_srgba
> > [...]
> > Jan 15 16:52:47 kaveri kernel: [  208.982506] radeon 0000:00:01.0: GPU fault
> > detected: 146 0x09450014
> > Jan 15 16:52:47 kaveri kernel: [  208.982511] radeon 0000:00:01.0:  
> > VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0002634A
> > Jan 15 16:52:47 kaveri kernel: [  208.982513] radeon 0000:00:01.0:  
> > VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014
> > Jan 15 16:52:47 kaveri kernel: [  208.982514] VM fault (0x04, vmid 2) at
> > page 156490, write from 'CB0' (0x43423000) (0)
> > Jan 15 16:52:47 kaveri kernel: [  208.982519] radeon 0000:00:01.0: GPU fault
> > detected: 146 0x09050014
> > Jan 15 16:52:47 kaveri kernel: [  208.982520] radeon 0000:00:01.0:  
> > VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00026347
> > Jan 15 16:52:47 kaveri kernel: [  208.982521] radeon 0000:00:01.0:  
> > VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014
> > Jan 15 16:52:47 kaveri kernel: [  208.982522] VM fault (0x04, vmid 2) at
> > page 156487, write from 'CB0' (0x43423000) (0)
> 
> I'll try 10.3.2 and launch a bisection, but I need to downgrade my newly
> updated llvm back to 3.5 first.

I'm confirming that 10.3.2 works fine.
Comment 13 smoki 2015-01-16 02:47:01 UTC
 I tried mesa 10.2.9, 10.3.0, 10.3.2, 10.3.7, 10.4.2 and 10.5-devel. with current game git all lockup GPU.

 Not to mention Windows 7 32bit and 64bit there are also GPU lockups with 0.8.2-beta release.
Comment 14 Alexandre Demers 2015-01-16 03:18:02 UTC
(In reply to smoki from comment #13)
>  I tried mesa 10.2.9, 10.3.0, 10.3.2, 10.3.7, 10.4.2 and 10.5-devel. with
> current game git all lockup GPU.
> 
>  Not to mention Windows 7 32bit and 64bit there are also GPU lockups with
> 0.8.2-beta release.

IMO, an application should never be able to lock a GPU. But things are as they are.

I haven't updated STK since I did my apitrace. For me, that version is not crashing with 10.3.2 (and I'm bisecting, we will see where this ends).
Comment 15 Alexandre Demers 2015-01-16 03:23:00 UTC
(In reply to Alexandre Demers from comment #12)
> (In reply to Alexandre Demers from comment #10)
> > (In reply to Michel Dänzer from comment #6)
> > > I can reproduce the hang with current Mesa Git master, but not with the
> > > Debian 10.3.2 packages. Can you confirm that and if so, can you bisect?
> > > 
> > > Even with 10.3.2 though, there are GPUVM faults, looks like the CB writing
> > > past the end of DXT5 SRGBA textures:
> > > 
> > > VM start=0x262F0000  end=0x26346800 | Texture 512x512x1, 10 levels, 1
> > > samples, dxt5_srgba
> > > [...]
> > > Jan 15 16:52:47 kaveri kernel: [  208.982506] radeon 0000:00:01.0: GPU fault
> > > detected: 146 0x09450014
> > > Jan 15 16:52:47 kaveri kernel: [  208.982511] radeon 0000:00:01.0:  
> > > VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0002634A
> > > Jan 15 16:52:47 kaveri kernel: [  208.982513] radeon 0000:00:01.0:  
> > > VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014
> > > Jan 15 16:52:47 kaveri kernel: [  208.982514] VM fault (0x04, vmid 2) at
> > > page 156490, write from 'CB0' (0x43423000) (0)
> > > Jan 15 16:52:47 kaveri kernel: [  208.982519] radeon 0000:00:01.0: GPU fault
> > > detected: 146 0x09050014
> > > Jan 15 16:52:47 kaveri kernel: [  208.982520] radeon 0000:00:01.0:  
> > > VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00026347
> > > Jan 15 16:52:47 kaveri kernel: [  208.982521] radeon 0000:00:01.0:  
> > > VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014
> > > Jan 15 16:52:47 kaveri kernel: [  208.982522] VM fault (0x04, vmid 2) at
> > > page 156487, write from 'CB0' (0x43423000) (0)
> > 
> > I'll try 10.3.2 and launch a bisection, but I need to downgrade my newly
> > updated llvm back to 3.5 first.
> 
> I'm confirming that 10.3.2 works fine.

I'm also seeing the VM faults in dmesg.
Comment 16 Alexandre Demers 2015-01-16 04:30:35 UTC
While bisecting, using today's mesa from git, I don't have any crash/hang... But the VM errors are still there.
Comment 17 Michel Dänzer 2015-01-16 08:45:22 UTC
(In reply to Alexandre Demers from comment #16)
> While bisecting, using today's mesa from git, I don't have any crash/hang...

If that's still using LLVM 3.5, maybe it's actually an LLVM regression.
Comment 18 Alexandre Demers 2015-01-16 13:39:36 UTC
(In reply to Michel Dänzer from comment #17)
> (In reply to Alexandre Demers from comment #16)
> > While bisecting, using today's mesa from git, I don't have any crash/hang...
> 
> If that's still using LLVM 3.5, maybe it's actually an LLVM regression.

I doubt it, since I've been using llvm-git only for the last couple of days. But I'm actually rebuilding llvm from git, I'll know later today for sure.
Comment 19 Alexandre Demers 2015-01-16 23:49:44 UTC
(In reply to Michel Dänzer from comment #17)
> (In reply to Alexandre Demers from comment #16)
> > While bisecting, using today's mesa from git, I don't have any crash/hang...
> 
> If that's still using LLVM 3.5, maybe it's actually an LLVM regression.

Tested with mesa recompiled with llvm 3.5+ (r226248) and it still doesn't crash.

Should we keep this bug opened and focus on the VM faults?
Comment 20 Marek Olšák 2015-01-23 22:06:05 UTC
Yes. VM faults can cause hangs too. Were you able to bisect the problematic commit?
Comment 21 Alexandre Demers 2015-01-23 23:12:06 UTC
(In reply to Marek Olšák from comment #20)
> Yes. VM faults can cause hangs too. Were you able to bisect the problematic
> commit?

Are you refering to the VM faults? If so, not yet? I've been busy with other things lately, but I could give it a go in the next week.
Comment 22 Michel Dänzer 2015-01-26 08:11:26 UTC
Has anyone found a Mesa commit yet where the VM faults *don't* occur? Otherwise, there's nothing to bisect for them.
Comment 23 smoki 2015-01-26 13:47:55 UTC
(In reply to Michel Dänzer from comment #22)
> Has anyone found a Mesa commit yet where the VM faults *don't* occur?
> Otherwise, there's nothing to bisect for them.

 There is no good bisect, a go down to nesa 10.1 game needs at least that for gl3 renderer at is still fault.

 Issue is mostly about that texture compression option for me, i tried 0.8.2-beta and 0.8.2-beta2... only you are better to not start a race before it get disabled, applied and most importantly exit a game after that because it does not get really applied :D ... blah, just do:

 MESA_EXTENSION_OVERRIDE=-GL_EXT_texture_compression_s3tc ./supertuxkart

 And it is fine.
Comment 24 Alexandre Demers 2015-01-27 05:42:10 UTC
(In reply to Michel Dänzer from comment #22)
> Has anyone found a Mesa commit yet where the VM faults *don't* occur?
> Otherwise, there's nothing to bisect for them.

To my knowledge, since I've had this video card (a few months), I've been dealing with VM faults. Here they are triggered in SuperTuxKart, but I've also reportered them in another bug about Serious Sam 3 (which hangs the GPU in a few seconds). However, are VM faults only related to mesa or can they come from somewhere else (drm)?
Comment 25 Michel Dänzer 2015-01-27 06:32:35 UTC
(In reply to Alexandre Demers from comment #24)
> To my knowledge, since I've had this video card (a few months), I've been
> dealing with VM faults.

Note that I'm referring specifically to the VM faults triggered by your apitrace from comment 5. A VM fault by itself is a generic symptom which can be caused by many different things, it's more or less the equivalent of a CPU segmentation fault.


> However, are VM faults only related to mesa or can they come from somewhere
> else (drm)?

The Mesa driver is most likely in general, though in this particular case it could also be e.g. libdrm_radeon calculating the surface parameters incorrectly.
Comment 26 smoki 2015-01-27 10:32:48 UTC
(In reply to Michel Dänzer from comment #25)
> Note that I'm referring specifically to the VM faults triggered by your
> apitrace from comment 5.

 I can't even start that one, it just throw this:

0 6 glXCreateWindow(dpy = 0x1fd84a0, config = 0x2060a80, win = 58720258, attribList = {}) = 58720259
6: warning: unsupported glXCreateWindow call
X Error of failed request:  GLXBadFBConfig
  Major opcode of failed request:  155 (GLX)
  Minor opcode of failed request:  34 ()
  Serial number of failed request:  22
  Current serial number in output stream:  20

 Probably shipped gcc libs issue.
Comment 27 Martin Peres 2019-11-19 09:00:28 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/569.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.