In SuperTuxKart, upon loading the first track of the story mode, the display freezes. The GPU resets but when it comes back everything is messed up and it keeps resetting continuously. I'm using latest kernel 3.19-rc3 with the "drm/radeon: fix VM flush..." patches (also tested without it), latest mesa from git, latest drm from git. I'll see a journald dump outputs something interesting.
Nothing interesting in there.
Can you create an apitrace which reproduces the problem?
(In reply to Michel Dänzer from comment #2) > Can you create an apitrace which reproduces the problem? I tried, but it was not conclusive. I'll give it another try tomorrow.
I have a trace available, but it's 170MB. Do you have a suggestion on where I should upload it?
Crashing trace: https://drive.google.com/file/d/0Bw_tZdWsNa4BeDN2c3VRZ014aW8/view?usp=sharing
I can reproduce the hang with current Mesa Git master, but not with the Debian 10.3.2 packages. Can you confirm that and if so, can you bisect? Even with 10.3.2 though, there are GPUVM faults, looks like the CB writing past the end of DXT5 SRGBA textures: VM start=0x262F0000 end=0x26346800 | Texture 512x512x1, 10 levels, 1 samples, dxt5_srgba [...] Jan 15 16:52:47 kaveri kernel: [ 208.982506] radeon 0000:00:01.0: GPU fault detected: 146 0x09450014 Jan 15 16:52:47 kaveri kernel: [ 208.982511] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0002634A Jan 15 16:52:47 kaveri kernel: [ 208.982513] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014 Jan 15 16:52:47 kaveri kernel: [ 208.982514] VM fault (0x04, vmid 2) at page 156490, write from 'CB0' (0x43423000) (0) Jan 15 16:52:47 kaveri kernel: [ 208.982519] radeon 0000:00:01.0: GPU fault detected: 146 0x09050014 Jan 15 16:52:47 kaveri kernel: [ 208.982520] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00026347 Jan 15 16:52:47 kaveri kernel: [ 208.982521] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014 Jan 15 16:52:47 kaveri kernel: [ 208.982522] VM fault (0x04, vmid 2) at page 156487, write from 'CB0' (0x43423000) (0)
Game works fine for me if i disable texture compression in options. BTW, they have disabled compression for any intel driver, that might mean this is not only radeonsi driver issue: https://github.com/supertuxkart/stk-code/blob/master/data/graphical_restrictions.xml
Compiled latest supertuxkart git, nothing good. Blah, even tried it on Windows now and there too it lockup driver randomly... they really needs to fix their new alpha engine.
(In reply to smoki from comment #7) > Game works fine for me if i disable texture compression in options. > > BTW, they have disabled compression for any intel driver, that might mean > this is not only radeonsi driver issue: > > https://github.com/supertuxkart/stk-code/blob/master/data/ > graphical_restrictions.xml Which GPU are you using? Disabling texture compression doesn't solve the bug.
(In reply to Michel Dänzer from comment #6) > I can reproduce the hang with current Mesa Git master, but not with the > Debian 10.3.2 packages. Can you confirm that and if so, can you bisect? > > Even with 10.3.2 though, there are GPUVM faults, looks like the CB writing > past the end of DXT5 SRGBA textures: > > VM start=0x262F0000 end=0x26346800 | Texture 512x512x1, 10 levels, 1 > samples, dxt5_srgba > [...] > Jan 15 16:52:47 kaveri kernel: [ 208.982506] radeon 0000:00:01.0: GPU fault > detected: 146 0x09450014 > Jan 15 16:52:47 kaveri kernel: [ 208.982511] radeon 0000:00:01.0: > VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0002634A > Jan 15 16:52:47 kaveri kernel: [ 208.982513] radeon 0000:00:01.0: > VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014 > Jan 15 16:52:47 kaveri kernel: [ 208.982514] VM fault (0x04, vmid 2) at > page 156490, write from 'CB0' (0x43423000) (0) > Jan 15 16:52:47 kaveri kernel: [ 208.982519] radeon 0000:00:01.0: GPU fault > detected: 146 0x09050014 > Jan 15 16:52:47 kaveri kernel: [ 208.982520] radeon 0000:00:01.0: > VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00026347 > Jan 15 16:52:47 kaveri kernel: [ 208.982521] radeon 0000:00:01.0: > VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014 > Jan 15 16:52:47 kaveri kernel: [ 208.982522] VM fault (0x04, vmid 2) at > page 156487, write from 'CB0' (0x43423000) (0) I'll try 10.3.2 and launch a bisection, but I need to downgrade my newly updated llvm back to 3.5 first.
(In reply to Alexandre Demers from comment #9) > > Which GPU are you using? > > Disabling texture compression doesn't solve the bug. Low end Kabini. Yeah texture compression disable, solve it for 0.8.2-beta release, but not for current game git i tried now, so yeah bug is there. Not sure it is driver bug, as game is really full of bugs and lockuped driver easely even on Windows for me, it lockup there even on very minimum settings... but somehow randomly, practically on any settings. Man to man said, game is now real shit... sorry to say that i don't know better words to describe this :) I only understend it uses new forked engine and game developers needs to fix some driver incompatibilities. I also read thir forums and issues on github, there are planty of unsolved isuess.
(In reply to Alexandre Demers from comment #10) > (In reply to Michel Dänzer from comment #6) > > I can reproduce the hang with current Mesa Git master, but not with the > > Debian 10.3.2 packages. Can you confirm that and if so, can you bisect? > > > > Even with 10.3.2 though, there are GPUVM faults, looks like the CB writing > > past the end of DXT5 SRGBA textures: > > > > VM start=0x262F0000 end=0x26346800 | Texture 512x512x1, 10 levels, 1 > > samples, dxt5_srgba > > [...] > > Jan 15 16:52:47 kaveri kernel: [ 208.982506] radeon 0000:00:01.0: GPU fault > > detected: 146 0x09450014 > > Jan 15 16:52:47 kaveri kernel: [ 208.982511] radeon 0000:00:01.0: > > VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0002634A > > Jan 15 16:52:47 kaveri kernel: [ 208.982513] radeon 0000:00:01.0: > > VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014 > > Jan 15 16:52:47 kaveri kernel: [ 208.982514] VM fault (0x04, vmid 2) at > > page 156490, write from 'CB0' (0x43423000) (0) > > Jan 15 16:52:47 kaveri kernel: [ 208.982519] radeon 0000:00:01.0: GPU fault > > detected: 146 0x09050014 > > Jan 15 16:52:47 kaveri kernel: [ 208.982520] radeon 0000:00:01.0: > > VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00026347 > > Jan 15 16:52:47 kaveri kernel: [ 208.982521] radeon 0000:00:01.0: > > VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014 > > Jan 15 16:52:47 kaveri kernel: [ 208.982522] VM fault (0x04, vmid 2) at > > page 156487, write from 'CB0' (0x43423000) (0) > > I'll try 10.3.2 and launch a bisection, but I need to downgrade my newly > updated llvm back to 3.5 first. I'm confirming that 10.3.2 works fine.
I tried mesa 10.2.9, 10.3.0, 10.3.2, 10.3.7, 10.4.2 and 10.5-devel. with current game git all lockup GPU. Not to mention Windows 7 32bit and 64bit there are also GPU lockups with 0.8.2-beta release.
(In reply to smoki from comment #13) > I tried mesa 10.2.9, 10.3.0, 10.3.2, 10.3.7, 10.4.2 and 10.5-devel. with > current game git all lockup GPU. > > Not to mention Windows 7 32bit and 64bit there are also GPU lockups with > 0.8.2-beta release. IMO, an application should never be able to lock a GPU. But things are as they are. I haven't updated STK since I did my apitrace. For me, that version is not crashing with 10.3.2 (and I'm bisecting, we will see where this ends).
(In reply to Alexandre Demers from comment #12) > (In reply to Alexandre Demers from comment #10) > > (In reply to Michel Dänzer from comment #6) > > > I can reproduce the hang with current Mesa Git master, but not with the > > > Debian 10.3.2 packages. Can you confirm that and if so, can you bisect? > > > > > > Even with 10.3.2 though, there are GPUVM faults, looks like the CB writing > > > past the end of DXT5 SRGBA textures: > > > > > > VM start=0x262F0000 end=0x26346800 | Texture 512x512x1, 10 levels, 1 > > > samples, dxt5_srgba > > > [...] > > > Jan 15 16:52:47 kaveri kernel: [ 208.982506] radeon 0000:00:01.0: GPU fault > > > detected: 146 0x09450014 > > > Jan 15 16:52:47 kaveri kernel: [ 208.982511] radeon 0000:00:01.0: > > > VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0002634A > > > Jan 15 16:52:47 kaveri kernel: [ 208.982513] radeon 0000:00:01.0: > > > VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014 > > > Jan 15 16:52:47 kaveri kernel: [ 208.982514] VM fault (0x04, vmid 2) at > > > page 156490, write from 'CB0' (0x43423000) (0) > > > Jan 15 16:52:47 kaveri kernel: [ 208.982519] radeon 0000:00:01.0: GPU fault > > > detected: 146 0x09050014 > > > Jan 15 16:52:47 kaveri kernel: [ 208.982520] radeon 0000:00:01.0: > > > VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00026347 > > > Jan 15 16:52:47 kaveri kernel: [ 208.982521] radeon 0000:00:01.0: > > > VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014 > > > Jan 15 16:52:47 kaveri kernel: [ 208.982522] VM fault (0x04, vmid 2) at > > > page 156487, write from 'CB0' (0x43423000) (0) > > > > I'll try 10.3.2 and launch a bisection, but I need to downgrade my newly > > updated llvm back to 3.5 first. > > I'm confirming that 10.3.2 works fine. I'm also seeing the VM faults in dmesg.
While bisecting, using today's mesa from git, I don't have any crash/hang... But the VM errors are still there.
(In reply to Alexandre Demers from comment #16) > While bisecting, using today's mesa from git, I don't have any crash/hang... If that's still using LLVM 3.5, maybe it's actually an LLVM regression.
(In reply to Michel Dänzer from comment #17) > (In reply to Alexandre Demers from comment #16) > > While bisecting, using today's mesa from git, I don't have any crash/hang... > > If that's still using LLVM 3.5, maybe it's actually an LLVM regression. I doubt it, since I've been using llvm-git only for the last couple of days. But I'm actually rebuilding llvm from git, I'll know later today for sure.
(In reply to Michel Dänzer from comment #17) > (In reply to Alexandre Demers from comment #16) > > While bisecting, using today's mesa from git, I don't have any crash/hang... > > If that's still using LLVM 3.5, maybe it's actually an LLVM regression. Tested with mesa recompiled with llvm 3.5+ (r226248) and it still doesn't crash. Should we keep this bug opened and focus on the VM faults?
Yes. VM faults can cause hangs too. Were you able to bisect the problematic commit?
(In reply to Marek Olšák from comment #20) > Yes. VM faults can cause hangs too. Were you able to bisect the problematic > commit? Are you refering to the VM faults? If so, not yet? I've been busy with other things lately, but I could give it a go in the next week.
Has anyone found a Mesa commit yet where the VM faults *don't* occur? Otherwise, there's nothing to bisect for them.
(In reply to Michel Dänzer from comment #22) > Has anyone found a Mesa commit yet where the VM faults *don't* occur? > Otherwise, there's nothing to bisect for them. There is no good bisect, a go down to nesa 10.1 game needs at least that for gl3 renderer at is still fault. Issue is mostly about that texture compression option for me, i tried 0.8.2-beta and 0.8.2-beta2... only you are better to not start a race before it get disabled, applied and most importantly exit a game after that because it does not get really applied :D ... blah, just do: MESA_EXTENSION_OVERRIDE=-GL_EXT_texture_compression_s3tc ./supertuxkart And it is fine.
(In reply to Michel Dänzer from comment #22) > Has anyone found a Mesa commit yet where the VM faults *don't* occur? > Otherwise, there's nothing to bisect for them. To my knowledge, since I've had this video card (a few months), I've been dealing with VM faults. Here they are triggered in SuperTuxKart, but I've also reportered them in another bug about Serious Sam 3 (which hangs the GPU in a few seconds). However, are VM faults only related to mesa or can they come from somewhere else (drm)?
(In reply to Alexandre Demers from comment #24) > To my knowledge, since I've had this video card (a few months), I've been > dealing with VM faults. Note that I'm referring specifically to the VM faults triggered by your apitrace from comment 5. A VM fault by itself is a generic symptom which can be caused by many different things, it's more or less the equivalent of a CPU segmentation fault. > However, are VM faults only related to mesa or can they come from somewhere > else (drm)? The Mesa driver is most likely in general, though in this particular case it could also be e.g. libdrm_radeon calculating the surface parameters incorrectly.
(In reply to Michel Dänzer from comment #25) > Note that I'm referring specifically to the VM faults triggered by your > apitrace from comment 5. I can't even start that one, it just throw this: 0 6 glXCreateWindow(dpy = 0x1fd84a0, config = 0x2060a80, win = 58720258, attribList = {}) = 58720259 6: warning: unsupported glXCreateWindow call X Error of failed request: GLXBadFBConfig Major opcode of failed request: 155 (GLX) Minor opcode of failed request: 34 () Serial number of failed request: 22 Current serial number in output stream: 20 Probably shipped gcc libs issue.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/569.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.