Summary: | Packet0 not allowed and GPU fault detected errors with Serious Engine games | ||
---|---|---|---|
Product: | Mesa | Reporter: | Daniel Scharrer <daniel> |
Component: | Drivers/Gallium/radeonsi | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | alexandre.f.demers, ashmikuz, haagch, keramidasceid, maraeo |
Version: | git | ||
Hardware: | Other | ||
OS: | All | ||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=84500 | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
dmesg output with the GPU fault errors filtered out
standard output from The Talos Principle and Serious Sam 3 possible fix for VM faults VM faults and Packet0 error when quitting the current game sorry i should've attached it dmesg with blizzard's heroes of the storm beta in wine Output with R600_DEBUG=ps,vs,gs dmesg to "Output with R600_DEBUG=ps,vs,gs" |
Description
Daniel Scharrer
2014-12-13 10:38:05 UTC
Created attachment 110809 [details]
standard output from The Talos Principle and Serious Sam 3
The standard output has this repeated a few times:
radeon: The kernel rejected CS, see dmesg for more information.
This also happens on a HD 7970M (pitcairn). 00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09) 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Wimbledon XT [Radeon HD 7970M] (rev ff) On latest mesa git master and both with linux 3.18 and drm-next-3.19. Setting the game to lowest settings doesn't seem to help. When closely looking at the ground the vm faults seem to stop, when looking a little bit in the distance, they start happening again. [47303.471209] radeon 0000:01:00.0: GPU fault detected: 147 0x0fe2c401 [47303.471211] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FF03E4E [47303.471212] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x020C4001 [47303.471214] VM fault (0x01, vmid 1) at page 267402830, read from TC (196) [47303.487684] radeon 0000:01:00.0: GPU fault detected: 147 0x09c2c801 [47303.487689] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FF03E4E [47303.487691] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x020C8001 [47303.487694] VM fault (0x01, vmid 1) at page 267402830, read from TC (200) [47303.487696] radeon 0000:01:00.0: GPU fault detected: 147 0x09c28401 [47303.487698] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FFFFFFF [47303.487699] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x020C4001 [47303.487701] VM fault (0x01, vmid 1) at page 268435455, read from TC (196) [47303.504293] radeon 0000:01:00.0: GPU fault detected: 147 0x09c24801 [47303.504298] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FF03E4E [47303.504300] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02048001 [47303.504302] VM fault (0x01, vmid 1) at page 267402830, read from TC (72) Does the environment variable R600_DEBUG=nodma avoid this problem? (In reply to Michel Dänzer from comment #3) > Does the environment variable R600_DEBUG=nodma avoid this problem? No, R600_DEBUG=nodma does not help. (In reply to Daniel Scharrer from comment #0) > Created attachment 110808 [details] > dmesg output with the GPU fault errors filtered out > > Running Serious Sam 3 or The Talos Principle spams dmesg with thousands of > these errors: > > [ 6001.212237] radeon 0000:01:00.0: GPU fault detected: 147 0x02528801 > [ 6001.212243] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR > 0x0FF02192 > [ 6001.212246] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS > 0x12088001 > [ 6001.212249] VM fault (0x01, vmid 9) at page 267395474, read from TC (136) > > There are also a few "Packet0 not allowed" errors (followed by a hex dump): > > [15446.473341] radeon 0000:01:00.0: Packet0 not allowed! > > So far it's only these errors in dmesg - I haven't observed any actual > rendering issues, crashes, GPU lockups because of this. > > I have only attached a filtered kernel log with the GPU fault errors removed > - the full log is available at http://constexpr.org/tmp/serious-dmesg.log > (140 MiB). > > Both of these games use the Serious Engine 3.5 (Serious Sam 3) or 4 (The > Talos Principle). This is also reproducible with The Talos Principle Public > Test which as of now is still available as a free download on Steam. > > Kernel: 3.18.0-gentoo > GPU: Radeon HD 7950 > Driver: radeonsi, Mesa 10.5.0-devel (git-ff96537) > > This might be related to bug 84500 - however those spurious Packet0 have > been gone for a while now with updated Mesa - now I got them again but only > while running Serious Engine games. I haven't had a look at the log when launching SS3, but for sure it crashes in no time. It crashes in no time once in a game. It could be related to your bug. However, I think the VM and the Packet 0 are different bugs. I'll have a look in the logs if I get something similar. (In reply to Christoph Haag from comment #2) > [47303.471211] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR > 0x0FF03E4E The fact that bits 32-39 of the faulting addresses are FF indicates incorrect shader code generation resulting in those address bits being clobbered. Can you generate an apitrace which reproduces the Packet0 error and/or GPUVM faults? (In reply to Michel Dänzer from comment #3) > Does the environment variable R600_DEBUG=nodma avoid this problem? No change here either. But now that you mention it: IIRC SS3 did sometimes lock up the GPU before http://cgit.freedesktop.org/mesa/mesa/commit/?id=ae4536b4f71cbe76230ea7edc7eb4d6041e651b4 (In reply to Michel Dänzer from comment #6) > Can you generate an apitrace which reproduces the Packet0 error and/or GPUVM > faults? Here is one with VM faults from starting The Talos Principle Public Test, just up to the main menu: http://constexpr.org/tmp/Talos_Demo.trace (149 MiB) Sometimes there is also a Packet0 error at the end. Didn't get it while recording, got it 2/3 times while replaying. (In reply to Daniel Scharrer from comment #7) > Here is one with VM faults from starting The Talos Principle Public Test, > just up to the main menu: > > http://constexpr.org/tmp/Talos_Demo.trace (149 MiB) Thanks. The VM faults generated by this apitrace turned out to be a Mesa regression. I bisected it: 5e0fbe1b631d883eb0e033938a534a259c8d95fd is the first bad commit commit 5e0fbe1b631d883eb0e033938a534a259c8d95fd Author: Marek Olšák <marek.olsak@amd.com> Date: Sat Oct 4 20:41:03 2014 +0200 radeonsi: remove vs.ucps_enabled from the shader key Written CLIPDIST outputs are simply disabled in PA_CL_VS_OUT_CNTL. Note that I'm only getting the VM faults with my Cape Verde card, not with my Kaveri. Seems to be SI specific. I haven't been able to reproduce the Packet0 error with this apitrace. Created attachment 111036 [details] [review] possible fix for VM faults Does this patch fix the VM faults? (In reply to Marek Olšák from comment #10) > Created attachment 111036 [details] [review] [review] > possible fix for VM faults > > Does this patch fix the VM faults? I don't think it does for me. I still get radeon 0000:01:00.0: GPU fault detected: 147 0x0342c401 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FF0081A radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x020C4001 VM fault (0x01, vmid 1) at page 267388954, read from TC (196) radeon 0000:01:00.0: GPU fault detected: 147 0x03424401 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FF0081A radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02044001 VM fault (0x01, vmid 1) at page 267388954, read from TC (68) radeon 0000:01:00.0: GPU fault detected: 147 0x03428401 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FF0081A radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02084001 VM fault (0x01, vmid 1) at page 267388954, read from TC (132) radeon 0000:01:00.0: GPU fault detected: 147 0x03420401 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FF0081A radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02004001 VM fault (0x01, vmid 1) at page 267388954, read from TC (4) radeon 0000:01:00.0: GPU fault detected: 147 0x03428401 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FF0081A radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02084001 VM fault (0x01, vmid 1) at page 267388954, read from TC (132) Sorry, it has been a couple of weeks, but I confirm I have the same problem with my R9 270X. I'm using latest mesa, drm, ddx from git repositories on a 3.19-rc1 kernel. I'll be attaching my log. I'll also test the patch. Created attachment 111430 [details]
VM faults and Packet0 error when quitting the current game
on R9 270X with today's latest mesa, drm, ddx from git repositories and kernel 3.19-rc1
(In reply to Alexandre Demers from comment #13) > Created attachment 111430 [details] > VM faults and Packet0 error when quitting the current game > > on R9 270X with today's latest mesa, drm, ddx from git repositories and > kernel 3.19-rc1 Patch tested and I get the same error as before, as Christoph. same issue, dota2 and with no Packet0 message. kernel 3.18 llvm git mesa git r9-270x/radeonsi it happens fairly frequently (or every time i pick 'morphling') :( i haven't had this problem with 6570 (r600g)... ... Jan 05 06:25:08 -- kernel: switching to power state: Jan 05 06:25:08 -- kernel: ui class: performance Jan 05 06:25:08 -- kernel: internal class: none Jan 05 06:25:08 -- kernel: caps: Jan 05 06:25:08 -- kernel: uvd vclk: 0 dclk: 0 Jan 05 06:25:08 -- kernel: power level 0 sclk: 30000 mclk: 15000 vddc: 875 vddci: 850 pcie gen: 1 Jan 05 06:25:08 -- kernel: power level 1 sclk: 45000 mclk: 140000 vddc: 950 vddci: 1025 pcie gen: 1 Jan 05 06:25:08 -- kernel: power level 2 sclk: 105000 mclk: 140000 vddc: 1163 vddci: 1025 pcie gen: 1 Jan 05 06:25:08 -- kernel: power level 3 sclk: 112000 mclk: 140000 vddc: 1206 vddci: 1025 pcie gen: 1 Jan 05 06:25:08 -- kernel: status: c r Jan 05 06:25:40 -- kernel: IPVS: Creating netns size=2056 id=21 Jan 05 06:25:40 -- kernel: IPVS: ftp: loaded support on port[0] = 21 Jan 05 06:28:59 -- kernel: radeon 0000:01:00.0: GPU fault detected: 147 0x00044401 Jan 05 06:28:59 -- kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x01000000 Jan 05 06:28:59 -- kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x04044001 Jan 05 06:28:59 -- kernel: VM fault (0x01, vmid 2) at page 16777216, read from TC (68) Jan 05 06:28:59 -- kernel: radeon 0000:01:00.0: GPU fault detected: 147 0x00044401 Jan 05 06:28:59 -- kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x01000000 Jan 05 06:28:59 -- kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x04044001 Jan 05 06:28:59 -- kernel: VM fault (0x01, vmid 2) at page 16777216, read from TC (68) Jan 05 06:29:10 -- kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10293msec Jan 05 06:29:10 -- kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000002165c7 last fence id 0x00000000002165e9 on ring 0) Jan 05 06:29:10 -- kernel: radeon 0000:01:00.0: failed to get a new IB (-35) Jan 05 06:29:10 -- kernel: [drm:radeon_cs_ib_fill] *ERROR* Failed to get ib ! Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: failed to get a new IB (-35) Jan 05 06:29:11 -- kernel: [drm:radeon_cs_ib_fill] *ERROR* Failed to get ib ! Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: Saved 1355 dwords of commands on ring 0. Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: GPU softreset: 0x0000004D Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: GRBM_STATUS = 0xF7D24028 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0xEFC00000 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0xEFC00000 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: SRBM_STATUS = 0x200000C0 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x40000000 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00008006 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: R_008680_CP_STAT = 0x80228647 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44483106 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00100100 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: GRBM_STATUS = 0x00003028 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000006 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: SRBM_STATUS = 0x200000C0 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: GPU reset succeeded, trying to resume Jan 05 06:29:11 -- kernel: [drm] probing gen 2 caps for device 8086:2e31 = 2212501/0 Jan 05 06:29:11 -- kernel: [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000). Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: WB enabled Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000100000c00 and cpu addr 0xffff8800db08fc00 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000100000c04 and cpu addr 0xffff8800db08fc04 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000100000c08 and cpu addr 0xffff8800db08fc08 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000100000c0c and cpu addr 0xffff8800db08fc0c Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000100000c10 and cpu addr 0xffff8800db08fc10 Jan 05 06:29:11 -- kernel: radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000075a18 and cpu addr 0xffffc900048b5a18 Jan 05 06:29:11 -- kernel: [drm] ring test on 0 succeeded in 3 usecs Jan 05 06:29:11 -- kernel: [drm] ring test on 1 succeeded in 1 usecs Jan 05 06:29:11 -- kernel: [drm] ring test on 2 succeeded in 1 usecs Jan 05 06:29:11 -- kernel: [drm] ring test on 3 succeeded in 6 usecs Jan 05 06:29:11 -- kernel: [drm] ring test on 4 succeeded in 6 usecs Jan 05 06:29:11 -- kernel: [drm] ring test on 5 succeeded in 2 usecs Jan 05 06:29:11 -- kernel: [drm] UVD initialized successfully. Jan 05 06:29:11 -- kernel: switching from power state: Jan 05 06:29:11 -- kernel: ui class: none Jan 05 06:29:11 -- kernel: internal class: boot Jan 05 06:29:11 -- kernel: caps: Jan 05 06:29:11 -- kernel: uvd vclk: 0 dclk: 0 Jan 05 06:29:11 -- kernel: power level 0 sclk: 15000 mclk: 15000 vddc: 950 vddci: 950 pcie gen: 1 Jan 05 06:29:11 -- kernel: status: c b ... Created attachment 111751 [details]
sorry i should've attached it
This seems to be fixed with current Mesa git. Can you confirm? (In reply to Marek Olšák from comment #17) > This seems to be fixed with current Mesa git. Can you confirm? I played for a while and the problems were gone for me. The performance is still very bad and sometimes there is some graphics/texture corruption flickering. But this specific issue here seems to be fixed. (In reply to Marek Olšák from comment #17) > This seems to be fixed with current Mesa git. Can you confirm? Indeed, no GPU faults and no Packet 0 observered. SS3 doesn't crash the whole desktop anymore. Do you have any idea what was pushed that may have fixed the bug we were seeing. I can confirm that the GPU fault errors are gone, but still get Packet0 errors (both in game and in the apitrace from Comment 7). Also, there were still GPU fault errors in The Talos Principle and demo (but not the apitrace) until I also updated LLVM. (In reply to Daniel Scharrer from comment #20) > I can confirm that the GPU fault errors are gone, but still get Packet0 > errors (both in game and in the apitrace from Comment 7). > > Also, there were still GPU fault errors in The Talos Principle and demo (but > not the apitrace) until I also updated LLVM. Good point about LLVM, because I'm also using yesterday's svn LLVM code. (In reply to Alexandre Demers from comment #19) > (In reply to Marek Olšák from comment #17) > > This seems to be fixed with current Mesa git. Can you confirm? > > Indeed, no GPU faults and no Packet 0 observered. SS3 doesn't crash the > whole desktop anymore. > > Do you have any idea what was pushed that may have fixed the bug we were > seeing. Sorry, I have absolutely no idea. It could have been something in LLVM or perhaps something here: http://cgit.freedesktop.org/mesa/mesa/log/?id=d8185aa9a8e3588fe014faef8afaeae56d45e90b Thanks for the feedback. I'm closing the bug. (Some of) the GPU faults are back with Mesa git-8a71fd8 and LLVM r229671: [11047.892869] radeon 0000:01:00.0: GPU fault detected: 147 0x04088801 [11047.892875] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FF00820 [11047.892878] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08088001 [11047.892881] VM fault (0x01, vmid 4) at page 267388960, read from TC (136) [...] There are also plenty of Packet0 errors still/again. This happens after 4db985a5fa9ea985616a726b1770727309502d81 which reverts 0e9cdedd2e3943bdb7f3543a3508b883b167e427 "radeon/llvm: enable unsafe math for graphics shaders" as mentioned in bug 89069 comment 21. Unlike before this bug was closed, now there are only GPU faults after actually loading a level, which is not covered in the above trace. Here is the new, longer trace from the other bug report - maybe it will also allow others to better reproduce the Packet0 errors: http://constexpr.org/tmp/TalosDemo-radeonsi.2.trace.xz (83 MiB) I also see a lot of these messages in The Talos Principle on a R9 270X here: ... [416091.177464] radeon 0000:01:00.0: GPU fault detected: 147 0x000a0401 [416091.177467] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x06F02080 [416091.177468] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A004001 [416091.177469] VM fault (0x01, vmid 5) at page 116400256, read from TC (4) [416091.195605] radeon 0000:01:00.0: GPU fault detected: 147 0x000a0401 [416091.195608] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x06F02080 [416091.195610] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A004001 [416091.195611] VM fault (0x01, vmid 5) at page 116400256, read from TC (4) [416091.213688] radeon 0000:01:00.0: GPU fault detected: 147 0x000a4801 [416091.213692] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x06F02080 [416091.213693] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A048001 [416091.213694] VM fault (0x01, vmid 5) at page 116400256, read from TC (72) [416091.231852] radeon 0000:01:00.0: GPU fault detected: 147 0x002a4801 [416091.231855] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x06F02081 [416091.231857] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A048001 [416091.231858] VM fault (0x01, vmid 5) at page 116400257, read from TC (72) [416091.250052] radeon 0000:01:00.0: GPU fault detected: 147 0x000a8801 [416091.250056] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x06F02080 [416091.250057] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A088001 [416091.250058] VM fault (0x01, vmid 5) at page 116400256, read from TC (136) [416091.268150] radeon 0000:01:00.0: GPU fault detected: 147 0x000a8801 [416091.268153] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x06F02080 [416091.268154] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A088001 [416091.268156] VM fault (0x01, vmid 5) at page 116400256, read from TC (136) [416091.286178] radeon 0000:01:00.0: GPU fault detected: 147 0x002a4801 [416091.286181] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x06F02081 [416091.286182] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A048001 [416091.286183] VM fault (0x01, vmid 5) at page 116400257, read from TC (72) [416091.304253] radeon 0000:01:00.0: GPU fault detected: 147 0x000a4801 [416091.304256] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x06F02080 [416091.304257] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A048001 [416091.304259] VM fault (0x01, vmid 5) at page 116400256, read from TC (72) ... It looks like the game is stuttering when new textures are loaded or something like that. For example I go to a new area and when I walk straight, everything is smooth. When I start looking around, I get stuttering. This happens only once. After the initial stuttering, the game runs at normal speed again. I also see some graphics corruptions like in https://bugs.freedesktop.org/show_bug.cgi?id=88978 which I can also see in dota itself. I'm running mesa 5750595ca97b2f8f18d22af35b431a6c66dd899a and llvm r231783. lspci says: 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT [Radeon R9 270X] Created attachment 114793 [details]
dmesg with blizzard's heroes of the storm beta in wine
When playing Heroes of the Storm (I think you need a beta key to play) with wine (wine-staging with csmt, I admit), I get a lot of GPU problems with radeonsi.
I've seen "radeon 0000:01:00.0: Packet0 not allowed!" and a lot of GPU faults, so perhaps it is related.
It's kinda unplayable with radeonsi because it often hangs and it takes several seconds for it to recover.
recent llvm 3.7 svn, recent mesa git, linux 3.19-ck
Can you run the game with R600_DEBUG=ps,vs,gs and post the output? Created attachment 114806 [details]
Output with R600_DEBUG=ps,vs,gs
Uhm, good luck with that 7 megabyte file. Not sure what's the binary garbage at the beginning.
Created attachment 114807 [details]
dmesg to "Output with R600_DEBUG=ps,vs,gs"
With Mesa git-3bdbc1e, LLVM r236436 and Linux 4.0.1-gentoo my previous Talos traces don't produce any GPU VM faults anymore. However, the game still does. Here is a new trace: http://constexpr.org/tmp/Talos-radeonsi.3.trace.xz (147 MiB) This traces still produces VM faults even when re-enabling unsafe-fp-math optimizations (see bug 89069). There is also some junk being rendered at the end of the trace. I no longer get any GPU faults or Packet0 errors with current LLVM and mesa (2b83133, even with unsafe math disabled again). I (In reply to Marek Olšák from comment #17) > This seems to be fixed with current Mesa git. Can you confirm? It looks like it is not fixed in mesa git as I can reproduce it with the apitrace in comment #c29 with Cap Verde, radeon driver, mesa 18+, kernel 4.15.0-15-generic, LLVM 7.0.0, xorg 1.20.99.1, xf86-video-ati 18.0.1. (same result with kernel 4.4, mesa 12.0.6, llvm 3) radeon 0000:08:00.0: GPU fault detected: 146 0x0d64520c radeon 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001196EB radeon 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0405200C VM fault (0x0c, vmid 2) at page 1152747, read from CB_CMASK (82) I always get the errors above and sometimes I get the gpu lockup and also sometimes the Packet0 not allowed!. The possible fix in #c10 does not help. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1213. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.