Playing Team Fortress 2 under ArchLinux with mainline git mesa + xorg 1.14 Get random GPU lockups and stalls. The driver does recover, however recovery takes around 10 seconds. dmesg gets the following stall warning on every stall: [ 2377.378560] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [ 2377.378568] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000051691 last fence id 0x000000000005168a) [ 2377.379628] radeon 0000:01:00.0: Saved 279 dwords of commands on ring 0. [ 2377.379635] radeon 0000:01:00.0: GPU softreset: 0x00000003 [ 2377.389578] radeon 0000:01:00.0: GRBM_STATUS = 0xF5700828 [ 2377.389581] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x88000003 [ 2377.389583] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0xFC000001 [ 2377.389585] radeon 0000:01:00.0: SRBM_STATUS = 0x200000C0 [ 2377.389586] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [ 2377.389588] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x400C0000 [ 2377.389590] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00048006 [ 2377.389592] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80268647 [ 2377.389593] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B [ 2377.389645] radeon 0000:01:00.0: GRBM_STATUS = 0x00003828 [ 2377.389647] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000007 [ 2377.389648] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000007 [ 2377.389650] radeon 0000:01:00.0: SRBM_STATUS = 0x200000C0 [ 2377.389652] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [ 2377.389653] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [ 2377.389655] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [ 2377.389656] radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 [ 2377.407070] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [ 2377.474159] [drm] probing gen 2 caps for device 1002:5a16 = 2/0 [ 2377.474161] [drm] PCIE gen 2 link speeds already enabled [ 2377.477011] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [ 2377.477093] radeon 0000:01:00.0: WB enabled [ 2377.477095] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8804270e7c00 [ 2377.477097] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8804270e7c0c [ 2377.493165] [drm] ring test on 0 succeeded in 1 usecs [ 2377.493240] [drm] ring test on 3 succeeded in 1 usecs [ 2377.506450] [drm] ib test on ring 0 succeeded in 0 usecs [ 2377.506483] [drm] ib test on ring 3 succeeded in 1 usecs [ 3733.413723] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [ 3733.413731] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000d3f80 last fence id 0x00000000000d3f7d) [ 3733.414789] radeon 0000:01:00.0: Saved 151 dwords of commands on ring 0. [ 3733.414796] radeon 0000:01:00.0: GPU softreset: 0x00000003 [ 3733.419144] radeon 0000:01:00.0: GRBM_STATUS = 0xF5500828 [ 3733.419148] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x88000003 [ 3733.419152] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0xEC000001 [ 3733.419155] radeon 0000:01:00.0: SRBM_STATUS = 0x200000C0 [ 3733.419159] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [ 3733.419162] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x400C0000 [ 3733.419166] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00048004 [ 3733.419169] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80268647 [ 3733.419172] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B [ 3733.419227] radeon 0000:01:00.0: GRBM_STATUS = 0x00003828 [ 3733.419230] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000007 [ 3733.419234] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000007 [ 3733.419237] radeon 0000:01:00.0: SRBM_STATUS = 0x200000C0 [ 3733.419241] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [ 3733.419244] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [ 3733.419247] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [ 3733.419251] radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 [ 3733.436635] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [ 3733.503442] [drm] probing gen 2 caps for device 1002:5a16 = 2/0 [ 3733.503444] [drm] PCIE gen 2 link speeds already enabled [ 3733.505831] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [ 3733.505909] radeon 0000:01:00.0: WB enabled [ 3733.505911] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8804270e7c00 [ 3733.505913] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8804270e7c0c [ 3733.521972] [drm] ring test on 0 succeeded in 1 usecs [ 3733.522028] [drm] ring test on 3 succeeded in 1 usecs [ 3733.534027] [drm] ib test on ring 0 succeeded in 0 usecs [ 3733.534049] [drm] ib test on ring 3 succeeded in 1 usecs kallisti5@eris ~ :) $ sudo lspci -vv -s 01:00.0 01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Cypress LE [Radeon HD 5800 Series] (prog-if 00 [VGA controller]) Subsystem: XFX Pine Group Inc. Device 3070 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 90 Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M] Region 2: Memory at fea20000 (64-bit, non-prefetchable) [size=128K] Region 4: I/O ports at e000 [size=256] Expansion ROM at fea00000 [disabled] [size=128K] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000feeff00c Data: 41e3 Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [150 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Kernel driver in use: radeon
Created attachment 78005 [details] xorg.conf
Git rev is: commit 02b808b08acc73b9b3d31832a7f137a9aae4bdd9 Author: Francisco Jerez <currojerez@riseup.net> Date: Sun Apr 7 18:31:06 2013 +0200 clover: Fix usage of incorrect object as destination in clEnqueueCopyBufferToImage. Signed-off-by: Francisco Jerez <currojerez@riseup.net> Pretty recent.
Also, as this could be CP related: kallisti5@eris ~ :( $ uname -a Linux eris 3.8.7-1-ARCH #1 SMP PREEMPT Sat Apr 13 09:01:47 CEST 2013 x86_64 GNU/Linux
Does disabling hyperZ help? Set env var R600_HYPERZ=0 (mesa 9.1), or R600_DEBUG=nohyperz (git master). If so, this is probably a duplicate of bug 61747.
I'll give it a try tonight and let you know. Thanks!
Nope, doesn't help. export R600_DEBUG=nohyperz R600_DEBUG=nohyperz steam . . [218403.403352] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [218403.403361] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000006d0011b last fence id 0x0000000006d00117) [218403.404418] radeon 0000:01:00.0: Saved 151 dwords of commands on ring 0. [218403.404425] radeon 0000:01:00.0: GPU softreset: 0x00000003 [218403.419360] radeon 0000:01:00.0: GRBM_STATUS = 0xF7730828 [218403.419367] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0xFC000001 [218403.419372] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0xFC000001 [218403.419376] radeon 0000:01:00.0: SRBM_STATUS = 0x200000C0 [218403.419379] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [218403.419383] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x400C0000 [218403.419387] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00048004 [218403.419391] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80268647 [218403.419394] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B [218403.419449] radeon 0000:01:00.0: GRBM_STATUS = 0x00003828 [218403.419452] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000007 [218403.419456] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000007 [218403.419459] radeon 0000:01:00.0: SRBM_STATUS = 0x200000C0 [218403.419463] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [218403.419467] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [218403.419470] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [218403.419474] radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 [218403.436850] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [218403.503687] [drm] probing gen 2 caps for device 1002:5a16 = 2/0 [218403.503689] [drm] PCIE gen 2 link speeds already enabled [218403.506900] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [218403.506982] radeon 0000:01:00.0: WB enabled [218403.506984] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8804270e7c00 [218403.506986] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8804270e7c0c [218403.523073] [drm] ring test on 0 succeeded in 1 usecs [218403.523129] [drm] ring test on 3 succeeded in 1 usecs [218403.530020] [drm] ib test on ring 0 succeeded in 0 usecs [218403.530053] [drm] ib test on ring 3 succeeded in 1 usecs
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/432.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.