Bug 63564 - Radeon HD 5870 CP lockup / Stall with OpenGL load
Summary: Radeon HD 5870 CP lockup / Stall with OpenGL load
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600 (show other bugs)
Version: git
Hardware: All Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-15 17:25 UTC by Alexander von Gluck
Modified: 2019-09-18 19:02 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
xorg.conf (471 bytes, text/plain)
2013-04-15 17:26 UTC, Alexander von Gluck
Details

Description Alexander von Gluck 2013-04-15 17:25:59 UTC
Playing Team Fortress 2 under ArchLinux with mainline git mesa + xorg 1.14

Get random GPU lockups and stalls. The driver does recover, however recovery takes around 10 seconds.  dmesg gets the following stall warning on every stall:

[ 2377.378560] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[ 2377.378568] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000051691 last fence id 0x000000000005168a)
[ 2377.379628] radeon 0000:01:00.0: Saved 279 dwords of commands on ring 0.
[ 2377.379635] radeon 0000:01:00.0: GPU softreset: 0x00000003
[ 2377.389578] radeon 0000:01:00.0:   GRBM_STATUS               = 0xF5700828
[ 2377.389581] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x88000003
[ 2377.389583] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0xFC000001
[ 2377.389585] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[ 2377.389586] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[ 2377.389588] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x400C0000
[ 2377.389590] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00048006
[ 2377.389592] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80268647
[ 2377.389593] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[ 2377.389645] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[ 2377.389647] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[ 2377.389648] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[ 2377.389650] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[ 2377.389652] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[ 2377.389653] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[ 2377.389655] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[ 2377.389656] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[ 2377.407070] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[ 2377.474159] [drm] probing gen 2 caps for device 1002:5a16 = 2/0
[ 2377.474161] [drm] PCIE gen 2 link speeds already enabled
[ 2377.477011] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[ 2377.477093] radeon 0000:01:00.0: WB enabled
[ 2377.477095] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8804270e7c00
[ 2377.477097] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8804270e7c0c
[ 2377.493165] [drm] ring test on 0 succeeded in 1 usecs
[ 2377.493240] [drm] ring test on 3 succeeded in 1 usecs
[ 2377.506450] [drm] ib test on ring 0 succeeded in 0 usecs
[ 2377.506483] [drm] ib test on ring 3 succeeded in 1 usecs

[ 3733.413723] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[ 3733.413731] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000d3f80 last fence id 0x00000000000d3f7d)
[ 3733.414789] radeon 0000:01:00.0: Saved 151 dwords of commands on ring 0.
[ 3733.414796] radeon 0000:01:00.0: GPU softreset: 0x00000003
[ 3733.419144] radeon 0000:01:00.0:   GRBM_STATUS               = 0xF5500828
[ 3733.419148] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x88000003
[ 3733.419152] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0xEC000001
[ 3733.419155] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[ 3733.419159] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[ 3733.419162] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x400C0000
[ 3733.419166] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00048004
[ 3733.419169] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80268647
[ 3733.419172] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[ 3733.419227] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[ 3733.419230] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[ 3733.419234] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[ 3733.419237] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[ 3733.419241] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[ 3733.419244] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[ 3733.419247] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[ 3733.419251] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[ 3733.436635] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[ 3733.503442] [drm] probing gen 2 caps for device 1002:5a16 = 2/0
[ 3733.503444] [drm] PCIE gen 2 link speeds already enabled
[ 3733.505831] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[ 3733.505909] radeon 0000:01:00.0: WB enabled
[ 3733.505911] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8804270e7c00
[ 3733.505913] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8804270e7c0c
[ 3733.521972] [drm] ring test on 0 succeeded in 1 usecs
[ 3733.522028] [drm] ring test on 3 succeeded in 1 usecs
[ 3733.534027] [drm] ib test on ring 0 succeeded in 0 usecs
[ 3733.534049] [drm] ib test on ring 3 succeeded in 1 usecs


kallisti5@eris ~ :) $ sudo lspci -vv -s 01:00.0
01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Cypress LE [Radeon HD 5800 Series] (prog-if 00 [VGA controller])
	Subsystem: XFX Pine Group Inc. Device 3070
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 90
	Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
	Region 2: Memory at fea20000 (64-bit, non-prefetchable) [size=128K]
	Region 4: I/O ports at e000 [size=256]
	Expansion ROM at fea00000 [disabled] [size=128K]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000feeff00c  Data: 41e3
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Kernel driver in use: radeon
Comment 1 Alexander von Gluck 2013-04-15 17:26:39 UTC
Created attachment 78005 [details]
xorg.conf
Comment 2 Alexander von Gluck 2013-04-15 17:28:45 UTC
Git rev is:

commit 02b808b08acc73b9b3d31832a7f137a9aae4bdd9
Author: Francisco Jerez <currojerez@riseup.net>
Date:   Sun Apr 7 18:31:06 2013 +0200

    clover: Fix usage of incorrect object as destination in clEnqueueCopyBufferToImage.
    
    Signed-off-by: Francisco Jerez <currojerez@riseup.net>



Pretty recent.
Comment 3 Alexander von Gluck 2013-04-15 17:29:45 UTC
Also, as this could be CP related:

kallisti5@eris ~ :( $ uname -a
Linux eris 3.8.7-1-ARCH #1 SMP PREEMPT Sat Apr 13 09:01:47 CEST 2013 x86_64 GNU/Linux
Comment 4 Alex Deucher 2013-04-15 19:05:38 UTC
Does disabling hyperZ help?  Set env var R600_HYPERZ=0 (mesa 9.1), or R600_DEBUG=nohyperz (git master).  If so, this is probably a duplicate of bug 61747.
Comment 5 Alexander von Gluck 2013-04-15 19:59:41 UTC
I'll give it a try tonight and let you know.

Thanks!
Comment 6 Alexander von Gluck 2013-04-16 15:30:01 UTC
Nope, doesn't help.

export R600_DEBUG=nohyperz
R600_DEBUG=nohyperz steam

.
.

[218403.403352] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[218403.403361] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000006d0011b last fence id 0x0000000006d00117)
[218403.404418] radeon 0000:01:00.0: Saved 151 dwords of commands on ring 0.
[218403.404425] radeon 0000:01:00.0: GPU softreset: 0x00000003
[218403.419360] radeon 0000:01:00.0:   GRBM_STATUS               = 0xF7730828
[218403.419367] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0xFC000001
[218403.419372] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0xFC000001
[218403.419376] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[218403.419379] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[218403.419383] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x400C0000
[218403.419387] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00048004
[218403.419391] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80268647
[218403.419394] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[218403.419449] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[218403.419452] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[218403.419456] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[218403.419459] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[218403.419463] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[218403.419467] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[218403.419470] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[218403.419474] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[218403.436850] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[218403.503687] [drm] probing gen 2 caps for device 1002:5a16 = 2/0
[218403.503689] [drm] PCIE gen 2 link speeds already enabled
[218403.506900] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[218403.506982] radeon 0000:01:00.0: WB enabled
[218403.506984] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8804270e7c00
[218403.506986] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8804270e7c0c
[218403.523073] [drm] ring test on 0 succeeded in 1 usecs
[218403.523129] [drm] ring test on 3 succeeded in 1 usecs
[218403.530020] [drm] ib test on ring 0 succeeded in 0 usecs
[218403.530053] [drm] ib test on ring 3 succeeded in 1 usecs
Comment 7 GitLab Migration User 2019-09-18 19:02:17 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/432.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.