Bug 101575

Summary: Lockup for executing trivial-tess-gs_no-gs-inputs.shader_test
Product: Mesa Reporter: Hi-Angel <Hi-Angel>
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium    
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Vertex, geometric, pixel shaders dump
R600_TRACE lockup at start
R600_TRACE lockup at end

Description Hi-Angel 2017-06-24 10:17:18 UTC
Created attachment 132214 [details]
Vertex, geometric, pixel shaders dump

This test always fails:

	$ bin/shader_runner piglit/tests/spec/arb_tessellation_shader/execution/trivial-tess-gs_no-gs-inputs.shader_test -auto -fbo
	Probe color at (0,0)
	  Expected: 0 255 0 0
	  Observed: 0 0 0 0
	Test failure on line 60
	PIGLIT: {"result": "fail" }

but, in addition, often screen goes black, and kernel spews a message about lockup:

	[ 4093.695956] radeon 0000:01:00.0: ring 3 stalled for more than 10053msec
	[ 4093.695966] radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000000002be last fence id 0x00000000000002bf on ring 3)
	[ 4093.696032] radeon 0000:01:00.0: failed to get a new IB (-35)
	[ 4093.696079] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib !
	[ 4093.703389] radeon 0000:01:00.0: Saved 1154 dwords of commands on ring 0.
	[ 4093.703406] radeon 0000:01:00.0: GPU softreset: 0x0000001D
	[ 4093.703410] radeon 0000:01:00.0:   GRBM_STATUS               = 0xA0631CA0
	[ 4093.703413] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x18000003
	[ 4093.703416] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
	[ 4093.703418] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
	[ 4093.703421] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
	[ 4093.703424] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x01000000
	[ 4093.703427] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00011000
	[ 4093.703430] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00068402
	[ 4093.703433] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80870243
	[ 4093.703436] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44483106
	[ 4093.708782] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B
	[ 4093.708836] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00100100
	[ 4093.710004] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
	[ 4093.710006] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
	[ 4093.710008] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
	[ 4093.710010] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
	[ 4093.710012] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
	[ 4093.710014] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
	[ 4093.710015] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
	[ 4093.710017] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
	[ 4093.710019] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
	[ 4093.710021] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
	[ 4093.710039] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
	[ 4093.737953] [drm] PCIE GART of 1024M enabled (table at 0x000000000014C000).
	[ 4093.738076] radeon 0000:01:00.0: WB enabled
	[ 4093.738081] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8801b0eb0c00
	[ 4093.738084] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8801b0eb0c0c
	[ 4093.738460] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffc9000181c418
	[ 4093.755220] [drm] ring test on 0 succeeded in 1 usecs
	[ 4093.755229] [drm] ring test on 3 succeeded in 3 usecs
	[ 4093.932873] [drm] ring test on 5 succeeded in 1 usecs
	[ 4093.932878] [drm] UVD initialized successfully.
	[ 4093.951552] [drm] ib test on ring 0 succeeded in 0 usecs
	[ 4093.951589] [drm] ib test on ring 3 succeeded in 0 usecs
	[ 4095.109237] [drm] ib test on ring 5 succeeded

≈½ of lockups are leaving screen black, and require reboot.

Attaching vs,gs,ps assembly, and R600_TRACE results from 2 different lockups. The 2 traces are identical except that 1-st one locks up at the start, and the other at the end.

I've tried fixing it myself, but I have no slightest idea what to look at.
Comment 1 Hi-Angel 2017-06-24 10:18:06 UTC
Created attachment 132215 [details]
R600_TRACE lockup at start
Comment 2 Hi-Angel 2017-06-24 10:18:39 UTC
Created attachment 132216 [details]
R600_TRACE lockup at end
Comment 3 Hi-Angel 2017-07-17 13:14:01 UTC
Some more info: for hard lockup SysRq and network stops working. I managed to get a trace for hard lockup using netconsole before network disappeared. An interesting part may be "Wait for MC idle timedout !" — I never saw it before.

	[ 3943.482116] radeon 0000:01:00.0: ring 0 stalled for more than 10030msec
	[ 3943.482135] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000026550 last fence id 0x000000000002657b on ring 0)
	[ 3943.482214] radeon 0000:01:00.0: failed to get a new IB (-35)
	[ 3943.482269] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib !
	[ 3943.489601] radeon 0000:01:00.0: Saved 1474 dwords of commands on ring 0.
	[ 3943.489623] radeon 0000:01:00.0: GPU softreset: 0x0000009D
	[ 3943.489629] radeon 0000:01:00.0:   GRBM_STATUS               = 0xA0631CA0
	[ 3943.489634] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x18000003
	[ 3943.489639] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
	[ 3943.489644] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200046C0
	[ 3943.489649] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
	[ 3943.489654] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x01000000
	[ 3943.489660] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00011000
	[ 3943.489665] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00068402
	[ 3943.489670] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80870243
	[ 3943.489675] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x60C83146
	[ 3943.497932] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B
	[ 3943.498011] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00108100
	[ 3943.499187] radeon 0000:01:00.0:   GRBM_STATUS               = 0xC0003828
	[ 3943.499196] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x80000007
	[ 3943.499202] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
	[ 3943.499207] radeon 0000:01:00.0:   SRBM_STATUS               = 0x20000CC0
	[ 3943.499212] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
	[ 3943.499218] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
	[ 3943.499223] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
	[ 3943.499228] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
	[ 3943.499233] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
	[ 3943.499239] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
	[ 3943.499261] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
	[ 3943.499355] radeon 0000:01:00.0: GPU softreset: 0x00000001
	[ 3943.499361] radeon 0000:01:00.0:   GRBM_STATUS               = 0xC0003828
	[ 3943.499366] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x80000007
	[ 3943.499371] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
	[ 3943.499376] radeon 0000:01:00.0:   SRBM_STATUS               = 0x20000CC0
	[ 3943.499381] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
	[ 3943.499386] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
	[ 3943.499392] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
	[ 3943.499397] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
	[ 3943.499402] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
	[ 3943.499407] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
	[ 3943.880423] radeon 0000:01:00.0: Wait for MC idle timedout !
	[ 3943.880432] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6A
	[ 3943.881600] radeon 0000:01:00.0:   GRBM_STATUS               = 0xC0003828
	[ 3943.881603] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x80000007
	[ 3943.881606] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
	[ 3943.881609] radeon 0000:01:00.0:   SRBM_STATUS               = 0x20000CC0
	[ 3943.881612] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
	[ 3943.881615] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
	[ 3943.881618] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
	[ 3943.881621] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
	[ 3943.881625] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
	[ 3943.881628] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Comment 4 GitLab Migration User 2019-09-18 19:23:20 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/605.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.