Bug 95659 - HD 7450, R9-270, GPU locked when testing glmark2
Summary: HD 7450, R9-270, GPU locked when testing glmark2
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: 11.2
Hardware: Other Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-24 12:03 UTC by Jack
Modified: 2019-09-25 17:54 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Jack 2016-05-24 12:03:53 UTC
[  427.367567] radeon 0000:01:00.0: ring 0 stalled for more than 10070msec
[  427.374151] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000ea28 last fence id 0x000000000000ea2d on ring 0)
[  427.500439] radeon 0000:01:00.0: Saved 151 dwords of commands on ring 0.
[  427.500464] radeon 0000:01:00.0: GPU softreset: 0x00000009
[  427.500471] radeon 0000:01:00.0:   GRBM_STATUS               = 0xB2732828
[  427.500476] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x1C000005
[  427.500481] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[  427.500486] radeon 0000:01:00.0:   SRBM_STATUS               = 0x20000AC0
[  427.500491] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  427.500495] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  427.500500] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x400C0000
[  427.500505] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00048004
[  427.500509] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80268647
[  427.500513] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  427.513320] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B
[  427.513378] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  427.514540] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[  427.514544] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[  427.514549] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[  427.514554] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[  427.514558] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  427.514563] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  427.514567] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[  427.514571] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[  427.514576] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[  427.514580] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  427.514607] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  427.543125] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[  427.612472] [drm] PCIE GART of 1024M enabled (table at 0x0000000000274000).
[  427.612666] radeon 0000:01:00.0: WB enabled
[  427.612673] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffffffc8dc366c00
[  427.612678] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffffffc8dc366c0c
[  427.662677] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xffffff8018332118
[  427.679867] [drm] ring test on 0 succeeded in 1 usecs
[  427.679884] [drm] ring test on 3 succeeded in 7 usecs
[  427.856598] [drm] ring test on 5 succeeded in 2 usecs
[  427.856613] [drm] UVD initialized successfully.
[  427.977870] [drm] ib test on ring 0 succeeded in 0 usecs
[  427.977919] [drm] ib test on ring 3 succeeded in 0 usecs
[  428.128994] [drm] ib test on ring 5 succeeded
[  792.996810] radeon 0000:01:00.0: ring 0 stalled for more than 10140msec
[  793.003403] radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000000184b0 last fence id 0x00000000000184b5 on ring 0)
[  793.140947] radeon 0000:01:00.0: Saved 151 dwords of commands on ring 0.
[  793.140974] radeon 0000:01:00.0: GPU softreset: 0x00000009
[  793.140981] radeon 0000:01:00.0:   GRBM_STATUS               = 0xF7730828
[  793.140986] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0xFC000001
[  793.140992] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[  793.140996] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[  793.141001] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  793.141006] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  793.141010] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x400C0000
[  793.141015] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00048004
[  793.141019] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80268647
[  793.141024] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  793.154522] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B
[  793.154580] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  793.155741] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[  793.155746] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[  793.155751] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[  793.155755] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[  793.155759] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  793.155764] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  793.155768] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[  793.155772] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[  793.155777] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[  793.155781] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  793.155808] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  793.184357] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[  793.253690] [drm] PCIE GART of 1024M enabled (table at 0x0000000000274000).
[  793.253884] radeon 0000:01:00.0: WB enabled
[  793.253891] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffffffc8dc366c00
[  793.253896] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffffffc8dc366c0c
[  793.303888] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xffffff8018332118
[  793.321047] [drm] ring test on 0 succeeded in 2 usecs
[  793.321064] [drm] ring test on 3 succeeded in 7 usecs
[  793.497767] [drm] ring test on 5 succeeded in 2 usecs
[  793.497781] [drm] UVD initialized successfully.
[  793.619052] [drm] ib test on ring 0 succeeded in 0 usecs
[  793.619102] [drm] ib test on ring 3 succeeded in 0 usecs
[  793.770188] [drm] ib test on ring 5 succeeded



when testing the following cases, GPU locked. First it stops responding for some seconds and screen goes black, then it return to normal.
loop:vertex-step=5, fragment-steps=5, fragment-loop=false;
loop:vertex-step=5, fragment-steps=5, fragment-uniform=false;
loop:vertex-step=5, fragment-steps=5, fragment-uniform=ture;

I've already tried several radeon boot flags, such as DPM on and off, radeon_lockup_timeout = 20000, without success. Upgrading the kernel version to 4.5.3 and 4.6.0 but it not fixed. 

The problem does not occur when i add a delay(such as 1ms using msleep) in the problem scene's draw function.

I thought is the  problem of Synchronization between the cpu and gpu using fence , can anyone have some idea?


lspci:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM]

libdrm 2.4.67, 
xserver-xorg-video-ati 7.6.0,  
xorg-server 1.18.2,   
mesa 11.2.0
kernel 4.4
Comment 1 Michel Dänzer 2016-05-25 01:22:49 UTC
Did this also happen with older versions of Mesa and/or the kernel?
Comment 2 Tiger 2016-05-25 02:35:54 UTC
(In reply to Michel Dänzer from comment #1)
> Did this also happen with older versions of Mesa and/or the kernel?

I've done the test with Mesa 10.3.0 (xorg-server-1.16,libdrm-2.56) and kernel-3.14, the GPU lockup does not occur. 
I've also tried with Mesa 10.3.0 and kernel-4.4, the GPU lockup still not occurs. So It looks like it have a relationship with the mesa.
Comment 3 Michel Dänzer 2016-05-25 03:05:34 UTC
Can you bisect Mesa?
Comment 4 Tiger 2016-05-25 06:18:30 UTC
(In reply to Michel Dänzer from comment #3)
> Can you bisect Mesa?

I guess I may be to have a try, but not in the last few days as it will cost a lot of time. 
Are there any other reasons cause this problem? Thank you very much for your reply,Could you give me some advice or anything else about it?
Comment 5 Tiger 2016-05-27 01:48:08 UTC
So after two days of testing, I found this bug is still present in mesa-11.1.2.

The GPU lockup does not occur after i tested several times with 11.0.2 on my R9 270/HD 5450/HD 7450, but locked up on E6760.

11.1.2 or 11.2.0 + kernel-4.4 = lockup (HD7450,5450,r9-270)
11.0.2 + kernel-4.4 = No problems! 

That looks more like some patches from commit 	51e0b06d9916e126060c0d218de1aaa4e5a4ce26(11.0.2) to commit	7bcd827806b0816d61122ba3d37dd40178d96d98(11.1.2) introduced this bug and triggered by 'glmark2 -b grid'. 

I want to use 'format-patch' to pick the patches from 11.0.2 to 11.1.2, can anyone provide a classification of the patches or some critical patches to me to locate. I don't have a in-depth understanding of the Mesa.

Furthermore, I'll try the latest Mesa for some times and report back if I either get a gpu lockup, or don't.
Comment 6 Michel Dänzer 2016-05-27 06:36:51 UTC
git bisect makes this a lot easier. If you don't know how to use it yet, search for "git bisect howto".
Comment 7 Tiger 2016-08-04 03:38:18 UTC
Always use VRAM for PIPE_USAGE_STREAM buffers, the GPU lockup does not occur.
Comment 8 Michel Dänzer 2016-08-04 03:45:59 UTC
Does setting the environment variable R600_DEBUG=nowc for the glmark2 process avoid the problem?
Comment 9 Tiger 2016-08-04 06:14:21 UTC
I'll try.
Comment 10 Timothy Arceri 2018-04-12 06:41:15 UTC
Is this still and issue for you on a more recent software stack?
Comment 11 GitLab Migration User 2019-09-25 17:54:31 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1232.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.