Bug 59649 - [r600][RV635] GPU lockup CP stall / GPU resets over and over - Kernel 3.7 to 3.12 inclusive
Summary: [r600][RV635] GPU lockup CP stall / GPU resets over and over - Kernel 3.7 to ...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-21 05:07 UTC by Shawn Starr
Modified: 2014-11-10 05:12 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Radeon crash with dynclks enabled (126.27 KB, text/plain)
2013-09-30 00:57 UTC, Shawn Starr
no flags Details

Description Shawn Starr 2013-01-21 05:07:01 UTC
Using Linux kernel 3.7 and up to 3.8-rc3 Unable to have a stable session with my RV635 GPU

Jan 19 03:45:26 segfault kernel: [15008.313696] radeon 0000:01:00.0: Saved 185 dwords of commands on ring 0.
Jan 19 03:45:26 segfault kernel: [15008.313704] radeon 0000:01:00.0: GPU softreset
Jan 19 03:45:26 segfault kernel: [15008.313711] radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0xA0003030
Jan 19 03:45:26 segfault kernel: [15008.313717] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00000003
Jan 19 03:45:26 segfault kernel: [15008.313723] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x200000C0
Jan 19 03:45:26 segfault kernel: [15008.313730] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Jan 19 03:45:26 segfault kernel: [15008.313736] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
Jan 19 03:45:26 segfault kernel: [15008.313742] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000006
Jan 19 03:45:26 segfault kernel: [15008.313748] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80000645
Jan 19 03:45:26 segfault kernel: [15008.313761] radeon 0000:01:00.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
Jan 19 03:45:26 segfault kernel: [15008.328772] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
Jan 19 03:45:26 segfault kernel: [15008.344782] radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0xA0003030
Jan 19 03:45:26 segfault kernel: [15008.344785] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00000003
Jan 19 03:45:26 segfault kernel: [15008.344787] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x200080C0
Jan 19 03:45:26 segfault kernel: [15008.344789] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Jan 19 03:45:26 segfault kernel: [15008.344792] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
Jan 19 03:45:26 segfault kernel: [15008.344794] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
Jan 19 03:45:26 segfault kernel: [15008.344797] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80100000
Jan 19 03:45:26 segfault kernel: [15008.345799] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
Jan 19 03:45:26 segfault kernel: [15008.348414] [drm] probing gen 2 caps for device 8086:2a41 = 1/0
Jan 19 03:45:26 segfault kernel: [15008.350360] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
Jan 19 03:45:26 segfault kernel: [15008.350399] radeon 0000:01:00.0: WB enabled
Jan 19 03:45:26 segfault kernel: [15008.350403] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff880229236c00
Jan 19 03:45:26 segfault kernel: [15008.381778] [drm] ring test on 0 succeeded in 1 usecs
Jan 19 03:45:26 segfault kernel: [15008.384549] [drm] ib test on ring 0 succeeded in 0 usecs
Jan 19 03:46:12 segfault kernel: [15053.625108] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec

...

Jan 19 03:46:12 segfault kernel: [15053.975428] radeon 0000:01:00.0: Wait for MC idle timedout !
Jan 19 03:46:12 segfault kernel: [15054.123890] radeon 0000:01:00.0: Wait for MC idle timedout !
Jan 19 03:46:12 segfault kernel: [15054.125748] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
Jan 19 03:46:12 segfault kernel: [15054.125785] radeon 0000:01:00.0: WB enabled
Jan 19 03:46:12 segfault kernel: [15054.125789] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff880229236c00
Jan 19 03:46:12 segfault kernel: [15054.157608] [drm] ring test on 0 succeeded in 0 usecs
Jan 19 03:46:23 segfault kernel: [15064.657103] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
Jan 19 03:46:23 segfault kernel: [15064.657114] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000441b6 last fence id 0x00000000000441a8)
Jan 19 03:46:23 segfault kernel: [15064.657121] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35).
Jan 19 03:46:23 segfault kernel: [15064.657134] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35).
Jan 19 03:46:23 segfault kernel: [15064.657140] radeon 0000:01:00.0: ib ring test failed (-35).
Jan 19 03:46:23 segfault kernel: [15064.658211] radeon 0000:01:00.0: GPU softreset
Jan 19 03:46:23 segfault kernel: [15064.658218] radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0xE57C24E0
Jan 19 03:46:23 segfault kernel: [15064.658224] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00113303
Jan 19 03:46:23 segfault kernel: [15064.658230] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x200030C0
Jan 19 03:46:23 segfault kernel: [15064.658236] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x01000000
Jan 19 03:46:23 segfault kernel: [15064.658242] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00001002
Jan 19 03:46:23 segfault kernel: [15064.658248] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00028482
Jan 19 03:46:23 segfault kernel: [15064.658254] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80838645
Jan 19 03:46:23 segfault kernel: [15064.829116] radeon 0000:01:00.0: Wait for MC idle timedout !
Jan 19 03:46:23 segfault kernel: [15064.829123] radeon 0000:01:00.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
Jan 19 03:46:23 segfault kernel: [15064.844133] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
Jan 19 03:46:23 segfault kernel: [15064.860144] radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0xA0003030
Jan 19 03:46:23 segfault kernel: [15064.860150] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00000003
an 19 03:46:23 segfault kernel: [15064.860163] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x2000B0C0
Jan 19 03:46:23 segfault kernel: [15064.860169] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Jan 19 03:46:23 segfault kernel: [15064.860175] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
Jan 19 03:46:23 segfault kernel: [15064.860181] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
Jan 19 03:46:23 segfault kernel: [15064.860191] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80100000
Jan 19 03:46:23 segfault kernel: [15064.861197] radeon 0000:01:00.0: GPU reset succeeded, trying to resume

Jan 19 04:39:23 segfault kernel: [ 2791.671107] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35).

Jan 19 04:39:23 segfault kernel: [ 2791.671115] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35).

Then floods console with

[drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB !
radeon 0000:01:00.0: couldn't schedule ib (over and over)

mesa-dri-drivers-9.0.1-3.fc18.x86_64
libdrm-2.4.40-1.fc18.x86_64

kernels: kernel-3.7.3-201.fc18.x86_64, kernel-devel-3.8.0-0.rc3.git1.2.fc19.x86_64

I have not tried on 3.8-rc4 yet

Laptop:  Lenovo ThinkPad W500
Comment 1 Alex Deucher 2013-01-21 21:09:17 UTC
Is this still an issue with the latest bits from Dave's last pull request?
http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes
Comment 2 Shawn Starr 2013-01-22 17:00:33 UTC
(In reply to comment #1)
> Is this still an issue with the latest bits from Dave's last pull request?
> http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes

to be determined, I will need to build the patch into Fedora SRPM kernel or build the kernel generic .rpm from your tree.
Comment 3 Alex Deucher 2013-01-22 19:01:41 UTC
Are you using the same userspace components (mesa and ddx) across kernels?
Comment 4 Shawn Starr 2013-01-22 19:03:24 UTC
Yes, I am
Comment 5 Alex Deucher 2013-01-22 19:31:03 UTC
What was the last working kernel?  Any chance you could bisect?
Comment 6 letharion 2013-01-22 20:34:49 UTC
I suppose "GPU lockup CP stall / GPU resets over and over" is probably such a generic error that a lot of bugs could be reported without having anything to do with each other, but FWIW, I recently experienced exactly the same error shortly, usually < 30 seconds, after starting any 3d-accelerated game.
I tried 3.5.7 and 3.6.11, had the same result, search around for more info, ended up in this issue, tried installing 3.8.0-rc4, and voila, problem is gone.

I guess this means it's not interesting with a bisect, but I'm willing to try if it helps in any way.
Comment 7 Shawn Starr 2013-09-05 13:50:49 UTC
Alex found some HW bug issues noted internally see patches:

http://lists.x.org/archives/xorg-driver-ati/2013-September/025087.html
http://lists.freedesktop.org/archives/mesa-dev/2013-September/044244.html

I'm going to try them out
Comment 8 Shawn Starr 2013-09-11 14:25:44 UTC
This is pending close, waiting til end of week, but so far, the fixes work, those patches listed in the bug are obsolete as the work is being shifted around but the logic however seems to fix the reset issues.
Comment 9 Shawn Starr 2013-09-13 14:45:23 UTC
Closing, I have not had any resets anymore with the respective code changes. Much thanks to Alex for finding this issue!
Comment 10 Shawn Starr 2013-09-14 06:54:01 UTC
Reopen :/

At least now i can trigger the crash repeatedly.

1) Log into Second Life first

2) You need to patch some of the GLSL programs as they will fail with Mesa 9.2 GLSL compiler

- #extension GL_ARB_texture_rectangle : enable
+/* #extension GL_ARB_texture_rectangle : enable */

3) Go to the Graphics options and under Shaders, enable:

- Basic Shaders
- Atmospheric Shaders
- Advanced Lighting Model
- Ambient Occlusion
- Depth of field

GPU will reset:

[566574.634495] switching from power state:
[566574.634497]         ui class: performance
[566574.634498]         internal class: none
[566574.634500]         caps: single_disp video 
[566574.634501]         uvd    vclk: 0 dclk: 0
[566574.634502]                 power level 0    sclk: 11000 mclk: 40500 vddc: 900
[566574.634503]                 power level 1    sclk: 30000 mclk: 70000 vddc: 1100
[566574.634504]                 power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[566574.634505]         status: c 
[566574.634506] switching to power state:
[566574.634507]         ui class: performance
[566574.634508]         internal class: none
[566574.634509]         caps: video 
[566574.634509]         uvd    vclk: 0 dclk: 0
[566574.634510]                 power level 0    sclk: 30000 mclk: 70000 vddc: 1100
[566574.634511]                 power level 1    sclk: 30000 mclk: 70000 vddc: 1100
[566574.634512]                 power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[566574.634513]         status: r 
[566584.067826] switching from power state:
[566584.067830]         ui class: performance
[566584.067831]         internal class: none
[566584.067833]         caps: video 
[566584.067835]         uvd    vclk: 0 dclk: 0
[566584.067836]                 power level 0    sclk: 30000 mclk: 70000 vddc: 1100
[566584.067837]                 power level 1    sclk: 30000 mclk: 70000 vddc: 1100
[566584.067839]                 power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[566584.067840]         status: c 
[566584.067841] switching to power state:
[566584.067842]         ui class: performance
[566584.067843]         internal class: none
[566584.067844]         caps: single_disp video 
[566584.067846]         uvd    vclk: 0 dclk: 0
[566584.067847]                 power level 0    sclk: 11000 mclk: 40500 vddc: 900
[566584.067848]                 power level 1    sclk: 30000 mclk: 70000 vddc: 1100
[566584.067849]                 power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[566584.067850]         status: r 
[568371.037065] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[568371.044281] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000017b4541 last fence id 0x00000000017b4531)
[568371.111399] switching from power state:
[568371.111401]         ui class: performance
[568371.111402]         internal class: none
[568371.111403]         caps: single_disp video 
[568371.111403]         uvd    vclk: 0 dclk: 0
[568371.111405]                 power level 0    sclk: 11000 mclk: 40500 vddc: 900
[568371.111405]                 power level 1    sclk: 30000 mclk: 70000 vddc: 1100
[568371.111406]                 power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[568371.111406]         status: c 
[568371.111407] switching to power state:
[568371.111407]         ui class: performance
[568371.111408]         internal class: none
[568371.111408]         caps: video 
[568371.111409]         uvd    vclk: 0 dclk: 0
[568371.111409]                 power level 0    sclk: 30000 mclk: 70000 vddc: 1100
[568371.111410]                 power level 1    sclk: 30000 mclk: 70000 vddc: 1100
[568371.111410]                 power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[568371.111411]         status: r 
[568371.544089] radeon 0000:01:00.0: GPU lockup CP stall for more than 10507msec
[568371.550588] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000017b4532)
[568371.550591] radeon 0000:01:00.0: failed to get a new IB (-35)
[568371.555183] [drm:radeon_cs_ib_chunk] *ERROR* Failed to get ib !
[568371.561868] radeon 0000:01:00.0: Saved 505 dwords of commands on ring 0.
[568371.561878] radeon 0000:01:00.0: GPU softreset: 0x00000008
[568371.561880] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0002030
[568371.561882] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
[568371.561884] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200000C0
[568371.561886] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[568371.561888] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[568371.561890] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00020186
[568371.561892] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80028645
[568371.561894] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[568371.626137] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00004001
[568371.626193] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[568371.628298] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
[568371.628301] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
[568371.628303] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200080C0
[568371.628305] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[568371.628307] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[568371.628309] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[568371.628311] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80100000
[568371.628314] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[568371.628320] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[568371.646369] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[568371.646430] radeon 0000:01:00.0: WB enabled
[568371.646434] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff88022fa7ec00
[568371.646436] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff88022fa7ec0c
[568371.677879] [drm] ring test on 0 succeeded in 1 usecs
[568371.858821] [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD)
[568371.866482] [drm:r600_resume] *ERROR* r600 startup failed on resume
[568381.872051] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[568381.878037] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000017b4542 last fence id 0x00000000017b4532)
[568381.878040] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35).
[568381.885559] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35).
[568381.893970] radeon 0000:01:00.0: ib ring test failed (-35).
[568381.901954] radeon 0000:01:00.0: GPU softreset: 0x00000009
[568381.901957] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA2233030
[568381.901960] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
[568381.901962] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200000C0
[568381.901964] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[568381.901966] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00008002
[568381.901968] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00008086
[568381.901970] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80018645
[568381.901972] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[568381.952081] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF
[568381.952134] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[568381.954239] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
[568381.954241] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
[568381.954243] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200080C0
[568381.954245] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[568381.954247] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[568381.954249] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[568381.954251] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80100000
[568381.954253] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[568381.954258] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[568381.958082] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[568381.958161] radeon 0000:01:00.0: WB enabled
[568381.958164] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff88022fa7ec00
[568381.958167] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff88022fa7ec0c
[568381.989585] [drm] ring test on 0 succeeded in 1 usecs
[568382.169970] [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD)
[568382.177864] [drm:r600_resume] *ERROR* r600 startup failed on resume
[568382.183058] [drm] ib test on ring 0 succeeded in 0 usecs
[568382.183523] switching from power state:
[568382.183525]         ui class: none
[568382.183527]         internal class: boot 
[568382.183528]         caps: video 
[568382.183530]         uvd    vclk: 0 dclk: 0
[568382.183531]                 power level 0    sclk: 60000 mclk: 70000 vddc: 1100
[568382.183533]                 power level 1    sclk: 60000 mclk: 70000 vddc: 1100
[568382.183534]                 power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[568382.183535]         status: c b 
[568382.183537] switching to power state:
[568382.183538]         ui class: performance
[568382.183539]         internal class: none
[568382.183540]         caps: video 
[568382.183541]         uvd    vclk: 0 dclk: 0
[568382.183542]                 power level 0    sclk: 30000 mclk: 70000 vddc: 1100
[568382.183544]                 power level 1    sclk: 30000 mclk: 70000 vddc: 1100
[568382.183545]                 power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[568382.183546]         status: r 
[568391.876830] switching from power state:
[568391.876834]         ui class: performance
[568391.876835]         internal class: none
[568391.876837]         caps: video 
[568391.876838]         uvd    vclk: 0 dclk: 0
[568391.876840]                 power level 0    sclk: 30000 mclk: 70000 vddc: 1100
[568391.876841]                 power level 1    sclk: 30000 mclk: 70000 vddc: 1100
[568391.876843]                 power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[568391.876843]         status: c 
[568391.876845] switching to power state:
[568391.876846]         ui class: performance
[568391.876847]         internal class: none
[568391.876848]         caps: single_disp video 
[568391.876850]         uvd    vclk: 0 dclk: 0
[568391.876851]                 power level 0    sclk: 11000 mclk: 40500 vddc: 900
[568391.876852]                 power level 1    sclk: 30000 mclk: 70000 vddc: 1100
[568391.876853]                 power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[568391.876854]         status: r 
[568407.191396] switching from power state:
[568407.191398]         ui class: performance
[568407.191399]         internal class: none
[568407.191400]         caps: single_disp video 
[568407.191400]         uvd    vclk: 0 dclk: 0
[568407.191401]                 power level 0    sclk: 11000 mclk: 40500 vddc: 900
[568407.191402]                 power level 1    sclk: 30000 mclk: 70000 vddc: 1100
[568407.191402]                 power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[568407.191403]         status: c 
[568407.191403] switching to power state:
[568407.191404]         ui class: performance
[568407.191404]         internal class: none
[568407.191405]         caps: video 
[568407.191405]         uvd    vclk: 0 dclk: 0
[568407.191406]                 power level 0    sclk: 30000 mclk: 70000 vddc: 1100
[568407.191406]                 power level 1    sclk: 30000 mclk: 70000 vddc: 1100
[568407.191407]                 power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[568407.191407]         status: r 
[568429.590326] switching from power state:
[568429.590330]         ui class: performance
[568429.590332]         internal class: none
[568429.590333]         caps: video 
[568429.590335]         uvd    vclk: 0 dclk: 0
[568429.590337]                 power level 0    sclk: 30000 mclk: 70000 vddc: 1100
[568429.590338]                 power level 1    sclk: 30000 mclk: 70000 vddc: 1100
[568429.590339]                 power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[568429.590340]         status: c 
[568429.590342] switching to power state:
[568429.590343]         ui class: performance
[568429.590344]         internal class: none
[568429.590345]         caps: single_disp video 
[568429.590347]         uvd    vclk: 0 dclk: 0
[568429.590348]                 power level 0    sclk: 11000 mclk: 40500 vddc: 900
[568429.590349]                 power level 1    sclk: 30000 mclk: 70000 vddc: 1100
[568429.590350]                 power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[568429.590351]         status: r
Comment 11 Shawn Starr 2013-09-14 07:09:37 UTC
Ths is not 100% repeatable but we still can reset the GPU and that's not good
Comment 12 Alex Deucher 2013-09-15 18:52:12 UTC
If this is specific to Second Life, please update the summary.
Comment 13 Shawn Starr 2013-09-16 14:09:39 UTC
Unsure, i'm using Linux 3.12-rc0 right now, with my patched libdrm and patched Mesa builds and need to isolate if the resets are being triggered by various combinations:

1) EXA w/ composite enabled + Second Life with most GLSL programs enabled

2) GLAMOR w/ composite enabled + Second Life with most GLSL programs enabled.

Let's keep this still opened, I will stress the GPU with other tests w/o Second Life in an attempt to cause the GPU to stall/reset.
Comment 14 Shawn Starr 2013-09-17 03:15:40 UTC
This is not Second Life related at all, I manged to get GPU to reset in the following way:

1) Set /sys/class/drm/card0/device/power_dpm_state to Battery and leave /sys/class/drm/card0/device/power_dpm_force_performance_level as 'auto mode.

2) have kwin enabled wih composite, rendering: XRender (not OpenGL as this will show black windows with GLAMOR)
3) Browsed a webpage in Chromium/Chrome and it suddenly GPU reset

if I recall, In both places even when playing with Second Life, I set DPM power state to Battery even though the laptop has AC plugged in as seen in this log from the latest reset:

[   55.572222] bridge0: port 2(vnet0) entered forwarding state
[   55.572229] bridge0: port 2(vnet0) entered forwarding state
[   70.624026] bridge0: port 2(vnet0) entered forwarding state
[  591.264107] device vnet1 entered promiscuous mode
[  591.273419] bridge0: port 3(vnet1) entered forwarding state
[  591.273425] bridge0: port 3(vnet1) entered forwarding state
[  606.303032] bridge0: port 3(vnet1) entered forwarding state
[  610.073896] perf samples too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[ 1924.749108] switching from power state:
[ 1924.749113]  ui class: performance
[ 1924.749115]  internal class: none
[ 1924.749116]  caps: single_disp video 
[ 1924.749118]  uvd    vclk: 0 dclk: 0
[ 1924.749120]          power level 0    sclk: 11000 mclk: 40500 vddc: 900
[ 1924.749121]          power level 1    sclk: 30000 mclk: 70000 vddc: 1100
[ 1924.749123]          power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[ 1924.749124]  status: c 
[ 1924.749125] switching to power state:
[ 1924.749126]  ui class: battery
[ 1924.749127]  internal class: none
[ 1924.749128]  caps: single_disp video 
[ 1924.749130]  uvd    vclk: 0 dclk: 0
[ 1924.749131]          power level 0    sclk: 11000 mclk: 40500 vddc: 900
[ 1924.749132]          power level 1    sclk: 30000 mclk: 40500 vddc: 900
[ 1924.749133]          power level 2    sclk: 30000 mclk: 40500 vddc: 900
[ 1924.749134]  status: r 
[ 6797.378014] hrtimer: interrupt took 14736 ns
[15919.834055] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[15919.839527] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000011ebf9)
[15919.839532] radeon 0000:01:00.0: failed to get a new IB (-35)
[15919.845308] [drm:radeon_cs_ib_chunk] *ERROR* Failed to get ib !
[15920.072129] radeon 0000:01:00.0: Saved 1081 dwords of commands on ring 0.
[15920.072146] radeon 0000:01:00.0: GPU softreset: 0x00000009
[15920.072149] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xE4723030
[15920.072152] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00110103
[15920.072154] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200000C0
[15920.072156] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[15920.072159] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00008002
[15920.072161] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00008086
[15920.072163] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80018645
[15920.072166] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[15920.129823] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF
[15920.129880] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[15920.131986] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
[15920.131989] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
[15920.131991] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200080C0
[15920.131993] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[15920.131995] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[15920.131998] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[15920.132011] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80100000
[15920.132014] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[15920.132021] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[15920.149897] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[15920.149928] radeon 0000:01:00.0: WB enabled
[15920.149931] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff88003715bc00
[15920.149934] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff88003715bc0c
[15920.181446] [drm] ring test on 0 succeeded in 1 usecs
[15920.389589] [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD)
[15920.397386] [drm:r600_resume] *ERROR* r600 startup failed on resume
[15930.402047] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[15930.409147] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000011ec1b last fence id 0x000000000011ebff)
[15930.409150] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35).
[15930.415268] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35).
[15930.422761] radeon 0000:01:00.0: ib ring test failed (-35).
[15930.430056] radeon 0000:01:00.0: GPU softreset: 0x00000009
[15930.430059] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0783030
[15930.430061] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000103
[15930.430064] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200020C0
[15930.430066] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[15930.430068] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00008002
[15930.430070] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00008086
[15930.430072] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80018645
[15930.430074] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[15930.635438] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF
[15930.635495] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[15930.637603] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
[15930.637606] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
[15930.637608] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x2000A0C0
[15930.637610] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[15930.637612] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[15930.637614] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[15930.637617] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80100000
[15930.637619] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[15930.637624] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[15930.800267] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[15930.800293] radeon 0000:01:00.0: WB enabled
[15930.800297] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff88003715bc00
[15930.800299] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff88003715bc0c
[15930.831855] [drm] ring test on 0 succeeded in 1 usecs
[15931.040164] [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD)
[15931.047446] [drm:r600_resume] *ERROR* r600 startup failed on resume
[15931.052132] [drm] ib test on ring 0 succeeded in 0 usecs
[15931.052586] switching from power state:
[15931.052588]  ui class: none
[15931.052590]  internal class: boot 
[15931.052591]  caps: video 
[15931.052593]  uvd    vclk: 0 dclk: 0
[15931.052594]          power level 0    sclk: 60000 mclk: 70000 vddc: 1100
[15931.052596]          power level 1    sclk: 60000 mclk: 70000 vddc: 1100
[15931.052597]          power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[15931.052598]  status: c b 
[15931.052599] switching to power state:
[15931.052600]  ui class: battery
[15931.052601]  internal class: none
[15931.052602]  caps: single_disp video 
[15931.052604]  uvd    vclk: 0 dclk: 0
[15931.052605]          power level 0    sclk: 11000 mclk: 40500 vddc: 900
[15931.052620]          power level 1    sclk: 30000 mclk: 40500 vddc: 900
[15931.052621]          power level 2    sclk: 30000 mclk: 40500 vddc: 900
[15931.052622]  status: r 
[15938.711325] switching from power state:
[15938.711327]  ui class: battery
[15938.711328]  internal class: none
[15938.711328]  caps: single_disp video 
[15938.711329]  uvd    vclk: 0 dclk: 0
[15938.711330]          power level 0    sclk: 11000 mclk: 40500 vddc: 900
[15938.711331]          power level 1    sclk: 30000 mclk: 40500 vddc: 900
[15938.711331]          power level 2    sclk: 30000 mclk: 40500 vddc: 900
[15938.711332]  status: c 
[15938.711332] switching to power state:
[15938.711333]  ui class: battery
[15938.711333]  internal class: none
[15938.711334]  caps: video 
[15938.711335]  uvd    vclk: 0 dclk: 0
[15938.711335]          power level 0    sclk: 11000 mclk: 40500 vddc: 900
[15938.711336]          power level 1    sclk: 30000 mclk: 40500 vddc: 900
[15938.711336]          power level 2    sclk: 30000 mclk: 40500 vddc: 900
[15938.711337]  status: r 
[15946.896158] switching from power state:
[15946.896164]  ui class: battery
[15946.896165]  internal class: none
[15946.896167]  caps: video 
[15946.896169]  uvd    vclk: 0 dclk: 0
[15946.896170]          power level 0    sclk: 11000 mclk: 40500 vddc: 900
[15946.896172]          power level 1    sclk: 30000 mclk: 40500 vddc: 900
[15946.896173]          power level 2    sclk: 30000 mclk: 40500 vddc: 900
[15946.896174]  status: c 
[15946.896175] switching to power state:
[15946.896176]  ui class: battery
[15946.896177]  internal class: none
[15946.896178]  caps: single_disp video 
[15946.896180]  uvd    vclk: 0 dclk: 0
[15946.896181]          power level 0    sclk: 11000 mclk: 40500 vddc: 900
[15946.896182]          power level 1    sclk: 30000 mclk: 40500 vddc: 900
[15946.896184]          power level 2    sclk: 30000 mclk: 40500 vddc: 900
[15946.896184]  status: r 
[15954.045444] switching from power state:
[15954.045446]  ui class: battery
[15954.045447]  internal class: none
[15954.045448]  caps: single_disp video 
[15954.045449]  uvd    vclk: 0 dclk: 0
[15954.045450]          power level 0    sclk: 11000 mclk: 40500 vddc: 900
[15954.045450]          power level 1    sclk: 30000 mclk: 40500 vddc: 900
[15954.045451]          power level 2    sclk: 30000 mclk: 40500 vddc: 900
[15954.045451]  status: c 
[15954.045452] switching to power state:
[15954.045452]  ui class: battery
[15954.045453]  internal class: none
[15954.045454]  caps: video 
[15954.045454]  uvd    vclk: 0 dclk: 0
[15954.045455]          power level 0    sclk: 11000 mclk: 40500 vddc: 900
[15954.045455]          power level 1    sclk: 30000 mclk: 40500 vddc: 900
[15954.045456]          power level 2    sclk: 30000 mclk: 40500 vddc: 900
[15954.045456]  status: r 
[15973.562587] switching from power state:
[15973.562591]  ui class: battery
[15973.562593]  internal class: none
[15973.562594]  caps: video 
[15973.562596]  uvd    vclk: 0 dclk: 0
[15973.562597]          power level 0    sclk: 11000 mclk: 40500 vddc: 900
[15973.562599]          power level 1    sclk: 30000 mclk: 40500 vddc: 900
[15973.562600]          power level 2    sclk: 30000 mclk: 40500 vddc: 900
[15973.562601]  status: c 
[15973.562602] switching to power state:
[15973.562603]  ui class: battery
[15973.562604]  internal class: none
[15973.562605]  caps: single_disp video 
[15973.562607]  uvd    vclk: 0 dclk: 0
[15973.562608]          power level 0    sclk: 11000 mclk: 40500 vddc: 900
[15973.562609]          power level 1    sclk: 30000 mclk: 40500 vddc: 900
[15973.562610]          power level 2    sclk: 30000 mclk: 40500 vddc: 900
[15973.562611]  status: r 
[15979.422353] switching from power state:
[15979.422355]  ui class: battery
[15979.422356]  internal class: none
[15979.422357]  caps: single_disp video 
[15979.422358]  uvd    vclk: 0 dclk: 0
[15979.422359]          power level 0    sclk: 11000 mclk: 40500 vddc: 900
[15979.422359]          power level 1    sclk: 30000 mclk: 40500 vddc: 900
[15979.422360]          power level 2    sclk: 30000 mclk: 40500 vddc: 900
[15979.422361]  status: c 
[15979.422361] switching to power state:
[15979.422361]  ui class: battery
[15979.422362]  internal class: none
[15979.422363]  caps: video 
[15979.422363]  uvd    vclk: 0 dclk: 0
[15979.422364]          power level 0    sclk: 11000 mclk: 40500 vddc: 900
[15979.422364]          power level 1    sclk: 30000 mclk: 40500 vddc: 900
[15979.422365]          power level 2    sclk: 30000 mclk: 40500 vddc: 900
[15979.422365]  status: r 
[15985.278874] switching from power state:
[15985.278878]  ui class: battery
[15985.278880]  internal class: none
[15985.278881]  caps: video 
[15985.278883]  uvd    vclk: 0 dclk: 0
[15985.278884]          power level 0    sclk: 11000 mclk: 40500 vddc: 900
[15985.278886]          power level 1    sclk: 30000 mclk: 40500 vddc: 900
[15985.278887]          power level 2    sclk: 30000 mclk: 40500 vddc: 900
[15985.278888]  status: c 
[15985.278889] switching to power state:
[15985.278890]  ui class: battery
[15985.278891]  internal class: none
[15985.278892]  caps: single_disp video 
[15985.278894]  uvd    vclk: 0 dclk: 0
[15985.278895]          power level 0    sclk: 11000 mclk: 40500 vddc: 900
[15985.278896]          power level 1    sclk: 30000 mclk: 40500 vddc: 900
[15985.278897]          power level 2    sclk: 30000 mclk: 40500 vddc: 900
[15985.278898]  status: r
Comment 15 Shawn Starr 2013-09-17 03:17:46 UTC
(In reply to comment #14)
> 
> if I recall, In both places even when playing with Second Life, I set DPM
> power state to Battery even though the laptop has AC plugged in as seen in
> this log from the latest reset:
> 

The other time was performance mode, so doesn't matter if DPM is in Performance or Battery state
Comment 16 Alex Deucher 2013-09-17 13:05:41 UTC
Since this bug was opened before dpm was released, can you reproduce the problems without dpm enabled?  If not, then these are two different issues.
Comment 17 Shawn Starr 2013-09-24 11:11:50 UTC
It seems I can cause the GPU reset with Firefox and scrolling pages, but after keeping DPM on for 2 days.

It would be good if the dri or drm had a way to capture the commands being submitted to the GPU so we could narrow down the condition that causes the reset?
Comment 18 Shawn Starr 2013-09-24 11:13:15 UTC
GPU reset: 3.12.0-0.rc1.git4.2.fc21.x86_64

[73351.965375] switching to power state:
[73351.965376]  ui class: performance
[73351.965378]  internal class: none
[73351.965380]  caps: single_disp video
[73351.965382]  uvd    vclk: 0 dclk: 0
[73351.965384]          power level 0    sclk: 11000 mclk: 40500 vddc: 900
[73351.965386]          power level 1    sclk: 30000 mclk: 70000 vddc: 1100
[73351.965388]          power level 2    sclk: 60000 mclk: 70000 vddc: 1100
[73351.965390]  status: r
[105011.490265] Bluetooth: Core ver 2.16
[105011.490671] NET: Registered protocol family 31
[105011.490672] Bluetooth: HCI device and connection manager initialized
[105011.490689] Bluetooth: HCI socket layer initialized
[105011.490691] Bluetooth: L2CAP socket layer initialized
[105011.490697] Bluetooth: SCO socket layer initialized
[105011.511026] Netfilter messages via NETLINK v0.30.
[106230.851066] device vnet1 entered promiscuous mode
[106230.855250] bridge0: port 3(vnet1) entered forwarding state
[106230.855262] bridge0: port 3(vnet1) entered forwarding state
[106245.856015] bridge0: port 3(vnet1) entered forwarding state
[109493.397651] traps: polkitd[24550] general protection ip:7fd65bd7c9d2 sp:7fff                                                                        4df759a0 error:0 in libmozjs-17.0.so[7fd65bc45000+3a7000]
[195661.081052] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[195661.088281] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000ac1936                                                                         last fence id 0x0000000000ac1935)
[195661.308503] [drm] Disabling audio 0 support
[195661.309556] radeon 0000:01:00.0: Saved 25 dwords of commands on ring 0.
[195661.309568] radeon 0000:01:00.0: GPU softreset: 0x00000009
[195661.309570] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA2231030
[195661.309573] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
[195661.309575] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200010C0
[195661.309577] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[195661.309580] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[195661.309582] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00008004
[195661.309584] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80000645
[195661.309587] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[195661.369014] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF
[195661.369070] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[195661.371178] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
[195661.371181] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
[195661.371183] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200080C0
[195661.371185] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[195661.371187] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[195661.371190] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[195661.371193] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80100000
[195661.371195] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[195661.371201] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[195661.388744] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[195661.388767] radeon 0000:01:00.0: WB enabled
[195661.388769] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x00000                                                                        00020000c00 and cpu addr 0xffff880036dd6c00
[195661.388772] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x00000                                                                        00020000c0c and cpu addr 0xffff880036dd6c0c
[195661.420215] [drm] ring test on 0 succeeded in 1 usecs
[195661.619774] [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xC                                                                        AFEDEAD)
[195661.625627] [drm:r600_resume] *ERROR* r600 startup failed on resume
[195661.631867] [drm] ib test on ring 0 succeeded in 0 usecs
Comment 19 Shawn Starr 2013-09-27 03:59:01 UTC
Currently testing with: radeon.dpm=0 radeon.dynclks=1

No crashes so far after two days of dynclks, but I am not ready to say the gpu resets are DPM related only still
Comment 20 Shawn Starr 2013-09-30 00:57:15 UTC
Created attachment 86821 [details]
Radeon crash with dynclks enabled

Radeon crash with dynclks enabled
Comment 21 Shawn Starr 2013-09-30 00:57:40 UTC
Attached crash is without DPM enabled, dynclks enabled and caused GPU reset.
Comment 22 Alex Deucher 2013-09-30 05:26:21 UTC
FWIW, the dynclks parameter doesn't actually do anything on r6xx+ asics.
Comment 23 Shawn Starr 2013-10-23 13:41:18 UTC
I'm going to disable tiling in the DDX w/ xorg config option and resume testing
Comment 24 Shawn Starr 2013-10-28 21:32:32 UTC
Using the following options caused GPU reset:

        Option     "ColorTiling"    "true"     # [<bool>]
        Option     "ColorTiling2D"  "false"      # [<bool>]
        Option     "RenderAccel"  "false"               # [<bool>]
        Option      "AccelMethod"    "exa"
        Option     "EXAPixmaps"    "True"       # [<bool>]

Whats interesting also is the GPU resets, X hangs (doesnt recover) rebooting laptop will work but when grub finishes loading the kernel the laptop hangs as the GPU is in a bad state. I have to hard power off/on for it to work again.

Testing with ColorTiling2D only enabled now,
Comment 25 Shawn Starr 2014-11-10 05:12:05 UTC
Closing, the workaround to set to performance/high generally stops this from happening now.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.