Using Linux kernel 3.7 and up to 3.8-rc3 Unable to have a stable session with my RV635 GPU Jan 19 03:45:26 segfault kernel: [15008.313696] radeon 0000:01:00.0: Saved 185 dwords of commands on ring 0. Jan 19 03:45:26 segfault kernel: [15008.313704] radeon 0000:01:00.0: GPU softreset Jan 19 03:45:26 segfault kernel: [15008.313711] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xA0003030 Jan 19 03:45:26 segfault kernel: [15008.313717] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00000003 Jan 19 03:45:26 segfault kernel: [15008.313723] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200000C0 Jan 19 03:45:26 segfault kernel: [15008.313730] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jan 19 03:45:26 segfault kernel: [15008.313736] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 Jan 19 03:45:26 segfault kernel: [15008.313742] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000006 Jan 19 03:45:26 segfault kernel: [15008.313748] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80000645 Jan 19 03:45:26 segfault kernel: [15008.313761] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE Jan 19 03:45:26 segfault kernel: [15008.328772] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001 Jan 19 03:45:26 segfault kernel: [15008.344782] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xA0003030 Jan 19 03:45:26 segfault kernel: [15008.344785] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00000003 Jan 19 03:45:26 segfault kernel: [15008.344787] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200080C0 Jan 19 03:45:26 segfault kernel: [15008.344789] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jan 19 03:45:26 segfault kernel: [15008.344792] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 Jan 19 03:45:26 segfault kernel: [15008.344794] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 Jan 19 03:45:26 segfault kernel: [15008.344797] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000 Jan 19 03:45:26 segfault kernel: [15008.345799] radeon 0000:01:00.0: GPU reset succeeded, trying to resume Jan 19 03:45:26 segfault kernel: [15008.348414] [drm] probing gen 2 caps for device 8086:2a41 = 1/0 Jan 19 03:45:26 segfault kernel: [15008.350360] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). Jan 19 03:45:26 segfault kernel: [15008.350399] radeon 0000:01:00.0: WB enabled Jan 19 03:45:26 segfault kernel: [15008.350403] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff880229236c00 Jan 19 03:45:26 segfault kernel: [15008.381778] [drm] ring test on 0 succeeded in 1 usecs Jan 19 03:45:26 segfault kernel: [15008.384549] [drm] ib test on ring 0 succeeded in 0 usecs Jan 19 03:46:12 segfault kernel: [15053.625108] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec ... Jan 19 03:46:12 segfault kernel: [15053.975428] radeon 0000:01:00.0: Wait for MC idle timedout ! Jan 19 03:46:12 segfault kernel: [15054.123890] radeon 0000:01:00.0: Wait for MC idle timedout ! Jan 19 03:46:12 segfault kernel: [15054.125748] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). Jan 19 03:46:12 segfault kernel: [15054.125785] radeon 0000:01:00.0: WB enabled Jan 19 03:46:12 segfault kernel: [15054.125789] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff880229236c00 Jan 19 03:46:12 segfault kernel: [15054.157608] [drm] ring test on 0 succeeded in 0 usecs Jan 19 03:46:23 segfault kernel: [15064.657103] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec Jan 19 03:46:23 segfault kernel: [15064.657114] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000441b6 last fence id 0x00000000000441a8) Jan 19 03:46:23 segfault kernel: [15064.657121] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35). Jan 19 03:46:23 segfault kernel: [15064.657134] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35). Jan 19 03:46:23 segfault kernel: [15064.657140] radeon 0000:01:00.0: ib ring test failed (-35). Jan 19 03:46:23 segfault kernel: [15064.658211] radeon 0000:01:00.0: GPU softreset Jan 19 03:46:23 segfault kernel: [15064.658218] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xE57C24E0 Jan 19 03:46:23 segfault kernel: [15064.658224] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00113303 Jan 19 03:46:23 segfault kernel: [15064.658230] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200030C0 Jan 19 03:46:23 segfault kernel: [15064.658236] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x01000000 Jan 19 03:46:23 segfault kernel: [15064.658242] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00001002 Jan 19 03:46:23 segfault kernel: [15064.658248] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00028482 Jan 19 03:46:23 segfault kernel: [15064.658254] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80838645 Jan 19 03:46:23 segfault kernel: [15064.829116] radeon 0000:01:00.0: Wait for MC idle timedout ! Jan 19 03:46:23 segfault kernel: [15064.829123] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE Jan 19 03:46:23 segfault kernel: [15064.844133] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001 Jan 19 03:46:23 segfault kernel: [15064.860144] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xA0003030 Jan 19 03:46:23 segfault kernel: [15064.860150] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00000003 an 19 03:46:23 segfault kernel: [15064.860163] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x2000B0C0 Jan 19 03:46:23 segfault kernel: [15064.860169] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jan 19 03:46:23 segfault kernel: [15064.860175] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 Jan 19 03:46:23 segfault kernel: [15064.860181] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 Jan 19 03:46:23 segfault kernel: [15064.860191] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000 Jan 19 03:46:23 segfault kernel: [15064.861197] radeon 0000:01:00.0: GPU reset succeeded, trying to resume Jan 19 04:39:23 segfault kernel: [ 2791.671107] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35). Jan 19 04:39:23 segfault kernel: [ 2791.671115] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35). Then floods console with [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! radeon 0000:01:00.0: couldn't schedule ib (over and over) mesa-dri-drivers-9.0.1-3.fc18.x86_64 libdrm-2.4.40-1.fc18.x86_64 kernels: kernel-3.7.3-201.fc18.x86_64, kernel-devel-3.8.0-0.rc3.git1.2.fc19.x86_64 I have not tried on 3.8-rc4 yet Laptop: Lenovo ThinkPad W500
Is this still an issue with the latest bits from Dave's last pull request? http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes
(In reply to comment #1) > Is this still an issue with the latest bits from Dave's last pull request? > http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes to be determined, I will need to build the patch into Fedora SRPM kernel or build the kernel generic .rpm from your tree.
Are you using the same userspace components (mesa and ddx) across kernels?
Yes, I am
What was the last working kernel? Any chance you could bisect?
I suppose "GPU lockup CP stall / GPU resets over and over" is probably such a generic error that a lot of bugs could be reported without having anything to do with each other, but FWIW, I recently experienced exactly the same error shortly, usually < 30 seconds, after starting any 3d-accelerated game. I tried 3.5.7 and 3.6.11, had the same result, search around for more info, ended up in this issue, tried installing 3.8.0-rc4, and voila, problem is gone. I guess this means it's not interesting with a bisect, but I'm willing to try if it helps in any way.
Alex found some HW bug issues noted internally see patches: http://lists.x.org/archives/xorg-driver-ati/2013-September/025087.html http://lists.freedesktop.org/archives/mesa-dev/2013-September/044244.html I'm going to try them out
This is pending close, waiting til end of week, but so far, the fixes work, those patches listed in the bug are obsolete as the work is being shifted around but the logic however seems to fix the reset issues.
Closing, I have not had any resets anymore with the respective code changes. Much thanks to Alex for finding this issue!
Reopen :/ At least now i can trigger the crash repeatedly. 1) Log into Second Life first 2) You need to patch some of the GLSL programs as they will fail with Mesa 9.2 GLSL compiler - #extension GL_ARB_texture_rectangle : enable +/* #extension GL_ARB_texture_rectangle : enable */ 3) Go to the Graphics options and under Shaders, enable: - Basic Shaders - Atmospheric Shaders - Advanced Lighting Model - Ambient Occlusion - Depth of field GPU will reset: [566574.634495] switching from power state: [566574.634497] ui class: performance [566574.634498] internal class: none [566574.634500] caps: single_disp video [566574.634501] uvd vclk: 0 dclk: 0 [566574.634502] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [566574.634503] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [566574.634504] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [566574.634505] status: c [566574.634506] switching to power state: [566574.634507] ui class: performance [566574.634508] internal class: none [566574.634509] caps: video [566574.634509] uvd vclk: 0 dclk: 0 [566574.634510] power level 0 sclk: 30000 mclk: 70000 vddc: 1100 [566574.634511] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [566574.634512] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [566574.634513] status: r [566584.067826] switching from power state: [566584.067830] ui class: performance [566584.067831] internal class: none [566584.067833] caps: video [566584.067835] uvd vclk: 0 dclk: 0 [566584.067836] power level 0 sclk: 30000 mclk: 70000 vddc: 1100 [566584.067837] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [566584.067839] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [566584.067840] status: c [566584.067841] switching to power state: [566584.067842] ui class: performance [566584.067843] internal class: none [566584.067844] caps: single_disp video [566584.067846] uvd vclk: 0 dclk: 0 [566584.067847] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [566584.067848] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [566584.067849] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [566584.067850] status: r [568371.037065] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [568371.044281] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000017b4541 last fence id 0x00000000017b4531) [568371.111399] switching from power state: [568371.111401] ui class: performance [568371.111402] internal class: none [568371.111403] caps: single_disp video [568371.111403] uvd vclk: 0 dclk: 0 [568371.111405] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [568371.111405] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568371.111406] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568371.111406] status: c [568371.111407] switching to power state: [568371.111407] ui class: performance [568371.111408] internal class: none [568371.111408] caps: video [568371.111409] uvd vclk: 0 dclk: 0 [568371.111409] power level 0 sclk: 30000 mclk: 70000 vddc: 1100 [568371.111410] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568371.111410] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568371.111411] status: r [568371.544089] radeon 0000:01:00.0: GPU lockup CP stall for more than 10507msec [568371.550588] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000017b4532) [568371.550591] radeon 0000:01:00.0: failed to get a new IB (-35) [568371.555183] [drm:radeon_cs_ib_chunk] *ERROR* Failed to get ib ! [568371.561868] radeon 0000:01:00.0: Saved 505 dwords of commands on ring 0. [568371.561878] radeon 0000:01:00.0: GPU softreset: 0x00000008 [568371.561880] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0002030 [568371.561882] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [568371.561884] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200000C0 [568371.561886] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [568371.561888] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [568371.561890] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00020186 [568371.561892] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80028645 [568371.561894] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [568371.626137] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00004001 [568371.626193] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100 [568371.628298] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030 [568371.628301] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [568371.628303] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200080C0 [568371.628305] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [568371.628307] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [568371.628309] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [568371.628311] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000 [568371.628314] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [568371.628320] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [568371.646369] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [568371.646430] radeon 0000:01:00.0: WB enabled [568371.646434] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff88022fa7ec00 [568371.646436] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff88022fa7ec0c [568371.677879] [drm] ring test on 0 succeeded in 1 usecs [568371.858821] [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) [568371.866482] [drm:r600_resume] *ERROR* r600 startup failed on resume [568381.872051] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [568381.878037] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000017b4542 last fence id 0x00000000017b4532) [568381.878040] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35). [568381.885559] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35). [568381.893970] radeon 0000:01:00.0: ib ring test failed (-35). [568381.901954] radeon 0000:01:00.0: GPU softreset: 0x00000009 [568381.901957] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA2233030 [568381.901960] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [568381.901962] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200000C0 [568381.901964] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [568381.901966] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00008002 [568381.901968] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00008086 [568381.901970] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80018645 [568381.901972] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [568381.952081] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF [568381.952134] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100 [568381.954239] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030 [568381.954241] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [568381.954243] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200080C0 [568381.954245] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [568381.954247] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [568381.954249] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [568381.954251] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000 [568381.954253] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [568381.954258] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [568381.958082] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [568381.958161] radeon 0000:01:00.0: WB enabled [568381.958164] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff88022fa7ec00 [568381.958167] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff88022fa7ec0c [568381.989585] [drm] ring test on 0 succeeded in 1 usecs [568382.169970] [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) [568382.177864] [drm:r600_resume] *ERROR* r600 startup failed on resume [568382.183058] [drm] ib test on ring 0 succeeded in 0 usecs [568382.183523] switching from power state: [568382.183525] ui class: none [568382.183527] internal class: boot [568382.183528] caps: video [568382.183530] uvd vclk: 0 dclk: 0 [568382.183531] power level 0 sclk: 60000 mclk: 70000 vddc: 1100 [568382.183533] power level 1 sclk: 60000 mclk: 70000 vddc: 1100 [568382.183534] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568382.183535] status: c b [568382.183537] switching to power state: [568382.183538] ui class: performance [568382.183539] internal class: none [568382.183540] caps: video [568382.183541] uvd vclk: 0 dclk: 0 [568382.183542] power level 0 sclk: 30000 mclk: 70000 vddc: 1100 [568382.183544] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568382.183545] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568382.183546] status: r [568391.876830] switching from power state: [568391.876834] ui class: performance [568391.876835] internal class: none [568391.876837] caps: video [568391.876838] uvd vclk: 0 dclk: 0 [568391.876840] power level 0 sclk: 30000 mclk: 70000 vddc: 1100 [568391.876841] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568391.876843] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568391.876843] status: c [568391.876845] switching to power state: [568391.876846] ui class: performance [568391.876847] internal class: none [568391.876848] caps: single_disp video [568391.876850] uvd vclk: 0 dclk: 0 [568391.876851] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [568391.876852] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568391.876853] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568391.876854] status: r [568407.191396] switching from power state: [568407.191398] ui class: performance [568407.191399] internal class: none [568407.191400] caps: single_disp video [568407.191400] uvd vclk: 0 dclk: 0 [568407.191401] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [568407.191402] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568407.191402] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568407.191403] status: c [568407.191403] switching to power state: [568407.191404] ui class: performance [568407.191404] internal class: none [568407.191405] caps: video [568407.191405] uvd vclk: 0 dclk: 0 [568407.191406] power level 0 sclk: 30000 mclk: 70000 vddc: 1100 [568407.191406] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568407.191407] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568407.191407] status: r [568429.590326] switching from power state: [568429.590330] ui class: performance [568429.590332] internal class: none [568429.590333] caps: video [568429.590335] uvd vclk: 0 dclk: 0 [568429.590337] power level 0 sclk: 30000 mclk: 70000 vddc: 1100 [568429.590338] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568429.590339] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568429.590340] status: c [568429.590342] switching to power state: [568429.590343] ui class: performance [568429.590344] internal class: none [568429.590345] caps: single_disp video [568429.590347] uvd vclk: 0 dclk: 0 [568429.590348] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [568429.590349] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568429.590350] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568429.590351] status: r
Ths is not 100% repeatable but we still can reset the GPU and that's not good
If this is specific to Second Life, please update the summary.
Unsure, i'm using Linux 3.12-rc0 right now, with my patched libdrm and patched Mesa builds and need to isolate if the resets are being triggered by various combinations: 1) EXA w/ composite enabled + Second Life with most GLSL programs enabled 2) GLAMOR w/ composite enabled + Second Life with most GLSL programs enabled. Let's keep this still opened, I will stress the GPU with other tests w/o Second Life in an attempt to cause the GPU to stall/reset.
This is not Second Life related at all, I manged to get GPU to reset in the following way: 1) Set /sys/class/drm/card0/device/power_dpm_state to Battery and leave /sys/class/drm/card0/device/power_dpm_force_performance_level as 'auto mode. 2) have kwin enabled wih composite, rendering: XRender (not OpenGL as this will show black windows with GLAMOR) 3) Browsed a webpage in Chromium/Chrome and it suddenly GPU reset if I recall, In both places even when playing with Second Life, I set DPM power state to Battery even though the laptop has AC plugged in as seen in this log from the latest reset: [ 55.572222] bridge0: port 2(vnet0) entered forwarding state [ 55.572229] bridge0: port 2(vnet0) entered forwarding state [ 70.624026] bridge0: port 2(vnet0) entered forwarding state [ 591.264107] device vnet1 entered promiscuous mode [ 591.273419] bridge0: port 3(vnet1) entered forwarding state [ 591.273425] bridge0: port 3(vnet1) entered forwarding state [ 606.303032] bridge0: port 3(vnet1) entered forwarding state [ 610.073896] perf samples too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 50000 [ 1924.749108] switching from power state: [ 1924.749113] ui class: performance [ 1924.749115] internal class: none [ 1924.749116] caps: single_disp video [ 1924.749118] uvd vclk: 0 dclk: 0 [ 1924.749120] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [ 1924.749121] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [ 1924.749123] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [ 1924.749124] status: c [ 1924.749125] switching to power state: [ 1924.749126] ui class: battery [ 1924.749127] internal class: none [ 1924.749128] caps: single_disp video [ 1924.749130] uvd vclk: 0 dclk: 0 [ 1924.749131] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [ 1924.749132] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [ 1924.749133] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [ 1924.749134] status: r [ 6797.378014] hrtimer: interrupt took 14736 ns [15919.834055] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [15919.839527] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000011ebf9) [15919.839532] radeon 0000:01:00.0: failed to get a new IB (-35) [15919.845308] [drm:radeon_cs_ib_chunk] *ERROR* Failed to get ib ! [15920.072129] radeon 0000:01:00.0: Saved 1081 dwords of commands on ring 0. [15920.072146] radeon 0000:01:00.0: GPU softreset: 0x00000009 [15920.072149] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xE4723030 [15920.072152] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00110103 [15920.072154] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200000C0 [15920.072156] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [15920.072159] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00008002 [15920.072161] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00008086 [15920.072163] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80018645 [15920.072166] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [15920.129823] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF [15920.129880] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100 [15920.131986] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030 [15920.131989] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [15920.131991] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200080C0 [15920.131993] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [15920.131995] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [15920.131998] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [15920.132011] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000 [15920.132014] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [15920.132021] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [15920.149897] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [15920.149928] radeon 0000:01:00.0: WB enabled [15920.149931] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff88003715bc00 [15920.149934] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff88003715bc0c [15920.181446] [drm] ring test on 0 succeeded in 1 usecs [15920.389589] [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) [15920.397386] [drm:r600_resume] *ERROR* r600 startup failed on resume [15930.402047] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [15930.409147] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000011ec1b last fence id 0x000000000011ebff) [15930.409150] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35). [15930.415268] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35). [15930.422761] radeon 0000:01:00.0: ib ring test failed (-35). [15930.430056] radeon 0000:01:00.0: GPU softreset: 0x00000009 [15930.430059] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0783030 [15930.430061] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000103 [15930.430064] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200020C0 [15930.430066] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [15930.430068] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00008002 [15930.430070] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00008086 [15930.430072] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80018645 [15930.430074] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [15930.635438] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF [15930.635495] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100 [15930.637603] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030 [15930.637606] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [15930.637608] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x2000A0C0 [15930.637610] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [15930.637612] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [15930.637614] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [15930.637617] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000 [15930.637619] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [15930.637624] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [15930.800267] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [15930.800293] radeon 0000:01:00.0: WB enabled [15930.800297] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff88003715bc00 [15930.800299] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff88003715bc0c [15930.831855] [drm] ring test on 0 succeeded in 1 usecs [15931.040164] [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) [15931.047446] [drm:r600_resume] *ERROR* r600 startup failed on resume [15931.052132] [drm] ib test on ring 0 succeeded in 0 usecs [15931.052586] switching from power state: [15931.052588] ui class: none [15931.052590] internal class: boot [15931.052591] caps: video [15931.052593] uvd vclk: 0 dclk: 0 [15931.052594] power level 0 sclk: 60000 mclk: 70000 vddc: 1100 [15931.052596] power level 1 sclk: 60000 mclk: 70000 vddc: 1100 [15931.052597] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [15931.052598] status: c b [15931.052599] switching to power state: [15931.052600] ui class: battery [15931.052601] internal class: none [15931.052602] caps: single_disp video [15931.052604] uvd vclk: 0 dclk: 0 [15931.052605] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15931.052620] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15931.052621] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15931.052622] status: r [15938.711325] switching from power state: [15938.711327] ui class: battery [15938.711328] internal class: none [15938.711328] caps: single_disp video [15938.711329] uvd vclk: 0 dclk: 0 [15938.711330] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15938.711331] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15938.711331] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15938.711332] status: c [15938.711332] switching to power state: [15938.711333] ui class: battery [15938.711333] internal class: none [15938.711334] caps: video [15938.711335] uvd vclk: 0 dclk: 0 [15938.711335] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15938.711336] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15938.711336] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15938.711337] status: r [15946.896158] switching from power state: [15946.896164] ui class: battery [15946.896165] internal class: none [15946.896167] caps: video [15946.896169] uvd vclk: 0 dclk: 0 [15946.896170] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15946.896172] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15946.896173] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15946.896174] status: c [15946.896175] switching to power state: [15946.896176] ui class: battery [15946.896177] internal class: none [15946.896178] caps: single_disp video [15946.896180] uvd vclk: 0 dclk: 0 [15946.896181] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15946.896182] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15946.896184] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15946.896184] status: r [15954.045444] switching from power state: [15954.045446] ui class: battery [15954.045447] internal class: none [15954.045448] caps: single_disp video [15954.045449] uvd vclk: 0 dclk: 0 [15954.045450] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15954.045450] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15954.045451] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15954.045451] status: c [15954.045452] switching to power state: [15954.045452] ui class: battery [15954.045453] internal class: none [15954.045454] caps: video [15954.045454] uvd vclk: 0 dclk: 0 [15954.045455] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15954.045455] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15954.045456] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15954.045456] status: r [15973.562587] switching from power state: [15973.562591] ui class: battery [15973.562593] internal class: none [15973.562594] caps: video [15973.562596] uvd vclk: 0 dclk: 0 [15973.562597] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15973.562599] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15973.562600] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15973.562601] status: c [15973.562602] switching to power state: [15973.562603] ui class: battery [15973.562604] internal class: none [15973.562605] caps: single_disp video [15973.562607] uvd vclk: 0 dclk: 0 [15973.562608] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15973.562609] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15973.562610] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15973.562611] status: r [15979.422353] switching from power state: [15979.422355] ui class: battery [15979.422356] internal class: none [15979.422357] caps: single_disp video [15979.422358] uvd vclk: 0 dclk: 0 [15979.422359] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15979.422359] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15979.422360] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15979.422361] status: c [15979.422361] switching to power state: [15979.422361] ui class: battery [15979.422362] internal class: none [15979.422363] caps: video [15979.422363] uvd vclk: 0 dclk: 0 [15979.422364] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15979.422364] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15979.422365] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15979.422365] status: r [15985.278874] switching from power state: [15985.278878] ui class: battery [15985.278880] internal class: none [15985.278881] caps: video [15985.278883] uvd vclk: 0 dclk: 0 [15985.278884] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15985.278886] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15985.278887] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15985.278888] status: c [15985.278889] switching to power state: [15985.278890] ui class: battery [15985.278891] internal class: none [15985.278892] caps: single_disp video [15985.278894] uvd vclk: 0 dclk: 0 [15985.278895] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15985.278896] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15985.278897] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15985.278898] status: r
(In reply to comment #14) > > if I recall, In both places even when playing with Second Life, I set DPM > power state to Battery even though the laptop has AC plugged in as seen in > this log from the latest reset: > The other time was performance mode, so doesn't matter if DPM is in Performance or Battery state
Since this bug was opened before dpm was released, can you reproduce the problems without dpm enabled? If not, then these are two different issues.
It seems I can cause the GPU reset with Firefox and scrolling pages, but after keeping DPM on for 2 days. It would be good if the dri or drm had a way to capture the commands being submitted to the GPU so we could narrow down the condition that causes the reset?
GPU reset: 3.12.0-0.rc1.git4.2.fc21.x86_64 [73351.965375] switching to power state: [73351.965376] ui class: performance [73351.965378] internal class: none [73351.965380] caps: single_disp video [73351.965382] uvd vclk: 0 dclk: 0 [73351.965384] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [73351.965386] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [73351.965388] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [73351.965390] status: r [105011.490265] Bluetooth: Core ver 2.16 [105011.490671] NET: Registered protocol family 31 [105011.490672] Bluetooth: HCI device and connection manager initialized [105011.490689] Bluetooth: HCI socket layer initialized [105011.490691] Bluetooth: L2CAP socket layer initialized [105011.490697] Bluetooth: SCO socket layer initialized [105011.511026] Netfilter messages via NETLINK v0.30. [106230.851066] device vnet1 entered promiscuous mode [106230.855250] bridge0: port 3(vnet1) entered forwarding state [106230.855262] bridge0: port 3(vnet1) entered forwarding state [106245.856015] bridge0: port 3(vnet1) entered forwarding state [109493.397651] traps: polkitd[24550] general protection ip:7fd65bd7c9d2 sp:7fff 4df759a0 error:0 in libmozjs-17.0.so[7fd65bc45000+3a7000] [195661.081052] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [195661.088281] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000ac1936 last fence id 0x0000000000ac1935) [195661.308503] [drm] Disabling audio 0 support [195661.309556] radeon 0000:01:00.0: Saved 25 dwords of commands on ring 0. [195661.309568] radeon 0000:01:00.0: GPU softreset: 0x00000009 [195661.309570] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA2231030 [195661.309573] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [195661.309575] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200010C0 [195661.309577] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [195661.309580] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [195661.309582] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00008004 [195661.309584] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80000645 [195661.309587] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [195661.369014] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF [195661.369070] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100 [195661.371178] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030 [195661.371181] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [195661.371183] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200080C0 [195661.371185] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [195661.371187] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [195661.371190] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [195661.371193] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000 [195661.371195] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [195661.371201] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [195661.388744] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [195661.388767] radeon 0000:01:00.0: WB enabled [195661.388769] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x00000 00020000c00 and cpu addr 0xffff880036dd6c00 [195661.388772] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x00000 00020000c0c and cpu addr 0xffff880036dd6c0c [195661.420215] [drm] ring test on 0 succeeded in 1 usecs [195661.619774] [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xC AFEDEAD) [195661.625627] [drm:r600_resume] *ERROR* r600 startup failed on resume [195661.631867] [drm] ib test on ring 0 succeeded in 0 usecs
Currently testing with: radeon.dpm=0 radeon.dynclks=1 No crashes so far after two days of dynclks, but I am not ready to say the gpu resets are DPM related only still
Created attachment 86821 [details] Radeon crash with dynclks enabled Radeon crash with dynclks enabled
Attached crash is without DPM enabled, dynclks enabled and caused GPU reset.
FWIW, the dynclks parameter doesn't actually do anything on r6xx+ asics.
I'm going to disable tiling in the DDX w/ xorg config option and resume testing
Using the following options caused GPU reset: Option "ColorTiling" "true" # [<bool>] Option "ColorTiling2D" "false" # [<bool>] Option "RenderAccel" "false" # [<bool>] Option "AccelMethod" "exa" Option "EXAPixmaps" "True" # [<bool>] Whats interesting also is the GPU resets, X hangs (doesnt recover) rebooting laptop will work but when grub finishes loading the kernel the laptop hangs as the GPU is in a bad state. I have to hard power off/on for it to work again. Testing with ColorTiling2D only enabled now,
Closing, the workaround to set to performance/high generally stops this from happening now.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.