Summary: | [r600][RV635] GPU lockup CP stall / GPU resets over and over - Kernel 3.7 to 3.12 inclusive | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Shawn Starr <shawn.starr> | ||||
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||
Status: | RESOLVED FIXED | QA Contact: | |||||
Severity: | major | ||||||
Priority: | medium | CC: | haeuslsc, nsoranzo | ||||
Version: | unspecified | ||||||
Hardware: | x86-64 (AMD64) | ||||||
OS: | Linux (All) | ||||||
Whiteboard: | |||||||
i915 platform: | i915 features: | ||||||
Attachments: |
|
Description
Shawn Starr
2013-01-21 05:07:01 UTC
Is this still an issue with the latest bits from Dave's last pull request? http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes (In reply to comment #1) > Is this still an issue with the latest bits from Dave's last pull request? > http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes to be determined, I will need to build the patch into Fedora SRPM kernel or build the kernel generic .rpm from your tree. Are you using the same userspace components (mesa and ddx) across kernels? Yes, I am What was the last working kernel? Any chance you could bisect? I suppose "GPU lockup CP stall / GPU resets over and over" is probably such a generic error that a lot of bugs could be reported without having anything to do with each other, but FWIW, I recently experienced exactly the same error shortly, usually < 30 seconds, after starting any 3d-accelerated game. I tried 3.5.7 and 3.6.11, had the same result, search around for more info, ended up in this issue, tried installing 3.8.0-rc4, and voila, problem is gone. I guess this means it's not interesting with a bisect, but I'm willing to try if it helps in any way. Alex found some HW bug issues noted internally see patches: http://lists.x.org/archives/xorg-driver-ati/2013-September/025087.html http://lists.freedesktop.org/archives/mesa-dev/2013-September/044244.html I'm going to try them out This is pending close, waiting til end of week, but so far, the fixes work, those patches listed in the bug are obsolete as the work is being shifted around but the logic however seems to fix the reset issues. Closing, I have not had any resets anymore with the respective code changes. Much thanks to Alex for finding this issue! Reopen :/ At least now i can trigger the crash repeatedly. 1) Log into Second Life first 2) You need to patch some of the GLSL programs as they will fail with Mesa 9.2 GLSL compiler - #extension GL_ARB_texture_rectangle : enable +/* #extension GL_ARB_texture_rectangle : enable */ 3) Go to the Graphics options and under Shaders, enable: - Basic Shaders - Atmospheric Shaders - Advanced Lighting Model - Ambient Occlusion - Depth of field GPU will reset: [566574.634495] switching from power state: [566574.634497] ui class: performance [566574.634498] internal class: none [566574.634500] caps: single_disp video [566574.634501] uvd vclk: 0 dclk: 0 [566574.634502] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [566574.634503] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [566574.634504] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [566574.634505] status: c [566574.634506] switching to power state: [566574.634507] ui class: performance [566574.634508] internal class: none [566574.634509] caps: video [566574.634509] uvd vclk: 0 dclk: 0 [566574.634510] power level 0 sclk: 30000 mclk: 70000 vddc: 1100 [566574.634511] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [566574.634512] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [566574.634513] status: r [566584.067826] switching from power state: [566584.067830] ui class: performance [566584.067831] internal class: none [566584.067833] caps: video [566584.067835] uvd vclk: 0 dclk: 0 [566584.067836] power level 0 sclk: 30000 mclk: 70000 vddc: 1100 [566584.067837] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [566584.067839] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [566584.067840] status: c [566584.067841] switching to power state: [566584.067842] ui class: performance [566584.067843] internal class: none [566584.067844] caps: single_disp video [566584.067846] uvd vclk: 0 dclk: 0 [566584.067847] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [566584.067848] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [566584.067849] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [566584.067850] status: r [568371.037065] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [568371.044281] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000017b4541 last fence id 0x00000000017b4531) [568371.111399] switching from power state: [568371.111401] ui class: performance [568371.111402] internal class: none [568371.111403] caps: single_disp video [568371.111403] uvd vclk: 0 dclk: 0 [568371.111405] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [568371.111405] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568371.111406] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568371.111406] status: c [568371.111407] switching to power state: [568371.111407] ui class: performance [568371.111408] internal class: none [568371.111408] caps: video [568371.111409] uvd vclk: 0 dclk: 0 [568371.111409] power level 0 sclk: 30000 mclk: 70000 vddc: 1100 [568371.111410] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568371.111410] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568371.111411] status: r [568371.544089] radeon 0000:01:00.0: GPU lockup CP stall for more than 10507msec [568371.550588] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000017b4532) [568371.550591] radeon 0000:01:00.0: failed to get a new IB (-35) [568371.555183] [drm:radeon_cs_ib_chunk] *ERROR* Failed to get ib ! [568371.561868] radeon 0000:01:00.0: Saved 505 dwords of commands on ring 0. [568371.561878] radeon 0000:01:00.0: GPU softreset: 0x00000008 [568371.561880] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0002030 [568371.561882] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [568371.561884] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200000C0 [568371.561886] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [568371.561888] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [568371.561890] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00020186 [568371.561892] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80028645 [568371.561894] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [568371.626137] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00004001 [568371.626193] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100 [568371.628298] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030 [568371.628301] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [568371.628303] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200080C0 [568371.628305] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [568371.628307] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [568371.628309] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [568371.628311] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000 [568371.628314] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [568371.628320] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [568371.646369] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [568371.646430] radeon 0000:01:00.0: WB enabled [568371.646434] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff88022fa7ec00 [568371.646436] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff88022fa7ec0c [568371.677879] [drm] ring test on 0 succeeded in 1 usecs [568371.858821] [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) [568371.866482] [drm:r600_resume] *ERROR* r600 startup failed on resume [568381.872051] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [568381.878037] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000017b4542 last fence id 0x00000000017b4532) [568381.878040] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35). [568381.885559] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35). [568381.893970] radeon 0000:01:00.0: ib ring test failed (-35). [568381.901954] radeon 0000:01:00.0: GPU softreset: 0x00000009 [568381.901957] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA2233030 [568381.901960] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [568381.901962] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200000C0 [568381.901964] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [568381.901966] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00008002 [568381.901968] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00008086 [568381.901970] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80018645 [568381.901972] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [568381.952081] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF [568381.952134] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100 [568381.954239] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030 [568381.954241] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [568381.954243] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200080C0 [568381.954245] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [568381.954247] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [568381.954249] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [568381.954251] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000 [568381.954253] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [568381.954258] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [568381.958082] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [568381.958161] radeon 0000:01:00.0: WB enabled [568381.958164] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff88022fa7ec00 [568381.958167] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff88022fa7ec0c [568381.989585] [drm] ring test on 0 succeeded in 1 usecs [568382.169970] [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) [568382.177864] [drm:r600_resume] *ERROR* r600 startup failed on resume [568382.183058] [drm] ib test on ring 0 succeeded in 0 usecs [568382.183523] switching from power state: [568382.183525] ui class: none [568382.183527] internal class: boot [568382.183528] caps: video [568382.183530] uvd vclk: 0 dclk: 0 [568382.183531] power level 0 sclk: 60000 mclk: 70000 vddc: 1100 [568382.183533] power level 1 sclk: 60000 mclk: 70000 vddc: 1100 [568382.183534] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568382.183535] status: c b [568382.183537] switching to power state: [568382.183538] ui class: performance [568382.183539] internal class: none [568382.183540] caps: video [568382.183541] uvd vclk: 0 dclk: 0 [568382.183542] power level 0 sclk: 30000 mclk: 70000 vddc: 1100 [568382.183544] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568382.183545] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568382.183546] status: r [568391.876830] switching from power state: [568391.876834] ui class: performance [568391.876835] internal class: none [568391.876837] caps: video [568391.876838] uvd vclk: 0 dclk: 0 [568391.876840] power level 0 sclk: 30000 mclk: 70000 vddc: 1100 [568391.876841] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568391.876843] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568391.876843] status: c [568391.876845] switching to power state: [568391.876846] ui class: performance [568391.876847] internal class: none [568391.876848] caps: single_disp video [568391.876850] uvd vclk: 0 dclk: 0 [568391.876851] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [568391.876852] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568391.876853] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568391.876854] status: r [568407.191396] switching from power state: [568407.191398] ui class: performance [568407.191399] internal class: none [568407.191400] caps: single_disp video [568407.191400] uvd vclk: 0 dclk: 0 [568407.191401] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [568407.191402] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568407.191402] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568407.191403] status: c [568407.191403] switching to power state: [568407.191404] ui class: performance [568407.191404] internal class: none [568407.191405] caps: video [568407.191405] uvd vclk: 0 dclk: 0 [568407.191406] power level 0 sclk: 30000 mclk: 70000 vddc: 1100 [568407.191406] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568407.191407] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568407.191407] status: r [568429.590326] switching from power state: [568429.590330] ui class: performance [568429.590332] internal class: none [568429.590333] caps: video [568429.590335] uvd vclk: 0 dclk: 0 [568429.590337] power level 0 sclk: 30000 mclk: 70000 vddc: 1100 [568429.590338] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568429.590339] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568429.590340] status: c [568429.590342] switching to power state: [568429.590343] ui class: performance [568429.590344] internal class: none [568429.590345] caps: single_disp video [568429.590347] uvd vclk: 0 dclk: 0 [568429.590348] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [568429.590349] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [568429.590350] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [568429.590351] status: r Ths is not 100% repeatable but we still can reset the GPU and that's not good If this is specific to Second Life, please update the summary. Unsure, i'm using Linux 3.12-rc0 right now, with my patched libdrm and patched Mesa builds and need to isolate if the resets are being triggered by various combinations: 1) EXA w/ composite enabled + Second Life with most GLSL programs enabled 2) GLAMOR w/ composite enabled + Second Life with most GLSL programs enabled. Let's keep this still opened, I will stress the GPU with other tests w/o Second Life in an attempt to cause the GPU to stall/reset. This is not Second Life related at all, I manged to get GPU to reset in the following way: 1) Set /sys/class/drm/card0/device/power_dpm_state to Battery and leave /sys/class/drm/card0/device/power_dpm_force_performance_level as 'auto mode. 2) have kwin enabled wih composite, rendering: XRender (not OpenGL as this will show black windows with GLAMOR) 3) Browsed a webpage in Chromium/Chrome and it suddenly GPU reset if I recall, In both places even when playing with Second Life, I set DPM power state to Battery even though the laptop has AC plugged in as seen in this log from the latest reset: [ 55.572222] bridge0: port 2(vnet0) entered forwarding state [ 55.572229] bridge0: port 2(vnet0) entered forwarding state [ 70.624026] bridge0: port 2(vnet0) entered forwarding state [ 591.264107] device vnet1 entered promiscuous mode [ 591.273419] bridge0: port 3(vnet1) entered forwarding state [ 591.273425] bridge0: port 3(vnet1) entered forwarding state [ 606.303032] bridge0: port 3(vnet1) entered forwarding state [ 610.073896] perf samples too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 50000 [ 1924.749108] switching from power state: [ 1924.749113] ui class: performance [ 1924.749115] internal class: none [ 1924.749116] caps: single_disp video [ 1924.749118] uvd vclk: 0 dclk: 0 [ 1924.749120] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [ 1924.749121] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [ 1924.749123] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [ 1924.749124] status: c [ 1924.749125] switching to power state: [ 1924.749126] ui class: battery [ 1924.749127] internal class: none [ 1924.749128] caps: single_disp video [ 1924.749130] uvd vclk: 0 dclk: 0 [ 1924.749131] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [ 1924.749132] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [ 1924.749133] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [ 1924.749134] status: r [ 6797.378014] hrtimer: interrupt took 14736 ns [15919.834055] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [15919.839527] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000011ebf9) [15919.839532] radeon 0000:01:00.0: failed to get a new IB (-35) [15919.845308] [drm:radeon_cs_ib_chunk] *ERROR* Failed to get ib ! [15920.072129] radeon 0000:01:00.0: Saved 1081 dwords of commands on ring 0. [15920.072146] radeon 0000:01:00.0: GPU softreset: 0x00000009 [15920.072149] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xE4723030 [15920.072152] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00110103 [15920.072154] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200000C0 [15920.072156] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [15920.072159] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00008002 [15920.072161] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00008086 [15920.072163] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80018645 [15920.072166] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [15920.129823] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF [15920.129880] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100 [15920.131986] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030 [15920.131989] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [15920.131991] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200080C0 [15920.131993] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [15920.131995] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [15920.131998] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [15920.132011] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000 [15920.132014] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [15920.132021] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [15920.149897] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [15920.149928] radeon 0000:01:00.0: WB enabled [15920.149931] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff88003715bc00 [15920.149934] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff88003715bc0c [15920.181446] [drm] ring test on 0 succeeded in 1 usecs [15920.389589] [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) [15920.397386] [drm:r600_resume] *ERROR* r600 startup failed on resume [15930.402047] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [15930.409147] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000011ec1b last fence id 0x000000000011ebff) [15930.409150] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35). [15930.415268] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35). [15930.422761] radeon 0000:01:00.0: ib ring test failed (-35). [15930.430056] radeon 0000:01:00.0: GPU softreset: 0x00000009 [15930.430059] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0783030 [15930.430061] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000103 [15930.430064] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200020C0 [15930.430066] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [15930.430068] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00008002 [15930.430070] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00008086 [15930.430072] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80018645 [15930.430074] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [15930.635438] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF [15930.635495] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100 [15930.637603] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030 [15930.637606] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [15930.637608] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x2000A0C0 [15930.637610] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [15930.637612] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [15930.637614] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [15930.637617] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000 [15930.637619] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [15930.637624] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [15930.800267] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [15930.800293] radeon 0000:01:00.0: WB enabled [15930.800297] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff88003715bc00 [15930.800299] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff88003715bc0c [15930.831855] [drm] ring test on 0 succeeded in 1 usecs [15931.040164] [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) [15931.047446] [drm:r600_resume] *ERROR* r600 startup failed on resume [15931.052132] [drm] ib test on ring 0 succeeded in 0 usecs [15931.052586] switching from power state: [15931.052588] ui class: none [15931.052590] internal class: boot [15931.052591] caps: video [15931.052593] uvd vclk: 0 dclk: 0 [15931.052594] power level 0 sclk: 60000 mclk: 70000 vddc: 1100 [15931.052596] power level 1 sclk: 60000 mclk: 70000 vddc: 1100 [15931.052597] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [15931.052598] status: c b [15931.052599] switching to power state: [15931.052600] ui class: battery [15931.052601] internal class: none [15931.052602] caps: single_disp video [15931.052604] uvd vclk: 0 dclk: 0 [15931.052605] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15931.052620] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15931.052621] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15931.052622] status: r [15938.711325] switching from power state: [15938.711327] ui class: battery [15938.711328] internal class: none [15938.711328] caps: single_disp video [15938.711329] uvd vclk: 0 dclk: 0 [15938.711330] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15938.711331] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15938.711331] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15938.711332] status: c [15938.711332] switching to power state: [15938.711333] ui class: battery [15938.711333] internal class: none [15938.711334] caps: video [15938.711335] uvd vclk: 0 dclk: 0 [15938.711335] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15938.711336] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15938.711336] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15938.711337] status: r [15946.896158] switching from power state: [15946.896164] ui class: battery [15946.896165] internal class: none [15946.896167] caps: video [15946.896169] uvd vclk: 0 dclk: 0 [15946.896170] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15946.896172] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15946.896173] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15946.896174] status: c [15946.896175] switching to power state: [15946.896176] ui class: battery [15946.896177] internal class: none [15946.896178] caps: single_disp video [15946.896180] uvd vclk: 0 dclk: 0 [15946.896181] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15946.896182] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15946.896184] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15946.896184] status: r [15954.045444] switching from power state: [15954.045446] ui class: battery [15954.045447] internal class: none [15954.045448] caps: single_disp video [15954.045449] uvd vclk: 0 dclk: 0 [15954.045450] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15954.045450] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15954.045451] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15954.045451] status: c [15954.045452] switching to power state: [15954.045452] ui class: battery [15954.045453] internal class: none [15954.045454] caps: video [15954.045454] uvd vclk: 0 dclk: 0 [15954.045455] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15954.045455] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15954.045456] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15954.045456] status: r [15973.562587] switching from power state: [15973.562591] ui class: battery [15973.562593] internal class: none [15973.562594] caps: video [15973.562596] uvd vclk: 0 dclk: 0 [15973.562597] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15973.562599] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15973.562600] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15973.562601] status: c [15973.562602] switching to power state: [15973.562603] ui class: battery [15973.562604] internal class: none [15973.562605] caps: single_disp video [15973.562607] uvd vclk: 0 dclk: 0 [15973.562608] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15973.562609] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15973.562610] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15973.562611] status: r [15979.422353] switching from power state: [15979.422355] ui class: battery [15979.422356] internal class: none [15979.422357] caps: single_disp video [15979.422358] uvd vclk: 0 dclk: 0 [15979.422359] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15979.422359] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15979.422360] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15979.422361] status: c [15979.422361] switching to power state: [15979.422361] ui class: battery [15979.422362] internal class: none [15979.422363] caps: video [15979.422363] uvd vclk: 0 dclk: 0 [15979.422364] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15979.422364] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15979.422365] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15979.422365] status: r [15985.278874] switching from power state: [15985.278878] ui class: battery [15985.278880] internal class: none [15985.278881] caps: video [15985.278883] uvd vclk: 0 dclk: 0 [15985.278884] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15985.278886] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15985.278887] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15985.278888] status: c [15985.278889] switching to power state: [15985.278890] ui class: battery [15985.278891] internal class: none [15985.278892] caps: single_disp video [15985.278894] uvd vclk: 0 dclk: 0 [15985.278895] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [15985.278896] power level 1 sclk: 30000 mclk: 40500 vddc: 900 [15985.278897] power level 2 sclk: 30000 mclk: 40500 vddc: 900 [15985.278898] status: r (In reply to comment #14) > > if I recall, In both places even when playing with Second Life, I set DPM > power state to Battery even though the laptop has AC plugged in as seen in > this log from the latest reset: > The other time was performance mode, so doesn't matter if DPM is in Performance or Battery state Since this bug was opened before dpm was released, can you reproduce the problems without dpm enabled? If not, then these are two different issues. It seems I can cause the GPU reset with Firefox and scrolling pages, but after keeping DPM on for 2 days. It would be good if the dri or drm had a way to capture the commands being submitted to the GPU so we could narrow down the condition that causes the reset? GPU reset: 3.12.0-0.rc1.git4.2.fc21.x86_64 [73351.965375] switching to power state: [73351.965376] ui class: performance [73351.965378] internal class: none [73351.965380] caps: single_disp video [73351.965382] uvd vclk: 0 dclk: 0 [73351.965384] power level 0 sclk: 11000 mclk: 40500 vddc: 900 [73351.965386] power level 1 sclk: 30000 mclk: 70000 vddc: 1100 [73351.965388] power level 2 sclk: 60000 mclk: 70000 vddc: 1100 [73351.965390] status: r [105011.490265] Bluetooth: Core ver 2.16 [105011.490671] NET: Registered protocol family 31 [105011.490672] Bluetooth: HCI device and connection manager initialized [105011.490689] Bluetooth: HCI socket layer initialized [105011.490691] Bluetooth: L2CAP socket layer initialized [105011.490697] Bluetooth: SCO socket layer initialized [105011.511026] Netfilter messages via NETLINK v0.30. [106230.851066] device vnet1 entered promiscuous mode [106230.855250] bridge0: port 3(vnet1) entered forwarding state [106230.855262] bridge0: port 3(vnet1) entered forwarding state [106245.856015] bridge0: port 3(vnet1) entered forwarding state [109493.397651] traps: polkitd[24550] general protection ip:7fd65bd7c9d2 sp:7fff 4df759a0 error:0 in libmozjs-17.0.so[7fd65bc45000+3a7000] [195661.081052] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [195661.088281] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000ac1936 last fence id 0x0000000000ac1935) [195661.308503] [drm] Disabling audio 0 support [195661.309556] radeon 0000:01:00.0: Saved 25 dwords of commands on ring 0. [195661.309568] radeon 0000:01:00.0: GPU softreset: 0x00000009 [195661.309570] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA2231030 [195661.309573] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [195661.309575] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200010C0 [195661.309577] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [195661.309580] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [195661.309582] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00008004 [195661.309584] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80000645 [195661.309587] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [195661.369014] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF [195661.369070] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100 [195661.371178] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030 [195661.371181] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [195661.371183] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200080C0 [195661.371185] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [195661.371187] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [195661.371190] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [195661.371193] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000 [195661.371195] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [195661.371201] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [195661.388744] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [195661.388767] radeon 0000:01:00.0: WB enabled [195661.388769] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x00000 00020000c00 and cpu addr 0xffff880036dd6c00 [195661.388772] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x00000 00020000c0c and cpu addr 0xffff880036dd6c0c [195661.420215] [drm] ring test on 0 succeeded in 1 usecs [195661.619774] [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xC AFEDEAD) [195661.625627] [drm:r600_resume] *ERROR* r600 startup failed on resume [195661.631867] [drm] ib test on ring 0 succeeded in 0 usecs Currently testing with: radeon.dpm=0 radeon.dynclks=1 No crashes so far after two days of dynclks, but I am not ready to say the gpu resets are DPM related only still Created attachment 86821 [details]
Radeon crash with dynclks enabled
Radeon crash with dynclks enabled
Attached crash is without DPM enabled, dynclks enabled and caused GPU reset. FWIW, the dynclks parameter doesn't actually do anything on r6xx+ asics. I'm going to disable tiling in the DDX w/ xorg config option and resume testing Using the following options caused GPU reset: Option "ColorTiling" "true" # [<bool>] Option "ColorTiling2D" "false" # [<bool>] Option "RenderAccel" "false" # [<bool>] Option "AccelMethod" "exa" Option "EXAPixmaps" "True" # [<bool>] Whats interesting also is the GPU resets, X hangs (doesnt recover) rebooting laptop will work but when grub finishes loading the kernel the laptop hangs as the GPU is in a bad state. I have to hard power off/on for it to work again. Testing with ColorTiling2D only enabled now, Closing, the workaround to set to performance/high generally stops this from happening now. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.