Bug 66972

Summary: GPU lockup CP stall with radeon.dpm=1 on BARTS/6850
Product: DRI Reporter: Andre Heider <a.heider>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED DUPLICATE QA Contact:
Severity: normal    
Priority: medium    
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Dmesg
none
add module parameter to disable aspm
none
debugging output
none
dmesg with mc reg dump none

Description Andre Heider 2013-07-16 17:40:36 UTC
Using 3.11-rc1 with merged drm-fixes (d1ce3d5 uvesafb: Really allow mtrr being 0, as documented and warn()ed) on top.

There're no glaring problems without radeon.dpm=1, but with dpm I get this:

[   18.930980] radeon 0000:01:00.0: GPU lockup CP stall for more than 10756msec
[   18.930983] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000003 last fence id 0x0000000000000001)
[   19.029921] radeon 0000:01:00.0: Saved 87 dwords of commands on ring 0.
[   19.030032] radeon 0000:01:00.0: GPU softreset: 0x00000108
[   19.030035] radeon 0000:01:00.0:   GRBM_STATUS               = 0xA00038A0
[   19.030037] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[   19.030039] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[   19.030041] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200200C0
[   19.030043] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[   19.030046] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x01000000
[   19.030048] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00011100
[   19.030050] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00028580
[   19.030052] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80838042
[   19.030054] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   19.221068] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00004001
[   19.221120] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000500
[   19.222267] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[   19.222270] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[   19.222272] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[   19.222274] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200200C0
[   19.222276] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[   19.222278] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[   19.222280] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[   19.222282] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[   19.222285] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[   19.222287] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   19.222293] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[   19.222299] radeon 0000:01:00.0: GPU softreset: 0x00000100
[   19.222301] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[   19.222303] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[   19.222305] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[   19.222308] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200200C0
[   19.222310] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[   19.222312] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[   19.222314] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[   19.222316] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[   19.222318] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[   19.222320] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   19.222476] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000400
[   19.223623] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[   19.223625] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[   19.223627] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[   19.223630] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200200C0
[   19.223632] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[   19.223634] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[   19.223636] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[   19.223638] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[   19.223640] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[   19.223642] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   20.283957] [drm] PCIE gen 2 link speeds already enabled
[   20.286156] [drm] PCIE GART of 512M enabled (table at 0x0000000000273000).
[   20.286273] radeon 0000:01:00.0: WB enabled
[   20.286277] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880800cb7c00
[   20.286280] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff880800cb7c0c
[   20.287780] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xffffc9000c932118
[   20.304059] [drm] ring test on 0 succeeded in 3 usecs
[   20.304131] [drm] ring test on 3 succeeded in 1 usecs
[   20.478915] [drm] ring test on 5 succeeded in 2 usecs
[   20.478930] [drm] UVD initialized successfully.
[   20.480805] [drm] ib test on ring 0 succeeded in 0 usecs
[   20.480851] [drm] ib test on ring 3 succeeded in 1 usecs
[   30.631806] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[   30.631813] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000004 last fence id 0x0000000000000002)
[   30.631816] [drm:r600_uvd_ib_test] *ERROR* radeon: fence wait failed (-35).
[   30.631822] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-35).

lightdm takes unusually long to show up due to the 1st stall. Eventually I can log in (xfce without compositing), but just running dmesg in a fullscreen terminal makes X crash (Backtrace is useless, let me know if one with debug symbols helps). 

From before the stall: dmesg|grep radeon\|drm:
[    6.441501] [drm] Initialized drm 1.1.0 20060810
[    6.547794] [drm] Memory usable by graphics device = 2048M
[    6.571559] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[    6.571560] [drm] Driver supports precise vblank timestamp query.
[    6.591921] [drm] Wrong MCH_SSKPD value: 0x20100406
[    6.591923] [drm] This can cause pipe underruns and display issues.
[    6.591924] [drm] Please upgrade your BIOS to fix this.
[    6.604625] [drm] radeon kernel modesetting enabled.
[    6.604655] fb: conflicting fb hw usage radeondrmfb vs EFI VGA - removing generic driver
[    6.837612] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[    6.837623] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[    6.837814] [drm] initializing kernel modesetting (BARTS 0x1002:0x6739 0x174B:0xE174).
[    6.837838] [drm] register mmio base: 0xF7B20000
[    6.837839] [drm] register mmio size: 131072
[    6.838016] radeon 0000:01:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
[    6.838018] radeon 0000:01:00.0: GTT: 512M 0x0000000040000000 - 0x000000005FFFFFFF
[    6.838019] [drm] Detected VRAM RAM=1024M, BAR=256M
[    6.838020] [drm] RAM width 256bits DDR
[    6.838133] [drm] radeon: 1024M of VRAM memory ready
[    6.838134] [drm] radeon: 512M of GTT memory ready.
[    6.843307] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    6.843559] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[    6.843733] [drm] Loading BARTS Microcode
[    6.861606] [drm] PCIE GART of 512M enabled (table at 0x0000000000273000).
[    6.861727] radeon 0000:01:00.0: WB enabled
[    6.861729] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880800cb7c00
[    6.861730] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff880800cb7c0c
[    6.861732] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xffffc9000c932118
[    6.861734] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[    6.861734] [drm] Driver supports precise vblank timestamp query.
[    6.861750] radeon 0000:01:00.0: irq 50 for MSI/MSI-X
[    6.861758] radeon 0000:01:00.0: radeon: using MSI.
[    6.861783] [drm] radeon: irq initialized.
[    6.878276] [drm] ring test on 0 succeeded in 3 usecs
[    6.878339] [drm] ring test on 3 succeeded in 1 usecs
[    7.063144] [drm] ring test on 5 succeeded in 2 usecs
[    7.063151] [drm] UVD initialized successfully.
[    7.063365] [drm] ib test on ring 0 succeeded in 0 usecs
[    7.063408] [drm] ib test on ring 3 succeeded in 0 usecs
[    7.214700] [drm] ib test on ring 5 succeeded
[    7.215287] [drm] Radeon Display Connectors
[    7.215289] [drm] Connector 0:
[    7.215289] [drm]   DP-4
[    7.215290] [drm]   HPD4
[    7.215291] [drm]   DDC: 0x6430 0x6430 0x6434 0x6434 0x6438 0x6438 0x643c 0x643c
[    7.215292] [drm]   Encoders:
[    7.215293] [drm]     DFP1: INTERNAL_UNIPHY2
[    7.215293] [drm] Connector 1:
[    7.215294] [drm]   HDMI-A-4
[    7.215294] [drm]   HPD3
[    7.215295] [drm]   DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c 0x646c
[    7.215296] [drm]   Encoders:
[    7.215296] [drm]     DFP2: INTERNAL_UNIPHY2
[    7.215297] [drm] Connector 2:
[    7.215298] [drm]   DVI-I-1
[    7.215298] [drm]   HPD6
[    7.215299] [drm]   DDC: 0x6470 0x6470 0x6474 0x6474 0x6478 0x6478 0x647c 0x647c
[    7.215300] [drm]   Encoders:
[    7.215300] [drm]     DFP3: INTERNAL_UNIPHY
[    7.215301] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
[    7.215302] [drm] Connector 3:
[    7.215302] [drm]   DVI-D-1
[    7.215303] [drm]   HPD1
[    7.215304] [drm]   DDC: 0x6480 0x6480 0x6484 0x6484 0x6488 0x6488 0x648c 0x648c
[    7.215304] [drm]   Encoders:
[    7.215305] [drm]     DFP4: INTERNAL_UNIPHY1
[    7.215401] [drm] Internal thermal controller with fan control
[    7.227372] [drm] radeon: dpm initialized
[    7.278110] [drm] fb mappable at 0xE0375000
[    7.278113] [drm] vram apper at 0xE0000000
[    7.278113] [drm] size 9216000
[    7.278114] [drm] fb depth is 24
[    7.278115] [drm]    pitch is 7680
[    7.278193] fbcon: radeondrmfb (fb1) is primary device
[    7.636682] radeon 0000:01:00.0: fb1: radeondrmfb frame buffer device
[    7.636684] [drm] Initialized radeon 2.34.0 20080528 for 0000:01:00.0 on minor 1
[    7.818982] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off
Comment 1 Mike Lothian 2013-07-16 18:32:25 UTC
Created attachment 82496 [details]
Dmesg

From drm-fixes-3.11
Comment 2 Mike Lothian 2013-07-16 18:33:15 UTC
I'm seeing similar things on my 6600 - it works very slowly (20fps on Lost Coast) on linus tree and doesn't launch properly on drm-fixes-3.11

It was working fine on the wip5 branch
Comment 3 Alex Deucher 2013-07-16 20:49:20 UTC
Created attachment 82502 [details] [review]
add module parameter to disable aspm

Try this patch which adds a new module parameter to disable aspm.  Add radeon.aspm=0 to your kernel command line in grub to disable aspm support.
Comment 4 Andre Heider 2013-07-16 21:34:30 UTC
Thanks, but with that patch and "radeon.dpm=1 radeon.aspm=0" I still get the stall. And now lightdm just displays garbage, before the patch it was able to come up after a delay.

Also, I noticed this "ACPI FADT declares the system doesn't support PCIe ASPM, so disable it", related?

[   18.907265] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[   18.907271] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000003 last fence id 0x0000000000000001)
[   18.985214] radeon 0000:01:00.0: Saved 87 dwords of commands on ring 0.
[   18.985225] radeon 0000:01:00.0: GPU softreset: 0x00000008
[   18.985227] radeon 0000:01:00.0:   GRBM_STATUS               = 0xA0003828
[   18.985230] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[   18.985232] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[   18.985234] radeon 0000:01:00.0:   SRBM_STATUS               = 0x20000AC0
[   18.985236] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[   18.985238] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[   18.985241] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00004100
[   18.985243] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00020180
[   18.985245] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80028042
[   18.985247] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   18.997899] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00004001
[   18.997952] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[   18.999099] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[   18.999101] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[   18.999103] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[   18.999106] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[   18.999108] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[   18.999110] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[   18.999112] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[   18.999114] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[   18.999116] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[   18.999118] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   18.999125] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[   19.020762] [drm] PCIE gen 2 link speeds already enabled
[   19.022654] [drm] PCIE GART of 512M enabled (table at 0x0000000000273000).
[   19.022749] radeon 0000:01:00.0: WB enabled
[   19.022751] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8807fff9fc00
[   19.022752] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8807fff9fc0c
[   19.024253] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xffffc9000c932118
[   19.040412] [drm] ring test on 0 succeeded in 3 usecs
[   19.040474] [drm] ring test on 3 succeeded in 1 usecs
[   19.215251] [drm] ring test on 5 succeeded in 2 usecs
[   19.215258] [drm] UVD initialized successfully.
[   19.217129] [drm] ib test on ring 0 succeeded in 0 usecs
[   19.217182] [drm] ib test on ring 3 succeeded in 1 usecs
[   29.368594] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[   29.368601] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000004 last fence id 0x0000000000000002)
[   29.368604] [drm:r600_uvd_ib_test] *ERROR* radeon: fence wait failed (-35).
[   29.368610] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-35).
Comment 5 Alex Deucher 2013-07-17 01:43:21 UTC
Created attachment 82518 [details] [review]
debugging output

Can you attach a dmesg output with dpm enabled with this patch?
Comment 6 Andre Heider 2013-07-17 15:18:01 UTC
Created attachment 82547 [details]
dmesg with mc reg dump

3.11-rc1 + drm-fixes + patch from #5
Comment 7 Alex Deucher 2013-07-17 16:40:45 UTC

*** This bug has been marked as a duplicate of bug 66932 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.