Bug 88364

Summary: Xorg hangs after videocard switching
Product: DRI Reporter: Liss <lissamour>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: major    
Priority: medium    
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg log
none
full dmesg output none

Description Liss 2015-01-13 10:33:36 UTC
Created attachment 112158 [details]
dmesg log

steps to reproduce:
1. launch anything through discrete videocard e. g. DRI_PRIME=1 glxgears

I expected that program will run through discrete videocard, but after little time xorg hanged.

Ubuntu 14.10 64-bit, kernel 3.18.1, lastest mesa and drivers from oibaf ppa.

0a:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Sun LE [Radeon HD 8550M / R5 M230] (rev ff)
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b)

cat /sys/kernel/debug/vgaswitcheroo/switch 
0:IGD:+:Pwr:0000:00:02.0
1:DIS: :DynOff:0000:0a:00.0
Comment 1 Liss 2015-01-13 19:57:37 UTC
Also, sometimes Xorg doesn't hang, but than, if I try to run something again I see that it runs though llvmpipe.

LIBGL_DEBUG=verbose DRI_PRIME=1 glxgears -info
libGL: screen 0 does not appear to be DRI3 capable
libGL error: failed to open drm device: Invalid argument
libGL error: failed to load driver: radeonsi
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/tls/swrast_dri.so
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
libGL: Can't open configuration file /home/lissamour/.drirc: No such file or directory.
libGL: Can't open configuration file /home/lissamour/.drirc: No such file or directory.
GL_RENDERER   = Gallium 0.4 on llvmpipe (LLVM 3.5, 128 bits)
GL_VERSION    = 3.0 Mesa 10.5.0-devel (git-bed6f20 2015-01-13 utopic-oibaf-ppa)
GL_VENDOR     = VMware, Inc.
Comment 2 Liss 2015-01-24 21:48:11 UTC
Looks like I found some regularity in the behavior of my Radeon card.
I I'll try to run something with Radeon after cold start my Xorg will hang. In dmesg I can found next log:
[  176.277431] radeon 0000:0a:00.0: ring 0 stalled for more than 10368msec
[  176.277443] radeon 0000:0a:00.0: GPU lockup (current fence id 0x000000000000008f last fence id 0x0000000000000092 on ring 0)
[  176.854734] radeon 0000:0a:00.0: Saved 81 dwords of commands on ring 0.
[  176.854802] radeon 0000:0a:00.0: GPU softreset: 0x00000049
[  176.854805] radeon 0000:0a:00.0:   GRBM_STATUS               = 0xE7D28028
[  176.854808] radeon 0000:0a:00.0:   GRBM_STATUS_SE0           = 0xEDC00000
[  176.854810] radeon 0000:0a:00.0:   GRBM_STATUS_SE1           = 0x00000006
[  176.854813] radeon 0000:0a:00.0:   SRBM_STATUS               = 0x200000C0
[  176.854870] radeon 0000:0a:00.0:   SRBM_STATUS2              = 0x00000000
[  176.854872] radeon 0000:0a:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  176.854875] radeon 0000:0a:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010800
[  176.854878] radeon 0000:0a:00.0:   R_00867C_CP_BUSY_STAT     = 0x00408006
[  176.854880] radeon 0000:0a:00.0:   R_008680_CP_STAT          = 0x84038647
[  176.854883] radeon 0000:0a:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  176.854886] radeon 0000:0a:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[  176.854888] radeon 0000:0a:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  176.854891] radeon 0000:0a:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  177.427677] radeon 0000:0a:00.0: GRBM_SOFT_RESET=0x0000DDFF
[  177.427732] radeon 0000:0a:00.0: SRBM_SOFT_RESET=0x00000100
[  177.427887] radeon 0000:0a:00.0:   GRBM_STATUS               = 0x8000B028
[  177.427889] radeon 0000:0a:00.0:   GRBM_STATUS_SE0           = 0x00000006
[  177.427892] radeon 0000:0a:00.0:   GRBM_STATUS_SE1           = 0x00000006
[  177.427894] radeon 0000:0a:00.0:   SRBM_STATUS               = 0x200000C0
[  177.427951] radeon 0000:0a:00.0:   SRBM_STATUS2              = 0x00000000
[  177.427954] radeon 0000:0a:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  177.427956] radeon 0000:0a:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[  177.427959] radeon 0000:0a:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[  177.427961] radeon 0000:0a:00.0:   R_008680_CP_STAT          = 0x00000000
[  177.427964] radeon 0000:0a:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  177.427966] radeon 0000:0a:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[  177.428090] radeon 0000:0a:00.0: GPU reset succeeded, trying to resume
[  177.440734] [drm] probing gen 2 caps for device 8086:9c18 = 5323c42/0
[  177.440739] [drm] PCIE gen 2 link speeds already enabled
[  177.440912] [drm] PCIE GART of 1024M enabled (table at 0x0000000000040000).
[  177.441017] radeon 0000:0a:00.0: WB enabled
[  177.441020] radeon 0000:0a:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880036b55c00
[  177.441022] radeon 0000:0a:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffff880036b55c04
[  177.441024] radeon 0000:0a:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffff880036b55c08
[  177.441026] radeon 0000:0a:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff880036b55c0c
[  177.441028] radeon 0000:0a:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffff880036b55c10
[  177.830669] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x850C)=0xCAFEDEAD)
[  177.830698] [drm:si_resume [radeon]] *ERROR* si startup failed on resume

If I restart notebook and try to run something again I'll get black window (Xorg will continue to work) and next dmesg log:
[  860.035779] [drm] probing gen 2 caps for device 8086:9c18 = 5323c42/0
[  860.035788] [drm] PCIE gen 2 link speeds already enabled
[  860.037672] [drm] PCIE GART of 1024M enabled (table at 0x0000000000040000).
[  860.037836] radeon 0000:0a:00.0: WB enabled
[  860.037842] radeon 0000:0a:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8801584dcc00
[  860.037846] radeon 0000:0a:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffff8801584dcc04
[  860.037849] radeon 0000:0a:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffff8801584dcc08
[  860.037853] radeon 0000:0a:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8801584dcc0c
[  860.037856] radeon 0000:0a:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffff8801584dcc10
[  860.233816] [drm] ring test on 0 succeeded in 1 usecs
[  860.233822] [drm] ring test on 1 succeeded in 1 usecs
[  860.233827] [drm] ring test on 2 succeeded in 1 usecs
[  860.233836] [drm] ring test on 3 succeeded in 4 usecs
[  860.233844] [drm] ring test on 4 succeeded in 4 usecs
[  860.233876] [drm] ib test on ring 0 succeeded in 0 usecs
[  860.233901] [drm] ib test on ring 1 succeeded in 0 usecs
[  860.233924] [drm] ib test on ring 2 succeeded in 0 usecs
[  860.233937] [drm] ib test on ring 3 succeeded in 0 usecs
[  860.233949] [drm] ib test on ring 4 succeeded in 0 usecs
[  870.734094] radeon 0000:0a:00.0: ring 0 stalled for more than 10276msec
[  870.734101] radeon 0000:0a:00.0: GPU lockup (current fence id 0x00000000000000b0 last fence id 0x00000000000000b2 on ring 0)
[  871.234271] radeon 0000:0a:00.0: ring 0 stalled for more than 10776msec
[  871.234277] radeon 0000:0a:00.0: GPU lockup (current fence id 0x00000000000000b0 last fence id 0x00000000000000b2 on ring 0)
Comment 3 Michel Dänzer 2015-01-28 07:10:07 UTC
(In reply to Liss from comment #2)
> If I restart notebook and try to run something again I'll get black window
> (Xorg will continue to work)

Note that DRI_PRIME currently requires a compositing manager for the contents to be visible.


> and next dmesg log:

Looks like the normal messages from powering up and initializing the Radeon GPU.
Comment 4 Liss 2015-01-29 20:47:34 UTC
(In reply to Michel Dänzer from comment #3)
> (In reply to Liss from comment #2)
> > If I restart notebook and try to run something again I'll get black window
> > (Xorg will continue to work)
> 
> Note that DRI_PRIME currently requires a compositing manager for the
> contents to be visible.

I'm using Mutter as WM, so there shouldn't be any problems with compositing.
Comment 5 Liss 2015-02-05 22:32:26 UTC
I found that last working version for me is 3.16.7. Bug occurs starting from 3.17-rc1
Comment 6 Alex Deucher 2015-02-05 22:34:47 UTC
(In reply to Liss from comment #5)
> I found that last working version for me is 3.16.7. Bug occurs starting from
> 3.17-rc1

Can you use git to bisect and figure out which commit caused the problem?
Comment 7 Liss 2015-02-09 08:13:13 UTC
Here is bisect output:
636e2582658742b94e7620becce58f939996c961 is the first bad commit
commit 636e2582658742b94e7620becce58f939996c961
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Fri Jun 6 18:43:45 2014 -0400

    drm/radeon/dpm: add support for SVI2 voltage for SI
    
    Some newer boards use SVI2 for voltage control rather
    than GPIO.
    
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 53449e086e151c24e90607b713c9bd416c658b55 72d3d0491cabfd6921a038565d8eeaca34726d5f M	drivers
Comment 8 Liss 2015-04-08 21:44:16 UTC
Error still reproducible on kernel 4.0-rc7.
Comment 9 Alex Deucher 2015-04-09 13:23:44 UTC
Can you attach a full dmesg output (i.e., I want to see the bootup output, not the hang stuff)?  Also does booting with either radeon.runpm=0 or radoen.dpm=0 on the kernel command line in grub help?
Comment 10 Liss 2015-04-10 20:05:53 UTC
Created attachment 115011 [details]
full dmesg output
Comment 11 Liss 2015-04-10 20:06:40 UTC
If I try to boot system with radeon.runpm=0 option I'll get behavior described in comment #1, and if I'll try to boot with radeon.dpm=0 I can run glxgears on radeon card, but system hangs after some time with no response even on SysRq keys.
Comment 12 Liss 2015-06-19 09:11:42 UTC
Bug still exists in 4.1-rc8.
Comment 13 Liss 2015-08-14 21:23:33 UTC
kernel 4.2-rc6, no changes
Comment 14 Martin Peres 2019-11-19 09:00:41 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/572.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.