Bug 99881

Summary: Lockup/Freezes on Laptop with switchable graphics
Product: DRI Reporter: Matthew Fox <matthew>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg log
none
lspci.log
none
dmesg.log 2
none
Xorg.1.log
none
Xorg.0.log
none
xrandr.log
none
Xorg log before xrandr
none
Xorg log after xrandr
none
vgaswitcheroo switch before xrandr
none
vgaswitcheroo switch after xrandr
none
dmesg before xrandr
none
dmesg after xrandr
none
xrandr.log 2
none
Xorg.0.log
none
Xorg.1.log
none
dmesg none

Description Matthew Fox 2017-02-21 00:06:09 UTC
Created attachment 129781 [details]
dmesg log

Hi,

I have a HP Pavilion dv6-3111sa laptop (circa 2010) with 2 GPUs:

01:05.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] RS880M [Mobility Radeon HD 4225/4250] [1002:9712]
02:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Park [Mobility Radeon HD 5430/5450/5470] [1002:68e0] (rev ff)

I am running Ubuntu 16.04.2 with kernel Ubuntu 4.8.0-36.36~16.04.1-generic 4.8.11

The screen usually freezes for a fraction of a second and then again a few seconds later. It may do this several times. In addition, the computer usually locks up before/after graphical login requiring a hard shutdown, although it doesn't always lock up. It seems to be preventing the computer from shutting down normally as well.

This appears in dmesg output whenever a freeze occurs:

  186.427140] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[  186.431201] [drm] PCIE GART of 512M enabled (table at 0x000000000014C000).
[  186.431293] radeon 0000:02:00.0: WB enabled
[  186.431301] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff958c0f4f3c00
[  186.431306] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff958c0f4f3c0c
[  186.431703] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffad3d81a1c418
[  186.447926] [drm] ring test on 0 succeeded in 1 usecs
[  186.447934] [drm] ring test on 3 succeeded in 2 usecs
[  186.634582] [drm] ring test on 5 succeeded in 1 usecs
[  186.634592] [drm] UVD initialized successfully.
[  186.634648] [drm] ib test on ring 0 succeeded in 0 usecs
[  186.634686] [drm] ib test on ring 3 succeeded in 0 usecs
[  186.805724] [drm] ib test on ring 5 succeeded
[  186.838322] snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo
[  186.942052] snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535
[  196.033454] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo
[  196.646111] snd_hda_intel 0000:02:00.1: Cannot lock devices!

Adding radeon.runpm=0 to my boot cmdline solves the issues as a workaround.

With previous ubuntu/kernel versions, the main issue was the freezing which would happen every seven seconds with the corresponding dmesg block. This would continue ad infinitum, although on rare occasions it would stop after many freezes. However with my current kernel this pattern doesn't seem to occur - it freezes a few times before the freezing stops and the freezes do not occur at regular intervals.

I'm not sure if this is a graphics or sound issue from the dmesg block. There's also some ACPI errors in the dmesg log so maybe a firmware problem, or faulty hardware? I tried some lower level debugging previously but couldn't conclude anything.

Thanks for any assistance.
Comment 1 Matthew Fox 2017-02-21 00:10:32 UTC
Created attachment 129782 [details]
lspci.log
Comment 2 Michel Dänzer 2017-02-21 01:21:47 UTC
It sounds like you have the environment variable DRI_PRIME=1 set for all applications?

Those dmesg messages are normal when the dedicated GPU is powered up, which takes some time. With runpm enabled, it's powered off automatically when nothing uses it for a while.
Comment 3 Matthew Fox 2017-02-21 01:52:14 UTC
Hi Michel,

Just a slight correction to my description - I am running Ubuntu Gnome 16.04.2.

This is a fresh install and I have not set that env var anywhere. Where could I check for that?

Does that mean with radeon.runpm=0 the laptop would be using more power & generating more heat?

Thanks

Matthew
Comment 4 Michel Dänzer 2017-02-21 02:07:45 UTC
(In reply to Matthew Fox from comment #3)
> This is a fresh install and I have not set that env var anywhere. Where
> could I check for that?

What does

 env | grep DRI_

say?


> Does that mean with radeon.runpm=0 the laptop would be using more power &
> generating more heat?

Yes (assuming the dedicated GPU is off most of the time with runpm on).
Comment 5 Matthew Fox 2017-02-21 02:31:46 UTC
> What does
> 
>  env | grep DRI_
> 
> say?

That printed nothing.

My session with runtime pm enabled (no radeon.runpm=0 in cmdline) had been running for a couple of hours without problem (apart from a bit of freezing at the start). However, just after running that command, some new radeon errors appeared in dmesg that I haven't seen before. I think they were ring test failures. The PC has locked up now anyway so I can only hard shut it down. I was switching ttys with CTRL+ALT at the same time which might have caused it.
Comment 6 Michel Dänzer 2017-02-21 02:55:09 UTC
Note that you should run

 env | grep DRI_

in an X terminal, not in a console TTY.
Comment 7 Matthew Fox 2017-02-21 03:08:15 UTC
(In reply to Michel Dänzer from comment #6)
> Note that you should run
> 
>  env | grep DRI_
> 
> in an X terminal, not in a console TTY.

Same result in both :/
Comment 8 Michel Dänzer 2017-02-21 03:11:20 UTC
Please attach the corresponding Xorg log file.
Comment 9 Matthew Fox 2017-02-21 04:09:19 UTC
(In reply to Michel Dänzer from comment #8)
> Please attach the corresponding Xorg log file.

Hi,

The only Xorg logs I have are for my new session. They weren't in /var/log/ but

/home/matthew/.local/share/xorg/Xorg.1.log
/var/lib/gdm3/.local/share/xorg/Xorg.0.log

for some reason. They are attached.

Also attached is a dmesg log for my current session.

When I said:

With previous ubuntu/kernel versions, the main issue was the freezing which would happen every seven seconds with the corresponding dmesg block. This would continue ad infinitum, although on rare occasions it would stop after many freezes. However with my current kernel this pattern doesn't seem to occur - it freezes a few times before the freezing stops and the freezes do not occur at regular intervals.

- this seems to be true of my current kernel. From the current dmesg.log, the 'Disabling via vga_switcheroo' happened at 14, 33, 41 and finally 48 (seven seconds apart, except 14-33):

[   14.146303] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo
[   15.586313] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[   15.588655] [drm] PCIE GART of 512M enabled (table at 0x000000000014C000).
[   15.588728] radeon 0000:02:00.0: WB enabled
[   15.588731] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff90180fa71c00
[   15.588733] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff90180fa71c0c
[   15.589099] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffbdbf41a1c418
[   15.605265] [drm] ring test on 0 succeeded in 1 usecs
[   15.605270] [drm] ring test on 3 succeeded in 2 usecs
[   15.791907] [drm] ring test on 5 succeeded in 1 usecs
[   15.791914] [drm] UVD initialized successfully.
[   15.791956] [drm] ib test on ring 0 succeeded in 0 usecs
[   15.791986] [drm] ib test on ring 3 succeeded in 0 usecs
[   16.482332] [drm] ib test on ring 5 succeeded
[   16.515177] snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo
[   16.619344] snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535

[   33.089549] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo
[   33.389563] snd_hda_intel 0000:02:00.1: Cannot lock devices!
[   34.733597] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[   34.735932] [drm] PCIE GART of 512M enabled (table at 0x000000000014C000).
[   34.736006] radeon 0000:02:00.0: WB enabled
[   34.736009] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff90180fa71c00
[   34.736011] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff90180fa71c0c
[   34.736378] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffbdbf41a1c418
[   34.753251] [drm] ring test on 0 succeeded in 1 usecs
[   34.753256] [drm] ring test on 3 succeeded in 2 usecs
[   34.939919] [drm] ring test on 5 succeeded in 1 usecs
[   34.939926] [drm] UVD initialized successfully.
[   34.939969] [drm] ib test on ring 0 succeeded in 0 usecs
[   34.940006] [drm] ib test on ring 3 succeeded in 0 usecs
[   35.617560] [drm] ib test on ring 5 succeeded
[   35.650390] snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo
[   35.753848] snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535

[   41.025246] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo
[   41.325632] snd_hda_intel 0000:02:00.1: Cannot lock devices!
[   42.665278] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[   42.667593] [drm] PCIE GART of 512M enabled (table at 0x000000000014C000).
[   42.667666] radeon 0000:02:00.0: WB enabled
[   42.667670] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff90180fa71c00
[   42.667671] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff90180fa71c0c
[   42.668038] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffbdbf41a1c418
[   42.684185] [drm] ring test on 0 succeeded in 1 usecs
[   42.684189] [drm] ring test on 3 succeeded in 2 usecs
[   42.870780] [drm] ring test on 5 succeeded in 1 usecs
[   42.870784] [drm] UVD initialized successfully.
[   42.870821] [drm] ib test on ring 0 succeeded in 0 usecs
[   42.870850] [drm] ib test on ring 3 succeeded in 0 usecs
[   43.553259] [drm] ib test on ring 5 succeeded
[   43.582109] snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo
[   43.685717] snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535

[   48.960919] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo
[   49.261324] snd_hda_intel 0000:02:00.1: Cannot lock devices!
Comment 10 Matthew Fox 2017-02-21 04:11:44 UTC
Created attachment 129783 [details]
dmesg.log 2
Comment 11 Matthew Fox 2017-02-21 04:15:03 UTC
Created attachment 129784 [details]
Xorg.1.log
Comment 12 Matthew Fox 2017-02-21 04:16:44 UTC
Created attachment 129785 [details]
Xorg.0.log
Comment 13 Michel Dänzer 2017-02-21 06:20:18 UTC
Please attach the output of xrandr.

With runpm enabled, if you run xrandr, does the dedicated GPU turn on and the corresponding messages appear in dmesg?
Comment 14 Matthew Fox 2017-02-21 20:28:27 UTC
Hi,

It's rare that the PC doesn't lock up with runpm enabled so I've only been able to test this a couple of times.

In the first try, the PC had stabilized (stopped freezing) after a while. I then ran xrandr. Immediately after I cat /sys/kernel/debug/vgaswitcheroo/switch and the discrete gpu had powered up. dmesg showed 1 block of gpu initialization lines. A few seconds later vgaswitcheroo/switch showed the discrete gpu as being off. dmesg also showed 2 or 3 blocks of the gpu initialization. It looked like the gpu was being enabled and disabled repeatedly. The computer then locked up a few seconds later. I don't have any logs for this session.

In the second try, the PC had stabilized. I ran xrandr and vgaswitcheroo/switch had changed from 'DynOff' to 'DynPwr' for the discrete gpu. dmesg showed 1 block of the gpu initialization. The computer locked up a few seconds later. The logs I have were captured straight after xrandr had run so the 'dmesg after' log only shows one of the gpu initialization blocks but I suspect the gpu was being enabled and disabled repeatedly before the PC locked up. I wasn't able to run dmesg again before the lockup to confirm.
Comment 15 Matthew Fox 2017-02-21 20:29:44 UTC
Created attachment 129808 [details]
xrandr.log
Comment 16 Matthew Fox 2017-02-21 20:30:51 UTC
Created attachment 129809 [details]
Xorg log before xrandr
Comment 17 Matthew Fox 2017-02-21 20:32:34 UTC
Created attachment 129810 [details]
Xorg log after xrandr
Comment 18 Matthew Fox 2017-02-21 20:33:50 UTC
Created attachment 129811 [details]
vgaswitcheroo switch before xrandr
Comment 19 Matthew Fox 2017-02-21 20:34:48 UTC
Created attachment 129812 [details]
vgaswitcheroo switch after xrandr
Comment 20 Matthew Fox 2017-02-21 20:36:08 UTC
Created attachment 129813 [details]
dmesg before xrandr
Comment 21 Matthew Fox 2017-02-21 20:36:47 UTC
Created attachment 129814 [details]
dmesg after xrandr
Comment 22 Michel Dänzer 2017-02-22 09:25:15 UTC
I suspect what happens is that some client occasionally asks the X server to probe  the connected displays, similar to xrandr. This powers up the dGPU, in order to probe its display connectors. That takes some time, during which the X server freezes.

Assuming you don't need the dGPU display outputs, adding the below to /etc/X11/xorg.conf may serve as a workaround. You can still use the dGPU for applications by setting the environment variable DRI_PRIME=1.

Section "ServerFlags"
        Option  "AutoAddGPU" "off"
EndSection

Section "Device"
        Identifier "Device0"
        Option  "AccelMethod" "glamor"
        Option  "DRI" "3"
EndSection
Comment 23 Matthew Fox 2017-02-22 16:29:30 UTC
That workaround doesn't seem to have any effect so I'll run with radeon.runpm=0

Thanks for your help any way.
Comment 24 Michel Dänzer 2017-02-23 00:58:24 UTC
(In reply to Matthew Fox from comment #23)
> That workaround doesn't seem to have any effect [...]

At the very least, it should have visible effects in the Xorg log file and xrandr output. Please attach those with the attempted workaround.
Comment 25 Matthew Fox 2017-02-23 03:29:30 UTC
Hi,

/etc/X11/xorg.conf didn't exist so I created it with the contents you specified.

So I'm now running with runpm enabled and the xorg.conf in place.
Comment 26 Matthew Fox 2017-02-23 03:30:45 UTC
Created attachment 129854 [details]
xrandr.log 2
Comment 27 Matthew Fox 2017-02-23 03:34:53 UTC
Created attachment 129855 [details]
Xorg.0.log
Comment 28 Matthew Fox 2017-02-23 03:36:03 UTC
Created attachment 129856 [details]
Xorg.1.log
Comment 29 Matthew Fox 2017-02-23 03:38:34 UTC
Created attachment 129857 [details]
dmesg
Comment 30 Matthew Fox 2017-02-23 04:03:39 UTC
Just to confirm, the freezes and hard lockups still occur and the corresponding messages in dmesg which I also attached.

This may be more sound related but I previously found in the kernel source (file http://lxr.free-electrons.com/source/sound/pci/hda/hda_intel.c?v=4.8):

1182 static int register_vga_switcheroo(struct azx *chip)
1183 {
1184         struct hda_intel *hda = container_of(chip, struct hda_intel, chip);
1185         int err;
1186 
1187         if (!hda->use_vga_switcheroo)
1188                 return 0;
1189         /* FIXME: currently only handling DIS controller
1190          * is there any machine with two switchable HDMI audio controllers?
1191          */
1192         err = vga_switcheroo_register_audio_client(chip->pci, &azx_vs_ops,
1193                                                    VGA_SWITCHEROO_DIS);
1194         if (err < 0)
1195                 return err;
1196         hda->vga_switcheroo_registered = 1;
1197 
1198         /* register as an optimus hdmi audio power domain */
1199         vga_switcheroo_init_domain_pm_optimus_hdmi_audio(chip->card->dev,
1200                                                          &hda->hdmi_pm_domain);
1201         return 0;
1202 } 

In dmesg, these lines always appear along with the gpu init lines:

snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo
snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535
snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo
snd_hda_intel 0000:02:00.1: Cannot lock devices!

'CORB reset timeout#2, CORBRP = 65535' appears red in dmesg and
'Cannot lock devices!' appears white in dmesg.

0000:02:00.1 is the Discrete audio attached to the discrete GPU (the discrete GPU is 02:00.0)

From lspci, there's another audio device:
00:14.2 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) [1002:4383] (rev 40)

Now in the function above, it says '...is there any machine with two switchable HDMI audio controllers?' - I wonder if that's the case here? Which might be causing problems and the associated sound messages in dmesg?
Comment 31 Michel Dänzer 2017-02-23 08:24:09 UTC
(In reply to Matthew Fox from comment #30)
> Just to confirm, the freezes and hard lockups still occur and the
> corresponding messages in dmesg which I also attached.

Weird; the xrandr output and Xorg log file show that the workaround is working as intended, Xorg is no longer using the dGPU; not sure why it's still getting powered on.


I'm not sure about the sound messages, but I'd guess they're a symptom of the dGPU powering on, not its cause. You could try if radeon.audio=0 on the kernel command line makes any difference though, just in case.
Comment 33 Matthew Fox 2017-02-23 20:51:50 UTC
Hi,

With runpm enabled & radeon.audio=0, the computer locks up requiring a hard shutdown.

With runpm enabled & radeon.audio=0 & xorg.conf workaround, ditto. Except sometimes instead the computer will lock up for 10 seconds or so during which time the caps lock will toggle on/off, pressed keys will not be printed on screen. Mouse cursor will move on screen but clicks will not happen. After the freeze, the key presses that didn't print, print and same for the mouse clicks.

Alex - yes it does.
Comment 34 Martin Peres 2019-11-19 09:25:19 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/774.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.