99881 – Lockup/Freezes on Laptop with switchable graphics

Bug 99881 - Lockup/Freezes on Laptop with switchable graphics

Summary: Lockup/Freezes on Laptop with switchable graphics

Status:	RESOLVED MOVED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Radeon (show other bugs)
Version:	unspecified
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2017-02-21 00:06 UTC by Matthew Fox
Modified:	2019-11-19 09:25 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
dmesg log (81.31 KB, text/plain) 2017-02-21 00:06 UTC, Matthew Fox	no flags	Details
lspci.log (31.10 KB, text/plain) 2017-02-21 00:10 UTC, Matthew Fox	no flags	Details
dmesg.log 2 (73.87 KB, text/plain) 2017-02-21 04:11 UTC, Matthew Fox	no flags	Details
Xorg.1.log (45.49 KB, text/plain) 2017-02-21 04:15 UTC, Matthew Fox	no flags	Details
Xorg.0.log (44.50 KB, text/plain) 2017-02-21 04:16 UTC, Matthew Fox	no flags	Details
xrandr.log (489 bytes, text/plain) 2017-02-21 20:29 UTC, Matthew Fox	no flags	Details
Xorg log before xrandr (49.58 KB, text/plain) 2017-02-21 20:30 UTC, Matthew Fox	no flags	Details
Xorg log after xrandr (50.07 KB, text/plain) 2017-02-21 20:32 UTC, Matthew Fox	no flags	Details
vgaswitcheroo switch before xrandr (84 bytes, text/plain) 2017-02-21 20:33 UTC, Matthew Fox	no flags	Details
vgaswitcheroo switch after xrandr (84 bytes, text/plain) 2017-02-21 20:34 UTC, Matthew Fox	no flags	Details
dmesg before xrandr (77.20 KB, text/plain) 2017-02-21 20:36 UTC, Matthew Fox	no flags	Details
dmesg after xrandr (78.30 KB, text/plain) 2017-02-21 20:36 UTC, Matthew Fox	no flags	Details
xrandr.log 2 (424 bytes, text/plain) 2017-02-23 03:30 UTC, Matthew Fox	no flags	Details
Xorg.0.log (41.16 KB, text/plain) 2017-02-23 03:34 UTC, Matthew Fox	no flags	Details
Xorg.1.log (42.72 KB, text/plain) 2017-02-23 03:36 UTC, Matthew Fox	no flags	Details
dmesg (74.41 KB, text/plain) 2017-02-23 03:38 UTC, Matthew Fox	no flags	Details
View All

Description Matthew Fox 2017-02-21 00:06:09 UTC

Created attachment 129781 [details]
dmesg log

Hi,

I have a HP Pavilion dv6-3111sa laptop (circa 2010) with 2 GPUs:

01:05.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] RS880M [Mobility Radeon HD 4225/4250] [1002:9712]
02:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Park [Mobility Radeon HD 5430/5450/5470] [1002:68e0] (rev ff)

I am running Ubuntu 16.04.2 with kernel Ubuntu 4.8.0-36.36~16.04.1-generic 4.8.11

The screen usually freezes for a fraction of a second and then again a few seconds later. It may do this several times. In addition, the computer usually locks up before/after graphical login requiring a hard shutdown, although it doesn't always lock up. It seems to be preventing the computer from shutting down normally as well.

This appears in dmesg output whenever a freeze occurs:

  186.427140] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[  186.431201] [drm] PCIE GART of 512M enabled (table at 0x000000000014C000).
[  186.431293] radeon 0000:02:00.0: WB enabled
[  186.431301] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff958c0f4f3c00
[  186.431306] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff958c0f4f3c0c
[  186.431703] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffad3d81a1c418
[  186.447926] [drm] ring test on 0 succeeded in 1 usecs
[  186.447934] [drm] ring test on 3 succeeded in 2 usecs
[  186.634582] [drm] ring test on 5 succeeded in 1 usecs
[  186.634592] [drm] UVD initialized successfully.
[  186.634648] [drm] ib test on ring 0 succeeded in 0 usecs
[  186.634686] [drm] ib test on ring 3 succeeded in 0 usecs
[  186.805724] [drm] ib test on ring 5 succeeded
[  186.838322] snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo
[  186.942052] snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535
[  196.033454] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo
[  196.646111] snd_hda_intel 0000:02:00.1: Cannot lock devices!

Adding radeon.runpm=0 to my boot cmdline solves the issues as a workaround.

With previous ubuntu/kernel versions, the main issue was the freezing which would happen every seven seconds with the corresponding dmesg block. This would continue ad infinitum, although on rare occasions it would stop after many freezes. However with my current kernel this pattern doesn't seem to occur - it freezes a few times before the freezing stops and the freezes do not occur at regular intervals.

I'm not sure if this is a graphics or sound issue from the dmesg block. There's also some ACPI errors in the dmesg log so maybe a firmware problem, or faulty hardware? I tried some lower level debugging previously but couldn't conclude anything.

Thanks for any assistance.

Comment 1 Matthew Fox 2017-02-21 00:10:32 UTC

Created attachment 129782 [details]
lspci.log

Comment 2 Michel Dänzer 2017-02-21 01:21:47 UTC

It sounds like you have the environment variable DRI_PRIME=1 set for all applications?

Those dmesg messages are normal when the dedicated GPU is powered up, which takes some time. With runpm enabled, it's powered off automatically when nothing uses it for a while.

Comment 3 Matthew Fox 2017-02-21 01:52:14 UTC

Hi Michel,

Just a slight correction to my description - I am running Ubuntu Gnome 16.04.2.

This is a fresh install and I have not set that env var anywhere. Where could I check for that?

Does that mean with radeon.runpm=0 the laptop would be using more power & generating more heat?

Thanks

Matthew

Comment 4 Michel Dänzer 2017-02-21 02:07:45 UTC

(In reply to Matthew Fox from comment #3)
> This is a fresh install and I have not set that env var anywhere. Where
> could I check for that?

What does

 env | grep DRI_

say?


> Does that mean with radeon.runpm=0 the laptop would be using more power &
> generating more heat?

Yes (assuming the dedicated GPU is off most of the time with runpm on).

Comment 5 Matthew Fox 2017-02-21 02:31:46 UTC

> What does
> 
>  env | grep DRI_
> 
> say?

That printed nothing.

My session with runtime pm enabled (no radeon.runpm=0 in cmdline) had been running for a couple of hours without problem (apart from a bit of freezing at the start). However, just after running that command, some new radeon errors appeared in dmesg that I haven't seen before. I think they were ring test failures. The PC has locked up now anyway so I can only hard shut it down. I was switching ttys with CTRL+ALT at the same time which might have caused it.

Comment 6 Michel Dänzer 2017-02-21 02:55:09 UTC

Note that you should run

 env | grep DRI_

in an X terminal, not in a console TTY.

Comment 7 Matthew Fox 2017-02-21 03:08:15 UTC

(In reply to Michel Dänzer from comment #6)
> Note that you should run
> 
>  env | grep DRI_
> 
> in an X terminal, not in a console TTY.

Same result in both :/

Comment 8 Michel Dänzer 2017-02-21 03:11:20 UTC

Please attach the corresponding Xorg log file.

Comment 9 Matthew Fox 2017-02-21 04:09:19 UTC

(In reply to Michel Dänzer from comment #8)
> Please attach the corresponding Xorg log file.

Hi,

The only Xorg logs I have are for my new session. They weren't in /var/log/ but

/home/matthew/.local/share/xorg/Xorg.1.log
/var/lib/gdm3/.local/share/xorg/Xorg.0.log

for some reason. They are attached.

Also attached is a dmesg log for my current session.

When I said:

With previous ubuntu/kernel versions, the main issue was the freezing which would happen every seven seconds with the corresponding dmesg block. This would continue ad infinitum, although on rare occasions it would stop after many freezes. However with my current kernel this pattern doesn't seem to occur - it freezes a few times before the freezing stops and the freezes do not occur at regular intervals.

- this seems to be true of my current kernel. From the current dmesg.log, the 'Disabling via vga_switcheroo' happened at 14, 33, 41 and finally 48 (seven seconds apart, except 14-33):

[   14.146303] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo
[   15.586313] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[   15.588655] [drm] PCIE GART of 512M enabled (table at 0x000000000014C000).
[   15.588728] radeon 0000:02:00.0: WB enabled
[   15.588731] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff90180fa71c00
[   15.588733] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff90180fa71c0c
[   15.589099] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffbdbf41a1c418
[   15.605265] [drm] ring test on 0 succeeded in 1 usecs
[   15.605270] [drm] ring test on 3 succeeded in 2 usecs
[   15.791907] [drm] ring test on 5 succeeded in 1 usecs
[   15.791914] [drm] UVD initialized successfully.
[   15.791956] [drm] ib test on ring 0 succeeded in 0 usecs
[   15.791986] [drm] ib test on ring 3 succeeded in 0 usecs
[   16.482332] [drm] ib test on ring 5 succeeded
[   16.515177] snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo
[   16.619344] snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535

[   33.089549] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo
[   33.389563] snd_hda_intel 0000:02:00.1: Cannot lock devices!
[   34.733597] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[   34.735932] [drm] PCIE GART of 512M enabled (table at 0x000000000014C000).
[   34.736006] radeon 0000:02:00.0: WB enabled
[   34.736009] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff90180fa71c00
[   34.736011] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff90180fa71c0c
[   34.736378] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffbdbf41a1c418
[   34.753251] [drm] ring test on 0 succeeded in 1 usecs
[   34.753256] [drm] ring test on 3 succeeded in 2 usecs
[   34.939919] [drm] ring test on 5 succeeded in 1 usecs
[   34.939926] [drm] UVD initialized successfully.
[   34.939969] [drm] ib test on ring 0 succeeded in 0 usecs
[   34.940006] [drm] ib test on ring 3 succeeded in 0 usecs
[   35.617560] [drm] ib test on ring 5 succeeded
[   35.650390] snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo
[   35.753848] snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535

[   41.025246] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo
[   41.325632] snd_hda_intel 0000:02:00.1: Cannot lock devices!
[   42.665278] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[   42.667593] [drm] PCIE GART of 512M enabled (table at 0x000000000014C000).
[   42.667666] radeon 0000:02:00.0: WB enabled
[   42.667670] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff90180fa71c00
[   42.667671] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff90180fa71c0c
[   42.668038] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffbdbf41a1c418
[   42.684185] [drm] ring test on 0 succeeded in 1 usecs
[   42.684189] [drm] ring test on 3 succeeded in 2 usecs
[   42.870780] [drm] ring test on 5 succeeded in 1 usecs
[   42.870784] [drm] UVD initialized successfully.
[   42.870821] [drm] ib test on ring 0 succeeded in 0 usecs
[   42.870850] [drm] ib test on ring 3 succeeded in 0 usecs
[   43.553259] [drm] ib test on ring 5 succeeded
[   43.582109] snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo
[   43.685717] snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535

[   48.960919] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo
[   49.261324] snd_hda_intel 0000:02:00.1: Cannot lock devices!

Comment 10 Matthew Fox 2017-02-21 04:11:44 UTC

Created attachment 129783 [details]
dmesg.log 2

Comment 11 Matthew Fox 2017-02-21 04:15:03 UTC

Created attachment 129784 [details]
Xorg.1.log

Comment 12 Matthew Fox 2017-02-21 04:16:44 UTC

Created attachment 129785 [details]
Xorg.0.log

Comment 13 Michel Dänzer 2017-02-21 06:20:18 UTC

Please attach the output of xrandr.

With runpm enabled, if you run xrandr, does the dedicated GPU turn on and the corresponding messages appear in dmesg?

Comment 14 Matthew Fox 2017-02-21 20:28:27 UTC

Hi,

It's rare that the PC doesn't lock up with runpm enabled so I've only been able to test this a couple of times.

In the first try, the PC had stabilized (stopped freezing) after a while. I then ran xrandr. Immediately after I cat /sys/kernel/debug/vgaswitcheroo/switch and the discrete gpu had powered up. dmesg showed 1 block of gpu initialization lines. A few seconds later vgaswitcheroo/switch showed the discrete gpu as being off. dmesg also showed 2 or 3 blocks of the gpu initialization. It looked like the gpu was being enabled and disabled repeatedly. The computer then locked up a few seconds later. I don't have any logs for this session.

In the second try, the PC had stabilized. I ran xrandr and vgaswitcheroo/switch had changed from 'DynOff' to 'DynPwr' for the discrete gpu. dmesg showed 1 block of the gpu initialization. The computer locked up a few seconds later. The logs I have were captured straight after xrandr had run so the 'dmesg after' log only shows one of the gpu initialization blocks but I suspect the gpu was being enabled and disabled repeatedly before the PC locked up. I wasn't able to run dmesg again before the lockup to confirm.

Comment 15 Matthew Fox 2017-02-21 20:29:44 UTC

Created attachment 129808 [details]
xrandr.log

Comment 16 Matthew Fox 2017-02-21 20:30:51 UTC

Created attachment 129809 [details]
Xorg log before xrandr

Comment 17 Matthew Fox 2017-02-21 20:32:34 UTC

Created attachment 129810 [details]
Xorg log after xrandr

Comment 18 Matthew Fox 2017-02-21 20:33:50 UTC

Created attachment 129811 [details]
vgaswitcheroo switch before xrandr

Comment 19 Matthew Fox 2017-02-21 20:34:48 UTC

Created attachment 129812 [details]
vgaswitcheroo switch after xrandr

Comment 20 Matthew Fox 2017-02-21 20:36:08 UTC

Created attachment 129813 [details]
dmesg before xrandr

Comment 21 Matthew Fox 2017-02-21 20:36:47 UTC

Created attachment 129814 [details]
dmesg after xrandr

Comment 22 Michel Dänzer 2017-02-22 09:25:15 UTC

I suspect what happens is that some client occasionally asks the X server to probe  the connected displays, similar to xrandr. This powers up the dGPU, in order to probe its display connectors. That takes some time, during which the X server freezes.

Assuming you don't need the dGPU display outputs, adding the below to /etc/X11/xorg.conf may serve as a workaround. You can still use the dGPU for applications by setting the environment variable DRI_PRIME=1.

Section "ServerFlags"
        Option  "AutoAddGPU" "off"
EndSection

Section "Device"
        Identifier "Device0"
        Option  "AccelMethod" "glamor"
        Option  "DRI" "3"
EndSection

Comment 23 Matthew Fox 2017-02-22 16:29:30 UTC

That workaround doesn't seem to have any effect so I'll run with radeon.runpm=0

Thanks for your help any way.

Comment 24 Michel Dänzer 2017-02-23 00:58:24 UTC

(In reply to Matthew Fox from comment #23)
> That workaround doesn't seem to have any effect [...]

At the very least, it should have visible effects in the Xorg log file and xrandr output. Please attach those with the attempted workaround.

Comment 25 Matthew Fox 2017-02-23 03:29:30 UTC

Hi,

/etc/X11/xorg.conf didn't exist so I created it with the contents you specified.

So I'm now running with runpm enabled and the xorg.conf in place.

Comment 26 Matthew Fox 2017-02-23 03:30:45 UTC

Created attachment 129854 [details]
xrandr.log 2

Comment 27 Matthew Fox 2017-02-23 03:34:53 UTC

Created attachment 129855 [details]
Xorg.0.log

Comment 28 Matthew Fox 2017-02-23 03:36:03 UTC

Created attachment 129856 [details]
Xorg.1.log

Comment 29 Matthew Fox 2017-02-23 03:38:34 UTC

Created attachment 129857 [details]
dmesg

Comment 30 Matthew Fox 2017-02-23 04:03:39 UTC

Just to confirm, the freezes and hard lockups still occur and the corresponding messages in dmesg which I also attached.

This may be more sound related but I previously found in the kernel source (file http://lxr.free-electrons.com/source/sound/pci/hda/hda_intel.c?v=4.8):

1182 static int register_vga_switcheroo(struct azx *chip)
1183 {
1184         struct hda_intel *hda = container_of(chip, struct hda_intel, chip);
1185         int err;
1186 
1187         if (!hda->use_vga_switcheroo)
1188                 return 0;
1189         /* FIXME: currently only handling DIS controller
1190          * is there any machine with two switchable HDMI audio controllers?
1191          */
1192         err = vga_switcheroo_register_audio_client(chip->pci, &azx_vs_ops,
1193                                                    VGA_SWITCHEROO_DIS);
1194         if (err < 0)
1195                 return err;
1196         hda->vga_switcheroo_registered = 1;
1197 
1198         /* register as an optimus hdmi audio power domain */
1199         vga_switcheroo_init_domain_pm_optimus_hdmi_audio(chip->card->dev,
1200                                                          &hda->hdmi_pm_domain);
1201         return 0;
1202 } 

In dmesg, these lines always appear along with the gpu init lines:

snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo
snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535
snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo
snd_hda_intel 0000:02:00.1: Cannot lock devices!

'CORB reset timeout#2, CORBRP = 65535' appears red in dmesg and
'Cannot lock devices!' appears white in dmesg.

0000:02:00.1 is the Discrete audio attached to the discrete GPU (the discrete GPU is 02:00.0)

From lspci, there's another audio device:
00:14.2 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) [1002:4383] (rev 40)

Now in the function above, it says '...is there any machine with two switchable HDMI audio controllers?' - I wonder if that's the case here? Which might be causing problems and the associated sound messages in dmesg?

Comment 31 Michel Dänzer 2017-02-23 08:24:09 UTC

(In reply to Matthew Fox from comment #30)
> Just to confirm, the freezes and hard lockups still occur and the
> corresponding messages in dmesg which I also attached.

Weird; the xrandr output and Xorg log file show that the workaround is working as intended, Xorg is no longer using the dGPU; not sure why it's still getting powered on.


I'm not sure about the sound messages, but I'd guess they're a symptom of the dGPU powering on, not its cause. You could try if radeon.audio=0 on the kernel command line makes any difference though, just in case.

Comment 32 Alex Deucher 2017-02-23 13:53:36 UTC

Does your kernel have this patch?
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/drivers/gpu/drm/radeon/radeon_device.c?id=066f1f0b4719eb4573ef09bfc63c2bbb6f7676ca

Comment 33 Matthew Fox 2017-02-23 20:51:50 UTC

Hi,

With runpm enabled & radeon.audio=0, the computer locks up requiring a hard shutdown.

With runpm enabled & radeon.audio=0 & xorg.conf workaround, ditto. Except sometimes instead the computer will lock up for 10 seconds or so during which time the caps lock will toggle on/off, pressed keys will not be printed on screen. Mouse cursor will move on screen but clicks will not happen. After the freeze, the key presses that didn't print, print and same for the mouse clicks.

Alex - yes it does.

Comment 34 Martin Peres 2019-11-19 09:25:19 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/774.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.