Summary: | Lockup/Freezes on Laptop with switchable graphics | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Matthew Fox <matthew> | ||||||||||||||||||||||||||||||||||
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||||||||||||||||||||||||||||||||
Status: | RESOLVED MOVED | QA Contact: | |||||||||||||||||||||||||||||||||||
Severity: | normal | ||||||||||||||||||||||||||||||||||||
Priority: | medium | ||||||||||||||||||||||||||||||||||||
Version: | unspecified | ||||||||||||||||||||||||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||||||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||||||||||||||||||||
Attachments: |
|
Created attachment 129782 [details]
lspci.log
It sounds like you have the environment variable DRI_PRIME=1 set for all applications? Those dmesg messages are normal when the dedicated GPU is powered up, which takes some time. With runpm enabled, it's powered off automatically when nothing uses it for a while. Hi Michel, Just a slight correction to my description - I am running Ubuntu Gnome 16.04.2. This is a fresh install and I have not set that env var anywhere. Where could I check for that? Does that mean with radeon.runpm=0 the laptop would be using more power & generating more heat? Thanks Matthew (In reply to Matthew Fox from comment #3) > This is a fresh install and I have not set that env var anywhere. Where > could I check for that? What does env | grep DRI_ say? > Does that mean with radeon.runpm=0 the laptop would be using more power & > generating more heat? Yes (assuming the dedicated GPU is off most of the time with runpm on). > What does
>
> env | grep DRI_
>
> say?
That printed nothing.
My session with runtime pm enabled (no radeon.runpm=0 in cmdline) had been running for a couple of hours without problem (apart from a bit of freezing at the start). However, just after running that command, some new radeon errors appeared in dmesg that I haven't seen before. I think they were ring test failures. The PC has locked up now anyway so I can only hard shut it down. I was switching ttys with CTRL+ALT at the same time which might have caused it.
Note that you should run env | grep DRI_ in an X terminal, not in a console TTY. (In reply to Michel Dänzer from comment #6) > Note that you should run > > env | grep DRI_ > > in an X terminal, not in a console TTY. Same result in both :/ Please attach the corresponding Xorg log file. (In reply to Michel Dänzer from comment #8) > Please attach the corresponding Xorg log file. Hi, The only Xorg logs I have are for my new session. They weren't in /var/log/ but /home/matthew/.local/share/xorg/Xorg.1.log /var/lib/gdm3/.local/share/xorg/Xorg.0.log for some reason. They are attached. Also attached is a dmesg log for my current session. When I said: With previous ubuntu/kernel versions, the main issue was the freezing which would happen every seven seconds with the corresponding dmesg block. This would continue ad infinitum, although on rare occasions it would stop after many freezes. However with my current kernel this pattern doesn't seem to occur - it freezes a few times before the freezing stops and the freezes do not occur at regular intervals. - this seems to be true of my current kernel. From the current dmesg.log, the 'Disabling via vga_switcheroo' happened at 14, 33, 41 and finally 48 (seven seconds apart, except 14-33): [ 14.146303] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo [ 15.586313] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 [ 15.588655] [drm] PCIE GART of 512M enabled (table at 0x000000000014C000). [ 15.588728] radeon 0000:02:00.0: WB enabled [ 15.588731] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff90180fa71c00 [ 15.588733] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff90180fa71c0c [ 15.589099] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffbdbf41a1c418 [ 15.605265] [drm] ring test on 0 succeeded in 1 usecs [ 15.605270] [drm] ring test on 3 succeeded in 2 usecs [ 15.791907] [drm] ring test on 5 succeeded in 1 usecs [ 15.791914] [drm] UVD initialized successfully. [ 15.791956] [drm] ib test on ring 0 succeeded in 0 usecs [ 15.791986] [drm] ib test on ring 3 succeeded in 0 usecs [ 16.482332] [drm] ib test on ring 5 succeeded [ 16.515177] snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo [ 16.619344] snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535 [ 33.089549] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo [ 33.389563] snd_hda_intel 0000:02:00.1: Cannot lock devices! [ 34.733597] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 [ 34.735932] [drm] PCIE GART of 512M enabled (table at 0x000000000014C000). [ 34.736006] radeon 0000:02:00.0: WB enabled [ 34.736009] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff90180fa71c00 [ 34.736011] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff90180fa71c0c [ 34.736378] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffbdbf41a1c418 [ 34.753251] [drm] ring test on 0 succeeded in 1 usecs [ 34.753256] [drm] ring test on 3 succeeded in 2 usecs [ 34.939919] [drm] ring test on 5 succeeded in 1 usecs [ 34.939926] [drm] UVD initialized successfully. [ 34.939969] [drm] ib test on ring 0 succeeded in 0 usecs [ 34.940006] [drm] ib test on ring 3 succeeded in 0 usecs [ 35.617560] [drm] ib test on ring 5 succeeded [ 35.650390] snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo [ 35.753848] snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535 [ 41.025246] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo [ 41.325632] snd_hda_intel 0000:02:00.1: Cannot lock devices! [ 42.665278] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 [ 42.667593] [drm] PCIE GART of 512M enabled (table at 0x000000000014C000). [ 42.667666] radeon 0000:02:00.0: WB enabled [ 42.667670] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff90180fa71c00 [ 42.667671] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff90180fa71c0c [ 42.668038] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffbdbf41a1c418 [ 42.684185] [drm] ring test on 0 succeeded in 1 usecs [ 42.684189] [drm] ring test on 3 succeeded in 2 usecs [ 42.870780] [drm] ring test on 5 succeeded in 1 usecs [ 42.870784] [drm] UVD initialized successfully. [ 42.870821] [drm] ib test on ring 0 succeeded in 0 usecs [ 42.870850] [drm] ib test on ring 3 succeeded in 0 usecs [ 43.553259] [drm] ib test on ring 5 succeeded [ 43.582109] snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo [ 43.685717] snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535 [ 48.960919] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo [ 49.261324] snd_hda_intel 0000:02:00.1: Cannot lock devices! Created attachment 129783 [details]
dmesg.log 2
Created attachment 129784 [details]
Xorg.1.log
Created attachment 129785 [details]
Xorg.0.log
Please attach the output of xrandr. With runpm enabled, if you run xrandr, does the dedicated GPU turn on and the corresponding messages appear in dmesg? Hi, It's rare that the PC doesn't lock up with runpm enabled so I've only been able to test this a couple of times. In the first try, the PC had stabilized (stopped freezing) after a while. I then ran xrandr. Immediately after I cat /sys/kernel/debug/vgaswitcheroo/switch and the discrete gpu had powered up. dmesg showed 1 block of gpu initialization lines. A few seconds later vgaswitcheroo/switch showed the discrete gpu as being off. dmesg also showed 2 or 3 blocks of the gpu initialization. It looked like the gpu was being enabled and disabled repeatedly. The computer then locked up a few seconds later. I don't have any logs for this session. In the second try, the PC had stabilized. I ran xrandr and vgaswitcheroo/switch had changed from 'DynOff' to 'DynPwr' for the discrete gpu. dmesg showed 1 block of the gpu initialization. The computer locked up a few seconds later. The logs I have were captured straight after xrandr had run so the 'dmesg after' log only shows one of the gpu initialization blocks but I suspect the gpu was being enabled and disabled repeatedly before the PC locked up. I wasn't able to run dmesg again before the lockup to confirm. Created attachment 129808 [details]
xrandr.log
Created attachment 129809 [details]
Xorg log before xrandr
Created attachment 129810 [details]
Xorg log after xrandr
Created attachment 129811 [details]
vgaswitcheroo switch before xrandr
Created attachment 129812 [details]
vgaswitcheroo switch after xrandr
Created attachment 129813 [details]
dmesg before xrandr
Created attachment 129814 [details]
dmesg after xrandr
I suspect what happens is that some client occasionally asks the X server to probe the connected displays, similar to xrandr. This powers up the dGPU, in order to probe its display connectors. That takes some time, during which the X server freezes. Assuming you don't need the dGPU display outputs, adding the below to /etc/X11/xorg.conf may serve as a workaround. You can still use the dGPU for applications by setting the environment variable DRI_PRIME=1. Section "ServerFlags" Option "AutoAddGPU" "off" EndSection Section "Device" Identifier "Device0" Option "AccelMethod" "glamor" Option "DRI" "3" EndSection That workaround doesn't seem to have any effect so I'll run with radeon.runpm=0 Thanks for your help any way. (In reply to Matthew Fox from comment #23) > That workaround doesn't seem to have any effect [...] At the very least, it should have visible effects in the Xorg log file and xrandr output. Please attach those with the attempted workaround. Hi, /etc/X11/xorg.conf didn't exist so I created it with the contents you specified. So I'm now running with runpm enabled and the xorg.conf in place. Created attachment 129854 [details]
xrandr.log 2
Created attachment 129855 [details]
Xorg.0.log
Created attachment 129856 [details]
Xorg.1.log
Created attachment 129857 [details]
dmesg
Just to confirm, the freezes and hard lockups still occur and the corresponding messages in dmesg which I also attached. This may be more sound related but I previously found in the kernel source (file http://lxr.free-electrons.com/source/sound/pci/hda/hda_intel.c?v=4.8): 1182 static int register_vga_switcheroo(struct azx *chip) 1183 { 1184 struct hda_intel *hda = container_of(chip, struct hda_intel, chip); 1185 int err; 1186 1187 if (!hda->use_vga_switcheroo) 1188 return 0; 1189 /* FIXME: currently only handling DIS controller 1190 * is there any machine with two switchable HDMI audio controllers? 1191 */ 1192 err = vga_switcheroo_register_audio_client(chip->pci, &azx_vs_ops, 1193 VGA_SWITCHEROO_DIS); 1194 if (err < 0) 1195 return err; 1196 hda->vga_switcheroo_registered = 1; 1197 1198 /* register as an optimus hdmi audio power domain */ 1199 vga_switcheroo_init_domain_pm_optimus_hdmi_audio(chip->card->dev, 1200 &hda->hdmi_pm_domain); 1201 return 0; 1202 } In dmesg, these lines always appear along with the gpu init lines: snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535 snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo snd_hda_intel 0000:02:00.1: Cannot lock devices! 'CORB reset timeout#2, CORBRP = 65535' appears red in dmesg and 'Cannot lock devices!' appears white in dmesg. 0000:02:00.1 is the Discrete audio attached to the discrete GPU (the discrete GPU is 02:00.0) From lspci, there's another audio device: 00:14.2 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) [1002:4383] (rev 40) Now in the function above, it says '...is there any machine with two switchable HDMI audio controllers?' - I wonder if that's the case here? Which might be causing problems and the associated sound messages in dmesg? (In reply to Matthew Fox from comment #30) > Just to confirm, the freezes and hard lockups still occur and the > corresponding messages in dmesg which I also attached. Weird; the xrandr output and Xorg log file show that the workaround is working as intended, Xorg is no longer using the dGPU; not sure why it's still getting powered on. I'm not sure about the sound messages, but I'd guess they're a symptom of the dGPU powering on, not its cause. You could try if radeon.audio=0 on the kernel command line makes any difference though, just in case. Does your kernel have this patch? http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/drivers/gpu/drm/radeon/radeon_device.c?id=066f1f0b4719eb4573ef09bfc63c2bbb6f7676ca Hi, With runpm enabled & radeon.audio=0, the computer locks up requiring a hard shutdown. With runpm enabled & radeon.audio=0 & xorg.conf workaround, ditto. Except sometimes instead the computer will lock up for 10 seconds or so during which time the caps lock will toggle on/off, pressed keys will not be printed on screen. Mouse cursor will move on screen but clicks will not happen. After the freeze, the key presses that didn't print, print and same for the mouse clicks. Alex - yes it does. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/774. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 129781 [details] dmesg log Hi, I have a HP Pavilion dv6-3111sa laptop (circa 2010) with 2 GPUs: 01:05.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] RS880M [Mobility Radeon HD 4225/4250] [1002:9712] 02:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Park [Mobility Radeon HD 5430/5450/5470] [1002:68e0] (rev ff) I am running Ubuntu 16.04.2 with kernel Ubuntu 4.8.0-36.36~16.04.1-generic 4.8.11 The screen usually freezes for a fraction of a second and then again a few seconds later. It may do this several times. In addition, the computer usually locks up before/after graphical login requiring a hard shutdown, although it doesn't always lock up. It seems to be preventing the computer from shutting down normally as well. This appears in dmesg output whenever a freeze occurs: 186.427140] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 [ 186.431201] [drm] PCIE GART of 512M enabled (table at 0x000000000014C000). [ 186.431293] radeon 0000:02:00.0: WB enabled [ 186.431301] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff958c0f4f3c00 [ 186.431306] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff958c0f4f3c0c [ 186.431703] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffad3d81a1c418 [ 186.447926] [drm] ring test on 0 succeeded in 1 usecs [ 186.447934] [drm] ring test on 3 succeeded in 2 usecs [ 186.634582] [drm] ring test on 5 succeeded in 1 usecs [ 186.634592] [drm] UVD initialized successfully. [ 186.634648] [drm] ib test on ring 0 succeeded in 0 usecs [ 186.634686] [drm] ib test on ring 3 succeeded in 0 usecs [ 186.805724] [drm] ib test on ring 5 succeeded [ 186.838322] snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo [ 186.942052] snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535 [ 196.033454] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo [ 196.646111] snd_hda_intel 0000:02:00.1: Cannot lock devices! Adding radeon.runpm=0 to my boot cmdline solves the issues as a workaround. With previous ubuntu/kernel versions, the main issue was the freezing which would happen every seven seconds with the corresponding dmesg block. This would continue ad infinitum, although on rare occasions it would stop after many freezes. However with my current kernel this pattern doesn't seem to occur - it freezes a few times before the freezing stops and the freezes do not occur at regular intervals. I'm not sure if this is a graphics or sound issue from the dmesg block. There's also some ACPI errors in the dmesg log so maybe a firmware problem, or faulty hardware? I tried some lower level debugging previously but couldn't conclude anything. Thanks for any assistance.