Created attachment 79884 [details] dmesg output when trying to switch back to radeon gpu. I have two GPUs in my system: 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Whistler [Radeon HD 6600M/6700M/7600M Series] 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) This is a macbookpro8,2 and hence the gmuxer is controlled by the apple-gmux driver. If I suspend the system to ram whilst on the integrated gpu (i.e. the intel gpu), then after resume switch back to the radeon, I get a GPU hang. I've attached the dmesg output that I get when I try this. I'm using linux 3.10-rc3. I don't have X running when doing this (vgaswitcheroo won't allow this).
It looks like vagswitcheroo doesn't properly enable the dgpu on resume so the driver tried to resume disabled hardware.
Are there any traces/dumps which I could produce to help debug this?
There's nothing that needs to be debugged per se. Someone just needs to implement support for making sure the dGPU is powered up when the driver resumes or having the driver defer resume until the dGPU is powered up.
I'm not sure if this has anything to do with this bug, but the PCI config space is all messed up when running on the integrated GPU: # cat /sys/kernel/debug/vgaswitcheroo/switch 0:IGD:+:Pwr:0000:00:02.0 1:DIS: :Off:0000:01:00.0 2:DIS-Audio: :Off:0000:01:00.1 # lspci -s 01:00 -xx 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Whistler [Radeon HD 6600M/6700M/7600M Series] (rev ff) 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Turks/Whistler HDMI Audio [Radeon HD 6000 Series] (rev ff) 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Maybe this is why suspend/resume doesn't work? Should the apple gmuxer remove the radeon device from the list of pci devices when in this state? (Like hotplugging?)
(In reply to comment #4) > I'm not sure if this has anything to do with this bug, but the PCI config > space is all messed up when running on the integrated GPU: When you disable the GPU the hardware is physically powered off so you are accessing non-existent registers. You need to power up the GPU using vgaswitcheroo before loading/resuming the driver or access the config space with lspci.
(In reply to comment #5) > (In reply to comment #4) > > I'm not sure if this has anything to do with this bug, but the PCI config > > space is all messed up when running on the integrated GPU: > > When you disable the GPU the hardware is physically powered off so you are > accessing non-existent registers. You need to power up the GPU using > vgaswitcheroo before loading/resuming the driver or access the config space > with lspci. I guess what I was trying to say was, does having the device in this state before you start suspend make the power management subsystem confused? Will it try to "restore" the pci configuration space to 0xff? The radeon GPU is definitely turned on by the firmware on resume and the apple-gmux driver turns it off if that was the state of the system when suspend was started. I tried a quick hack to call the "ON" function for vga-switcheroo from the apple-gmux driver suspend hook. But that didn't work. However, if I do it manually (i.e. echo ON > /sys/kernel/debug/vgaswitcheroo/switch) before suspend, then it does seem to fix it.
(In reply to comment #6) > (In reply to comment #5) > > (In reply to comment #4) > > > I'm not sure if this has anything to do with this bug, but the PCI config > > > space is all messed up when running on the integrated GPU: > > > > When you disable the GPU the hardware is physically powered off so you are > > accessing non-existent registers. You need to power up the GPU using > > vgaswitcheroo before loading/resuming the driver or access the config space > > with lspci. > > I guess what I was trying to say was, does having the device in this state > before you start suspend make the power management subsystem confused? Will > it try to "restore" the pci configuration space to 0xff? The reason you are getting 0xff is because you are accessing a disabled device. lspci would need to power up the GPU using vgaswitcheroo before accessing the hardware. > > The radeon GPU is definitely turned on by the firmware on resume and the > apple-gmux driver turns it off if that was the state of the system when > suspend was started. I suspect the gmux driver resumes first and disables the hardware before the radeon driver resumes. So the radeon driver resumes on disabled hardware. > > I tried a quick hack to call the "ON" function for vga-switcheroo from the > apple-gmux driver suspend hook. But that didn't work. However, if I do it > manually (i.e. echo ON > /sys/kernel/debug/vgaswitcheroo/switch) before > suspend, then it does seem to fix it. Some one needs to sort out the interactions between the the vgaswitcheroo drivers and the GPU drivers so that they do the right thing on suspend and resume and hw access from userspace utilities like lspci.
Additionally, since the audio device is part of the GPU, it needs to work properly with vgaswitcheroo. Dave Airlie was doing some work to support all of this properly, but it's not complete yet: http://cgit.freedesktop.org/~airlied/linux/log/?h=switchy-wip http://cgit.freedesktop.org/~airlied/linux/log/?h=nv-pm-ops2-wip
Created attachment 80547 [details] pci_restore_config_dword debugging when resuming with dGPU off at suspend It is log output like this that worries my about the 0xff reads from the pci config space. I set this in dynamic_debug: drivers/pci/pci.c:964 [pci]pci_restore_config_dword =p "restoring config space at offset %#x (was %#x, writing %#x)\012" Then it gives this 0xff writing to the pci address space on restore when using the integrated gpu. Surely this cannot help. My guess is that the hardware should be removed from the kernel when the power is turned off. But that's just a hunch.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/331.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.