Bug 65068 - vgaswitcheroo doesn't deal with powered off dGPU on resume
Summary: vgaswitcheroo doesn't deal with powered off dGPU on resume
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-28 07:59 UTC by Austin Lund
Modified: 2019-11-19 08:33 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg output when trying to switch back to radeon gpu. (4.83 KB, text/plain)
2013-05-28 07:59 UTC, Austin Lund
no flags Details
pci_restore_config_dword debugging when resuming with dGPU off at suspend (7.29 KB, text/plain)
2013-06-09 04:05 UTC, Austin Lund
no flags Details

Description Austin Lund 2013-05-28 07:59:18 UTC
Created attachment 79884 [details]
dmesg output when trying to switch back to radeon gpu.

I have two GPUs in my system:

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Whistler [Radeon HD 6600M/6700M/7600M Series]

00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)

This is a macbookpro8,2 and hence the gmuxer is controlled by the apple-gmux driver.

If I suspend the system to ram whilst on the integrated gpu (i.e. the intel gpu), then after resume switch back to the radeon, I get a GPU hang.

I've attached the dmesg output that I get when I try this.

I'm using linux 3.10-rc3.  I don't have X running when doing this (vgaswitcheroo won't allow this).
Comment 1 Alex Deucher 2013-05-29 04:39:45 UTC
It looks like vagswitcheroo doesn't properly enable the dgpu on resume so the driver tried to resume disabled hardware.
Comment 2 Austin Lund 2013-05-29 10:03:09 UTC
Are there any traces/dumps which I could produce to help debug this?
Comment 3 Alex Deucher 2013-05-29 13:29:03 UTC
There's nothing that needs to be debugged per se.  Someone just needs to implement support for making sure the dGPU is powered up when the driver resumes or having the driver defer resume until the dGPU is powered up.
Comment 4 Austin Lund 2013-05-31 23:57:54 UTC
I'm not sure if this has anything to do with this bug, but the PCI config space is all messed up when running on the integrated GPU:

# cat /sys/kernel/debug/vgaswitcheroo/switch 
0:IGD:+:Pwr:0000:00:02.0
1:DIS: :Off:0000:01:00.0
2:DIS-Audio: :Off:0000:01:00.1
# lspci -s 01:00 -xx
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Whistler [Radeon HD 6600M/6700M/7600M Series] (rev ff)
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Turks/Whistler HDMI Audio [Radeon HD 6000 Series] (rev ff)
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Maybe this is why suspend/resume doesn't work?  Should the apple gmuxer remove the radeon device from the list of pci devices when in this state?  (Like hotplugging?)
Comment 5 Alex Deucher 2013-06-01 20:27:12 UTC
(In reply to comment #4)
> I'm not sure if this has anything to do with this bug, but the PCI config
> space is all messed up when running on the integrated GPU:

When you disable the GPU the hardware is physically powered off so you are accessing non-existent registers.  You need to power up the GPU using vgaswitcheroo before loading/resuming the driver or access the config space with lspci.
Comment 6 Austin Lund 2013-06-03 01:39:43 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > I'm not sure if this has anything to do with this bug, but the PCI config
> > space is all messed up when running on the integrated GPU:
> 
> When you disable the GPU the hardware is physically powered off so you are
> accessing non-existent registers.  You need to power up the GPU using
> vgaswitcheroo before loading/resuming the driver or access the config space
> with lspci.

I guess what I was trying to say was, does having the device in this state before you start suspend make the power management subsystem confused?  Will it try to "restore" the pci configuration space to 0xff?  

The radeon GPU is definitely turned on by the firmware on resume and the apple-gmux driver turns it off if that was the state of the system when suspend was started.

I tried a quick hack to call the "ON" function for vga-switcheroo from the apple-gmux driver suspend hook.  But that didn't work.  However, if I do it manually (i.e. echo ON > /sys/kernel/debug/vgaswitcheroo/switch) before suspend, then it does seem to fix it.
Comment 7 Alex Deucher 2013-06-03 13:18:22 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > I'm not sure if this has anything to do with this bug, but the PCI config
> > > space is all messed up when running on the integrated GPU:
> > 
> > When you disable the GPU the hardware is physically powered off so you are
> > accessing non-existent registers.  You need to power up the GPU using
> > vgaswitcheroo before loading/resuming the driver or access the config space
> > with lspci.
> 
> I guess what I was trying to say was, does having the device in this state
> before you start suspend make the power management subsystem confused?  Will
> it try to "restore" the pci configuration space to 0xff?  

The reason you are getting 0xff is because you are accessing a disabled device.  lspci would need to power up the GPU using vgaswitcheroo before accessing the hardware.

> 
> The radeon GPU is definitely turned on by the firmware on resume and the
> apple-gmux driver turns it off if that was the state of the system when
> suspend was started.

I suspect the gmux driver resumes first and disables the hardware before the radeon driver resumes. So the radeon driver resumes on disabled hardware.

> 
> I tried a quick hack to call the "ON" function for vga-switcheroo from the
> apple-gmux driver suspend hook.  But that didn't work.  However, if I do it
> manually (i.e. echo ON > /sys/kernel/debug/vgaswitcheroo/switch) before
> suspend, then it does seem to fix it.

Some one needs to sort out the interactions between the the vgaswitcheroo drivers and the GPU drivers so that they do the right thing on suspend and resume and hw access from userspace utilities like lspci.
Comment 8 Alex Deucher 2013-06-03 13:22:09 UTC
Additionally, since the audio device is part of the GPU, it needs to work properly with vgaswitcheroo.  Dave Airlie was doing some work to support all of this properly, but it's not complete yet:
http://cgit.freedesktop.org/~airlied/linux/log/?h=switchy-wip
http://cgit.freedesktop.org/~airlied/linux/log/?h=nv-pm-ops2-wip
Comment 9 Austin Lund 2013-06-09 04:05:56 UTC
Created attachment 80547 [details]
pci_restore_config_dword debugging when resuming with dGPU off at suspend

It is log output like this that worries my about the 0xff reads from the pci config space.

I set this in dynamic_debug:

drivers/pci/pci.c:964 [pci]pci_restore_config_dword =p "restoring config space at offset %#x (was %#x, writing %#x)\012"

Then it gives this 0xff writing to the pci address space on restore when using the integrated gpu.  Surely this cannot help.  My guess is that the hardware should be removed from the kernel when the power is turned off.  But that's just a hunch.
Comment 10 Martin Peres 2019-11-19 08:33:01 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/331.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.