Bug 106957

Summary: GPU runtime suspend broken since 4.17
Product: DRI Reporter: prg
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: lukas
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg from 4.17.2
none
dmesg with debug patch
none
dmesg with second debug patch none

Description prg 2018-06-18 20:49:51 UTC
Created attachment 140212 [details]
dmesg from 4.17.2

I got a notebook with some Intel IGP and a Radeon HD5650. Since commit 07f4f97d7b4bf325d9f558c5b58230387e4e57e0 the dGPU is constantly DynPwr'd. This still happens in 4.17.2, so the patch mentioned in #106597 doesn't help.
Comment 1 Lukas Wunner 2018-06-18 22:06:04 UTC
Hm, what does the following show?

cat /sys/bus/pci/devices/0000:02:00.0/power/control               # GPU
cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_status        # GPU
cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_usage         # GPU
cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_active_kids   # GPU
cat /sys/bus/pci/devices/0000:02:00.1/power/control               # HDA
cat /sys/bus/pci/devices/0000:02:00.1/power/runtime_status        # HDA
cat /sys/bus/pci/devices/0000:02:00.1/power/runtime_usage         # HDA
cat /sys/bus/pci/devices/0000:02:00.1/power/runtime_active_kids   # HDA

I can't see anything in dmesg that the HDA controller is bound to a driver. What does the following show?

ls -l /sys/bus/pci/devices/0000:02:00.1/driver
Comment 2 prg 2018-06-19 16:08:08 UTC
# cat /sys/bus/pci/devices/0000:02:00.0/power/control               # GPU
auto

# cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_status        # GPU
active

# cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_usage         # GPU
cat: '/sys/bus/pci/devices/0000:02:00.0/power/runtime_usage': No such file or directory

# cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_active_kids   # GPU
cat: '/sys/bus/pci/devices/0000:02:00.0/power/runtime_active_kids': No such file or directory                               
                                                                                                                            
# cat /sys/bus/pci/devices/0000:02:00.1/power/control               # HDA                                                   
auto                                                                                                                        
                                                                                                                            
# cat /sys/bus/pci/devices/0000:02:00.1/power/runtime_status        # HDA
active

# cat /sys/bus/pci/devices/0000:02:00.1/power/runtime_usage         # HDA
cat: '/sys/bus/pci/devices/0000:02:00.1/power/runtime_usage': No such file or directory

# cat /sys/bus/pci/devices/0000:02:00.1/power/runtime_active_kids   # HDA
cat: '/sys/bus/pci/devices/0000:02:00.1/power/runtime_active_kids': No such file or directory

# ls -l /sys/bus/pci/devices/0000:02:00.1/driver
lrwxrwxrwx 1 root root 0 Jun 19 18:04 /sys/bus/pci/devices/0000:02:00.1/driver -> ../../../../bus/pci/drivers/snd_hda_intel
Comment 3 Lukas Wunner 2018-06-19 17:51:09 UTC
Okay so the HDA controller is bound to a driver and is runtime active.  Naturally, if it's runtime active it'll keep the GPU awake.  Question is what's keeping it active.

Could you check if there are any user space processes accessing the HDA controller:
sudo lsof /dev/snd/controlC1

You got "No such file or directory" for some of the commands I gave you because the kernel isn't compiled with CONFIG_PM_ADVANCED_DEBUG=y.  Could you enable that option and try again?  Thanks!
Comment 4 prg 2018-06-19 18:32:49 UTC
# cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_usage
1

# cat /sys/bus/pci/devices/0000:02:00.0/power/runtime_active_kids
0

# cat /sys/bus/pci/devices/0000:02:00.1/power/runtime_usage
0

# cat /sys/bus/pci/devices/0000:02:00.1/power/runtime_active_kids
0

# lsof /dev/snd/controlC1

No output. Yes, I did run this as root.
Comment 5 Lukas Wunner 2018-06-19 20:09:41 UTC
Okay the HDA's runtime ref counter is 0 and it has no active children, so it should suspend. Chances are it doesn't because rpm_idle() fails for some reason.

Could you try this debug patch that I had created for #106597 and post the dmesg output?

https://bugs.freedesktop.org/attachment.cgi?id=139706&action=edit

You can add "log_buf_len=10M ignore_loglevel" to the command line to ensure that dmesg isn't truncated and contains all debug output.
Comment 6 prg 2018-06-19 20:25:01 UTC
Created attachment 140235 [details]
dmesg with debug patch
Comment 7 Lukas Wunner 2018-06-20 04:12:30 UTC
I'm having a déjà vu here :-)

[    7.840122] snd_hda_intel 0000:02:00.1: azx_runtime_idle: !power_save_controller = 0, !azx_has_pm_runtime(chip) = 0, azx_bus(chip)->codec_powered = 0x1, !chip->running = 0

The single codec on this HDA controller is considered powered on, hence the HDA controller refuses to runtime suspend. Same problem as in #106597.

What does the following show:
grep . /sys/bus/hdaudio/devices/hdaudioC1D0/widgets/*/power_caps

And could you try this debug patch (in lieu of the other one) to narrow down the root cause further:
https://bugs.freedesktop.org/attachment.cgi?id=139735&action=edit

Thanks!
Comment 8 prg 2018-06-20 07:27:05 UTC
Created attachment 140242 [details]
dmesg with second debug patch

# grep . /sys/bus/hdaudio/devices/hdaudioC1D0/widgets/*/power_caps
/sys/bus/hdaudio/devices/hdaudioC1D0/widgets/01/power_caps:0x00000009
Comment 9 Lukas Wunner 2018-06-21 19:43:30 UTC
Fixed with commit 57cb54e53bdd ("ALSA: hda - Force to link down at runtime suspend on ATI/AMD HDMI") which is now queued for 4.18-rc2 and marked for stable, it will probably appear in 4.17.3:
https://git.kernel.org/tiwai/sound/c/57cb54e53bdd
Comment 10 Lukas Wunner 2018-06-24 16:50:26 UTC
It looks like no pull request was sent out for the sound subsystem this week, so I'm afraid the fix will not appear in mainline earlier than 4.18-rc3.
Comment 11 Lukas Wunner 2018-06-29 11:15:05 UTC
The fix landed in Linus' tree yesterday:
https://git.kernel.org/linus/57cb54e53bdd

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.